cURL följa länkar från hemsidor?

2016-07-19, 13:06 #1

Medlem

Reg: Dec 2014

Inlägg: 71

Länkarna ändras hela tiden men är alltid i samma class, så jag hämtar först hemsidan med curl och letar sedan upp den konstanta classen (_56pjv) och följer sedan länken.

Har redan problem vid själva detekterningen av länken som ska följas. Får detta error:

Fatal error: Uncaught Error: Call to a member function getElementsByTagName() on null

Kod:


// Create a user agent so websites don't block you

$userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.1';



// Create the initial link you want.

$target_url = "topphemligurl";



// Initialize curl and following options

$ch = curl_init();

curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);

curl_setopt($ch, CURLOPT_URL,$target_url);

curl_setopt($ch, CURLOPT_FAILONERROR, true);

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

curl_setopt($ch, CURLOPT_AUTOREFERER, true);

curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);

curl_setopt($ch, CURLOPT_TIMEOUT, 10);

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);

curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false); 





// Grab the html from the page

$html = curl_exec($ch);







// Error handling

if(!$html){

     echo "Error";

     exit();

}





// Create a new DOM Document to handle scraping

$dom = new DOMDocument();

@$dom->loadHTML($html);









// get your element, you can do this numerous ways like getting by tag, id or using a DOMXPath object

// This example gets elements with id forward-link which might be a div or ul or li, etc

// It then gets all the a tags (links) within all those divs, uls, etc

// Then it takes the first link in the array of links and then grabs the href from the link

$search = $dom->getElementById('_56pjv');

$forwardlink = $search->getElementsByTagName('a');

$forwardlink = $forwardlink->item(0);

$forwardlink = $getNamedItem('href');

$href = $forwardlink->textContent;





// Now that you have the link you want to follow/click to

// Set the target_url for the cUrl to the new url

curl_setopt($ch, CURLOPT_URL, $target_url);



$html = curl_exec($ch);



curl_close ($ch);

Såhär ser HTML ut för sidan jag hämtade med curl:

HTML-kod:

<a class="_56pjv" href="http://blabla.com" rel="nofollow me" target="_blank">blabla.com</a>

Citera

2016-12-05, 06:14 #2

Medlem

Reg: Nov 2004

Inlägg: 134

Hej,

Kod:


$search = $dom->getElementById('_56pjv');

$forwardlink = $search->getElementsByTagName('a');

Det ser ut som $search blir null eller FALSE och därmed inte är något objekt.
Länken du länkar efter ser inte ut att ha något tillängnat id

Byt ut

HTML-kod:

<a class="_56pjv" href="http://blabla.com" rel="nofollow me" target="_blank">blabla.com</a>

Mot

HTML-kod:

<a class="_56pjv" href="http://blabla.com" rel="nofollow me" target="_blank" id="_56pjv">blabla.com</a>

Så borde det fungera. Du får använda en annan CSS selector så är saken biff!
Hint: https://developer.mozilla.org/en-US/...ntsByClassName

Citera

cURL följa länkar från hemsidor?

Stöd Flashback