php網頁提取

發布時間: 2022-12-10 21:06:26

1. php抓取網頁源碼方法

可以使用file_get_content函數來獲取源代碼，你只需要把網站傳入這個函數，獲取後是一個字元串，你需要格式化代碼就可以了

2. PHP獲取網頁內容的幾種方法

簡單的收集下PHP下獲取網頁內容的幾種方法:
用file_get_contents,以get方式獲取內容。
用fopen打開url,以get方式獲取內容。
使用curl庫，使用curl庫之前，可能需要查看一下php.ini是否已經打開了curl擴展。
用file_get_contents函數，以post方式獲取url。
用fopen打開url，以post方式獲取內容。
用fsockopen函數打開url，獲取完整的數據，包括header和body。

3. php獲取網頁源碼內容有哪些辦法

可以參考以下幾種方法：

方法一： file_get_contents獲取

$url="http://www..com/";

$fh= file_get_contents

('http://www.hxfzzx.com/news/fzfj/');echo $fh;

拓展資料

PHP（外文名:PHP: Hypertext Preprocessor，中文名：「超文本預處理器」）是一種通用開源腳本語言。語法吸收了C語言、Java和Perl的特點，利於學習，使用廣泛，主要適用於Web開發領域。PHP 獨特的語法混合了C、Java、Perl以及PHP自創的語法。它可以比CGI或者Perl更快速地執行動態網頁。

用PHP做出的動態頁面與其他的編程語言相比，PHP是將程序嵌入到HTML（標准通用標記語言下的一個應用）文檔中去執行，執行效率比完全生成HTML標記的CGI要高許多；PHP還可以執行編譯後代碼，編譯可以達到加密和優化代碼運行，使代碼運行更快。

4. PHP 如何獲取到一個網頁的內容

1.file_get_contents
PHP代碼

復制代碼代碼如下:

<?php
$url = "http://www.jb51.net";
$contents = file_get_contents($url);
//如果出現中文亂碼使用下面代碼
//$getcontent = iconv("gb2312", "utf-8",$contents);
echo $contents;
?>

2.curl
PHP代碼

復制代碼代碼如下:

<?php
$url = "http://www.jb51.net";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
//在需要用戶檢測的網頁里需要增加下面兩行
//curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
//curl_setopt($ch, CURLOPT_USERPWD, US_NAME.":".US_PWD);
$contents = curl_exec($ch);
curl_close($ch);
echo $contents;
?>

3.fopen->fread->fclose
PHP代碼

復制代碼代碼如下:

<?php
$handle = fopen ("http://www.jb51.net", "rb");
$contents = "";
do {
$data = fread($handle, 1024);
if (strlen($data) == 0) {
break;
}
$contents .= $data;
} while(true);
fclose ($handle);
echo $contents;
?>

註：
1.
使用file_get_contents和fopen必須空間開啟allow_url_fopen。方法：編輯php.ini，設置
allow_url_fopen = On，allow_url_fopen關閉時fopen和file_get_contents都不能打開遠程文件。
2.使用curl必須空間開啟curl。方法：windows下修改php.ini，將extension=php_curl.dll前面的分
號去掉，而且需要拷貝ssleay32.dll和libeay32.dll到C:\WINDOWS\system32下；Linux下要安裝curl擴
展。

5. 關於PHP正則提取網頁信息

用 int preg_match( string pattern, string subject [, array matches ] ) 來處理，實現過程我就不寫了，沒有驗環境，下面說下思路。

用$str讀入採集結果，preg_match_all("/<[^>]+.+>/", $str,$split_word)分割採集結果，得到數組$split_word，結果應該是這樣子的：
$split_word[0]="<li><table><tr>"
$split_word[1]=" <td width="574"><a href="detailnew.jsp?id=803088">駐村幹部</a></td>"
……
然後逐個對數組元素進行查找，首先循環查找條件1 id號：
preg_match("/id=\d+/i",$split_word[n],$id_value)
匹配的結果是數組$id_value，類似$id_value[0]="id=xxxxx"，如果你要提取純數字，還要再對這個結果提取一次。
其餘欄位提取仿照上面操作，對應的正則表達式:
鏈接標題先提取/<a[^>]+>\w+<\/a>/i, 然後再從結果中提取< /[x80-xff>]{4,}/i (即匹配四個漢字以上，漢字GBK/GB2312編碼是：[x80-xff>]，UTF-8編碼：[x{4e00}-x{9fa5}]+/u，這一點要注意)

……

思路就是這樣，比較煩人，沒有下一子能就匹配所有信息的正則表達式。

6. php獲取指定網頁內容

一、用file_get_contents函數,以post方式獲取url

<?php

$url='http://www.domain.com/test.php?id=123';

$data=array('foo'=>'bar');

$data= http_build_query($data);

$opts=array(

'http'=>array(

'method'=>'POST',

'header'=>"Content-type: application/x-www-form-urlencoded " .

"Content-Length: " .strlen($data) ." ",

'content'=>$data

)

);

$ctx= stream_context_create($opts);

$html= @file_get_contents($url,'',$ctx);

二、用file_get_contents以get方式獲取內容

<?php

$url='http://www.domain.com/?para=123';

$html=file_get_contents($url);

echo$html;

三、用fopen打開url, 以get方式獲取內容

<?php

$fp=fopen($url,'r');

$header= stream_get_meta_data($fp);//獲取報頭信息

while(!feof($fp)) {

$result.=fgets($fp, 1024);

}

echo"url header: {$header} ":

echo"url body: $result";

fclose($fp);

四、用fopen打開url, 以post方式獲取內容

<?php

$data=array('foo2'=>'bar2','foo3'=>'bar3');

$data= http_build_query($data);

$opts=array(

'http'=>array(

'method'=>'POST',

'header'=>"Content-type: application/x-www-form-

urlencoded Cookie:cook1=c3;cook2=c4 " .

"Content-Length: " .strlen($data) ." ",

'content'=>$data

)

);

$context= stream_context_create($opts);

$html=fopen('http://www.test.com/zzzz.php?id=i3&id2=i4','rb',false,$context);

$w=fread($html,1024);

echo$w;

五、使用curl庫，使用curl庫之前，可能需要查看一下php.ini是否已經打開了curl擴展

<?php

$ch= curl_init();

$timeout= 5;

curl_setopt ($ch, CURLOPT_URL,'http://www.domain.com/');

curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);

curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT,$timeout);

$file_contents= curl_exec($ch);

curl_close($ch);

echo$file_contents;

閱讀全文

熱點內容

java返回this 發布：2025-10-20 08:28:16 瀏覽：729

製作腳本網站發布：2025-10-20 08:17:34 瀏覽：995

python中的init方法發布：2025-10-20 08:17:33 瀏覽：702

圖案密碼什麼意思發布：2025-10-20 08:16:56 瀏覽：866

怎麼清理微信視頻緩存發布：2025-10-20 08:12:37 瀏覽：765

c語言編譯器怎麼看執行過程發布：2025-10-20 08:00:32 瀏覽：1102

郵箱如何填寫發信伺服器發布：2025-10-20 07:45:27 瀏覽：338

shell腳本入門案例發布：2025-10-20 07:44:45 瀏覽：212

怎麼上傳照片瀏覽上傳發布：2025-10-20 07:44:03 瀏覽：899

python股票數據獲取發布：2025-10-20 07:39:44 瀏覽：859

php網頁提取

與php網頁提取相關的資訊