php网页提取

发布时间: 2022-12-10 21:06:26

1. php抓取网页源码方法

可以使用file_get_content函数来获取源代码，你只需要把网站传入这个函数，获取后是一个字符串，你需要格式化代码就可以了

2. PHP获取网页内容的几种方法

简单的收集下PHP下获取网页内容的几种方法:
用file_get_contents,以get方式获取内容。
用fopen打开url,以get方式获取内容。
使用curl库，使用curl库之前，可能需要查看一下php.ini是否已经打开了curl扩展。
用file_get_contents函数，以post方式获取url。
用fopen打开url，以post方式获取内容。
用fsockopen函数打开url，获取完整的数据，包括header和body。

3. php获取网页源码内容有哪些办法

可以参考以下几种方法：

方法一： file_get_contents获取

$url="http://www..com/";

$fh= file_get_contents

('http://www.hxfzzx.com/news/fzfj/');echo $fh;

拓展资料

PHP（外文名:PHP: Hypertext Preprocessor，中文名：“超文本预处理器”）是一种通用开源脚本语言。语法吸收了C语言、Java和Perl的特点，利于学习，使用广泛，主要适用于Web开发领域。PHP 独特的语法混合了C、Java、Perl以及PHP自创的语法。它可以比CGI或者Perl更快速地执行动态网页。

用PHP做出的动态页面与其他的编程语言相比，PHP是将程序嵌入到HTML（标准通用标记语言下的一个应用）文档中去执行，执行效率比完全生成HTML标记的CGI要高许多；PHP还可以执行编译后代码，编译可以达到加密和优化代码运行，使代码运行更快。

4. PHP 如何获取到一个网页的内容

1.file_get_contents
PHP代码

复制代码代码如下:

<?php
$url = "http://www.jb51.net";
$contents = file_get_contents($url);
//如果出现中文乱码使用下面代码
//$getcontent = iconv("gb2312", "utf-8",$contents);
echo $contents;
?>

2.curl
PHP代码

复制代码代码如下:

<?php
$url = "http://www.jb51.net";
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
//在需要用户检测的网页里需要增加下面两行
//curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
//curl_setopt($ch, CURLOPT_USERPWD, US_NAME.":".US_PWD);
$contents = curl_exec($ch);
curl_close($ch);
echo $contents;
?>

3.fopen->fread->fclose
PHP代码

复制代码代码如下:

<?php
$handle = fopen ("http://www.jb51.net", "rb");
$contents = "";
do {
$data = fread($handle, 1024);
if (strlen($data) == 0) {
break;
}
$contents .= $data;
} while(true);
fclose ($handle);
echo $contents;
?>

注：
1.
使用file_get_contents和fopen必须空间开启allow_url_fopen。方法：编辑php.ini，设置
allow_url_fopen = On，allow_url_fopen关闭时fopen和file_get_contents都不能打开远程文件。
2.使用curl必须空间开启curl。方法：windows下修改php.ini，将extension=php_curl.dll前面的分
号去掉，而且需要拷贝ssleay32.dll和libeay32.dll到C:\WINDOWS\system32下；Linux下要安装curl扩
展。

5. 关于PHP正则提取网页信息

用 int preg_match( string pattern, string subject [, array matches ] ) 来处理，实现过程我就不写了，没有验环境，下面说下思路。

用$str读入采集结果，preg_match_all("/<[^>]+.+>/", $str,$split_word)分割采集结果，得到数组$split_word，结果应该是这样子的：
$split_word[0]="<li><table><tr>"
$split_word[1]=" <td width="574"><a href="detailnew.jsp?id=803088">驻村干部</a></td>"
……
然后逐个对数组元素进行查找，首先循环查找条件1 id号：
preg_match("/id=\d+/i",$split_word[n],$id_value)
匹配的结果是数组$id_value，类似$id_value[0]="id=xxxxx"，如果你要提取纯数字，还要再对这个结果提取一次。
其余字段提取仿照上面操作，对应的正则表达式:
链接标题先提取/<a[^>]+>\w+<\/a>/i, 然后再从结果中提取< /[x80-xff>]{4,}/i (即匹配四个汉字以上，汉字GBK/GB2312编码是：[x80-xff>]，UTF-8编码：[x{4e00}-x{9fa5}]+/u，这一点要注意)

……

思路就是这样，比较烦人，没有下一子能就匹配所有信息的正则表达式。

6. php获取指定网页内容

一、用file_get_contents函数,以post方式获取url

<?php

$url='http://www.domain.com/test.php?id=123';

$data=array('foo'=>'bar');

$data= http_build_query($data);

$opts=array(

'http'=>array(

'method'=>'POST',

'header'=>"Content-type: application/x-www-form-urlencoded " .

"Content-Length: " .strlen($data) ." ",

'content'=>$data

)

);

$ctx= stream_context_create($opts);

$html= @file_get_contents($url,'',$ctx);

二、用file_get_contents以get方式获取内容

<?php

$url='http://www.domain.com/?para=123';

$html=file_get_contents($url);

echo$html;

三、用fopen打开url, 以get方式获取内容

<?php

$fp=fopen($url,'r');

$header= stream_get_meta_data($fp);//获取报头信息

while(!feof($fp)) {

$result.=fgets($fp, 1024);

}

echo"url header: {$header} ":

echo"url body: $result";

fclose($fp);

四、用fopen打开url, 以post方式获取内容

<?php

$data=array('foo2'=>'bar2','foo3'=>'bar3');

$data= http_build_query($data);

$opts=array(

'http'=>array(

'method'=>'POST',

'header'=>"Content-type: application/x-www-form-

urlencoded Cookie:cook1=c3;cook2=c4 " .

"Content-Length: " .strlen($data) ." ",

'content'=>$data

)

);

$context= stream_context_create($opts);

$html=fopen('http://www.test.com/zzzz.php?id=i3&id2=i4','rb',false,$context);

$w=fread($html,1024);

echo$w;

五、使用curl库，使用curl库之前，可能需要查看一下php.ini是否已经打开了curl扩展

<?php

$ch= curl_init();

$timeout= 5;

curl_setopt ($ch, CURLOPT_URL,'http://www.domain.com/');

curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);

curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT,$timeout);

$file_contents= curl_exec($ch);

curl_close($ch);

echo$file_contents;

阅读全文

热点内容

做解压橡皮发布：2025-01-21 15:03:06 浏览：990

双系统win访问mac 发布：2025-01-21 14:53:52 浏览：484

安卓车机系统如何安装carplay 发布：2025-01-21 14:52:24 浏览：589

sql操作手册发布：2025-01-21 14:46:08 浏览：311

青橙脚本发布：2025-01-21 14:44:05 浏览：218

东风本田crv时尚版是什么配置发布：2025-01-21 14:20:04 浏览：219

安卓如何多开软件每个机型不一样发布：2025-01-21 14:15:29 浏览：501

iis配置php5 发布：2025-01-21 14:08:19 浏览：274

凯叔讲故事为什么联系不到服务器发布：2025-01-21 13:56:50 浏览：387

linux镜像文件下载发布：2025-01-21 13:34:36 浏览：218

php网页提取

与php网页提取相关的资讯