爬虫java代码

发布时间: 2022-03-06 06:47:54

❶ 求一个网络爬虫的java代码，

贴吧有特定的吗？还是泛指各种贴吧？

❷ 跪求Java网络爬虫代码

我不会,我知道一个人,他肯定会,我同学Q:820215725,不是广告

❸ java网络爬虫

源代码如下
package com.cellstrain.icell.util;

import java.io.*;
import java.net.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
* java实现爬虫
*/
public class Robot {
public static void main(String[] args) {
URL url = null;
URLConnection urlconn = null;
BufferedReader br = null;
PrintWriter pw = null;
// String regex = "http://[\\w+\\.?/?]+\\.[A-Za-z]+";
String regex = "https://[\\w+\\.?/?]+\\.[A-Za-z]+";//url匹配规则
Pattern p = Pattern.compile(regex);
try {
url = new URL("网址");//爬取的网址、这里爬取的是一个生物网站
urlconn = url.openConnection();
pw = new PrintWriter(new FileWriter("D:/SiteURL.txt"), true);//将爬取到的链接放到D盘的SiteURL文件中
br = new BufferedReader(new InputStreamReader(
urlconn.getInputStream()));
String buf = null;
while ((buf = br.readLine()) != null) {
Matcher buf_m = p.matcher(buf);
while (buf_m.find()) {
pw.println(buf_m.group());
}
}
System.out.println("爬取成功^_^");
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
pw.close();
}
}
}

❹ 求java网络爬虫的源代码

package com.heaton.bot;import com.heaton.bot.*;import java.net.*; /** * The SpiderWorker class performs the actual work of * spidering pages. It is implemented as a thread * that is created by the spider class. * * Copyright 2001-2003 by Jeff Heaton ( http://www.jeffheaton.com) * * @author Jeff Heaton * @version 1.2 */public class SpiderWorker extends Thread { /** * The URL that this spider worker * should be downloading. */ protected String target; /** * The owner of this spider worker class, * should always be a Spider object. * This is the class that this spider * worker will send its data to. */ protected Spider owner; /** * Indicates if the spider is busy or not. * true = busy * false = idle */ protected boolean busy; /** * A descendant of the HTTP object that * this class should be using for HTTP * communication. This is usually the * HTTPSocket class. */ protected HTTP http; /** * Constructs a spider worker object. * * @param owner The owner of this object, usually * a Spider object. * @param http */ public SpiderWorker(Spider owner,HTTP http) { this.http = http; this.owner = owner; } /** * Returns true of false to indicate if * the spider is busy or idle. * * @return true = busy * false = idle */ public boolean isBusy() {<-文章出处： http://www.diybl.com/course/3_program/java/javajs/200797/69988.html

❺ 用java编写网络爬虫求代码和流程急

import java.awt.*;
import java.awt.event.*;
import java.io.*;
import java.net.*;
import java.util.*;
import java.util.regex.*;
import javax.swing.*;
import javax.swing.table.*;//一个Web的爬行者(注：爬行在这里的意思与抓取，捕获相同)
public class SearchCrawler extends JFrame{
//最大URL保存值
private static final String[] MAX_URLS={"50","100","500","1000"};

//缓存robot禁止爬行列表
private HashMap disallowListCache=new HashMap();

//搜索GUI控件
private JTextField startTextField;
private JComboBox maxComboBox;
private JCheckBox limitCheckBox;
private JTextField logTextField;
private JTextField searchTextField;
private JCheckBox caseCheckBox;
private JButton searchButton;

//搜索状态GUI控件
private JLabel crawlingLabel2;
private JLabel crawledLabel2;
private JLabel toCrawlLabel2;
private JProgressBar progressBar;
private JLabel matchesLabel2;

//搜索匹配项表格列表
private JTable table;

//标记爬行机器是否正在爬行
private boolean crawling;

//写日志匹配文件的引用
private PrintWriter logFileWriter;

//网络爬行者的构造函数
public SearchCrawler(){
//设置应用程序标题栏
setTitle("搜索爬行者");
//设置窗体大小
setSize(600,600);

//处理窗体关闭事件
addWindowListener(new WindowAdapter(){
public void windowClosing(WindowEvent e){
actionExit();
}
});

//设置文件菜单
JMenuBar menuBar=new JMenuBar();
JMenu fileMenu=new JMenu("文件");
fileMenu.setMnemonic(KeyEvent.VK_F);
JMenuItem fileExitMenuItem=new JMenuItem("退出",KeyEvent.VK_X);
fileExitMenuItem.addActionListener(new ActionListener(){
public void actionPerformed(ActionEvent e){
actionExit();
}
});
fileMenu.add(fileExitMenuItem);
menuBar.add(fileMenu);
setJMenuBar(menuBar);

❻ 200分求java网络爬虫的源代码

http://search.gougou.com/search?search=%E7%BD%91%E7%BB%9C%E7%88%AC%E8%99%AB&id=2

❼ Java源码实现网络爬虫

//Java爬虫demo

importjava.io.File;
importjava.net.URL;
importjava.net.URLConnection;
importjava.nio.file.Files;
importjava.nio.file.Paths;
importjava.util.Scanner;
importjava.util.UUID;
importjava.util.regex.Matcher;
importjava.util.regex.Pattern;

publicclassDownMM{
publicstaticvoidmain(String[]args)throwsException{
//out为输出的路径,注意要以\结尾
Stringout="D:\JSP\pic\java\";
try{
Filef=newFile(out);
if(!f.exists()){
f.mkdirs();
}
}catch(Exceptione){
System.out.println("no");
}

Stringurl="http://www.mzitu.com/share/comment-page-";
Patternreg=Pattern.compile("<imgsrc="(.*?)"");
for(intj=0,i=1;i<=10;i++){
URLuu=newURL(url+i);
URLConnectionconn=uu.openConnection();
conn.setRequestProperty("User-Agent","Mozilla/5.0(WindowsNT6.3;WOW64;Trident/7.0;rv:11.0)likeGecko");
Scannersc=newScanner(conn.getInputStream());
Matcherm=reg.matcher(sc.useDelimiter("\A").next());
while(m.find()){
Files.(newURL(m.group(1)).openStream(),Paths.get(out+UUID.randomUUID()+".jpg"));
System.out.println("已下载:"+j++);
}
}
}
}

❽ Java源码实现网络爬虫

给我邮箱~~~~ 看你问好几天了

❾ 网络爬虫解析网页怎样用java代码实现

爬虫的原理其实就是获取到网页内容，然后对其进行解析。只不过获取的网页、解析内容的方式多种多样而已。
你可以简单的使用httpclient发送get/post请求，获取结果，然后使用截取字符串、正则表达式获取想要的内容。
或者使用像Jsoup/crawler4j等这些已经封装好的类库，更方便的爬取信息。

❿ 求用JAVA编写网络爬虫的源代码

我不知道你用来干什么网络爬虫太多了你说的详细点才能给你编写我有一套采集qvod视频自己编写的你可以告诉我你想采集那个网站我给你编一套

阅读全文

热点内容

phpfile乱码发布：2025-07-16 03:57:54 浏览：93

手机存储空间扩容发布：2025-07-16 03:52:07 浏览：861

小米4清除缓存发布：2025-07-16 03:03:17 浏览：563

如何缓解压力英语作文发布：2025-07-16 03:03:15 浏览：15

手机视频怎么缓存发布：2025-07-16 02:59:05 浏览：933

安卓手机设备在哪里找发布：2025-07-16 02:49:28 浏览：357

php建立数组发布：2025-07-16 02:34:30 浏览：284

oracle存储过程同步发布：2025-07-16 02:29:18 浏览：941

欧诺s买哪个配置的好发布：2025-07-16 02:26:22 浏览：559

热点可以建立ftp吗发布：2025-07-16 02:26:21 浏览：304

爬虫java代码

与爬虫java代码相关的资讯