当前位置:首页 » 编程语言 » 爬虫java代码

爬虫java代码

发布时间: 2022-03-06 06:47:54

❶ 求一个网络爬虫的java代码,

贴吧有特定的吗?还是泛指各种贴吧?

❷ 跪求Java网络爬虫 代码

我不会,我知道一个人,他肯定会,我同学Q:820215725,不是广告

❸ java网络爬虫

源代码如下
package com.cellstrain.icell.util;

import java.io.*;
import java.net.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
* java实现爬虫
*/
public class Robot {
public static void main(String[] args) {
URL url = null;
URLConnection urlconn = null;
BufferedReader br = null;
PrintWriter pw = null;
// String regex = "http://[\\w+\\.?/?]+\\.[A-Za-z]+";
String regex = "https://[\\w+\\.?/?]+\\.[A-Za-z]+";//url匹配规则
Pattern p = Pattern.compile(regex);
try {
url = new URL("网址");//爬取的网址、这里爬取的是一个生物网站
urlconn = url.openConnection();
pw = new PrintWriter(new FileWriter("D:/SiteURL.txt"), true);//将爬取到的链接放到D盘的SiteURL文件中
br = new BufferedReader(new InputStreamReader(
urlconn.getInputStream()));
String buf = null;
while ((buf = br.readLine()) != null) {
Matcher buf_m = p.matcher(buf);
while (buf_m.find()) {
pw.println(buf_m.group());
}
}
System.out.println("爬取成功^_^");
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
pw.close();
}
}
}

❹ 求java网络爬虫的源代码

package com.heaton.bot;import com.heaton.bot.*;import java.net.*; /** * The SpiderWorker class performs the actual work of * spidering pages. It is implemented as a thread * that is created by the spider class. * * Copyright 2001-2003 by Jeff Heaton ( http://www.jeffheaton.com) * * @author Jeff Heaton * @version 1.2 */public class SpiderWorker extends Thread { /** * The URL that this spider worker * should be downloading. */ protected String target; /** * The owner of this spider worker class, * should always be a Spider object. * This is the class that this spider * worker will send its data to. */ protected Spider owner; /** * Indicates if the spider is busy or not. * true = busy * false = idle */ protected boolean busy; /** * A descendant of the HTTP object that * this class should be using for HTTP * communication. This is usually the * HTTPSocket class. */ protected HTTP http; /** * Constructs a spider worker object. * * @param owner The owner of this object, usually * a Spider object. * @param http */ public SpiderWorker(Spider owner,HTTP http) { this.http = http; this.owner = owner; } /** * Returns true of false to indicate if * the spider is busy or idle. * * @return true = busy * false = idle */ public boolean isBusy() {<-文章出处: http://www.diybl.com/course/3_program/java/javajs/200797/69988.html

❺ 用java编写 网络爬虫求代码和流程 急

import java.awt.*;
import java.awt.event.*;
import java.io.*;
import java.net.*;
import java.util.*;
import java.util.regex.*;
import javax.swing.*;
import javax.swing.table.*;//一个Web的爬行者(注:爬行在这里的意思与抓取,捕获相同)
public class SearchCrawler extends JFrame{
//最大URL保存值
private static final String[] MAX_URLS={"50","100","500","1000"};

//缓存robot禁止爬行列表
private HashMap disallowListCache=new HashMap();

//搜索GUI控件
private JTextField startTextField;
private JComboBox maxComboBox;
private JCheckBox limitCheckBox;
private JTextField logTextField;
private JTextField searchTextField;
private JCheckBox caseCheckBox;
private JButton searchButton;

//搜索状态GUI控件
private JLabel crawlingLabel2;
private JLabel crawledLabel2;
private JLabel toCrawlLabel2;
private JProgressBar progressBar;
private JLabel matchesLabel2;

//搜索匹配项表格列表
private JTable table;

//标记爬行机器是否正在爬行
private boolean crawling;

//写日志匹配文件的引用
private PrintWriter logFileWriter;

//网络爬行者的构造函数
public SearchCrawler(){
//设置应用程序标题栏
setTitle("搜索爬行者");
//设置窗体大小
setSize(600,600);

//处理窗体关闭事件
addWindowListener(new WindowAdapter(){
public void windowClosing(WindowEvent e){
actionExit();
}
});

//设置文件菜单
JMenuBar menuBar=new JMenuBar();
JMenu fileMenu=new JMenu("文件");
fileMenu.setMnemonic(KeyEvent.VK_F);
JMenuItem fileExitMenuItem=new JMenuItem("退出",KeyEvent.VK_X);
fileExitMenuItem.addActionListener(new ActionListener(){
public void actionPerformed(ActionEvent e){
actionExit();
}
});
fileMenu.add(fileExitMenuItem);
menuBar.add(fileMenu);
setJMenuBar(menuBar);

❻ 200分求java网络爬虫的源代码

http://search.gougou.com/search?search=%E7%BD%91%E7%BB%9C%E7%88%AC%E8%99%AB&id=2

❼ Java源码 实现网络爬虫

//Java爬虫demo

importjava.io.File;
importjava.net.URL;
importjava.net.URLConnection;
importjava.nio.file.Files;
importjava.nio.file.Paths;
importjava.util.Scanner;
importjava.util.UUID;
importjava.util.regex.Matcher;
importjava.util.regex.Pattern;

publicclassDownMM{
publicstaticvoidmain(String[]args)throwsException{
//out为输出的路径,注意要以\结尾
Stringout="D:\JSP\pic\java\";
try{
Filef=newFile(out);
if(!f.exists()){
f.mkdirs();
}
}catch(Exceptione){
System.out.println("no");
}

Stringurl="http://www.mzitu.com/share/comment-page-";
Patternreg=Pattern.compile("<imgsrc="(.*?)"");
for(intj=0,i=1;i<=10;i++){
URLuu=newURL(url+i);
URLConnectionconn=uu.openConnection();
conn.setRequestProperty("User-Agent","Mozilla/5.0(WindowsNT6.3;WOW64;Trident/7.0;rv:11.0)likeGecko");
Scannersc=newScanner(conn.getInputStream());
Matcherm=reg.matcher(sc.useDelimiter("\A").next());
while(m.find()){
Files.(newURL(m.group(1)).openStream(),Paths.get(out+UUID.randomUUID()+".jpg"));
System.out.println("已下载:"+j++);
}
}
}
}

❽ Java源码 实现网络爬虫

给我邮箱~~~~ 看你问好几天了

❾ 网络爬虫解析网页怎样用java代码实现

爬虫的原理其实就是获取到网页内容,然后对其进行解析。只不过获取的网页、解析内容的方式多种多样而已。
你可以简单的使用httpclient发送get/post请求,获取结果,然后使用截取字符串、正则表达式获取想要的内容。
或者使用像Jsoup/crawler4j等这些已经封装好的类库,更方便的爬取信息。

❿ 求用JAVA编写网络爬虫的源代码

我不知道 你用来干什么 网络爬虫太多了 你说的详细点 才能给你编写 我有一套采集qvod视频 自己编写的 你可以告诉我你想采集那个网站 我给你编一套

热点内容
成都少儿编程培训机构 发布:2025-01-13 03:21:20 浏览:88
linuxatop 发布:2025-01-13 03:19:01 浏览:438
彩38源码下载 发布:2025-01-13 03:16:51 浏览:971
手机app缓存可不可以删 发布:2025-01-13 03:10:46 浏览:937
安卓怎么显示第五个人图鉴 发布:2025-01-13 03:03:23 浏览:922
内网访问很慢 发布:2025-01-13 03:01:01 浏览:454
魔兽脚本p闪 发布:2025-01-13 02:58:40 浏览:291
java递减 发布:2025-01-13 02:54:40 浏览:490
决策树的算法例题 发布:2025-01-13 02:53:15 浏览:448
脚本四要素 发布:2025-01-13 02:40:18 浏览:930