资源简介
天眼查爬虫,用到了Python的各种解析及IP池技术,仅供学习使用
代码片段和文件信息
# -*- coding:utf-8 -*-
import requests
from lxml import etree
import random
import re
# import HTMLParser
from html.parser import HTMLParser
import time
proxy = [
‘http://112.83.86.88:2589‘
‘https://117.92.128.239:2444‘
‘https://117.94.120.55:4734‘
‘https://116.149.201.121:6436‘
‘https://111.72.104.133:4184‘
‘https://113.103.151.180:4217‘
‘https://60.189.139.208:4241‘
‘https://222.191.171.98:4263‘
‘https://182.108.168.108:4234‘
‘https://115.209.194.193:4270‘
]
USER_AGENTS = [
“Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)“
“Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)“
“Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)“
“Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)“
“Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)“
“Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)“
“Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)“
“Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML like Gecko Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)“
“Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML like Gecko Safari/419.3) Arora/0.6“
“Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1“
“Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0“
“Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5“
“Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6“
“Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML like Gecko) Chrome/17.0.963.56 Safari/535.11“
“Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML like Gecko) Chrome/19.0.1036.7 Safari/535.20“
“Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52“
]
cookie = [
#‘cloud_token=bc34c50c90c2446c83aed5cb1be47d45; Hm_lpvt_e92c8d65d92d534b0fc290df538b4758=1544282809; RTYCID=74927dd837fb4732a031f393165e04eb; TYCID=f81d6a20af4d11e88c773753f20cd2b6; _gid=GA1.2.1641349744.1544194803; undefined=f81d6a20af4d11e88c773753f20cd2b6; CT_TYCID=dd01fb472ae5479ba38a69ae86aeb2f7; ssuid=4039911408; _ga=GA1.2.176006030.1535961067; Hm_lvt_e92c8d65d92d534b0fc290df538b4758=1544203531154420887815442286711544282750; tyc-user-info=%257B%2522myQuestionCount%2522%253A%25220%2522%252C%2522integrity%2522%253A%25220%2525%2522%252C%2522
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
文件 2009648 2018-10-09 14:20 天眼查爬虫\Java架构师课程大纲.jpg
文件 22070 2019-01-08 17:21 天眼查爬虫\pachong.py
文件 21567 2018-12-11 18:26 天眼查爬虫\天眼查爬虫_学习.py
目录 0 2019-01-22 10:29 天眼查爬虫
----------- --------- ---------- ----- ----
2053285 4
相关资源
- Python-京东抢购助手包含登录查询商品
- python网络爬虫获取景点信息源码
- python爬取维基百科程序语言消息盒(
- python新浪微博爬虫
- 12306爬虫实现
- 中国裁判文书网爬虫
- Python爬虫相关书籍.zip
- 豆瓣电影排行爬虫
- 疫情数据爬虫并绘制柱状图.py
- python新浪微博爬虫,爬取微博和用户
- Python数据爬虫及可视化分析
- 一套最新价值1680元的python爬虫实战全
- 测试工程师相关学习视频(包含pyth
- 11-Python爬虫工程师-App抓取进阶
- 基于10000网页python搭建搜索引擎课程设
- 法律判决文书python爬虫、以及数据处
- python新浪微博爬虫,爬取微博和用户
- Python网络爬虫实战.epub
- Python爬虫、Flask框架与ECharts实现数据
- Python爬虫入门到实战 (二花) PDF版
- python网络爬虫爬取整个网页
- Python-利用Python图虫网摄影作品
- ScrapyMySQL爬取链家网中北京地区租房信
- 学习python爬虫看一篇就足够了之爬取
- python3爬虫
- Python项目案例开发从入门到实战源代
- 基于Python的网络爬虫系统的设计与实
- 基于Python的分布式网络爬虫系统的设
- 爬取优酷电影代码
- 基于Python网络爬虫毕业论文.doc
评论
共有 条评论