资源简介
用Python写网络爬虫PDF&源码.rar 用Python写网络爬虫PDF&源码.rar
代码片段和文件信息
# -*- coding: utf-8 -*-
import urllib2
import urlparse
def download1(url):
“““Simple downloader“““
return urllib2.urlopen(url).read()
def download2(url):
“““Download function that catches errors“““
print ‘Downloading:‘ url
try:
html = urllib2.urlopen(url).read()
except urllib2.URLError as e:
print ‘Download error:‘ e.reason
html = None
return html
def download3(url num_retries=2):
“““Download function that also retries 5XX errors“““
print ‘Downloading:‘ url
try:
html = urllib2.urlopen(url).read()
except urllib2.URLError as e:
print ‘Download error:‘ e.reason
html = None
if num_retries > 0:
if hasattr(e ‘code‘) and 500 <= e.code < 600:
# retry 5XX HTTP errors
html = download3(url num_retries-1)
return html
def download4(url user_agent=‘wswp‘ num_retries=2):
“““Download function that includes user agent support“““
print ‘Downloading:‘ url
headers = {‘User-agent‘: user_agent}
request = urllib2.Request(url headers=headers)
try:
html = urllib2.urlopen(request).read()
except urllib2.URLError as e:
print ‘Download error:‘ e.reason
html = None
if num_retries > 0:
if hasattr(e ‘code‘) and 500 <= e.code < 600:
# retry 5XX HTTP errors
html = download4(url user_agent num_retries-1)
return html
def download5(url user_agent=‘wswp‘ proxy=None num_retries=2):
“““Download function with support for proxies“““
print ‘Downloading:‘ url
headers = {‘User-agent‘: user_agent}
request = urllib2.Request(url headers=headers)
opener = urllib2.build_opener()
if proxy:
proxy_params = {urlparse.urlparse(url).scheme: proxy}
opener.add_handler(urllib2.ProxyHandler(proxy_params))
try:
html = opener.open(request).read()
except urllib2.URLError as e:
print ‘Download error:‘ e.reason
html = None
if num_retries > 0:
if hasattr(e ‘code‘) and 500 <= e.code < 600:
# retry 5XX HTTP errors
html = download5(url user_agent proxy num_retries-1)
return html
download = download5
if __name__ == ‘__main__‘:
print download(‘http://example.webscraping.com‘)
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
文件 174 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\.hg_archival.txt
文件 2364 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter01\common.py
文件 553 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter01\iteration_crawler1.py
文件 846 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter01\iteration_crawler2.py
文件 931 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter01\li
文件 1149 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter01\li
文件 4649 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter01\li
文件 445 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter01\sitemap_crawler.py
文件 554 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter02\bs_example.py
文件 462 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter02\common.py
文件 4816 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter02\li
文件 371 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter02\lxm
文件 2293 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter02\performance.py
文件 333 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter02\regex_example.py
文件 700 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter02\scrape_callback1.py
文件 940 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter02\scrape_callback2.py
文件 3686 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter03\disk_cache.py
文件 3230 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter03\downloader.py
文件 3183 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter03\li
文件 2356 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter03\mongo_cache.py
文件 818 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter04\alexa_cb.py
文件 564 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter04\alexa_fn.py
文件 3026 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter04\mongo_queue.py
文件 2736 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter04\process_crawler.py
文件 471 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter04\process_test.py
文件 375 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter04\sequential_test.py
文件 2491 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter04\threaded_crawler.py
文件 475 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter04\threaded_test.py
文件 2747 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter05\browser_render.py
文件 1101 2015-09-28 13:29 用Python写网络爬虫PDF&源码\用Python写网络爬虫PDF&源码\用Python写爬虫-源码\chapter05\search1.py
............此处省略157个文件信息
相关资源
- 最新Python离线帮助文档PDF格式-Python
- python pandas 手册
- tesseract-ocr以及中文包
- Python操作Word、EXCELACCESS
- Python爬虫教学PPT
- Python 自动化测试框架-pytest.pdf
- Python算法教程_中文版高清带书签
- 基于django的博客系统源码_python学习项
- 千峰凯哥python第4章 Tornado
- Hands-On Transfer Learning with Python
- 吴恩达机器学习课后作业python代码
- 最基础的Python入门课件和代码-整理
- 机器学习字母分类-python
- python Django 学生会管理系统.zip
- OpenCV3计算机视觉_Python语言实现 _刘波
- 自动化测试实战基于Python语言(虫师
- 灰帽python (Gray Hat Python) 中文版
- python3中文识别词库模型
- OpenCV3计算机视觉Python语言实现源代码
- 微博情感分析_python代码
- opencv_python‑3.4.3‑cp37‑cp37m‑win_amd
- 吴恩达机器学习编程作业python3版本
- Python Unix和Linux系统管理指南
- opencv_python-3.4.2-cp37-cp37m-win_amd64.whl
- Michael Nielsen 的《Neural Networks and Deep
- Python Data Science Handbook(英文pdf带目录
- python3实现RSA(非调用RSA库
- 《Python深度学习》2018中文版pdf+英文版
- Deep Learning With Python_中文版+英文版+代
- Deep Learning With Python_中文版+英文版+代
评论
共有 条评论