资源简介
Python爬虫,爬取136书屋的小说beautifulsoup4.py
使用beautifulsoup4包进行html和xml的解析,使用urllib打开和操作网址
使用前请先安装beautifulsoup4和urllib包,本示例使用的是Python2.7
代码片段和文件信息
#coding=utf-8
from urllib import URLopener
from bs4 import BeautifulSoup as BS
import os
import sys
if __name__ == ‘__main__‘:
Bfolder = r“D:\LILUO\6.MyTools\12.beautifulsoup4\books“
url = “http://www.136book.com/“
html = URLopener().open(url)
soup = BS(html.read() “html.parser“)
a = soup.find_all(name=‘a‘)
BookDict = {}
for each in a:
if “http://www.136book.com/“ in each.get(‘href‘):
if each.get(‘title‘):
BookDict[each.get(‘href‘)] = each.get(‘title‘)
html.close()
for burl in BookDict:
#burl = “http://www.136book.com/zetianji/“
bhtml = URLopener().open(burl)
bsoup = BS(bhtml.read() “html.parser“)
ba = bsoup.find_all(name=‘a‘)
path = Bfol
- 上一篇:openmv识别特定颜色且打印坐标到串口
- 下一篇:Python 凸包算法
评论
共有 条评论