资源简介
Python 抓取网页下载链接
代码片段和文件信息
########################################################
# Find gudaiyanqing xiaoshuo on http://www.bookben.com #
########################################################
# -*- coding: utf-8 -*-
import time
import urllib.request
from bs4 import BeautifulSoup
num = 0
web = “http://m.bookben.com“
url = “http://m.bookben.com/gudaiyanqing“
result = “######Get the update of novel website on “ + url + “\n“ + “\n“
date_mark = time.strftime(‘%Y-%m-%d‘time.localtime(time.time()))
time_mark = time.strftime(‘%Y-%m-%d-%H-%M-%S‘time.localtime(time.time()))
#Get the update of bookben.com website
main_page = urllib.request.urlopen(url).read().decode(‘gb2312‘errors=‘replace‘)
main_soup = BeautifulSoup(main_page “lxml“)
main_classes = main_soup.findAll(‘li‘ class_=‘li_bg‘)
for main_links in main_classes:
for main_link in main_links.find_all(‘a‘):
novel_name = main_link.get_text()
novel_url = web + main_link.get(‘href‘)
num = num + 1
print(str(num) + novel_name + novel_url)
novel_page = urllib.request.urlopen(novel_url).read().decode(‘gb2312‘errors=‘replace‘)
novel_soup = BeautifulSoup(novel_page “lxml“)
novel_date = novel_sou
- 上一篇:libvsm_3.1
- 下一篇:k均值聚类python实现
相关资源
- k均值聚类python实现
- python实现VRPTW求解禁忌搜索算法
- python3.4爬取网络图片
- python实现可视域算法
- Python爬取东方财富公司公告
- python大作业
- Python二级考试试题
- Python实现朴素贝叶斯算法文本分类器
- 树莓派避障小车.py
- Apartment_Manager.py
- python编程相关的161本书内含解压密码
-
wxPython实现fr
ame界面的跳转 - 模式识别ISODATA算法
- Python - 截取指定帧数间隔指定大小的
- Graph Cut图像分割算法——Python+Opencv实
- 球面拟合-基于最小二乘法
- python+pyqt5显示图像,播放视频,绘图
- karmarkar.py
- python keylogger键盘记录源码
- Python基础训练100题(带答案).docx
- 燕大《Python机器学习》实验报告 .do
- 正向云模型发生器python
- 鱼C小甲鱼零基础入门学习Python视频课
- Python爬虫爬取51Job职位数据
- BP神经网络_Python实习_包含鸢尾花分类
- Python调用CAD生成多边形骨料
- python爬取豆瓣电影源码+报告.zip
- 利用python爬虫爬取王者荣耀数据.py
- Fiona-1.8.6-cp37-cp37m-win_amd64.whl
- FP_Growth算法python实现.rar.rar
评论
共有 条评论