Python 抓取网页下载链接
# Find gudaiyanqing xiaoshuo on http://www.bookben.com #
# -*- coding: utf-8 -*-
import time
import urllib.request
from bs4 import BeautifulSoup
num = 0
web = “http://m.bookben.com“
url = “http://m.bookben.com/gudaiyanqing“
result = “######Get the update of novel website on “ + url + “\n“ + “\n“
date_mark = time.strftime(‘%Y-%m-%d‘time.localtime(time.time()))
time_mark = time.strftime(‘%Y-%m-%d-%H-%M-%S‘time.localtime(time.time()))
#Get the update of bookben.com website
main_page = urllib.request.urlopen(url).read().decode(‘gb2312‘errors=‘replace‘)
main_soup = BeautifulSoup(main_page “lxml“)
main_classes = main_soup.findAll(‘li‘ class_=‘li_bg‘)
for main_links in main_classes:
for main_link in main_links.find_all(‘a‘):
novel_name = main_link.get_text()
novel_url = web + main_link.get(‘href‘)
num = num + 1
print(str(num) + novel_name + novel_url)
novel_page = urllib.request.urlopen(novel_url).read().decode(‘gb2312‘errors=‘replace‘)
novel_soup = BeautifulSoup(novel_page “lxml“)
novel_date = novel_sou
- 上一篇:libvsm_3.1
- 下一篇:k均值聚类python实现
- 二级考试python试题12套(包括选择题和
- pywin32_python3.6_64位
- python+ selenium教程
- PycURL(Windows7/Win32)Python2.7安装包 P
- 英文原版-Scientific Computing with Python
- 7.图像风格迁移 基于深度学习 pyt
- 基于Python的学生管理系统
- A Byte of Python(简明Python教程)(第
- Python实例174946
- Python 人脸识别
- Python 人事管理系统
- 基于python-flask的个人博客系统
- 计算机视觉应用开发流程
- python 调用sftp断点续传文件
- python socket游戏
- 基于Python爬虫爬取天气预报信息
- python函数编程和讲解
- Python开发的个人博客
- 基于python的三层神经网络模型搭建
- python实现自动操作windows应用
- python人脸识别(opencv)
- python 绘图(方形、线条、圆形)
- python疫情卡UN管控
- python 连连看小游戏源码
- 基于PyQt5的视频播放器设计
- 一个简单的python爬虫
- csv文件行列转换python实现代码
- Python操作Mysql教程手册
- Python Machine Learning Case Studies
- python获取硬件信息
共有 条评论