python关于小说的简易爬虫程序

大小: 3KB

文件类型: .py

金币: 2

下载: 1 次

发布日期: 2021-06-04
语言: Python
标签: python

高速下载

资源简介

关于小说的简易爬虫，仅供大家进行学习参考，有问题可以一块讨论

资源截图

小图大图

代码片段和文件信息

import requests
from bs4 import BeautifulSoup
import sys
import time
class download（object）:

    def __init__（self）:
        self.server = ‘https://www.biqukan.com‘
        self.target = ‘https://www.biqukan.com/1_1094/‘
        self.names = []
        self.nums = 0
        self.urls = []
        self.headers = { ‘Accept‘: ‘text/htmlapplication/xhtml+xmlapplication/xml;q=0.9*/*;q=0.8‘
            ‘Accept-Encoding‘: ‘gzip deflate br‘
            ‘Accept-Language‘: ‘zh-CNzh;q=0.8en-US;q=0.5en;q=0.3‘
            ‘Connection‘: ‘keep-alive‘
            ‘user-agent‘: ‘Mozilla/5.0 （Windows NT 6.3; WOW64） AppleWebKit/537.36 （KHTML like Gecko） Chrome/44.0.2403.157 Safari/537.36‘
            }
    “““
    函数说明：获取下载链接
    Parameter：
        无
    Return：
        无
    Modify：
        2018-12-08
    “““
    def get_download_url（self）:
        req = requests.get（self.targetheaders = self.headers）
        html = req.text
        div_bf = BeautifulSoup（html‘html5lib‘）
        div = div_bf.find_all（‘div‘class_=‘listmain‘）
        a_bf = BeautifulSoup（str（div[0]）‘html5lib‘）
        a = a_bf.find_all（‘a‘）
        self.nums = len（a[15:]）
        for each in a[15:]:
            if each.string==“正文“ or each.string==“正文卷“:
                continue
            self.names.append（each.string）
            self.urls.append（self.server+each.get（‘href‘））
    “““
    函数说明：获取章节内容
    Parmeters：
        target - 下载链接（string）
    Returns：
        texts - 章节内容（string）
    Modify：
        2018-12-08

上一篇：python pcap模块WIN32 64位版本
下一篇：python百度地图商家爬虫

共有条评论

python关于小说的简易爬虫程序

资源简介

资源截图

代码片段和文件信息

评论

相关资源