python爬虫简单源码，附解释。

大小: 4KB

文件类型: .py

金币: 1

下载: 1 次

发布日期: 2021-06-18
语言: Python
标签: python 爬虫

高速下载

资源简介

自己练手的代码，爬虫小程序，爬一些鬼故事。。。自己练手的代码，爬虫小程序，爬一些鬼故事。。。自己练手的代码，爬虫小程序，爬一些鬼故事。。。自己练手的代码，爬虫小程序，爬一些鬼故事。。。

资源截图

小图大图

代码片段和文件信息

import requests
from lxml import etree
import pymysql
import time


class kunbubooks（object）:
    def __init__（self）:
        # self.URL = “http://www.bestgushi.com/o/kongbu/index.html“
        self.headers = {‘User-Agent‘:‘Mozilla/4.0 （compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; InfoPath.3）‘}

        # 创建数据库连接对象与游标对象
        # self.db = pymysql.connect（‘localhost‘
        #                             ‘root‘
        #                             ‘123456‘
        #                             ‘gushi‘
        #                             charset=‘utf8‘）
        # self.cursor = self.db.cursor（）



    def bookurllist（selfURL）:
        # 请求三步
        res = requests.get（URLheaders=self.headers）
        res.encoding = ‘gbk‘
        html = res.text

        parseHtml = etree.HTML（html）
        # 找出一页中所有故事的链接
        books = parseHtml.xpath（“//div[@class=‘gs‘]/h3/a/@href|//div[@class=‘gs yt‘]/h3/a/@href“）

        for b in books:
            self.book（b）   #循环遍历每个故事链接，调用并交给book函数

    def book（selfb）:
        res = requests.get（bheaders=self.headers）
        res.encoding = ‘gbk‘  #文中有些是gb2312解不出来的，所以要用到gbk
        html = res.text

        parseHtml = etree.HTML（html） 
        book = parseHtml.xpath（“//div[@id=‘zzzxcwqsdas‘]//p//text（）“） #得到故事内容列表
        bookname = parseHtml.xpath（“//div[@class=‘gushi‘]/h1/a/text（）“）  #得到故事名
        

        for x in range（250）:
            y = b[:-5] + ‘_‘ + str（x） + ‘.html‘ #拼接每个故事页面中的分页链接
            # 循环请求分页链接 
            res1 = requests

上一篇：回溯法之最小长度电路板排列问题.zip
下一篇：ALL_demos.rar贾老师python-OpenCV源码

共有条评论

python爬虫简单源码，附解释。

资源简介

资源截图

代码片段和文件信息

评论

相关资源