资源简介

使用python语言,通过爬虫技术,爬取qidian小说数据的源码。 python爬虫学习的必备技能,从这里开始。

资源截图

代码片段和文件信息

#encoding:utf-8
import urllib2
import sys
class HtmlDownLoader(object):
    type = sys.getfilesystemencoding();
    def download(self url):
        if url is None:
            return
        response=urllib2.urlopen(url)
        if response.getcode() != 200:
            return None
        data = response.read()
        return data

    #下载的是GBK编码,所以需要特殊处理
    def download_script(self url):
        if url is None:
            return
        response=urllib2.urlopen(url)
        if response.getcode() != 200:
            return None
        data = response.read()
        return data.decode(“GBK“);


“““
if __name__==“__main__“:
    #url = “http://read.qidian.com/BookReader/JtLeEdQdeLBQv4sKnwMhGg2.aspx“
    url=“http://read.qidian.com/BookReader/JtLeEdQdeLBQv4sKn

 属性            大小     日期    时间   名称
----------- ---------  ---------- -----  ----

     文件         13  2016-09-25 09:31  qidian_spider\.idea\.name

     文件        159  2016-09-25 09:31  qidian_spider\.idea\encodings.xml

     文件        693  2016-09-25 09:31  qidian_spider\.idea\misc.xml

     文件        401  2016-09-25 12:36  qidian_spider\.idea\modules.xml

     文件        451  2016-09-25 12:36  qidian_spider\.idea\qidian_spider.iml

     文件      42133  2016-09-28 16:50  qidian_spider\.idea\workspace.xml

     文件        965  2016-09-28 00:38  qidian_spider\novel_spider\html_downloader.py

     文件       1215  2016-09-28 16:42  qidian_spider\novel_spider\html_downloader.pyc

     文件        658  2016-09-25 09:16  qidian_spider\novel_spider\html_output.py

     文件       1423  2016-09-25 09:32  qidian_spider\novel_spider\html_output.pyc

     文件       9858  2016-09-28 00:45  qidian_spider\novel_spider\html_parser.py

     文件      10001  2016-09-28 16:42  qidian_spider\novel_spider\html_parser.pyc

     文件      41108  2016-09-28 16:42  qidian_spider\novel_spider\output.html

     文件       2121  2016-09-28 16:42  qidian_spider\novel_spider\spider_main.py

     文件        677  2016-09-25 09:00  qidian_spider\novel_spider\url_manager.py

     文件       1691  2016-09-25 09:32  qidian_spider\novel_spider\url_manager.pyc

     文件          0  2016-09-24 22:17  qidian_spider\novel_spider\_init_.py

     目录          0  2016-09-28 16:50  qidian_spider\.idea

     目录          0  2016-09-28 16:42  qidian_spider\novel_spider

     目录          0  2016-09-25 09:32  qidian_spider

----------- ---------  ---------- -----  ----

               113567                    20


评论

共有 条评论