资源简介

scrapy爬取cnblog博客园文章列表保存到本地数据库。这个是本人最近学习爬虫的一个实践案例,源码解析详情请移步博文:https://blog.csdn.net/xiaocy66/article/details/83834261

资源截图

代码片段和文件信息

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.html

import scrapy


class CnblogItem(scrapy.Item):
    # 头像
    avatar = scrapy.Field()

    # 标题
    title = scrapy.Field()

    # 分类文本
    category = scrapy.Field()

    # 分类id
    category_id = scrapy.Field()

    # 渠道
    channel = scrapy.Field()

    # 创建时间
    created_time = scrapy.Field()

    # 当前id
    cur_id = scrapy.Field()

    # 用户名
    user_name = scrapy.Field()

    # 作者昵称
    nickname = scrapy.Field()

   # 封面图url
    logo_url = scrapy.Field()

    # 用户详情url
    user_url = scrapy.Field()

    # 展示时间
    showtime = scrapy.Field()

    # 展示时间,比如2018年8月、19小时前、2天前
    show_datetime = scrapy.Field()
    
    # 来源
    source_from = s

 属性            大小     日期    时间   名称
----------- ---------  ---------- -----  ----
     目录           0  2018-11-03 20:43  cnblog\
     目录           0  2018-11-03 20:45  cnblog\cnblog\
     目录           0  2018-11-03 20:45  cnblog\cnblog\db\
     文件        2268  2018-11-03 20:45  cnblog\cnblog\db\dbhelper.py
     文件        1686  2018-11-03 20:45  cnblog\cnblog\db\init.sql
     文件         161  2018-11-03 20:45  cnblog\cnblog\db\__init__.py
     目录           0  2018-11-03 21:23  cnblog\cnblog\db\__pycache__\
     文件        2202  2018-11-03 21:23  cnblog\cnblog\db\__pycache__\dbhelper.cpython-36.pyc
     文件         144  2018-11-03 21:23  cnblog\cnblog\db\__pycache__\__init__.cpython-36.pyc
     文件        1265  2018-11-03 23:29  cnblog\cnblog\items.py
     文件        3597  2018-11-03 20:43  cnblog\cnblog\middlewares.py
     文件         488  2018-11-03 20:45  cnblog\cnblog\pipelines.py
     文件        3442  2018-11-03 21:25  cnblog\cnblog\settings.py
     目录           0  2018-11-03 20:44  cnblog\cnblog\spiders\
     文件        2980  2018-11-03 22:41  cnblog\cnblog\spiders\cnblogspider.py
     文件         161  2018-07-12 05:14  cnblog\cnblog\spiders\__init__.py
     目录           0  2018-11-04 21:36  cnblog\cnblog\spiders\__pycache__\
     文件        2598  2018-11-04 21:36  cnblog\cnblog\spiders\__pycache__\cnblogspider.cpython-36.pyc
     文件         149  2018-11-03 20:44  cnblog\cnblog\spiders\__pycache__\__init__.cpython-36.pyc
     文件           0  2018-07-12 05:14  cnblog\cnblog\__init__.py
     目录           0  2018-11-04 21:36  cnblog\cnblog\__pycache__\
     文件         738  2018-11-04 21:36  cnblog\cnblog\__pycache__\items.cpython-36.pyc
     文件         833  2018-11-03 21:23  cnblog\cnblog\__pycache__\pipelines.cpython-36.pyc
     文件         444  2018-11-03 21:25  cnblog\cnblog\__pycache__\settings.cpython-36.pyc
     文件         141  2018-11-03 20:44  cnblog\cnblog\__pycache__\__init__.cpython-36.pyc
     文件         255  2018-11-03 20:43  cnblog\scrapy.cfg

评论

共有 条评论