图虫网爬虫python实现

大小: 2KB

文件类型: .py

金币: 1

下载: 0 次

发布日期: 2021-06-02
语言: Python
标签: Python爬虫

高速下载

资源简介

Python爬虫实现对图虫网相关图片的在线爬取，只需要填写相应的主题名即可自动下载资源至相应目录

资源截图

小图大图

代码片段和文件信息

import urllib.request
import urllib.parse
import os
import time
import json

def url_open（url）:
    headers = （“User-Agent““Mozilla/5.0 （Windows NT 10.0; Win64; x64） AppleWebKit/537.36 （KHTML like Gecko） Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393“）

    req = urllib.request.Request（url）
    req.add_header（headers[0]headers[1]）
    response = urllib.request.urlopen（req）
    html = response.read（）

    return html
    
def get_pages（url count）:

    pages = []
    
    html = url_open（url）.decode（‘utf-8‘）
    target = json.loads（html）
    for mytag in target[“postList“]:
        tag1 = mytag[‘site_id‘]
        tag2 = mytag[‘post_id‘]
        tag = tag1 + ‘/‘ + tag2
        pages.append（tag）

    return pages

def find_imgs（url）:
    html = url_open（url）.decode（‘utf-8‘）
    img_addrs = []

    a = html.find（‘class=“multi-photo-image“ src=“‘）
    while a != -1:
        b = html.find（‘.jpg‘ a a+255）

        if b != -1:
            img_addrs.append（html[a+31 : b+4]）
        else:
            b = a + 31

共有条评论

图虫网爬虫python实现

资源简介

资源截图

代码片段和文件信息

评论

相关资源