realweibo.py

大小: 3KB

文件类型: .py

金币: 2

下载: 1 次

发布日期: 2021-06-06
语言: Python
标签: 爬虫 python 微博

高速下载

资源简介

给定任意关键词，能够通过新浪微博搜索，爬取相关微博的博主id，微博正文，转发数，评论数，点赞数，发布时间

资源截图

小图大图

代码片段和文件信息

from urllib.parse import urlencode
import requests
from pyquery import PyQuery as pq
import os
import re
import xlwt
import pandas as pd

current_Path = os.path.dirname（os.path.abspath（__file__）） + ‘\\‘


base_url = ‘https://s.weibo.com/‘

headers = {
    ‘Host‘:‘m.weibo.cn‘
    ‘Refer‘:‘https://weibo.com/zzk1996?is_all=1‘
    ‘User-Agent‘: ‘Mozilla/5.0 （Windows NT 10.0; Win64; x64） AppleWebKit/537.36 （KHTML like Gecko） Chrome/80.0.3987.87 Safari/537.36 Edg/80.0.361.48‘
}

#搜索
def get_Research（research_Wordspage）:
    params = {
        ‘q‘: research_Words
        ‘Refer‘: ‘index‘
        ‘page‘: str（page）
    }
    url = ‘https://s.weibo.com/weibo?‘ + urlencode（params）
    #print（url）
    # print（urlencode（params））

    try:
        response = requests.get（url）
        if response.status_code == 200:
            return response.text
    except requests.Connectionerror:
        return None


def get_Information（research_Wordspage）:
    res = []
    html = get_Research（research_Wordspage）
    doc = pq（html）
    #print（doc）
    with open（current_Path + ‘test.txt‘‘w+‘encoding = ‘utf8‘） as f:
        f.write（html）
    # items = doc（“.content“）.items（）
    items = doc（“div[class=‘card‘]“）.items（）
    
    for li in items:
        temp_Info_Dict = {}
        
        ###抽取昵称
        info = li.find（‘div‘）（‘.name‘）
        nick_Name = info.attr（‘nick-name‘）
        temp_Info_Dict[‘博主id‘] = nick_Name
        ###抽取内容
        # text = li（‘.txt‘）
        text = li（“p[node-type=‘feed_list_content_full‘]>a“）
        temp_Info_Dict[‘微博正文‘] = text.text（）
        if temp_Info_Dict[‘微博正文‘] == ‘‘:

上一篇：算法的python实现代码、测试数据集及结果
下一篇：bow python实现

共有条评论

realweibo.py

资源简介

资源截图

代码片段和文件信息

评论

相关资源