python新浪微博爬虫，爬取微博和用户信息（含源码及）

大小: 111KB

文件类型: .rar

金币: 2

下载: 1 次

发布日期: 2023-11-27
语言: Python
标签: python 新浪爬虫 新浪微博 selenium 源码

高速下载

资源简介

这是新浪微博爬虫，采用python+selenium实现。免费资源，希望对你有所帮助，虽然是傻瓜式爬虫，但是至少能运行。同时rar中包括源码及爬取的示例。参考我的文章： http://blog.csdn.net/eastmount/article/details/50720436 [python爬虫] Selenium爬取新浪微博内容及用户信息 http://blog.csdn.net/eastmount/article/details/51231852 [Python爬虫] Selenium爬取新浪微博客户端用户信息、热点话题及评论 (上) 主要爬取内容包括：新浪微博手机端用户信息和微博信息。用户信息：包括用户ID、用户名、微博数、粉丝数、关注数等。微博信息：包括转发或原创、点赞数、转发数、评论数、发布时间、微博内容等。安装过程： 1.先安装Python环境，作者是Python 2.7.8 2.再安装PIP或者easy_install 3.通过命令pip install selenium安装selenium，它是自动测试、爬虫的工具 4.然后修改代码中的用户名和密码，填写你自己的用户名和密码 5.运行程序，自动调用Firefox浏览器登陆微博注意：手机端信息更加精致简单，而且没有动态加载的一些限制，但是如微博或粉丝id只显示20页，这是它的缺点；而客户端可能存在动态加载，如评论和微博，但是它的信息更加完整。 [源码] 爬取移动端微博信息 spider_selenium_sina_content.py 输入：明星用户id列表，采用URL+用户id进行访问（这些id可以从一个用户的关注列表里面获取） SinaWeibo_List_best_1.txt 输出：微博信息及用户基本信息 SinaWeibo_Info_best_1.txt Megry_Result_Best.py 该文件用户整理某一天的用户微博信息，如2016年4月23日 [源码] 爬取客户端微博信息爬取客户端信息，但是评论是动态加载，还在研究中 weibo_spider2.py By:Eastmount 2016-04-24

资源截图

小图大图

代码片段和文件信息

# coding=utf-8

“““  
Created on 2016-04-24 @author: Eastmount

功能: 爬取新浪微博用户的信息及微博评论
网址：http://weibo.cn/ 数据量更小 相对http://weibo.com/

“““    

import time            
import re            
import os    
import sys  
import codecs  
import shutil
import urllib 
from selenium import webdriver        
from selenium.webdriver.common.keys import Keys        
import selenium.webdriver.support.ui as ui        
from selenium.webdriver.common.action_chains import ActionChains



#先调用无界面浏览器PhantomJS或Firefox    
#driver = webdriver.PhantomJS（executable_path=“G:\phantomjs-1.9.1-windows\phantomjs.exe“）    
driver = webdriver.Firefox（）
wait = ui.WebDriverWait（driver10）


#全局变量 文件操作读写信息
inforead = codecs.open（“SinaWeibo_List_best_1.txt“ ‘r‘ ‘utf-8‘）
infofile = codecs.open（“SinaWeibo_Info_best_1.txt“ ‘a‘ ‘utf-8‘）


#********************************************************************************
#                            第一步: 登陆weibo.cn 
#        该方法针对weibo.cn有效（明文形式传输数据） weibo.com见学弟设置POST和Header方法
#                LoginWeibo（username password） 参数用户名 密码
#********************************************************************************

def LoginWeibo（username password）:
    try:
        #输入用户名/密码登录
        print u‘准备登陆Weibo.cn网站...‘
        driver.get（“http://login.sina.com.cn/“）
        elem_user = driver.find_element_by_name（“username“）
        elem_user.send_keys（username） #用户名
        elem_pwd = driver.find_element_by_name（“password“）
        elem_pwd.send_keys（password）  #密码
        #elem_rem = driver.find_element_by_name（“safe_login“）
        #elem_rem.click（）             #安全登录

        #重点: 暂停时间输入验证码（http://login.weibo.cn/login/ 手机端需要）
        time.sleep（20）
        
        #elem_sub = driver.find_element_by_xpath（“//input[@class=‘smb_btn‘]“）
        #elem_sub.click（）              #点击登陆 因无name属性
        elem_pwd.send_keys（Keys.RETURN）
        time.sleep（2）
        
        #获取Coockie 推荐资料：http://www.cnblogs.com/fnng/p/3269450.html
        print driver.current_url
        print driver.get_cookies（）  #获得cookie信息 dict存储
        print u‘输出Cookie键值对信息:‘
        for cookie in driver.get_cookies（）: 
            #print cookie
            for key in cookie:
                print key cookie[key]
                    
        #driver.get_cookies（）类型list 仅包含一个元素cookie类型dict
        print u‘登陆成功...‘
        
        
    except Exceptione:      
        print “Error: “e
    finally:    
        print u‘End LoginWeibo!\n\n‘


#********************************************************************************
#                  第二步: 访问个人页面http://weibo.cn/5824697471并获取信息
#                                VisitPersonPage（）
#        编码常见错误 UnicodeEncodeError: ‘ascii‘ codec can‘t encode characters 
#********************************************************************************

def VisitPersonPage（user_id）:

    try:
        global infofile       #全局文件变量
        url = “http:/

属性            大小     日期    时间   名称
----------- ---------  ---------- -----  ----

     文件       5628  2016-04-24 20:31  新浪微博爬虫\[源码] 爬取客户端微博信息\SinaWeibo_Info_best_1.txt

     文件         27  2016-04-24 03:45  新浪微博爬虫\[源码] 爬取客户端微博信息\SinaWeibo_List_best_1.txt

     文件       8119  2016-04-24 20:31  新浪微博爬虫\[源码] 爬取客户端微博信息\weibo_spider2.py

     文件      17680  2016-04-24 21:18  新浪微博爬虫\[源码] 爬取移动端个人信息关注id和粉丝id （速度慢）\SinaWeibo_Info_1.txt

     文件         50  2016-04-24 21:17  新浪微博爬虫\[源码] 爬取移动端个人信息关注id和粉丝id （速度慢）\SinaWeibo_List_1.txt

     文件      14884  2016-04-24 21:19  新浪微博爬虫\[源码] 爬取移动端个人信息关注id和粉丝id （速度慢）\spider_selenium_sina_info_other_userid_all.py

     文件      13386  2016-04-24 20:55  新浪微博爬虫\[源码] 爬取移动端微博信息（强推）\2016-04-23\20160423_SinaWeibo_Num_Best.txt

     文件       1595  2016-04-24 20:55  新浪微博爬虫\[源码] 爬取移动端微博信息（强推）\2016-04-23\Megry_Result_Best.py

     文件     237289  2016-04-24 20:52  新浪微博爬虫\[源码] 爬取移动端微博信息（强推）\2016-04-23\SinaWeibo_Info_best_1.txt

     文件        189  2016-04-24 20:46  新浪微博爬虫\[源码] 爬取移动端微博信息（强推）\2016-04-23\SinaWeibo_List_best_1.txt

     文件      12115  2016-04-24 20:54  新浪微博爬虫\[源码] 爬取移动端微博信息（强推）\2016-04-23\spider_selenium_sina_content.py

     文件        840  2016-04-24 21:02  新浪微博爬虫\运行配置过程.txt

     目录          0  2016-04-24 20:55  新浪微博爬虫\[源码] 爬取移动端微博信息（强推）\2016-04-23

     目录          0  2016-04-24 20:42  新浪微博爬虫\[源码] 爬取客户端微博信息

     目录          0  2016-04-24 21:18  新浪微博爬虫\[源码] 爬取移动端个人信息关注id和粉丝id （速度慢）

     目录          0  2016-04-24 20:46  新浪微博爬虫\[源码] 爬取移动端微博信息（强推）

     目录          0  2016-04-24 21:18  新浪微博爬虫

----------- ---------  ---------- -----  ----

               311802                    17

共有条评论

python新浪微博爬虫，爬取微博和用户信息 （含源码及）

资源简介

资源截图

代码片段和文件信息

评论

相关资源

python新浪微博爬虫，爬取微博和用户信息（含源码及）