资源简介
(1)打开豆瓣一部电影评论区,根据html结构捕获三个信息:
一,每账号的评分等级为5星、4星、3星、2星、1星;
二,每个账号的评论留言;
三,跳转到下个评论页面的http链接
(2)获取所有的信息后对信息进行处理:
一,计算出每个星级的总数和一共多少账户进行了评级
二、将所有的评论内容放在一起,处理评论中的空格和其他不规范形式
(3)用matplotlib绘制评分等级占比的饼图,用jieba进行分词处理,用wordcloud生成词云图
同个修改url=https://movie.douban.com/subject/26430636/comments?start=0&limit=20&sort=new_score&status=P&percent;_type=
之中“26430636”为电影的代表,将其换做其他的编号就可以读取和生成其他电影的matplotlib和wordcloud制作评分图和词云图
代码片段和文件信息
import requests
from bs4 import BeautifulSoup
import random
import matplotlib.pyplot as plt
import jieba
from wordcloud import WordCloud
import PIL
import numpy as np
agents = [
“Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:17.0; Baiduspider-ads) Gecko/17.0 Firefox/17.0“
“Mozilla/5.0 (Linux; U; Android 2.3.6; en-us; Nexus S Build/GRK39F) AppleWebKit/533.1 (KHTML like Gecko) Version/4.0 Mobile Safari/533.1“
“Avant Browser/1.2.789rel1 (http://www.avantbrowser.com)“
“Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/532.5 (KHTML like Gecko) Chrome/4.0.249.0 Safari/532.5“
“Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/532.9 (KHTML like Gecko) Chrome/5.0.310.0 Safari/532.9“
“Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.7 (KHTML like Gecko) Chrome/7.0.514.0 Safari/534.7“
“Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/534.14 (KHTML like Gecko) Chrome/9.0.601.0 Safari/534.14“
“Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.14 (KHTML like Gecko) Chrome/10.0.601.0 Safari/534.14“
“Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.20 (KHTML like Gecko) Chrome/11.0.672.2 Safari/534.20“
“Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.27 (KHTML like Gecko) Chrome/12.0.712.0 Safari/534.27“
“Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML like Gecko) Chrome/13.0.782.24 Safari/535.1“
“Mozilla/5.0 (Windows NT 6.0) AppleWebKit/535.2 (KHTML like Gecko) Chrome/15.0.874.120 Safari/535.2“
“Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.7 (KHTML like Gecko) Chrome/16.0.912.36 Safari/535.7“
“Mozilla/5.0 (Windows; U; Windows NT 6.0 x64; en-US; rv:1.9pre) Gecko/2008072421 Minefield/3.0.2pre“
“Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9b4) Gecko/2008030317 Firefox/3.0b4“
“Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10“
“Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)“
“Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6 GTB5“
“Mozilla/5.0 (Windows; U; Windows NT 5.1; tr; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 ( .NET CLR 3.5.30729; .NET4.0E)“
“Mozilla/5.0 (Windows; U; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; BIDUBrowser 7.6)“
“Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko“
“Mozilla/5.0 (Windows NT 6.3; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0“
“Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/45.0.2454.99 Safari/537.36“
“Mozilla/5.0 (Windows NT 6.3; Win64; x64; Trident/7.0; Touch; LCJB; rv:11.0) like Gecko“
“Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1“
“Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1“
“Mozill
相关资源
- 使用python对淘宝商品信息数据进行爬
- python爬虫Scrapy(一)-我爬了boss数据
- 豆瓣电影信息Python爬虫存入MongoDB.一分
- Python爬虫文件:爬取图片的程序.py
- 基于Python爬虫的股票信息爬取保存到
- Python爬虫每日抓取必应壁纸
- Python爬虫源码—爬取猫途鹰官方旅游
- python爬取百度搜索新闻,并自动生成
- python爬虫数据可视化分析大作业.zip
- python爬虫--爬取youtobe红人信息
- python简单爬虫抓取网页内容
- python 爬虫源码
- 81个Python爬虫源代码
- python爬虫程序python3.6版本
- python爬虫项目.zip
- Python爬虫教程+游戏+框架全套源码课件
- python爬虫爬取股票评论,调用百度A
- 2018年统计用区划代码和城乡划分代码
- Python爬虫抓取双色球开奖数据保存为
- Python爬虫爬取招聘数据和代码.zip
- Python爬虫爬取链家网,并进行可视化
- Python爬虫:爬取小说站(biqukan.com)网
- python爬虫-scrapy框架
- python爬虫爬取当当网
- PYTHON爬虫示例21345
- python爬虫(爬取新浪微博数据)
- python爬虫百度图片(将网络图片采集
- 新手python爬虫必学案例,爬取豆瓣电
- Python爬虫开发与项目实战-范传辉.pd
- 微信公众号的爬取 Selenium+Chromedriver
评论
共有 条评论