资源简介
此为python实现的基于网路爬虫的电影评论爬取和分析系统。其中包括源代码、完整文档。本系统主要由热门电影排名、影评内容词云、观众满意度饼图等模块组成。其中代码有bug(我去年可以运行,不知道今年为什么不可了呜呜呜),介意勿下载!!!

代码片段和文件信息
from urllib import request
headers={‘User-Agent‘:‘Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML like Gecko) Chrome/21.0.1180.89 Safari/537.1‘}
resp = request.urlopen(‘https://movie.douban.com/nowplaying/hangzhou/‘)
html_data = resp.read().decode(‘utf-8‘)
from bs4 import BeautifulSoup as bs
soup = bs(html_data ‘html.parser‘)
nowplaying_movie = soup.find_all(‘div‘ id=‘nowplaying‘)
nowplaying_movie_list = nowplaying_movie[0].find_all(‘li‘ class_=‘list-item‘)
nowplaying_list = []
for item in nowplaying_movie_list:
nowplaying_dict = {}
nowplaying_dict[‘id‘] = item[‘data-subject‘]
for tag_img_item in item.find_all(‘img‘):
nowplaying_dict[‘name‘] = tag_img_item[‘alt‘]
nowplaying_list.append(nowplaying_dict)
print(‘豆瓣排行榜中名列前茅的影片为:‘)
for i in range(len(nowplaying_list)):
print(‘NO.‘(i+1)‘\t‘nowplaying_list[i][‘name‘])
#print(nowplaying_list)
import requests
requrl = ‘https://movie.douban.com/subject/‘ + nowplaying_list[1][‘id‘] + ‘/comments‘ +‘?‘ +‘start=0‘ + ‘&limit=20‘
resp = requests.get(requrlheaders)
html_data = resp.text
soup = bs(html_data ‘html.parser‘)
comment_div_lits = soup.find_all(‘div‘ class_=‘comment‘)
#print(comment_div_lits)
eachAudiList=[]
for person in comment_div_lits:
b=person.find_all(‘a‘class_=‘‘)
eachAudiList.append(b[0].string)
#print(eachAudiList)
eachTimeList=[]
for time in comment_div_lits:
a=time.find_all(‘span‘class_=‘comment-time‘)
eachTimeList.append(a[0].text.split()[0])
#print(eachTimeList)
eachCommentList = []
for item in comment_div_lits:
i=item.find_all(‘p‘)[0].text
eachCommentList.append(i)
#print(eachCommentList)
comments = ‘‘
for k in range(len(eachCommentList)):
comments = comments + (str(eachCommentList[k])).strip()
#print(comments)
print(‘------------------以下为各路神仙的留言-----------------------------------------‘)
for i in range(len(eachCommentList)):
print(eachAudiList[i]+‘ 的留言为:‘)
print(eachCommentList[i])
print(‘\t\t\t‘eachTimeList[i])
from wordcloud import WordCloud
import jieba
import matplotlib.pyplot as plt
wordlist_after_jieba = jieba.cut(comments cut_all=True)
wl_space_split = “ “.join(wordlist_after_jieba)
my_wordcloud = WordCloud(background_color=“white“width=1000height=860 font_path=“font.ttf“).generate(wl_space_split)
plt.imshow(my_wordcloud)
plt.axis(“off“)
plt.show()
import requests
requrl = ‘https://movie.douban.com/subject/‘ + nowplaying_list[1][‘id‘] + ‘/‘+‘?‘+‘from=showing‘
resp = requests.get(requrl)
html_data = resp.text
soup = bs(html_data ‘html.parser‘)
assess=soup.find_all(‘div‘class_=‘ratings-on-weight‘)
#print(assess[0])
assess_dit={}
for ass in range(len(assess)):
x=assess[ass].find_all(‘div‘class_=‘item‘)
star=[]
percent=[]
for y in x:
z=y.find_all(‘span‘)
star.append(z[0].string.split
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
文件 5828044 2019-06-11 13:35 python程序设计\font.ttf
文件 3707 2020-04-01 16:53 python程序设计\python语言程序设计.py
文件 529408 2020-04-01 16:58 python程序设计\文档.doc
文件 211 2020-04-01 17:01 python程序设计\附录.txt
目录 0 2020-04-01 17:08 python程序设计\
相关资源
- 二级考试python试题12套(包括选择题和
- pywin32_python3.6_64位
- python+ selenium教程
- PycURL(Windows7/Win32)Python2.7安装包 P
- 英文原版-Scientific Computing with Python
- 7.图像风格迁移 基于深度学习 pyt
- 基于Python的学生管理系统
- A Byte of Python(简明Python教程)(第
- Python实例174946
- Python 人脸识别
- Python 人事管理系统
- 一个多线程智能爬虫,爬取网站小说
- 基于python-flask的个人博客系统
- 计算机视觉应用开发流程
- python 调用sftp断点续传文件
- python socket游戏
- 基于Python爬虫爬取天气预报信息
- python函数编程和讲解
- 顶点小说单本书爬虫.py
- Python开发的个人博客
- 基于python的三层神经网络模型搭建
- python实现自动操作windows应用
- python人脸识别(opencv)
- python 绘图(方形、线条、圆形)
- python疫情卡UN管控
- python 连连看小游戏源码
- 基于PyQt5的视频播放器设计
- 一个简单的python爬虫
- csv文件行列转换python实现代码
- Python操作Mysql教程手册
评论
共有 条评论