资源简介
爬取豆瓣电视剧天盛长歌影评,并去掉其中的停止词,生成词云,
代码片段和文件信息
#coding=utf-8
import requests
from lxml import etree
import random
import pymysql
import jieba.analyse
import re
# from scipy.misc import imread
# from wordcloud import WordCloud
# from wordcloud import ImageColorGenerator
# import matplotlib.pyplot as plt
# from os import path
from PIL import ImageImageSequence
import numpy as np
import matplotlib.pyplot as plt
from wordcloud import WordCloudImageColorGenerator
def geturl(urlIP_pools):
USER_AGENTS = [
“Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML like Gecko) Chrome/62.0.3202.94 Safari/537.36“
“Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/68.0.3440.106 Safari/537.36“
“Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:61.0) Gecko/20100101 Firefox/61.0“
“Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/55.0.2883.87 Safari/537.36“
]
Agent_Value = random.choice(USER_AGENTS)
headers = {
“User-Agent“:Agent_Value
“Host“: “movie.douban.com“
“Accept“:“text/htmlapplication/xhtml+xmlapplication/xml;q=0.9image/webpimage/apng*/*;q=0.8“
}
try:
ip_one = random.choice(IP_pools)
print(ip_one)
proxies1 = {‘http‘: “http://“ + ip_one}
print(url)
r = requests.get(url=url headers=headers proxies=proxies1 timeout=5)
print(r.status_code)
assert r.status_code == 200
return etree.HTML(r.content)
except:
try:
ip_one = random.choice(IP_pools)
print(ip_one)
proxies1 = {‘http‘: “http://“ + ip_one}
print(url)
r = requests.get(url=url headers=headers proxies=proxies1 timeout=5)
print(r.status_code)
assert r.status_code == 200
return etree.HTML(r.content)
except:
try:
ip_one = random.choice(IP_pools)
print(ip_one)
proxies1 = {‘http‘: “http://“ + ip_one}
print(url)
r = requests.get(url=url headers=headers proxies=proxies1 timeout=5)
print(r.status_code)
assert r.status_code == 200
return etree.HTML(r.content)
except:
print(“**“*20+“出现错误!“+“**“*20)
def get_IP():
con = pymysql.connect(host=‘192.168.0.136‘ user=‘root‘ passwd=‘oysm=K8cV6eldcv‘ db=‘lh‘ port=3306
charset=‘utf8‘)
if con:
print(“ok“)
cur = con.cursor()
if cur:
sql_read = “select IPport from ip_pool where score = %s “
cur.execute(sql_read “T“)
con.commit()
lines = cur.fetchall()
a_list = []
for i in lines:
li = i[0] + “:“ + i[1]
# print(li)
a_lis
相关资源
- Python爬虫相关书籍.zip
- 疫情数据爬虫并绘制柱状图.py
- python新浪微博爬虫,爬取微博和用户
- 一套最新价值1680元的python爬虫实战全
- 11-Python爬虫工程师-App抓取进阶
- 法律判决文书python爬虫、以及数据处
- Python爬虫、Flask框架与ECharts实现数据
- Python爬虫入门到实战 (二花) PDF版
- 学习python爬虫看一篇就足够了之爬取
- 基于Python智联招聘牌爬虫+本科毕业论
- Python爬虫开源项目代码
- 《Python爬虫-开发与项目实战》源码
- Python爬虫爬取智联招聘
- Python爬虫入门:如何爬取招聘网站并
- 基于selenium模拟天眼查登录并爬取企业
- python爬虫爬取杭州市幼儿园信息
- 《零基础:21天搞定Python分布爬虫》课
- python爬虫爬取豆瓣评分数据
- Python爬虫教学PPT
- Python爬虫开发与项目实战.mobi
- (一)python爬虫验证码识别去除干扰
- 全套从零开始学Python网络爬虫教学以
- python爬虫项目
- Python爬虫抓取东方财富网股票数据并
- Python爬虫自学书籍入门到精通PDF.7z
- Python爬虫开发与项目源代码
- python知网爬虫
- 完整python项目,python爬虫 爬取今日头
- 从零开始学Python网络爬虫源代码+教学
- Python爬虫项目-12306票务查询
评论
共有 条评论