Python淘宝评论爬取

大小: 4KB

文件类型: .py

金币: 1

下载: 0 次

发布日期: 2021-05-13
语言: Python
标签: 评论爬取 Python

高速下载

资源简介

自己写的基于Python的淘宝评论爬取，并得到商品的图片

资源截图

小图大图

代码片段和文件信息

import requests
import re
import io
import sys
import os
import urllib.request
import time

sys.stdout = io.TextIOWrapper（sys.stdout.bufferencoding=‘gb18030‘）
headers={“User-Agent“:“Mozilla/4.0（compatible;MSIE7.0;WindowsNT5.1;Trident/4.0;SE2.XmetaSr1.0;SE2.XmetaSr1.0;.NETCLR2.0.50727;SE2.XmetaSr1.0）“}
def getHTMLText（url）:
#得到商品的html
	try:
		r = requests.get（urlheaders=headerstimeout=30）
		return r.text
	except:
		return ““


def parsePage（url）:
#得到商品的list
	infoList=[];
	html=getHTMLText（url）
	#print（html）
	plt = re.findall（‘view_price“:“（[\s\S]*?）“‘ html）
	tlt = re.findall（‘raw_title“:“（[\s\S]*?）“‘ html）
	clt = re.findall（‘view_sales“:“（[\s\S]*?）“‘html）
	ilt = re.findall（‘nid“:“（[\s\S]*?）“‘html）
	photolt=re.findall（‘pic_url“:“（[\s\S]*?）“‘html）
	for i in range（len（plt））:
		price =plt[i]
		title = tlt[i]
		customer = clt[i]
		id = ilt[i]
		photo=photolt[i]
		infoList.append（[pricetitlecustomeridphoto]）
	printGoodsList（infoList）
	
def GetComment（goods_filenameurlitemId）:
#得到某个商品评论放入商品的文件夹中
	web_data=requests.get（urlheaders=headers）;
	goods_filename=goods_filename.replace（“?“““）.replace（“、“““）.replace（“\\“““）.replace（“*“““）.replace（“““““）.replace（“”“““）.replace（“<“““）.replace（“>“““）.replace（“|“““）.replace（‘/‘‘‘）
	itemId=itemId
	#print（web_data.text）
	spuId=re.search（‘spuId=（[0-9]*）‘web_data.text）.group（1）;
	sellerId=re.search（‘sellerId=（[0-9]*）‘web_data.text）.group（1）
	#得到店家Id商品Id商铺
	time.sleep（3）;
	comment_url=str（“https://rate.tmall.com/list_detail_rate.htm?itemId=“+itemId+“&spuId=“+spuId+“&sellerId=“+sellerId+“&order=3¤tPage=1&append=0&content=1“）
	web_data=requests.get（comment_urlheaders=headers）;
	f=open（‘E:\\淘宝爬取内容\\‘+goods_filename+“/pinglun.txt““w“）
	#print（web_data.text）
	try:
		comment_num=re.search（‘lastPage“:（[0-9]*）‘web_data.text）.group（1）
		#得到评论的页数
		if（int（comment_num）>3）:
			for n in range（13）:
				comment_url=str（“https://rate.tmall.com/list_detail_rate.htm?itemId=“+itemId+“&spuId=“+spuId+“&sellerId=“+sellerId+“&order=3¤

上一篇：py新浪微博爬虫通过修改最后的uid值即可爬取某些用户的博文评论等
下一篇：KMeans python 代码

共有条评论

Python淘宝评论爬取

资源简介

资源截图

代码片段和文件信息

评论

相关资源