资源简介
untitled2.py
代码片段和文件信息
# -*- coding: utf-8 -*-
“““
Created on Mon Aug 20 12:23:53 2018
@author: linzhenglai
“““
# -*- coding: utf-8 -*-
“““
Created on Mon Aug 20 11:11:18 2018
@author: linzhenglai
“““
# -*- coding: utf-8 -*-
“““
Spyder Editor
This is a temporary script file.
“““
# -*- coding: utf-8 -*-
“““
Spyder Editor
This is a temporary script file.
“““
import time
import requests
#爬取网页时直接出现403,意思是没有访问权限
from bs4 import BeautifulSoup
import json
#入口网页
import xlwt
url=‘http://chs.meituan.com/‘
headers={
‘Accept‘:‘text/htmlapplication/xhtml+xmlapplication/xml;q=0.9image/webpimage/apng*/*;q=0.8‘
‘Accept-Language‘:‘zh-CNzh;q=0.9‘
‘Cache-Control‘:‘max-age=0‘
‘Proxy-Connection‘:‘keep-alive‘
‘Host‘:‘chs.meituan.com‘
‘Referer‘:‘http://chs.meituan.com/‘
‘Upgrade-Insecure-Requests‘:‘1‘
‘User-Agent‘:‘Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/67.0.3396.99 Safari/537.36‘
‘Content-Type‘:‘text/html;charset=utf-8‘
‘Cookie‘: ‘_lxsdk_cuid=164c9bed44ac8-0bf488e0cbc5d9-5b193413-1fa400-164c9bed44bc8; __mta=248363576.1532393090021.1532393090021.1532393090021.1; rvct=70%2C1; ci=70; iuuid=30CB504DBAC7CCDD72645E3809496C48229D8143D427C01A5532A4DDB0D42388; cityname=%E9%95%BF%E6%B2%99; _lxsdk=30CB504DBAC7CCDD72645E3809496C48229D8143D427C01A5532A4DDB0D42388; _ga=GA1.2.1889738019.1532505689; uuid=2b2adb1787947dbe0888.1534733150.0.0.0; oc=d4TCN9aIiRPd6Py96Y94AGxfsjATZHPGsCDua9-Z_NQHsXDcp6WlG2x7iJpYzpSLttNvEucwm_D_SuJ7VRJkLcjqV6Nk8s_q3VyOJw5IsVJ6RJPL3qCgybGW3vxTkMHr9A4yChReTafbZ7f93F1PkCyUeFBQV4D-YXoVoFV5h3o; _lx_utm=utm_source%3DBaidu%26utm_medium%3Dorganic; client-id=97664882-24cd-4743-b21c-d25de878708e; lat=28.189822; lng=112.97422; _lxsdk_s=165553df04a-bc8-311-ba7%7C%7C6‘
}
#with open(r‘美团西安美食.csv‘“w“ newline=‘‘encoding=‘UTF-8‘) as csvfile
#获取主页源码(获取分类-美食、电影)
#def get_start_links(url):
#html=requests.get(url).text#发送请求获取主页文本
#print html
#soup=BeautifulSoup(html‘lxml‘)#解析网页
#==$
#
# #links=‘http://chs.meituan.com/meishi/
# links=[link.find(‘span‘).find(‘a‘)[‘href‘] for link in soup.find_all(‘span‘class_=‘nav-text-wrapper‘)]
#print links
#return links
#获取分类链接中间的店铺id
#find:取第一个(返回一个具体的元素,没有为null)find_all:匹配所有(返回列表,没有返回[])
def get_detail_id(category_url):
html=requests.get(category_urlheaders=headers).text
#print html
soup=BeautifulSoup(html‘lxml‘)#解析网页
#print soup
texts=soup.find_all(‘script‘)
#print texts
text=texts[14].get_text().strip()
#print text
text=text[19:-1]
result1=json.loads(textencoding=‘utf-8‘)
#print result1
result2=result1[‘poiLists‘]
result3=result2[‘poiInfos‘]
#print result3
Info_list=[]
for it in result3:
#print it
Info_list.append(it[‘poiId‘]
- 上一篇:python带基因元胞自动机代码
- 下一篇:支持向量机进行数字识别
相关资源
- 疯狂Python讲义习题答案.rar
- 传智播客Python就业班2018.zip
- prepro.py
- python视频教程.txt
- python网盘.txt
- Python学习路线Python课程大纲Python视频
- 实现火车票查询python.zip
- python.txt
- Violent-Python-Source.zip
- Python学习教程哈工大、嵩天.txt
- python视频资料.zip
- python从入门到精通视频全60集百度云链
- 小甲鱼零基础入门学习Python全套源码
- vgg_easy.py
- scel2txt.py
- 小甲鱼Python零基础免费全套视频教学
- 传智播客python最新视频.txt
- Python升级3.6强力Django杀手级Xadmin打造
- shuake.py
- 51Job.py
- python3网络爬虫与开发实战崔庆才PDF百
- python300G视频书籍教程.zip
- 5652华尔街见闻.py
- hulk.py
- 麦子学院Python视频.txt
- python核心基础.txt
- 仿真3D版本.py
- Python零基础10天进阶班.rar
- 用Python自动办公,做职场高手.txt
- 基于python实现的http接口自动化测试框
评论
共有 条评论