资源简介
采用爬虫(XPATH)进行高校专家信息(批量,某学院或学院为单位)检索,并将专家姓名,研究方向,邮箱等信息保存在.csv

代码片段和文件信息
import requests
import re
from lxml import etree
import json
import csv
#url = ‘http://zsy.jlu.edu.cn/info/1021/8364.htm‘ # jiayujiao1
csvfile =open(“吉林哲学.csv““w“newline=‘‘encoding=‘utf-8‘)
writer = csv.writer(csvfile)
#先写入columns_name
writer.writerow([“姓名““研究方向““邮箱“])
wangzhi_list = [‘info/1022/5251.htm‘ ‘info/1023/5199.htm‘ ‘info/1021/5191.htm‘ ‘info/1020/5093.htm‘ ‘info/1021/5261.htm‘ ‘info/1022/5171.htm‘ ‘info/1020/5092.htm‘ ‘info/1021/5259.htm‘ ‘info/1022/5242.htm‘ ‘info/1020/5061.htm‘ ‘info/1020/5105.htm‘ ‘info/1020/5103.htm‘ ‘info/1020/5098.htm‘ ‘info/1020/5100.htm‘ ‘info/1022/5265.htm‘ ‘info/1022/5264.htm‘ ‘info/1023/5201.htm‘ ‘info/1021/5189.htm‘ ‘info/1022/5184.htm‘ ‘info/1022/5268.htm‘ ‘info/1022/5269.htm‘ ‘info/1022/5267.htm‘ ‘info/1022/5270.htm‘ ‘info/1021/5204.htm‘ ‘info/1022/5196.htm‘ ‘info/1021/7131.htm‘ ‘info/1023/5271.htm‘ ‘info/1022/5198.htm‘ ‘info/1023/5182.htm‘ ‘info/1021/5173.htm‘ ‘info/1020/5106.htm‘ ‘info/1021/7580.htm‘ ‘info/1020/5091.htm‘ ‘info/1023/5200.htm‘ ‘info/1021/8364.htm‘ ‘info/1022/5197.htm‘ ‘info/1020/5099.htm‘ ‘info/1020/5104.htm‘ ‘info/1022/5266.htm‘ ‘info/1021/5260.htm‘ ‘info/1020/5080.htm‘ ‘info/1023/5202.htm‘ ‘info/1022/5195.htm‘ ‘info/1021/5193.htm‘ ‘info/1023/5177.htm‘ ‘info/1020/5058.htm‘ ‘info/1020/5079.htm‘ ‘info/1020/5169.htm‘ ‘info/1021/5186.htm‘ ‘info/1023/5172.htm‘ ‘info/1023/7348.htm‘ ‘info/1023/5174.htm‘ ‘info/1020/5053.htm‘ ‘info/1023/7344.htm‘ ‘info/1020/5095.htm‘ ‘info/1021/5187.htm‘ ‘info/1020/5090.htm‘ ‘info/1020/5097.htm‘ ‘info/1023/5203.htm‘ ‘info/1021/5181.htm‘ ‘info/1020/5096.htm‘ ‘info/1020/5101.htm‘ ‘info/1020/5081.htm‘ ‘info/1021/5188.htm‘ ‘info/1020/5082.htm‘ ‘info/1020/5258.htm‘ ‘info/1020/5088.htm‘ ‘info/1020/5085.htm‘ ‘info/1021/5178.htm‘ ‘info/1020/5089.htm‘ ‘info/1020/5059.htm‘ ‘info/1020/5062.htm‘ ‘info/1021/5194.htm‘ ‘info/1020/5094.htm‘ ‘info/1020/5077.htm‘ ‘info/1020/5087.htm‘ ‘info/1020/7448.htm‘ ‘info/1021/5190.htm‘ ‘info/1020/5108.htm‘ ‘info/1020/5107.htm‘ ‘info/1021/5263.htm‘ ‘info/1021/5262.htm‘ ‘info/1020/5056.htm‘ ‘info/1021/5179.htm‘ ‘info/1020/5066.htm‘ ‘info/1020/5170.htm‘ ‘info/1020/5102.htm‘ ‘info/1020/8365.htm‘ ‘info/1020/5074.htm‘ ‘info/1020/5180.htm‘ ‘info/1020/5183.htm‘]
for wangzhi in wangzhi_list:
url = ‘http://zsy.jlu.edu.cn/‘+wangzhi
#url = ‘http://zsy.jlu.edu.cn/info/1020/5061.htm‘
# url = ‘https://teachers.jlu.edu.cn/pyjslb.jsp?totalpage=11&PAGENUM=‘+str(i)+‘&urltype=tsites.PinYinTeacherList&wbtreeid=1001&py=‘+k+‘&lang=zh_CN‘
headers = {
‘User-Agent‘: ‘Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/73.0.3683.103 Safari/537.36 ‘
}
response = requests.get(url headers=headers verify=False)
data = response.content.decode()
xpath_data = etree.HTML(data)
links = xpath_data.xpath(‘//span/text()‘)
print(links)
try:
fenduan
属性 大小 日期 时间 名称
----------- --------- ---------- ----- ----
文件 8858 2019-05-11 12:27 爬虫代码实现\吉林哲学.csv
文件 5290 2019-05-11 12:27 爬虫代码实现\研究方向.py
文件 727 2019-05-11 11:34 爬虫代码实现\网址.py
文件 143 2019-05-15 17:34 爬虫代码实现\说明.txt
目录 0 2019-05-15 17:22 爬虫代码实现
----------- --------- ---------- ----- ----
15018 5
相关资源
- Pythonamp;课堂amp;笔记(高淇amp;400;集第
- Python中Numpy库最新教程
- 用python编写的移动彩信的发送程序
- Python全栈学习笔记面向对象大作业:
- Xpath生成器,自动生成可用的Xpath。
- python实现的ftp自动上传、下载脚本
- Python版的A*寻路算法
-
xm
l课件及例题(xm l) - IronPython IDE
- 很好用的网站前端页面爬取工具
- pip-10.0.1.tar.gz
- Data Science from Scratch 2nd Edition
- shape_predictor_68_face_landmarks.dat.bz2 68个标
- 爬取豆瓣电影TOP250程序,包含非常详
- Web Scraper 爬虫 网页抓取 Chrome插件
- 中文维基百科语料库百度网盘网址.
- MSCNN_dehaze.rar
- 爬取豆瓣排行榜电影数据(含GUI界面
- 字典文本资源
- 爬取新浪微博上的妹纸照片的爬虫程
- 中国行政区划到村总.txt
- Brainfuck / OoK 解码脚本
- 案例实战信用卡欺诈检测数据集
- Lucene(搜索)Demo
- 招商策略_抱团启示录那些年我们一起
- 机械主题爬虫的设计与实现
- sip-4.19.zip
- 树莓派3b+学习使用教程
- 模拟登录一些常见的网站
- numpy 中文学习手册
评论
共有 条评论