资源简介
利用selenium编写的批量下载google学术脚本,使用前请先下载配置firefox相关webdriver
代码片段和文件信息
from selenium import webdriver
import time
import requests
import re
import sys
reload(sys)
sys.setdefaultencoding(‘utf-8‘)
def getpdfurl(tstr):
# repdflink = ‘href=\“(.+?pdf)\“‘
repdflink = ‘href=\“([\S]+pdf)‘
pdflink = re.findall(repdflinktstr)
pdflink = list(set(pdflink))
return pdflink
def writefile(filepathtext):
file_object = open(filepath‘w‘)
file_object.write(text)
file_object.close()
def googlescholararticle(qwordrecordnum):
driver = webdriver.Firefox()
googlescholarurl =‘https://scholar.google.com/scholar‘
url =googlescholarurl+‘?start=‘+recordnum+‘&q=‘+qword
driver.get(url)
data = driver.page_source
driver.quit()
# writefile(‘res.txt‘‘\n‘.join(getpdfurl(data)))
return getpd
评论
共有 条评论