资源简介
利用selenium+pyquery对拉勾网进行爬取相应招聘信息,并且将爬取到的信息导入数据库mysql中,
代码片段和文件信息
import re
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from pyquery import PyQuery as pq
# from config import *
import pymysql
browser = webdriver.Chrome()
wait = WebDriverWait(browser 10)
key_word = ‘python爬虫‘
host=“localhost“
user=“root“
password=“******“
db=“lagou“
TableName=‘shenzhen‘
sitys = {‘beijing‘:‘1‘ ‘shanghai‘:‘2‘ ‘shenzhen‘:‘3‘ ‘guangzhou‘:‘4‘ ‘hangzhou‘:‘5‘‘chengdou‘:‘6‘ ‘nanjing‘:‘7‘ ‘wuhan‘:‘8‘‘xian‘:‘9‘ ‘xiamen‘:‘10‘}
key_sity = ‘guangzhou‘
def search():
try:
url = ‘https://www.lagou.com/‘
browser.get(url)
if wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR ‘#cboxClose‘))):
close_submit = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR ‘#cboxClose‘)))
close_submit.click()
input = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR ‘#search_input‘)))
submit = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR ‘#search_button‘)))
time.sleep(1)
input.clear()
input.send_keys(key_word)
submit.click()
city_select=wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR
‘#filterCollapse > div:nth-child(1) > div.choose-detail > li > div.other-hot-city > div > a:nth-child(%s)‘ %
sitys[key_sity])))
city_select.click()
total_page = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR ‘#s_position_list > div.item_con_pager > div > span:nth-child(5)‘)))
job_num = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR ‘#tab_pos > span‘)))
return total_page.textjob_num.text
except TimeoutError:
print(TimeoutError)
return search()
def get_html():
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR ‘#s_position_list .item_con_list .con_list_item‘)))
html = browser.page_source
return html
def next_page():
counter = 1
get_products()
pattern=re.compile(‘···.*?“pager_not_current“>(.*?)‘ re.S)
total_page = re.findall(pattern get_html())[0].strip()
try:
f
相关资源
- python+ selenium教程
- 基于Python爬虫爬取天气预报信息
- 一个简单的python爬虫
- Django+MySql增删改查入门案例(附数据
- Python爬虫数据分析可视化
- 小说阅读项目源码(附数据库脚本)
- 北邮python爬虫学堂在线
- python爬虫爬取微博热搜
- jd抢茅台(基于selenium.webdriver)
- python爬虫爬取旅游信息(附源码,c
- python爬虫爬取豆瓣电影信息
- Python爬虫实战入门教程
- 模拟自动滑块验证码.py(基于chromed
- Python操作Excel表格并将其中部分数据写
- 12306火车班次.zip
- selenium中python包&对应版本的firefox
- Python爬虫相关书籍.zip
- 疫情数据爬虫并绘制柱状图.py
- python新浪微博爬虫,爬取微博和用户
- selenium最新版2.53.1---python
- 一套最新价值1680元的python爬虫实战全
- 测试工程师相关学习视频(包含pyth
- 11-Python爬虫工程师-App抓取进阶
- 法律判决文书python爬虫、以及数据处
- selenium3离线安装包whl和setup安装包.z
- 一些python自动化代码
- selenium python第三版基础入门教程
- Python爬虫、Flask框架与ECharts实现数据
- Python爬虫入门到实战 (二花) PDF版
- Python-PCA降维人脸识别,已包含yale数据
评论
共有 条评论