资源简介
利用selenium+pyquery对拉勾网进行爬取相应招聘信息,并且将爬取到的信息导入数据库mysql中,
代码片段和文件信息
import re
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from pyquery import PyQuery as pq
# from config import *
import pymysql
browser = webdriver.Chrome()
wait = WebDriverWait(browser 10)
key_word = ‘python爬虫‘
host=“localhost“
user=“root“
password=“******“
db=“lagou“
TableName=‘shenzhen‘
sitys = {‘beijing‘:‘1‘ ‘shanghai‘:‘2‘ ‘shenzhen‘:‘3‘ ‘guangzhou‘:‘4‘ ‘hangzhou‘:‘5‘‘chengdou‘:‘6‘ ‘nanjing‘:‘7‘ ‘wuhan‘:‘8‘‘xian‘:‘9‘ ‘xiamen‘:‘10‘}
key_sity = ‘guangzhou‘
def search():
try:
url = ‘https://www.lagou.com/‘
browser.get(url)
if wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR ‘#cboxClose‘))):
close_submit = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR ‘#cboxClose‘)))
close_submit.click()
input = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR ‘#search_input‘)))
submit = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR ‘#search_button‘)))
time.sleep(1)
input.clear()
input.send_keys(key_word)
submit.click()
city_select=wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR
‘#filterCollapse > div:nth-child(1) > div.choose-detail > li > div.other-hot-city > div > a:nth-child(%s)‘ %
sitys[key_sity])))
city_select.click()
total_page = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR ‘#s_position_list > div.item_con_pager > div > span:nth-child(5)‘)))
job_num = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR ‘#tab_pos > span‘)))
return total_page.textjob_num.text
except TimeoutError:
print(TimeoutError)
return search()
def get_html():
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR ‘#s_position_list .item_con_list .con_list_item‘)))
html = browser.page_source
return html
def next_page():
counter = 1
get_products()
pattern=re.compile(‘···.*?“pager_not_current“>(.*?)‘ re.S)
total_page = re.findall(pattern get_html())[0].strip()
try:
f
相关资源
- Python操作Excel表格并将其中部分数据写
- 12306火车班次.zip
- selenium中python包&对应版本的firefox
- Python爬虫相关书籍.zip
- 疫情数据爬虫并绘制柱状图.py
- python新浪微博爬虫,爬取微博和用户
- selenium最新版2.53.1---python
- 一套最新价值1680元的python爬虫实战全
- 测试工程师相关学习视频(包含pyth
- 11-Python爬虫工程师-App抓取进阶
- 法律判决文书python爬虫、以及数据处
- selenium3离线安装包whl和setup安装包.z
- 一些python自动化代码
- selenium python第三版基础入门教程
- Python爬虫、Flask框架与ECharts实现数据
- Python爬虫入门到实战 (二花) PDF版
- Python-PCA降维人脸识别,已包含yale数据
- 学习python爬虫看一篇就足够了之爬取
- pycharm编写的用户信息管理系统
- selenium webdriver (python)第三版.pdf
- 基于协同过滤的电影推荐系统 python
- 基于Python智联招聘牌爬虫+本科毕业论
- Python爬虫开源项目代码
- 《Python爬虫-开发与项目实战》源码
- 终极自动化测试环境搭建:Selenium+E
- Python测试驱动开发 使用Django、Seleni
- Python爬虫爬取智联招聘
- 《Selenium自动化测试:基于Python语言》
- Python爬虫入门:如何爬取招聘网站并
- python写一个商城网页服务器并且实现
评论
共有 条评论