python3.x实现智联招聘网站岗位信息爬取

大小: 6KB

文件类型: .rar

金币: 2

下载: 1 次

发布日期: 2021-06-08
语言: Python
标签: python 爬虫 智联招聘

高速下载

资源简介

python3.x实现智联招聘网站岗位信息爬取，如果需要换工作的话，这个一定用的上

资源截图

小图大图

代码片段和文件信息

#-*- coding: utf-8 -*-
import re
import csv
import requests
from tqdm import tqdm
from urllib.parse import urlencode
from requests.exceptions import RequestException

def get_one_page（city keyword region page）:
    ‘‘‘
    获取网页html内容并返回
    ‘‘‘
    paras = {
        ‘jl‘: city         # 搜索城市
        ‘kw‘: keyword      # 搜索关键词 
        ‘isadv‘: 0         # 是否打开更详细搜索选项
        ‘isfilter‘: 1      # 是否对结果过滤
        ‘p‘: page          # 页数
        ‘re‘: region        # region的缩写，地区，2005代表海淀2061代表武汉
    }

    headers = {
        ‘User-Agent‘: ‘Mozilla/5.0 （Windows NT 10.0; WOW64） AppleWebKit/537.36 （KHTML like Gecko） Chrome/63.0.3239.132 Safari/537.36‘
        ‘Host‘: ‘sou.zhaopin.com‘
        ‘Referer‘: ‘https://www.zhaopin.com/‘
        ‘Accept‘: ‘text/htmlapplication/xhtml+xmlapplication/xml;q=0.9image/webpimage/apng*/*;q=0.8‘
        ‘Accept-Encoding‘: ‘gzip deflate br‘
        ‘Accept-Language‘: ‘zh-CNzh;q=0.9‘
    }

    url = ‘https://sou.zhaopin.com/jobs/searchresult.ashx?‘ + urlencode（paras）
    try:
        # 获取网页内容，返回html数据
        response = requests.get（url headers=headers）
        # 通过状态码判断是否获取成功
        if response.status_code == 200:
            return response.text
        return None
    except RequestException as e:
        return None

def parse_one_page（html）:
    ‘‘‘
    解析HTML代码，提取有用信息并返回
    ‘‘‘
    # 正则表达式进行解析
    pattern = re.compile（‘yle=.*? target=“_blank“>（.*?）.*?‘        # 匹配职位信息
        ‘（.*?）.*?‘     # 匹配公司网址和公司名称
        ‘（.*?）‘ re.S）                                # 匹配月薪      

    # 匹配所有符合条件的内容
    items = re.findall（pattern html）   

    for item in items:
        job_name = item[0]
        job_name = job_name.replace（‘‘ ‘‘）
        job_name = job_name.replace（‘‘ ‘‘）
        yield {
            ‘job‘: job_name
            ‘website‘: item[1]
            ‘company‘: item[2]
            ‘salary‘: item[3]
        }

def write_csv_file（path headers rows）:
    ‘‘‘
    将表头和行写入csv文件
    ‘‘‘
    # 加入encoding防止中文写入报错
    # newline参数防止每写入一行都多一个空行
    with open（path ‘a‘ encoding=‘gb18030‘ newline=‘‘） as f:
        f_csv = csv.DictWriter（f headers）
        f_csv.writeheader（）
        f_csv.writerows（rows）

def write_csv_headers（path headers）:
    ‘‘‘
    写入表头
    ‘‘‘
    with open（path ‘a‘ encoding=‘gb18030‘ newline=‘‘） as f:
        f_csv = csv.DictWriter（f headers）
        f_csv.writeheader（）

def write_csv_rows（path headers rows）:
    ‘‘‘
    写入行
    ‘‘‘
    with open（path ‘a‘ encoding=‘gb18030‘ newline=‘‘） as f:
        f_csv = csv.DictWriter（f headers）
        f_csv.writerows（rows）

def main（city keyword region pages）:
    ‘‘‘
    主函数
    ‘‘‘
    filename = ‘zl_‘ + city + ‘_‘ + keyword + ‘.csv‘
    headers = [‘job‘ ‘website‘ ‘company‘ ‘salary‘]
    write_csv_headers（filename headers）

属性            大小     日期    时间   名称
----------- ---------  ---------- -----  ----

     文件      14712  2018-06-16 21:18  智联招聘\zl_530_python工程师.csv

     文件      63708  2018-06-16 21:28  智联招聘\zl_武汉_python工程师.csv

     文件       3770  2018-06-16 21:32  智联招聘\智联招聘.py

     目录          0  2018-06-16 21:33  智联招聘

----------- ---------  ---------- -----  ----

                82190                    4

上一篇：tensorflow样例 BP神经网络
下一篇：弹球小游戏Python源代码

共有条评论

python3.x实现智联招聘网站岗位信息爬取

资源简介

资源截图

代码片段和文件信息

评论

相关资源