基于python的招聘网站分析源码及爬取数据.zip

大小: 4.98MB

文件类型: .zip

金币: 2

下载: 0 次

发布日期: 2023-11-10
语言: Python
标签: python 招聘项目

高速下载

资源简介

通过使用招聘网站的体验，发现对现在IT市场主流人才和技术需求缺乏宏观的掌握。
通过运用python爬虫技术，爬取大型主流招聘网站关于大数据人才的需求，并通过后台分析，最终以玫瑰图，漏斗图，地图的形式展示当下市场主要需求。

资源截图

小图大图

代码片段和文件信息

import urllib.request
import xlwt              #使用xlwt模块写入Excel文件
import re                #正则表达式
import urllib.parse      #parse模块的作用：url的解析，合并，编码，解码
import time              #时间模块
#模拟浏览器
header={
    ‘Host‘:‘search.51job.com‘
    ‘Upgrade-Insecure-Requests‘:‘1‘
    ‘User-Agent‘:‘Mozilla/5.0 （Linux; Android 6.0; Nexus 5 Build/MRA58N） AppleWebKit/537.36 （KHTML like Gecko） Chrome/75.0.3770.80 Mobile Safari/537.36‘
}
def getfront（pageitem）:       #page是页数，item是输入的字符串
     result = urllib.parse.quote（item）					#先把字符串转成十六进制编码
     ur1 = result+‘2‘+ str（page）+‘.html‘
     ur2 = ‘https://search.51job.com/list/000000000000000000999‘
     res = ur2+ur1															#拼接网址
     a = urllib.request.urlopen（res）
     html = a.read（）.decode（‘gbk‘）          # 读取源代码并转为unicode
     return html
def getInformation（html）:
    #compile 函数用于编译正则表达式，生成一个正则表达式（ Pattern ）对象，供 match（） 和 search（） 这两个函数使用。
    reg = re.compile（r‘class=“t1 “>.*? tle=“（.*?）“ href=“（.*?）“.*? tle=“（.*?）“ href=“（.*?）“.*?（.*?）.*?（.*?）.*?（.*?）.*?‘re.S）#匹配换行符
    items=re.findall（reghtml）
    return items
#新建表格空间
excel1 = xlwt.Workbook（）
#新建一个sheet设置单元格格式cell_overwrite_ok=True防止对一个单元格重复操作引发的错误
sheet1 = excel1.add_sheet（‘Job‘ cell_overwrite_ok=True）
sheet1.write（0 0 ‘序号‘）
sheet1.write（0 1 ‘职位‘）
sheet1.write（0 2 ‘公司名称‘）
sheet1.write（0 3 ‘公司地点‘）
sheet1.write（0 4 ‘公司性质‘）
sheet1.write（0 5 ‘薪资‘）
sheet1.write（0 6 ‘学历要求‘）
sheet1.write（0 7 ‘工作经验‘）
sheet1.write（0 8 ‘公司规模‘）
sheet1.write（0 9 ‘公司类型‘）
sheet1.write（0 10‘公司福利‘）
sheet1.write（0 11‘发布时间‘）

number = 1    #保存到excel中第几条数据
item = input（‘请输入岗位关键词（与大数据相关职业）：‘）
for j in range（110000）:   #页数自己随便改
    try:
        print（“正在爬取第“+str（j）+“页数据...“）
        html = getfront（jitem）      #调用获取网页原码
        for i in getInformation（html）:
            try:
                ‘‘‘
                i[0]：职位
                i[1]：职位网址
                i[2]：公司名称
                i[4]：公司地点
                i[5]：薪资
                i[6]：发布时间
                company[0][0]：公司性质
                job_need[2][0]：学历要求
                job_need[1][0]：工作经验
                company[0][1]：公司规模
                company[0][2]：公司类型
                welfare：公司福利
                ‘‘‘
                url1 = i[1]          #职位网址
                res1 = urllib.request.urlopen（url1）.read（）.decode（‘gbk‘）
                company = re.findall（re.compile（r‘.*?tle=“（.*?）“>.*?tle=“（.*?）“>.*?tle=“（.*?）“>.*?‘re.S）res1）
                job_need = re.findall（re.compile（r‘.*?  |  （.*?）  |  （.*?）  |  .*?
‘re.S）res1）
                welfare = re.findall（re.compile（r‘


 属性            大小     日期    时间   名称
----------- ---------  ---------- -----  ----
     文件      752640  2020-01-08 17:48  51job2.xls
     文件        4677  2020-01-07 16:03  51job_view.py
     文件        6796  2020-01-06 14:49  51job_view2.py
     文件         646  2020-01-03 20:57  chowbai_error_info.txt
     文件     3671828  2020-01-08 17:48  大数据城市需求分布图.html
     文件     3649615  2020-01-08 17:48  学历要求饼图.html
     文件     3647659  2020-01-08 17:48  工作经验要求漏斗图.html
     文件     3109888  2020-01-08 17:10  51job.xls

 

						
							
								
									上一篇：人工智能算法合集-python实现 
									下一篇：gurobi学习手册，很好用的自定义机器学习数学建模编程语言 
								
							
							
								
									挑错
									打印

 
			 
			  
			   
			   
			   
				相关资源 
				 
				  
				  
						  Python-BDD100K大规模多样化驾驶视频数据   
					 
 
						  Instant Pygame for Python Game Development How   
					 
 
						  Biopython Tutorial   
					 
 
						  Think Python 2nd   
					 
 
						  一个小小的表白程序（python）   
					 
 
						  Python课堂笔记（高淇400集第一季）   
					 
 
						  二级考试python试题12套（包括选择题和   
					 
 
						  pywin32_python3.6_64位   
					 
 
						  python+ selenium教程   
					 
 
						  PycURL（Windows7/Win32）Python2.7安装包 P   
					 
 
						  英文原版-Scientific Computing with Python    
					 
 
						  7.图像风格迁移   基于深度学习  pyt   
					 
 
						  基于Python的学生管理系统   
					 
 
						  A Byte of Python（简明Python教程）（第   
					 
 
						  Python实例174946   
					 
 
						  Python 人脸识别   
					 
 
						  Python 人事管理系统   
					 
 
						  15个pyqt5项目   
					 
 
						  基于python-flask的个人博客系统   
					 
 
						  计算机视觉应用开发流程   
					 
 
						  python 调用sftp断点续传文件   
					 
 
						  python socket游戏   
					 
 
						  基于Python爬虫爬取天气预报信息   
					 
 
						  python函数编程和讲解   
					 
 
						  Python开发的个人博客   
					 
 
						  基于python的三层神经网络模型搭建   
					 
 
						  python实现自动操作windows应用   
					 
 
						  python人脸识别（opencv）   
					 
 
						  python 绘图（方形、线条、圆形）   
					 
 
						  python疫情卡UN管控