资源简介
用于爬取人人贷网站信息,在之前的代码基础上进行了新的更新
代码片段和文件信息
# -*- coding: utf-8 -*-
“““
Created on Mon Aug 13 11:10:39 2018
@author: 95647
“““
from selenium import webdriver
import time
import json
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from selenium.webdriver.firefox.options import Options #headless browser login ini
import requests
from pandas import Dataframe
import threading
time_start = time.clock()
#driver = webdriver.PhantomJS(executable_path=r‘‘‘C:\Users\95647\Desktop\小工具\phantomjs-2.1.1-windows\bin\phantomjs.exe‘‘‘)
headers = {‘User-Agent‘:‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/55.0.2883.87 Safari/537.36‘}
#headers according to push F12 in browser
#defin a def to annalys of website
#use the login information you have signed
username = “******“ #username
password = u“*****“ #password
# driver = webdriver.Firefox()
# use headless browser to log
options = Options()
options.add_argument(‘-headless‘)
driver = webdriver.Firefox(firefox_options=options) #use headless firefox to login in
def LoginRRD(username password):
try:
print(u‘ready loging renrendai website...‘)
driver.get(“https://www.renrendai.com/login“)
login_in_pswd = driver.find_element_by_class_name(“tab-password“) #点击密码登录
login_in_pswd.click()
time.sleep(2)
driver.find_element_by_id(“login_username“).send_keys(username)
time.sleep(0.5)
driver.find_element_by_id(“J_pass_input“).send_keys(password)
time.sleep(0.5)
driver.find_element_by_xpath(r“““/html/body/div[2]/div/div/div[2]/div[2]/div/div[1]/button“““).click()
time.sleep(2) #设置等待几秒,以进入用户主界面,如不等待而直接进入爬虫会提示未登录
print(u‘login successful!‘)
except Exception as e:
print(“Error:“ e)
finally:
print(u‘End Login!\n‘)
loanid_e =[]
def parse_userinfo(loanididx): #defin def to analysis borrower informations
# global login_status
global loanid_e
login_status =False
urll=“https://www.renrendai.com/loan-%s.html“%str(loanid)
driver.get(urll)
html = BeautifulSoup(driver.page_source‘lxml‘)
# f= open(“htm%s.txt“%idx“w“)
# f.write(html.decode(“utf-8“).replace(‘\xa9‘“@“))
# f.close
info = html.findAll(‘div‘ class_=“loan-user-info“) # 这个地方的命名经常修改
try:
userinfo = {}
items = info[0].findAll(‘span‘{“class“:“pr20“})
except:
loanid_e.append(loanid)
else:
for item in items:
var = item.get_text()
value = item.parent.text.replace(var““)
userinfo[var]=value
data = pd.Dataframe(userinfoi
- 上一篇:华为挑战赛装箱问题解决
- 下一篇:Head First python 第二版源代码
相关资源
- python实现SGBM图像匹配算法
- python实现灰度直方图均衡化
- scrapy_qunar_one
- Python学习全系列教程永久可用
- python简明教程.chm
- 抽奖大转盘python的图形化界面
- 双边滤波器实验报告及代码python
- python +MYSQL+HTML实现21蛋糕网上商城
- Python-直播答题助手自动检测出题搜索
- OpenCV入门教程+OpenCV官方教程中文版
- Python 串口工具源码+.exe文件
- Python开发的全栈股票系统.zip
- Python操作Excel表格并将其中部分数据写
- python书籍 PDF
- 利用python绘制散点图
- python+labview+No1.vi
- 老男孩python项目实战
- python源码制作whl文件.rar
- python3.5可用的scipy
- PYTHON3 经典50案例.pptx
- 计算机科学导论-python.pdf
- python模拟鼠标点击屏幕
- windows鼠标自动点击py脚本
- 鱼c小甲鱼零基础学python全套课后题和
- Python 练习题100道
- Practical Programming 2nd Edition
- wxPython Application Development Cookbook
- python 3.6
- Python 3.5.2 中文文档 互联网唯一CHM版本
- python3.5.2.chm官方文档
评论
共有 条评论