Python爬取企查查数据.pdf

文档编号：18631285
上传时间：2023-08-23
格式：PDF
页数：3
大小：229.31KB

Python爬取企查查数据.pdf

《Python爬取企查查数据.pdf》由会员分享，可在线阅读，更多相关《Python爬取企查查数据.pdf（3页珍藏版）》请在冰点文库上搜索。

Python爬取企查查数据.pdf

Python爬取企查查数据由于作需要，爬取企查查数据，在前的基础上做了修改，可以爬全部的数据。

代码如下：

#-*-coding-8-*-importrequestsimportlxmlfrombs4importBeautifulSoupimportxlwtdefcraw（url,key_word）:

User_Agent=Mozilla/5.0（WindowsNT6.1;Win64;x64;rv:

56.0）Gecko/20100101Firefox/56.0headers=Host:

Accept:

text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8,Connection:

keep-alive,User-Agent:

rMozilla/5.0（WindowsNT6.1;Win64;x64;rv:

56.0）Gecko/20100101Firefox/56.0,Cache-Control:

max-age=0,Accept-Language:

zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3,Accept-Encoding:

gzip,deflate,Referer:

http:

/response.status_code!

=200:

response.encoding=utf-8print（response.status_code）print（ERROR）soup=BeautifulSoup（response.text,lxml）#print（soup）com_names=soup.find_all（class_=ma_h1）#获取公司名称#print（com_names）#com_name1=com_names1.get_text（）#print（com_name1）peo_names=soup.find_all（class_=a-blue）#公司法#print（peo_names）peo_phones=soup.find_all（class_=m-t-xs）#公司号码#tags=peo_phones4.find（text=True）.strip（）#print（tags）#tttt=peo_phones0.contents5.get_text（）#print（tttt）#else_comtent=peo_phones0.find（class_=m-l）#else_comtent=peo_phones0.find（class_=m-l）#print（else_comtent）#peo_emails=soup.find_all（class_=m-1）globalcom_name_listglobalpeo_name_listglobalpeo_phone_listglobalcom_place_listglobalzhuceziben_listglobalchenglishijian_listglobalemail_listprint（开始爬取数据，请勿打开excel）foriinrange（0,len（com_names）:

n=1+3*im=i+2*（i+1）try:

peo_phone=peo_phonesn.find（text=True）.strip（）com_place=peo_phonesm.find（text=True）.strip（）zhuceziben=peo_phones3*i.find（class_=m-l）.get_text（）chenglishijian=peo_phones3*i.contents5.get_text（）email=peo_phonesn.contents1.get_text（）#print（email,email）peo_phone_list.append（peo_phone）com_place_list.append（com_place）zhuceziben_list.append（zhuceziben）chenglishijian_list.append（chenglishijian）email_list.append（email）exceptException:

print（exception）forcom_name,peo_nameinzip（com_names,peo_names）:

com_name=com_name.get_text（）peo_name=peo_name.get_text（）com_name_list.append（com_name）peo_name_list.append（peo_name）if_name_=_main_:

com_name_list=peo_name_list=peo_phone_list=com_place_list=zhuceziben_list=chenglishijian_list=email_list=key_word=input（请输您想搜索的关键词：

）print（正在搜索，请稍后）forxinrange（400,500）:

ifx=1:

url=rhttp:

/sheet1=workbook.add_sheet（xlwt,cell_overwrite_ok=True）#-设置excel样式-#初始化样式style=xlwt.XFStyle（）#创建字体样式font=xlwt.Font（）font.name=TimesNewRomanfont.bold=True#加粗#设置字体style.font=font#使样式写数据#sheet.write（0,1,xxxxx,style）print（正在存储数据，请勿打开excel）#向sheet中写数据name_list=公司名字,法定代表,联系式,注册资本,成时间,公司地址,公司邮件forccinrange（0,len（name_list）:

sheet1.write（0,cc,name_listcc,style）foriinrange（0,len（com_name_list）:

sheet1.write（i+1,0,com_name_listi,style）#公司名字sheet1.write（i+1,1,peo_name_listi,style）#法定代表sheet1.write（i+1,2,peo_phone_listi,style）#联系式sheet1.write（i+1,3,zhuceziben_listi,style）#注册资本sheet1.write（i+1,4,chenglishijian_listi,style）#成时间sheet1.write（i+1,5,com_place_listi,style）#公司地址sheet1.write（i+1,6,email_listi,style）#邮件地址#保存excel件，有同名的直接覆盖workbook.save（rE:

test.xls）print（theexcelsavesuccess）代码执结果如下：