先看看效果图:
今天来演示一个,获取IP海的代理IP列表;
❝好了话不多说,直接上代码,代码上已经详细注释了;看代码即可! ❞
''睡眠延迟函数
Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Function 取得网页源码(Optional ByVal pages As Integer = 1) As String
On Error GoTo er:
Dim iurl As String: iurl = "https://www.kuaidaili.com/free/inha/" & pages
''读取网页源码
With CreateObject("WinHttp.WinHttpRequest.5.1") ''请求对象
.Open "GET", iurl, False ''请求参数
.send ''发送请求
''取得源码
strText = .responseText
取得网页源码 = strText
End With
Exit Function
er:
取得网页源码 = "查询出错啦:" & Err.Description
End Function
Sub 解析网页源码()
Dim sht As Worksheet: Set sht = Worksheets("IP地址池")
sht.Range("A1:AA65536").ClearContents
''测试取5页数据
For p = 1 To 5
''解析html
Dim xmldocstr As String: xmldocstr = 取得网页源码(p)
Dim HTMLDoc As Object, TDElements As Object
Set HTMLDoc = CreateObject("htmlfile")
''大致判断内容
If Len(xmldocstr) < 100 Then Exit Sub
HTMLDoc.body.innerhtml = xmldocstr
''定位html表格
Set TDElements = HTMLDoc.getElementById("list")
Dim infotb As Object
Set infotb = TDElements.Children(1)
''读取表头
Dim heads As Object: Set heads = infotb.Children(0).Children(0)
For j = 0 To heads.Cells.Length - 1
''数据表头写入表格
sht.Cells(1, j + 1) = heads.Children(j).innertext
DoEvents
Next
''读取内容
Dim Contents As Object: Set Contents = infotb.Children(1)
For i = 0 To Contents.Rows.Length - 1
Dim Content As Object: Set Content = Contents.Children(i)
''取得实际行数
Dim rw As Integer: rw = sht.Range("A65536").End(xlUp).Row
DoEvents
For k = 0 To Content.Cells.Length - 1
''数据内容写入表格
sht.Cells(rw + 1, k + 1) = Content.Children(k).innertext
DoEvents
Next
DoEvents
Next
Sleep 800 ''如果无法获取第二页内容,请把延迟秒数调大一点
DoEvents
Next
End Sub
有爬虫兴趣的同学,可以后台加微信或者群聊,这门一起探讨!!注意爬虫千万不要涉嫌隐私问题,最好遵循Robots协议
扫码关注腾讯云开发者
领取腾讯云代金券
Copyright © 2013 - 2025 Tencent Cloud. All Rights Reserved. 腾讯云 版权所有
深圳市腾讯计算机系统有限公司 ICP备案/许可证号:粤B2-20090059 深公网安备号 44030502008569
腾讯云计算(北京)有限责任公司 京ICP证150476号 | 京ICP备11018762号 | 京公网安备号11010802020287
Copyright © 2013 - 2025 Tencent Cloud.
All Rights Reserved. 腾讯云 版权所有