2013年11月2日星期六

Extract Specific Link From Web Pages Using Excel VBA

A website visitor wants help on how to automate the extraction of a specific hyper-link from web pages using Excel VBA
'Hello Dr. Takyar,

 It was pleasure talking to you the other day.  Hope you and your friends/relatives were safe from the recent storm.

 Here is the task that I would like to automate with Excel VBA and seeking help on.  I liked your code for extracting table data from multiple web pages and creating and pasting  it in respective worksheets - http://www.youtube.com/watch?v=qbOdUaf4yfI However my task is a bit different which is the following:

 1)  There are several (100s or 1000s) web-links in column A.
 2)  A connection is established with a web-link's webpage.
 3)  On the connected webpage, the first HREF link that contains the text "About" is copied and it's web-link is  pasted in the adjacent cell in column B.  If "About" HREF
 link is not found on connected webpage, then the cell is populated with "NOT FOUND".
 4)  The connection is closed and steps 1 - 3 are repeated until there are no more web-links in column A.

 Some examples we can try with are the following web-links in
 column A:

 Column A                                Column B

 1)  https://www.google.com/       https://www.google.com/intl/en/about/
 2)  https://www.facebook.com/    https://www.facebook.com/facebook
 3)  http://www.youtube.com/       http://www.youtube.com/yt/about/
 4)  http://www.yahoo.com/          http://info.yahoo.com/
 5)  http://www.wikipedia.org/       NOT FOUND
 6)  http://www.linkedin.com/       http://www.linkedin.com/about-us

 Note:
 1)  When a respective "About" HREF link is copied from the webpage of a web-link in Column A, and the same is pasted in
 Column B, the web-link of the "About" HREF link is displayed in Column B.
 2)  "About" HREF links on webpages can be one of several variations like - About, About Us, About "Company_Name", etc.  Whatever it is, we want to capture the first HREF link that contains the text - "About".
 Please feel free to email me with any questions or if you need me to call you.

 best wishes,

 Romi Ghose'

Here is the complete VBA with proper remarks:
Sub GetAboutUsLinks()
'First define all the variables
Dim ie As Object 'Internet Explorer
Dim html As Object ' HTML document
Dim myLinks As Object ' Links collection
Dim myLink As Object  'Single Link
Dim result As String
Dim myURL As String  'Web Links on worksheet
Dim LastRow As Integer ' VBA execution should stop here

Set ie = CreateObject("InternetExplorer.Application")

LastRow = Sheet1.Cells(Rows.Count, "A").End(xlUp).Row
'Loop through all the web links on the worksheet one by one and then do some things
For i = 2 To LastRow
'Get the link from the worksheet and assign it to the variable
myURL = Sheet1.Cells(i, 1).Value
'Now go to the website
ie.navigate myURL
'Keep the internet explorer visible
ie.Visible = True
'Ensure that the web page has downloaded completely
While ie.readyState <> 4
DoEvents
Wend
'Get the data from the web page that is in the links and assign it to the variable
result = ie.document.body.innerHTML
'create a new html file
Set html = CreateObject("htmlfile")
'now place all the data extracted from the web page into the new html document
html.body.innerHTML = result

Set myLinks = html.getElementsByTagName("a")
'loop through the collected links and get a specific link defined by the conditions
For Each myLink In myLinks
If Right$(myLink, 13) = "about-us.html" Or Right$(myLink, 10) = "about.html" Or Right$(myLink, 8) = "about-us" Or Right$(myLink, 5) = "about" Then
Sheet1.Cells(i, "B").Value = myLink
End If
'go to the next link
Next myLink
'once the last web link on the sheet has been visited close the internet explorer
If i = LastRow Then
ie.Quit
End If
' go to the next web link on the worksheet
Next i

End Sub



Watch the video on YouTube

没有评论:

发表评论