Thursday, September 8, 2011

VBA in Excel to automate IE for crawling a web page


If you want to open a website and go through the results of a webpage using VBA 
you can achieve it by first including a reference to Microsoft HTML Object Library 
in your VBA editor.


The following snippet of code should be a good starting point of how you can achieve the same
Sub GoToWebSiteAndPlayAround()

Dim appIE As Object ' InternetExplorer.Application
Dim sURL As String

Application.ScreenUpdating = False
Set appIE = CreateObject("InternetExplorer.Application")

'URL with the search term 'Cancer' at Science
sURL = "http://www.google.com?q=vba" 'this URL to be replaced by your target web page

With appIE
    .navigate sURL
    ' uncomment the line below if you want to watch the code execute, or for debugging
    '.Visible = True
End With

' loop until the page finishes loading
Do While appIE.readyState <> 4
    DoEvents
Loop

'Get info from HTML by ID and Name
Dim outerDiv, innerSpan, requiredtext, HTMLDoc
Dim spanCollection, outerSpan
   Set HTMLDoc = appIE.document
   Set outerDiv = HTMLDoc.getElementById("outerDivClassName")
        Set spanCollection = outerDiv.getElementsByTagName("SPAN")
       
        For Each outerSpan In spanCollection
            If outerSpan.className = "outerSpanClassName" Then
                requiredtext = outerSpan.innerHTML
                Debug.Print requiredtext
            End If
        Next

Application.ScreenUpdating = True
appIE.Quit
End Sub



2 comments:

  1. Hi Hitesh... what if you want to determine the final URL for the page?

    ReplyDelete
  2. Hi Jon,

    In case the url is redirected you can wait for the required url /web page title to appear. Please have a look at this link for more help http://vba-corner.livejournal.com/4623.html. Let me know if this resolves your requirement

    ReplyDelete