Control browser navigation with Browser Helper Objects

Phillip Perkins shows you how to build a rudimentary content filter, which uses the functionality of Browser Helper Objects (BHOs) to control browser navigation.

I once read an article about AdBlock, which is an extension available to Mozilla. This add-on allows users to enter filter expressions to limit content delivered to the browser. It hit me that I could also use something like that for Internet Explorer (IE). In this article, I'll build a rudimentary content filter, which will use the functionality of Browser Helper Objects (BHOs) to control browser navigation. The only things that I'll need to negotiate are images and page sources.

A BHO is an ActiveX DLL that runs in the process space of IE and can implement the same functionality as the WebBrowser Control as well as events. When IE navigates to a page, several events occur. One of these events is the BeforeNavigate2 event; this replaces the WebBrowser Control's older BeforeNavigate and FrameBeforeNavigate events.

The BeforeNavigate2 event fires before the browser navigates to a particular URL. Several parameters are passed to this event, including pDisp (a Dispatch pointer to the event source), the URL, navigation Flags, and PostData. You can use the pDisp pointer to set a local variable declared as a WebBrowser Control. This gives us all the functionality of the control, such as the Stop() method. The Stop() method will stop the current navigation, and we can supplement our own URL by using the Navigate2() method to navigate to the URL of our choice.

Another event that takes place is the NavigateComplete2 event, which replaces the older NavigateComplete and FrameNavigateComplete events. The NavigateComplete2 event occurs when the browser finishes navigating to the intended URL. However, at this point, images, objects, and scripts may still be downloading. During this event, we can use the Document property of our IE object to iterate through the images to change the src value of nefarious images.

In order to identify miscreant URLs, we'll use regular expressions to seek out and attack these URLs. You can store the regular expression in a separate text file to make things easy. We do a test on the URLs of the pages and image sources to identify bad URLs. Any URL that doesn't pass the test—or passes the Test(), in this case—will be massaged to a friendlier URL, such as an ACCESS DENIED page or image.

In order to create the functionality for the BHO, we have to implement the IObjectWithSite interface in Visual Basic. Making a reference to the type library file that I will include in the sample download will give you the ability to implement this interface. You must also make a reference to the Microsoft Scripting Runtime library, the Microsoft VBScript Regular Expressions 5.5 library, the Microsoft Internet Controls library, and the Microsoft HTML Objects library.

Here's the code:

Option Explicit
Option Base 0

Implements IObjectWithSiteTLB.IObjectWithSite
Dim WithEvents m_ie As InternetExplorer
Dim m_Site As IUnknownVB
Dim m_lError As Long
Dim m_sError As String
Dim sURLs As String

Private Sub Class_Initialize()
    Dim fso As Scripting.FileSystemObject
    Dim ts As Scripting.TextStream
    Set fso = New Scripting.FileSystemObject
    Set ts = fso.OpenTextFile(App.Path & "/urls.txt", ForReading, False)
    sURLs = ts.ReadAll()
    Set ts = Nothing
    Set fso = Nothing
End Sub

Private Sub IObjectWithSite_GetSite(ByVal priid As
 IObjectWithSiteTLB.GUIDPtr, ppvObj As IObjectWithSiteTLB.VOIDPtr)
    m_Site.QueryInterface priid, ppvObj
End Sub

Private Sub IObjectWithSite_SetSite(ByVal pSite As
    Set m_Site = pSite
    Set m_ie = pSite
End Sub

Private Sub m_ie_BeforeNavigate2(ByVal pDisp As Object, URL As Variant,
 Flags As Variant, TargetFrameName As Variant, PostData As Variant, Headers
 As Variant, Cancel As Boolean)
    Dim re As RegExp
    Set re = New RegExp
    Dim wbc As WebBrowser
    m_lError = 0
    m_sError = ""
    re.Pattern = sURLs
    If re.Test(URL) Then
        Cancel = True
        Set wbc = pDisp
        wbc.Navigate2 "file:///" & Replace(Replace(App.Path &
 "/access_denied.htm", ":", "|", 1, 1), "\", "/"), Flags, TargetFrameName
    End If
    Set re = Nothing
    Set wbc = Nothing
End Sub

Private Sub m_ie_NavigateComplete2(ByVal pDisp As Object, URL As Variant)
    Dim img As MSHTML.HTMLImg
    Dim doc As MSHTML.HTMLDocument
    Dim re As RegExp
    Set doc = m_ie.document
    Set re = New RegExp
    re.Pattern = sURLs
    For Each img In doc.images
        If re.Test(img.src) Then _
            img.src = "file:///" & Replace(Replace(App.Path &
 "/access_denied.jpg", ":", "|", 1, 1), "\", "/")
    Set re = Nothing
    Set img = Nothing
    Set doc = Nothing
End Sub

Ignore the IObjectWithSite_GetSite and _SetSite procedures. The important thing to note is the m_ie local variable is set to the pSite parameter in the IObjectWithSite_SetSite procedure. This gives us a local instance of the InternetExplorer object, which also gives us the events of that object-declared WithEvents at the top of the class code. When this class initializes, the URL restrictions load from the appropriate text file.

In the m_ie_BeforeNavigate2 event handler, you'll see that I get a local instance of the WebBrowser Control using the pDisp parameter. I create a RegExp object to test the URL. If the URL is an unwanted URL, the URL is changed to a local HTML file. This file is a basic HTML file that gently tells the user that they cannot view their intended URL.

In the m_ie_NavigateComplete2 event handler, I get a local instance of the HTMLDocument. I then use that local instance to iterate through all the images and check the src property for unwanted URLs. If I happen upon one, I change the src property to a local JPEG image that tells the user that that image isn't allowed. However, the drawback to this approach is that when the user Refreshes the page, the original image will be downloaded and displayed.

In order for IE to use this BHO, you must make a new key in the HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\Browser Helper Objects path in the registry with the CLSID of the IEAdBlocker.AdBlock object. After you compile the code, you can find this CLSID under the HKEY_CLASSES_ROOT\IEAdBlocker.AdBlock\Clsid entry in the registry.

This code is not a complete solution; this method simply shows what functionality is available with IE and BHOs. If you're feeling brave, you can dig deeper into IE by exploring the functionality available for C++ developers. The crafty could even create the type libraries necessary to expose other interfaces available to IE, such as the IDownloadManager. Download the source code for this example.

Note: Editing the registry is risky, so make sure you have a verified backup before making any changes.

Keep your developer skills sharp by automatically signing up for TechRepublic's free Web Development Zone newsletter, delivered each Tuesday.


Editor's Picks