Enterprise Software

DIY: Filter content for free with DansGuardian

Jack Wallen demonstrates how easy it is to install the free and effective proxy and content filtering system DansGuardian on Ubuntu 11.04.

DansGuardian is the best proxy and content filtering system I have used; it's free, simple to set up, reliable, and easy to configure. DansGuardian is a command-line only tool that requires a text editor for configuration and can be as complex or as simple as you like. With this system, you can perfectly tailor web content filtering to meet your exact needs; for instance, you can filter by domain, keyword, extension, and more.

The default configuration for DansGuardian is geared toward primary school systems, but it can quickly be adjusted to meet your needs. DansGuardian runs on Linux, FreeBSD, OpenBSD, NetBSD, Mac OS X, HP-UX, and Solaris. Let's get this content filtering party started with the installation and then move on to configuration and use.

Installing DansGuardian

I will demonstrate how simple it is to install DansGuardian on a Ubuntu 11.04 system. DansGuardian will already be in the default repositories, so all that is necessary is to do the following:

  1. Open a terminal window.
  2. Issue the command sudo apt-get install dansguardian.
  3. Enter the sudo password.
  4. Accept any dependencies necessary.
  5. Allow the installation to complete.

This does not complete the process; DansGuardian needs the tool tinyproxy to function. Tinyproxy will serve as the DansGuardian proxy server. To install this tool, do the following:

  1. Open a terminal window
  2. Issue the command sudo apt-get install tinyprox.
  3. Enter the sudo password.
  4. Accept any dependencies necessary.
  5. Allow the installation to complete.

You'll find all of the files and folders that need to be configured in /etc/dansguardian/. There is one main configuration file, /etc/dansguardian/dansguardian.conf, and within the /etc/dansguardian/lists folder is all of the configuration files for the different types of filtering that can be done, which include: extension, IP, mime type, phrase, regexpression, site, URL.

You could just fire up both DansGuardian and tinyproxy, configure your browsers to use the server hosing DansGuardian as their proxy servers, and enjoy content filtered web browsing. However, the default setup is geared toward primary schools, so the content filtering might be rather tight for your needs. Let's look at how this system is set up.

Configuring DansGuardian and tinyproxy

First, match ports on both DansGuardian and tinyproxy by opening /etc/dansguardian/dansguardian.conf and /etc/tinyproxy.conf. By default, tinyproxy uses port 8888 and DansGuardian uses 8080; hese two ports have to match. What port these are set to will depend upon your needs, but port 8080 is always a good place to start. If that works for your network, all you need to do is change the tinyproxy port.

You will find the port configuration options here:

DansGuardian (/etc/dansguardian/dansguardian.conf)

# the port that DansGuardian listens to.

filterport = 8080

tinyproxy (/etc/tinyproxy.conf)

Port 8888

Once the ports match, it's time to begin setting up the system filtering.

Configuring filtering

All filtering is handled in the /etc/dansguardian/lists directory, which is where you'll find numerous files that handle different types of filtering. The three most popular types/files are:

  • bannedsitelist
  • bannedurllist
  • bannedphraselist

Within each of these files, you can add phrases, sites, and URLs to be banned. Let's say you want to ban the word woodchucks. To do so, open the bannedphraselist file and add the following at the bottom of the file:

<woodchucks>

Now, let's say you want to ban the site woodchucks.com. To do this, open the bannedsitelist file and add the following to the bottom:

woodchucks.com

The above will block all pages from the site woodchucks.com. If you want to block only part of that site, say woodchucks.com/naughtybits/, to do that you would use the bannedurllist file and add the following to the bottom of that file:

woodchucks.com/naughtybits

Now you have the phrase, site, and URL blocked, and it's time to configure the systems.

Starting the daemons

To start both DansGuardian and tinyproxy, first start up tinyproxy with the command:

sudo /etc/init.d/tinyproxy start

Now start DansGuardian with the command:

sudo /etc/init.d/dansguardian start

Configuring the browsers

You need to configure the browsers on your network to use the content filtering proxy. To do this, you will set up the browsers to go through the content filtering system using the IP address of the machine hosting DansGuardian and use the port configured for both DansGuardian and tinyproxy. Once you've done that, test the browser by pointing it to woodchucks.com. The browser should time out. If it does, make sure the browser can go to a non-blocked site (such as TechRepublic). If the browser is blocked at woodchucks.com but is allowed through to techrepublic.com, you now have a working content filtering system for your network.

About

Jack Wallen is an award-winning writer for TechRepublic and Linux.com. He’s an avid promoter of open source and the voice of The Android Expert. For more news about Jack Wallen, visit his website getjackd.net.

2 comments
tonytamaulipas
tonytamaulipas

I installed just dansguardian with tinyproxy in a computer lab in a school, and they run perfect. The problem started after the network administrator installed a squid proxy applied to the entire school network. Squid was only installed but not configured to block anything, this suppose a problem to me because students can access everything. Well, here the real problem, the computer where I installed dansguardian can't connect to internet directly but only if it is configured to use the squid proxy. I configured it and the computer can surf normally, but the others machines can't reach any webpage. What can I do?

mwclarke1
mwclarke1

I use it, mostly as an add-on to another open source firewall project There are GUI configuration tools available also. This is a true content filtering solution, not just another pretty URL filter Can filter on actual content of a page also, if certain words, phrases or combination should appear can block also. there are several ways to block including Boolean expression capable for the daring but is really simple to configure and use to do that. You also have block-lists, categorized based on type of content that can be allowed/blocked. There are several free lists that can also be automatically updated. Can block based on IP addresses as well. Do just about any type of filtering can think of.