Writing Java applications that access Web pages is straightforward, thanks to the excellent support provided by Java’s core library. However, to be complete, any such application must support proxies and HTTP authentication. Fortunately, starting with version 1.2, Java provides native support for authentication. With some effort, you can add similar support for prior versions.

Using proxies
Technically speaking, a proxy is just an agent that receives requests and forwards them to the ultimate destination or to another proxy. Proxies are typically used to implement caches and firewalls.

At the HTTP level, making a request through a proxy is not much different from a regular request. Basically, the request is sent to the proxy instead of the real destination, and the address is fully qualified so that the proxy can find the target host.

Java provides support for proxies in the form of special system properties. All you need to do is set the property http.proxyHost to the proxy address and http.proxyPort to the proxy port. For example, suppose you have a proxy at the address proxy.mycompany.com:8132. The code fragment below configures Java’s HTTP protocol implementation to use that proxy:
System.getProperties().setProperty( “http.proxyHost”, “proxy.mycompany.com” );
System.getProperties().setProperty( “http.proxyPort”, “8132” );

This support is enough for simple cases. But some proxies, especially firewalls, are configured to require an authentication so that the requests can pass through. In this case, an authorization must be supplied, which leads us to HTTP authentication.

HTTP authentication
The HTTP protocol supports protecting resources so that a suitable authorization must be supplied to access them. When a request is made to such a resource, the Web server responds with a 401 (Unauthorized) error code (see RFC2616). In this case, the response includes a WWW-Authenticate header specifying a scheme and a realm.

The scheme defines the method to be used to supply the authorization. Currently, two schemes are specified: basic and digest (see RFC2617). I will focus on the basic scheme because it’s more common and easier to implement, although the digest scheme is stronger and provides higher security.

The realm
The realm is an arbitrary string that defines a protection space (a set of protected resources) within the same host. A single host can have several realms, and all resources within the same realm share the same authorizations—that is, an authorization that is valid in a request for a given resource must be valid for any subsequent request for other resources within the same realm.

Suppose you try to access a protected resource within the realm “Protected Territory” of some Web server. The response will include the following header (assuming the Web server uses basic authentication):
WWW-Authenticate: Basic realm=”Protected Territory”

You must resend the request and include the header Authorization specifying a valid user name and password. The means of obtaining the user name and password is left to the application. Browsers, for example, usually present a dialog box displaying the host and realm and ask for the user name and password.

The header Authorization must supply the authentication scheme used and the user name and password in the form <username>:<password>, as a base64 encoded string (see RFC2045). So, if the user name is Alladin and the password is “open sesame,” the header to send would be:
Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==

The same is true for proxies, except that proxies respond with a 407 (Proxy Authentication Required) error code and the header Proxy-Authenticate to a request that does not include a suitable authorization. The authorization must be supplied through the Proxy-Authorization header in the request.

Since proxies are specified through configuration in the application, so are the user name and password. This way, the application does not need to wait for a 407 error code to ask for the required information and resend the request.

HTTP authentication in Java
In the class URLConnection, Java provides all the necessary pieces to implement HTTP authentication. After the connection has been opened to the server (after calling the connect method), a header can be accessed through the method getHeaderField(String), which returns the value of a header as a string, given its name.

So after performing a request, use the method getHeaderField to get the WWW-Authenticate header. If the method returns null, the header is not present and the request does not need authorization. Otherwise, parse the value returned to get the realm, and use it to get a user name and password. Then, resend the request, this time using the method setRequestProperty to set the Authorization header.

You can repeat this procedure until the authorization is granted or the user cancels the operation. You can save the user name and password to provide the authorization for future requests within the same realm. According to the HTTP authentication specification, all requests to the same path at the same level or deeper should be considered part of the same realm.

Although the standard Java classes do not provide base64 support, implementing base63 is not difficult. There are several publicly available implementations, as shown in Listing A.

You can set authorization for a proxy in the same way. Just use the method setRequestProperty to set the header Proxy-Authorization in each request.

To make it easier to add HTTP authentication to Java applications, I wrote the class HttpGet (Listing B), which provides support for proxies as well. See Table A for the complete API.
Table A


removeAuthorization Removes the authorization previously set for a given host and realm
setAuthorization Sets the user name and password to be used for future requests to a given host and realm
setProxy Sets the proxy to be used for requests
setProxyAuthorization Sets the user name and password to use for the proxy
doGet Performs an HTTP GET request to a given URL and returns a connect URLConnection object; queries the user for user names and passwords, as needed, to access protected resources

The most important method is doGet, which performs an HTTP GET request on a URL and returns a connected URLConnection object. It uses the method authorize to get a user name and password for a given realm. The default implementation presents a dialog box asking for this information. If you need a custom implementation, just subclass HttpGet and override the method authorize. Authorizations are cached for reuse if the same realm is accessed again.

Listing C shows a simple example that makes use of the class HttpGet to get the contents of a given URL (specified in the command line) and print it to the standard output.

Authentication and Java 1.2
Java 1.2 and later versions provide native support for authentication in the form of the class Authenticator. All you need to do is to subclass it and override the method getPasswordAuthentication. The method must get a user name and password and return them as a PasswordAuthentication object.

You must also register an instance of your Authenticator implementation using the method Authenticator.setDefault. After that, Java will call its getPasswordAuthentication method whenever it hits a protected resource.

The good thing about this approach is that Java manages all low-level details. Besides that, Authenticator is not restricted to HTTP authentication but works for any protocol. The downside is that it is available only on Java 1.2. Listing D shows the example from Listing C, but uses the Authenticator class.

Proxy and HTTP authentication support are essential in any application that deals with Web pages. As we’ve seen here, Java 1.2 provides native support for this authentication, and with a bit of effort, you can add such support in any Java version.

My implementation illustrates the basics of this process, but there’s a lot of room for improvement. (You can grab the source code for the article here.) For instance, you might want to increase the support for other types of requests, since this version works only for GET. In a future article, I will revisit this issue and discuss additional refinements.