Software Development

How do I... Determine the distance between ZIP codes using C#?

Calculating the distance between ZIP codes has become a common feature in search engines. For instance, if you're searching for restaurants, a Web site will often allow you to enter a ZIP code and display all restaurants within x miles of that ZIP code. Zach Smith shows you how to implement a ZIP code distance calculator in C#.

Calculating the distance between ZIP codes has become a common feature in search engines. For instance, if you're searching for restaurants, a Web site will often allow you to enter a ZIP code and display all of the restaurants within X miles of that ZIP code. Here is one way application developers to implement a ZIP code distance calculator using the .NET Framework and C#.

The TechRepublic Download version of this blog post includes a PDF version of the article as well as a Visual Studio project file that contains all of the code for the ZIP code calculator.

Getting the data

Before starting we need to have a list of all ZIP codes and their respective latitude/longitude coordinates. You can get this data from the US Census Bureau -- the data that I'm using is from the 2000 census.

On that page the data is available for download by clicking the "ZCTAs (ZIP Code Tabulation Area)" link under the "ASCII text versions of" heading. This file includes information on all ZIP codes, including population numbers, latitude/longitude coordinates, housing information, geography data, and state in which the ZIP code resides.

You will need to take this data and either import it into a database or transform it into an XML document.

Using the data

I am using an XML version of the data which can be deserialized into a generic List<T> object. This allows me to search the list and retrieve the latitude/longitude coordinates for the distance calculations. I did this by parsing the list provided by the Census Bureau and inserting the data I needed into a custom C# class called ZIPCode. The ZIPCode class contains the following:

  • Instance Properties
    • State - Abbreviation for the state which the ZIP code is in.
    • Code - The five digit ZIP code
    • Latitude - The latitude of the ZIP code
    • Longitude - The longitude of the ZIP code
  • Static Properties and Methods
    • CodeList - A List<ZIPCode> property containing all the ZIP codes
    • LoadData - A method that loads all of the ZIP code data into the CodeList property
    • Distance - A method that returns the distance between two ZIP codes as a type double.
As mentioned above, we simply deserialize the data to get a List<T> object that is fully populated with all of the ZIP codes. The code snippets in Figure A and Figure B show what the XML data looks like and what the deserialization code looks like that populates the List<T> object:

Figure A

What the XML looks like

Figure B

The deserialization code
Once we have the List object populated with data we can begin to search through it and access the data. The searching will be done by utilizing the List<T>.Find method as shown in Figure C.

Figure C

The Find method

This code searches through the _codeList variable and returns the first instance where z.Code equals ZIPCode1, which is a string representing the ZIP code to find. Once we've done this, we do the same thing for the other ZIP code. We then use the latitude and longitude coordinates from those objects to calculate the distance.

Calculating the distance

This article uses a derivation of the Haversine formula to calculate the distance between ZIP codes. The Haversine formula is used to determine the distance between two points on a sphere - in this case the earth. While this isn't 100 percent accurate due to the earth not being a perfect sphere, it is probably as close as you're going to get without getting out the tape measure!

I'm not going to go into the math behind the Haversine formula, but if you would like to learn more here is another good resource. This second link is particularly interesting as it gives a few alternatives to the Haversine formula.

Figure D shows the complete function used to calculate the distance between two ZIP codes in C#.

Figure D

The complete function

This method accepts two ZIP codes and retrieves the data for them from the List<T> object. It then converts the latitude/longitude points to radians and feeds those values into the distance formula. The result of this formula is the distance, in miles, between the two points.

The TechRepublic Download

Please take advantage of the TechRepublic Download that is associated with this article. The download includes a PDF version of the article as well as a Visual Studio project file that contains all of the code for the ZIP code calculator.

8 comments
CharlieSpencer
CharlieSpencer

wouldn't you need an accompanying utility to tell you your current zip code? I only know the zip codes for the houses I've lived in. I also don't know when I've crossed the border between one and another; the USPO still hasn't painted the border lines on the streets.

grumfellow
grumfellow

Appreciate you providing this code: saved me lots of time. Thanks!

Dr Dij
Dr Dij

if you're using 2000 census, a large# of zip codes have been added. This type of algorithm and variations is very useful, for many types of use, such as finding the closest dealer for a customer, given the zip. Several websites, including a malaysian one and melissadata.com allow you to calculate online either to verify your program or if you only need a couple distances. Melissa also sells a subscription to the updated zip lists. Probably many brokers sell up to date info with the centroid of the zip lat, lon included. other gotchas: if the zip code includes 'APO / AE' in the towne name then it is NOT a physical location but a mail drop for military mail. The person at that zip could be anywhere in the world. If you want distances within a zipcode (between street addresses) you'll need to buy another database containing lat / lon for a street address, and standardize the address first to postal standard format before running the calculations. if using zipcode distance and the two zipcodes match, you should skip the calculation and set distance to zero. very rarely you'll get a trig function that approaches zero and is = to zero for the precision you are using. You may need to set error traps for certain functions. what you should NOT do: do NOT use the 'least squares' method. Spherical trig is the only way to go. Least squares is very inaccurate because the lat-long squares is really a trapezoid that gets narrower at the poles and larger at the equator. for commercial uses, you can either find a closest dealer or parse a customer list and narrow it down to a set distance from your store for a mailing, say to everyone on your list within 20 miles. Cities often have 10+ zip codes. If you have an event then you should pick the zip code for the exact address in that city to calculate distance from your customer list to. If for some reason you need use all the zip codes in a city, the previously mentioned databases can be searched for allzip codes for a city and these plugged in to your calculations, so you'd recurse and calculate distances to all the zips for the are you want and select any record that matches your cutoff distance. If you want says the N closest dealers, you can sort the calculated distances for a customer to all dealers on your dealer list to a temp table with a secondary index by distance + dealer#, then read the closest N dealers you want and append them to your customer record. Also remember that searching for zip codes that are one digit different is a totally inaccurate way to find closest zip. There are often large jumps and gaps in zipcode numbering. using this method assures you will find the closest zips, such as a dealer we found match in data file where in rhode island, closest dealer was in next state. Happy computing!

benbur
benbur

...that looks great :D

NickNielsen
NickNielsen

APO AE is not "anywhere in the world." It may be anywhere in a particular part of the world, though. There are three APO "states": AA, AE, & AP. These are assigned based on the home location of the post office. - AA is the Americas: North, South, and Central. - AP is the Pacific region, including Japan, Australia, Southeast Asia as far west as (I think) Burma and India, and expeditionary post offices under USPACCOM. - AE is Europe and Africa, and includes Southwest Asia and the Indian Ocean islands, and expeditionary post offices under USCENTCOM. FPO and DPO follow the same structure.

jim.wright
jim.wright

What sources, free or not, are available for Canada postal codes or Mexican cities?

Justin James
Justin James

I second all of these. I was responsible for overseeing a "find within XYZ radius" system last year, and the devil really is in the details. We had to weigh the speed of calculations (particularly when it is a function used within a SQL query to narrow a set down) against the accuracy, for starters. We were willing to give up some accuracy, especially since those lat/long pairs are for the *center* of the ZIP code anyways. The flip side of cities having many ZIP codes is that many areas will see 1 ZIP cover hundreds of square miles. As such, you are going to lose a lot more accuracy (or even more!) by using a lat/long from a ZIP code alone than your choice of formula probably will, in most scanarios. Picture this: in the middle of Midestaho (fake state), there are 2 ZIP codes, each roughly 50 miles by 50 miles in size. Since it is the Midwest, everything is divided into neat little squares. A street runs down the middle of this border, half of the street is in one ZIP, half is in the other. There are 2 houses facing each other on each side of this road, so they are roughly 100 or 200 feet apart. A geodistance formula based on the lat/long reading of the ZIP code will show them to be 50 miles apart, give or take a few feet. Needless to say, if you used this to find the nearest gas station when your tank is on "E", hope you can call a tow truck to get you to the gas station 2 miles away that looked 75 miles away to the locator system. :) J.Ja