The local government in Louisville, KY is sponsoring a contest for applications that best utilize TARC General Transit Feed Specification (GTFS) data. (TARC is the metro area's public transportation system.) The contest piqued my interest in GTFS, so I dove into the specification with many ideas on how to put the data to use.
What is GTFS?
GTFS defines a common format for public transportation schedules along with geographic data. It was spearheaded by Google and a group of developers. The goal was integration with Google Maps, so users could find public transit options when trying to get from point a to b. Public transit agencies make their data available via GTFS feeds.
A goal of the open government initiative like Data.gov is transparency. Just like open source software, the community finds great ways to use and refine data like that offered by GTFS.
How does GTFS work?
An early goal of GTFS was simplicity, so small agencies could just as easily adopt the standard as their larger counterparts. For this reason, the GTFS uses comma-separated values (CSV) files.
A GTFS feed is a compressed zip file containing CSV files that provide data on the many aspects of a transit system: routes, stops, stop times, and so on. The online reference provides details on the files and fields, but the following list includes the required files for a GTFS feed (there are many optional files).
- agency.txt: Data on the transit agency providing the feed.
- calendar.txt: A schedule of when a trip is active.
- routes.txt: This defines the many transit routes available.
- stop_times.txt: The actual times associated with a stop in stops.txt.
- stops.txt: This specifies individual stops a certain transit run makes.
- trips.txt: This defines individual trips or runs within a route.
As an example, here are the first few lines from the routes.txt for TARC, with the first line indicating column headers.
route_id,agency_id,route_short_name,route_long_name,route_desc,route_type,route_url,route_color,route_text_color 1,,1,4th Street Trolley,,3,,47C995,FFFFFF 12,,12,Twelfth Street,,3,,829ECE,FFFFFF
Since it is comma delimited, this data is easy to pull into a database and put it to use. The caveat with GTFS data is it is updated daily, so the data for TARC places a new zip file on its site each morning. Old zip files are moved to another directory so only the current zip files are in the root of the site.
The GoogleTransitDataFeed project page has a number of tools for working with data. For example, the ScheduleViewer application loads the data from a GTFS zip file and presents it in Google Maps. Figure A shows TARC data loaded; Figure B shows one route selected.
This raises issues with those wanting to use transit data in real-time — that is, be aware of any issues with stops and where vehicles are currently located. The GTFS standard was extended to include a real-time component aptly called GTFS-realtime.
Real-time GTFS data augments the normal GTFS data, so routes, trips, and so forth are still defined in standard GTFS. The real-time component provides updates on trips, vehicle position, and service interruptions. This data is served over HTTP, but the format is not CSV, as it utilizes protocol buffers. Protocol buffers are a method used for serializing data — it is described as a lightweight, thus faster, alternative to XML. The standard seems simple, but finding examples was problematic because most code samples are provided in Java and Python since this is Google.
The details of working with protocol buffers and real-time GTFS data is beyond the scope of this article. I have been working with base GTFS, but I thought it was important to point out the real-time availability.
A good resource
One of the most frustrating aspects of working with GTFS is a lack of technical information; fortunately, I stumbled upon The Definitive Guide to GTFS by Quentin Zervaas. He said he wrote the book as a way to share the GTFS knowledge he had accumulated from working with it. He has developed a mobile application called TRANSITTIMES+ along with an online resource for public transit data.
The book is a great resource for learning about GFTS data, as well as how to interact with it once it is in a database. The only drawback to this resource is that it lacks GTFS-realtime information, but he hopes to write a follow-up book that covers it.
Given Mr. Zervaas' extensive GTFS experience, I took the opportunity to ask him about the evolution of GTFS:
"The beauty of GTFS is its simplicity, which means it's unlikely to evolve in a big way any time soon. Having said that, there are a few aspects to it which are quite as strong as others. For instance, the way the fare data works is geared towards only a handful of agencies. It's very difficult (or impossible) to model the fare structure of many agencies in GTFS.
The biggest evolution in the last couple of years has been the introduction of GTFS-RealTime. This complements GTFS nicely: GTFS has static data for the mid/long-term, GTFS-RT has info that changes on a really short-term basis."
Outside of this book and the Google GTFS resources, there are limited resources available on utilizing GTFS, especially GTFS-realtime.
The GTFS standard makes public transit data readily available so developers can consume the data to build applications that simplify navigating transit systems. One drawback is the lack of information on GTFS usage, but hopefully that will change as GTFS continues to be adopted.
I see a lot of uses for the GTFS data in addition to facilitating public transit usage; for instance, the data could be combined with other data sources like the census to get a clearer picture on ridership or to help target advertising to certain areas. As with a lot of these open data sources, the possibilities are exciting.
Have you taken a look at GTFS? If so, what are your impressions of the spec, and how are you using it? Have you found a great resource on GTFS that I didn't mention? Let us know in the discussion.
- Get creative with the developer resources on Data.gov
- Personal data access opens new doors for patients and consumers
- White House attacks climate change with hackathons, crowdsourcing, and big data
- How open climate data can improve community resilience against climate change
- Open Data 500: Proof that open data fuels economic activity
Tony Patton has worn many hats over his 15+ years in the IT industry while witnessing many technologies come and go. He currently focuses on .NET and Web Development while trying to grasp the many facets of supporting such technologies in a production environment on a daily basis.