Cutting-edge data mining and other intelligence tools could redefine privacy as we know it.
By John Borland
Staff Writer, CNET News.com
To show why the government's terrorist-finding database doesn't work, Elizabeth LaForest points to her own case file: While her arrest records are authentic, these days the 89-year-old Roman Catholic nun doesn't often run afoul of the law when taking part in peace demonstrations.
So Sister LaForest joined the American Civil Liberties Union this year to sue the state of Michigan, charging that a controversial law enforcement data-sharing program there was breaking the state's privacy laws.
"I have been very concerned about our privacy rights--they're all being taken away," she said in an interview from her retirement home in Oakland County, Mich. "That's one of the rights we have in the Constitution."
Privacy organizations have fought an uphill battle on Fourth Amendment protections since the terrorist attacks of Sept. 11, 2001, particularly in the area of high-tech surveillance. But the debate has taken on particular urgency with advancements in "data mining," a technology used to identify patterns based on the millions of bits of information stored in public and commercial computer systems.
The use of data mining represents a technological sea change in the way the federal government gathers, stores and analyzes information on its own citizens--perhaps the most substantial transformation in domestic intelligence since the excesses of FBI Director J. Edgar Hoover led to sweeping privacy law reforms in the mid-1970s.
The trend has already led to a string of colossal public-relations disasters, including defunct initiatives such as the Pentagon's Total Information Awareness program and the CAPPS II airline passenger-screening project, but the promise of data-mining and data-sharing technologies remains too tempting for government agencies to resist.
Civil libertarians say they recognize the importance of such technologies in counterterrorism efforts but stress that these programs must be accompanied by parallel changes in laws. Rather than deal with objections after the fact, they say the government must build privacy protections into these programs from their inception.
"We need to write those rules, and they need to be part of the system," said James Dempsey, executive director of the nonprofit Center for Democracy and Technology."
In theory, data-mining technologies are supposed to yield millions of bits of information on background and activities, creating patterns that might eventually help authorities identify terrorists before they strike. But no technology is perfect, and civil-rights advocates warn about the prospect of bad data and "false positives," or an incorrect identification match, in an era when terrorist suspects are held for months or years without being charged.
The most ambitious data-mining proposal came quickly after Sept. 11 in the form of the Total Information Awareness project, launched in 2002 and run by retired Rear Adm. John Poindexter at the Defense Advanced Research Projects Agency (DARPA). Poindexter, a key figure in the Iran-Contra scandal, envisioned "ultra-large-scale" database technologies that would search commercial and government databases while monitoring information streams such as retail transactions to identify "signature" behaviors of terrorists.
"We must become much more efficient and more clever in the ways we find new sources of data, mine information from the new and old, generate information, make it available for analysis, convert it to knowledge and create actionable options," Poindexter said in a conference address in 2002. "Certain agencies and apologists talk about connecting the dots, but one of the problems is to know which dots to connect."
After an outcry among civil liberties groups, the TIA program was scaled back before Congress eliminated its funding. Many technologists, led by the Association of Computing Machinery, said the technical problems with Poindexter's vision were as deep as its potential privacy abuses.
"Computers are not magic," said Barbara Simons, the association's former president. "Programs are only as good as the people who write them and feed the data into them."
CAPPS II (Computer Assisted Passenger Prescreening System)--a proposed successor to today's rudimentary passenger-screening system--met a similar fate after its own flirtation with what the government calls "automated risk assessment."
(continued from previous page)
In early 2003, the Transportation Security Administration proposed the new CAPPS II system, which would cross-reference airline passenger information with commercial data services and secret intelligence information. Passengers would be assigned a "risk score" and be screened or blocked from flying accordingly.
Privacy groups complained again, saying the program was a form of discrimination against travelers on the basis of information that could not be checked or challenged. Like Poindexter, the TSA relented and announced last month that a new system called Secure Flight would be put in place instead.
David Stone, assistant secretary of homeland security, said that the new system will check passenger information against a "no fly" list maintained by the government to hunt for those with known terrorist connections but that it will not assign a risk score.
"CAPPS II ultimately got shut down because when you compared the need for information sharing against the rights of the citizens, we found it unbalanced," said Greg Baroni, president of the global public-sector unit at Unisys, a company that has significant technology contracts with the TSA. "That didn't mean that the goal would be thwarted. Alternative plans went into motion."
Despite these setbacks, data sharing and data mining are making steady inroads within federal security agencies. The most immediately realistic tasks have proven to be sharing data between agencies and simultaneous searching of multiple databases in and out of government circles.
One of the most public examples of this kind of new search tool is the program targeted in Sister LaForest's lawsuit, called MATRIX. That system, whose name stands for Multistate Anti-Terrorism Information Exchange, has also drawn considerable criticism, with 11 of the original 16 participating states withdrawing from its pilot project for either cost or privacy concerns.
executive director, Center for Democracy and Technology
MATRIX was created largely by Florida technology company Seisint, recently acquired by LexisNexis, with help from state officials and a grant from the Department of Homeland Security. The pilot program has taken public records from five states--information such as driver's license and vehicle registrations--and made that searchable along with Seisint's own mass of commercial data.
That means, for example, that an investigator might type in a partial license plate number seen by a witness and immediately get results for all the possible vehicles within a 50-mile radius of the sighting, as well as driver's license photos for the vehicle owners.
Law enforcement officials say privacy concerns stemming from the program are misplaced. The same checks of arrest records, driver's license details, addresses and other information would be done, anyway, they say, but would take days to complete rather than hours.
"It is a tremendous time-saving tool," said Mark Zadre, chief of investigations at the Office of Statewide Intelligence in Florida's Department of Law Enforcement. "The system does not solve anything--it's just an investigation tool. We hit a 'submit,' not a 'solve,' button."
The Michigan lawsuit against the program is based on a 1980 law that bars Florida from sharing records with out-of-state intelligence agencies.
"We're not asking to stop the world and prohibit technology from moving forward," said Noel Saleh, a lawyer for the ACLU Fund of Michigan. "But if we're going to have this kind of expanded information, we need to have some government and privacy oversight to make sure the rights of individuals are not trampled on."
According to documents the ACLU obtained through a Freedom of Information Act request, the original MATRIX program sent 120,000 names to the federal government that scored high on a list of possible terrorist indicators, including age, gender, credit rating and "proximity to 'dirty' addresses."
Other data search tools are potentially even more powerful, though less public.
A recent report from the Government Accountability Office identified a handful of data-mining tools with similar capabilities used by federal agencies. One of these was the Verity K2 Enterprise system used by the Defense Intelligence Agency to "identify foreign terrorists or U.S. citizens connected to foreign terrorism activities."
Officials at the defense agency did not return calls asking for further details. Executives from Verity declined to say specifically how their technology was used but did describe in general how it might function.
Verity's K2 is a search and indexing tool, rather than a database itself. It can target many types of sources, such as Web sites, internal intelligence databases or even the flow of information over an agency-monitored network. While not a monitoring device itself, K2 organizes results from these sources and can notify investigators when information relevant to a query comes up.
"The idea of clicking a magic button and having software issue an APB to arrest someone in Iowa is not going to happen, and shouldn't happen," said Andrew Feit, Verity's senior vice president of marketing. But software can "start to look for relationships between events and documents that can be truly data mined out of a database," he said.
(continued from previous page)
The tension between privacy advocates and ambitious technology programs has slowed progress in data tools, and some researchers believe that basic compromises between security and privacy have been ignored by both sides.
"People incorrectly believe that we must sacrifice one to get more of other," said Dennis McBride, president of the Potomac Institute for Policy Studies, an independent think tank that focuses on technology policy. "In our overzealous knee-jerk protection of privacy, we have thrown out the baby with the bath water."
president, Potomac Institute for Policy Studies
Indeed, many on both sides of the issue believe that a model for compromise may now be emerging in proposals for a new network linking federal homeland security and state law enforcement databases.
The need for this coordination--and the difficulty of achieving it--was underscored in a recent report by the Department of Homeland Security's own internal auditor, which said the government has failed to consolidate 12 terrorist "watch lists."
A broader national debate on data mining and data sharing may be inevitable, however. Each individual creates a staggering amount of data when going through daily life, and increasingly, these bits of information are stored in giant databases at research companies like LexisNexis or ChoicePoint. Privacy laws created as many as 30 years ago are ill-suited for these modern tools, whether they are being used by government agencies, corporations or private citizens.
"The ultimate threat to our privacy is that everything of any significance at all becomes available," said Barry Steinhardt, director of the ACLU's Technology and Liberty Program. "It can be searched, mined, and predictions are going to be made about us on the basis of spare bits of data. That's what bears watching."
In the near term, the balance between privacy and safety will likely remain weighted on the side of security, especially in the first presidential election year since the Sept. 11 terrorist attacks.
"If you're applying to drive a tanker truck through a tunnel in New York City, the city has a right to know who you are," said ChoicePoint Chief Marketing Officer James Lee. "We would say there is no universal right to anonymity. There is only a right to privacy."