id="info"

When your genome costs less than your iPhone: The beautiful, terrifying future of DNA sequencing

Mapping the human genome was one of humanity's greatest scientific breakthroughs. Now, the cloud and supercomputing are taking it to new heights, bringing breathtaking and disturbing possibilities.

Dan Lane has always had a keen interest in the next big thing. He owned an early set of Google Glass, got his first RFID implant 10 years ago. Working in technology, he'd always been curious to try out the latest piece of kit or new service. So when 23andMe, a company that offers customers insight into their own DNA, opened up in the UK, he decided he'd give that a try too.

23andMe works like this: you get sent a small tube in the post, you spit into it, and send it away to be analysed. That little sample of saliva may not seem like much, but it holds a few of your cheek cells, each of which contains your entire genome—all the DNA that makes you, you. A few weeks later after the tube arrives, 23andMe sends you back a personalised interpretation of your DNA.

Services like 23andMe can unpick your genome to discover information on a range of traits: how you react to certain drugs; your risk of developing particular diseases; what characteristics you might pass on to your children; and even elements of your physical make-up—whether your earwax is wet or dry, for example, or whether you're likely to enjoy the soapy taste of coriander.

The testing kit sat on Dan's desk for a few weeks after it arrived. He spat in the tube, forgot about it for a while longer, remembered it a week later, and sent it off.

The service seemed like "an interesting, geeky expensive toy," Dan said. "I thought I'd give it a try and see if anything interesting came up."

"At the time I sent the test off, I happened to be talking to my mum and I mentioned I was doing a DNA test. She didn't say anything, but I imagine at that point she was probably thinking 'oh crap'."

Something interesting was about to come up.

A few weeks later, the results were in. An email dropped into Dan's inbox telling him he could go online and see what his DNA had to say about him. He was interested to know how our genes make us what we are, but he was less bothered about the ancestry information 23andMe provides—details of where in the world your maternal and paternal DNA hails from. He knew both sets of his parental DNA would just show a standard European genetic mix.

He thumbed through the results on his phone: no elevated risks of disease to worry about, nothing surprising in his physical characteristics, all quite boring, all quite plain.

dna.jpg
Our DNA fingerprint is unique.
Image: Isak Ivan / iStockphoto

He scrolled on to the ancestry information—maternal DNA from Europe, as expected. His paternal DNA originated in South Asia though—now that was interesting. Dan assumed a few generations back, there was a branch of the family tree that had sprung from somewhere unknown to him.

"I stopped looking at that point, and thought, 'that was a bit strange'. My dad had died in 2010, so he wasn't around to ask, so I asked my aunt, his sister. I said 'do you know if any of our family history goes back to India?', still not suspecting the slightest thing, thinking 'isn't that interesting?'

"She messaged me back saying, 'I know where you're coming from, I know what you're asking about, I'll tell you everything I know'."

Dan's aunt told him that his parents had never known for certain whether his biological father was the man his mother married—the man Dan had always known as his father—or another man, who was, in fact, from India. 23andMe had settled beyond a doubt a question that had remained unanswered for over 30 years, and one that Dan had never even known to ask.

Unlocking the secrets of the genome, for $99

All the complexity of the human race is written in a language of just four chemicals, each known by a letter. Those chemicals, called bases, form pairs: A pairs with T, C with G. Every person has three billion of those pairs in their genome, and the order in which those pairs line up one after another acts as a blueprint to build that unique individual. The way the bases are ordered is what gives us all nearly the same physical layout—brains, limbs, skin, eyes, and so on—but makes one of us Marilyn Monroe and one Albert Einstein, one of us me, one you.

Just by observing the sequence of those bases, stored in each and every human body cell, researchers can create a portrait of the person they come from, from their hair colour to the likelihood they'll get breast cancer to their sensitivity to bitter tastes.

While the $10,000 cost of sequencing a genome makes it prohibitive for most people and organisations, an alternative—known as genotyping—is already available from companies such as 23andMe and Personalis Genomics, at a far lower cost.

Of the human genome's three billion base pairs, only several million are likely to differ from person to person. And thanks to the genomic maps that the Human Genome Project made possible, by tracking which variants of those bases a person has, companies can sequence portions of people's genomes for a relatively low cost.

Your genes are not your destiny... Lifestyle and environment have a significant part to play in determining your future health.

For under $100, 23andMe will give you a snapshot of your sensitivity to a number of drugs, as well as dozens of other biological markers.

It is, however, just that: a snapshot.

"23andMe provides you with genetic information, but does not sequence your entire genome or perform predictive or diagnostic tests," the company states on its website.

And because it doesn't sequence everything, it will miss some things. Take breast cancer risk, for example: there are many different known genetic mutations that elevate a person's likelihood of contracting breast cancer, and 23andMe only looks for a few of them.

The company also highlights the limitations of what it does look for, describing genomic information as "difficult to interpret on its own". It adds, "We review the most up-to-date biomedical literature on genetic associations and provide you your genotype information in the context of current scientific knowledge." It can't tell you if you have a disease, or whether you will get it in the future, just whether your odds are higher or lower than the average.

Your genes are not your destiny, as 23andMe rightly points out. Lifestyle and environment have a significant part to play in determining your future health. Peeking at your DNA is only getting a look at half the story: genes don't affect health in isolation, and our personal choices—how we eat, where we live—have a similarly significant impact on future health. Your genome may suggest you have a lower than average risk of colon cancer, for example, but if you drink heavily and eat a lot of processed red meat, you may find you've worn away that advantage.

Nonetheless, the FDA judged in 2013 that the 'health' portion of the tests that 23andMe offered in the US were sufficiently medical in nature, and so it should be pulled. It took the decision because, it said, the tests were intended for the "diagnosis of disease or other conditions or in the cure, mitigation, treatment, or prevention of disease" and had not been given the necessary market approval. It also warned that there was a risk that a false positive or negative associated with certain disease markers could have disastrous consequences.

There are other areas of concern around personal genotyping services. If you haven't taken a genotyping test, for now, the only place your genome is stored is in the nucleus of each of your cells. However, once you've used a genotyping service, your genetic information is also the property of the company you've sent it off to.

It's no longer under your sole control, and who your genetic information gets shared with is now at the discretion of the provider. 23andMe and rivals will have to hand over genomic information if the appropriate legal requests are made, for example. The wisdom of that is undeniable for the investigation of criminal cases. But, particularly in light of mass warrantless surveillance conducted by the NSA and its international counterparts, it's a distinctly uncomfortable prospect.

The company's privacy statement says it "may share anonymized and aggregate information with third parties; anonymized and aggregate information is any information that has been stripped of your name and contact information and aggregated with information of others or anonymized so that you cannot reasonably be identified as an individual." This could clearly become a vexed notion in the future when whole genomes are sequenced by such services. Can you truly anonymise something that is unique to only one person in the world?

23andMe has its own research arm which allows users to opt-in to share their data with organisations using genomics to investigate health. For example, customers can volunteer to take part in a study the company is currently running with Pfizer to investigate genetic links to inflammatory bowel disease by contributing their genetic data and filling out a survey.

23andMe highlights the risks associated with taking part in research in this way (risks that are the same for both public and commercial efforts) including that a breach could result in genetic information being put into the public domain and becoming associated with your identity.

"Identification of your individual-level data from those summaries would be extremely difficult, but it is possible that a third party that has obtained some of your genetic data could compare that partial data to the published results and infer some of your other personal information.... Although 23andMe cannot provide a 100% guarantee that your data will be safe, 23andMe has strong policies and procedures in place to minimize the possibility of a breach," 23andMe's privacy statement says.

Yet, it's only by providing as much detail as possible that consumers will really help research efforts, and themselves—thanks to the interplay of genome and environment in disease. Imagine someone had contributed both their lifestyle and genetic data to a study that allowed scientists to identify which genomic and environmental markers cause an increased risk of a rare cancer, say, then consequently find an effective way to prevent it developing. Then imagine that data was so well anonymised they couldn't be traced and alerted of the potentially life-saving treatment.

There are incredible advantages to individuals' sharing their genomic data, of course, and not just to the individuals themselves. Adding more and more genomic information to researchers' databases allows for better analysis and enables scientists to work out why certain diseases affect the people they do, and how they can be better treated. New, more effective drugs can be developed, and the interplay between genotype and environment can be better understood.

However, while sharing as much of our data as possible stands to benefit both us as individuals as well as the rest of the human race, it equally puts us at most risk of having that data exploited in a way that could harm us.

Who owns your genome?

In terms of gaining better insights into human health from genome sequencing, businesses will soon want a slice of the action too.

For health insurers, knowing whether their clients have a greater likelihood of developing a condition that could bring years of expensive medical treatment would have a clear benefit.

And perhaps not solely health insurers: if a predisposition to risk-taking behaviour was associated with a certain gene sequence, home or travel insurers may want to get in on the act too. Those with bad news in their genes would likely see their premiums jacked up, while their luckier brethren would be given a cut, much like health insurance discounts for gym goers. Of course, one way to try to dodge the whole scenario is to decline to share the information, though it's not hard to imagine consumers who decide not to give away genetic information will be suspected of having something to hide, and landed with a higher bill for their silence.

Similarly, employers could request the sequence of an employee or potential hire to work out whether they're likely to remain in good health during their tenure. While there's currently legislation in some parts of the world that forbids discrimination on genetic grounds, given how often legislation panders to the interests of commercial companies rather than the electorate as a whole, it's not unreasonable to imagine that this won't be the case for long.

Another challenging question with the advent of genetic sequencing is just how much of our DNA is really ours? Having discovered a mutation that causes, say, a higher risk of bowel cancer, is a company entitled to patent it?

Who your genetic information gets shared with is now at the discretion of the provider.

Some companies argue they should be—that having spent time and money discovering the mutation and so enabling tests for its existence to be developed, they should be allowed to patent it in order to make their money back. Others argue that such mutations exist naturally in the human body and that if any patent were to be granted, it would be to mother nature, rather than a commercial interest.

Test cases have already been making their way through the courts, with varying results. Myriad Genetics, a co-discoverer of mutations on the BRCA1 and BRCA2 genes that lead to a high risk of breast cancer, went on to file patents on the mutations, and sued a rival maker of a test for them. The US Court of Appeals for the Federal Circuit found against Myriad, saying: "Myriad did not create or alter any of the genetic information encoded in the BRCA1 or BRCA2 genes. The location and order of the nucleotides existed in nature before Myriad found them... As the Supreme Court made clear, neither naturally occurring compositions of matter, nor synthetically created compositions that are structurally identical to the naturally occurring compositions, are patent eligible."

Not all courts take such an enlightened view, however. Australian courts have backed Myriad, and other countries have allowed companies to patent sections of naturally occurring DNA.

The worrying implications of patentable DNA cannot be overstated. If a single company is allowed to patent something you or I may have in our bodies right now, they would be able to control how, where, and at what cost testing can be carried out. Your mother may have a breast cancer mutation, but finding out if she passed it onto you could end up a highly expensive process. Similarly, human-created DNA sequences remain patentable, much like today's pharmaceuticals. We may one day find ourselves paying for the insertion of a sequence that can improve brain function or prevent the onset of a particular disease. And, as patents last 20 years and such sequences may be hereditary, it's not inconceivable we may find ourselves paying for our children too.

For now, genotyping services are something of a blunt instrument when it comes to improving human health, but one that's becoming ever sharper as tech improves and scientific research continues. If you're feeling particularly future-gazy, remember you could potentially clone someone from their sequenced DNA. While there's a limit to how useful a genome is to someone with criminal intent today, the potential for harm in the future is broad and terrifying. Yet, its potential for good is similarly broad, and equally breathtaking.

The advent of DNA

Genome sequencing may sound like a simple process since there are a number of companies that can analyse your genome for around $99 and a wait of only a few weeks. But, it's a process based on some of the most revolutionary science of the twentieth century.

How DNA works, and even the fact that it exists, was only discovered around 50 years ago. However, it wasn't until 1993 that a concerted effort was launched to understand what each tiny detail of human DNA meant. That's when the Human Genome Project began, charged with producing the first completely sequenced human genome—a task that would take over a decade, involve researchers from across the US, Europe, China, and Japan, and was destined to drive a radical transformation in medical technology.

When scientists first began sequencing DNA in the mid-1970s, the process—known as Sanger sequencing—was technology-light and time-heavy.

The DNA used for sequencing was divided into chunks, and inserted into bacteria. The bacteria divides and reproduces extremely quickly, making a new copy of the DNA along with it.

bacteria-on-agar.jpg
A sample of bacteria being grown on agar
Image: Bill Branson, NIH

In its natural state DNA is a helix, two strands wrapped around each other. Once the DNA was removed from the bacteria—no mean feat in itself—the helix was split in two. The base pairs that would normally glue the two strands together would be cut in half, each one sticking out like ribs from a backbone. A series of chemical reactions would be run, allowing different chemicals to stick to one of the four bases.

After the fragile DNA was removed from the gel, it would be left to dry and then exposed to photographic film. As those four chemicals were radioactive, where it had stuck to the DNA strand, it showed as a dark mark on the film.

In the 1970s and most of the 1980s, there were no machines to record the sequence of bases, just researchers with rulers.

"You'd sit and have a colleague write down the letters as you'd read up the gel [plate that held the DNA strands]. You'd move a ruler from the bottom of the gel up to the top, and somebody would record 'A, T, T, C, A, G, T'. Because of the process for pouring the gels, very often they weren't perfect. The lanes were wavy and moving around, not nice and straight like when you see pictures of them. It was hard," Dr. Jeff Schloss, genome technology programme director at the National Human Genome Research Institute, remembers.

In the late 1980s, a new process attached a different fluorescent dye to each base rather than a radioactive chemical. That sped up throughput fourfold. Rather than needing to run four separate reactions, a single reaction run once could do the same job.

Capillary arrays in the 1990s accelerated sequencing still further. Instead of having technicians load the strands of DNA into gel plates by hand, machines could automatically suck them into arrays of 96 capillaries, and read them by laser. It allowed the project to start putting together sections of the genome that were up to 1,000 bases long.

In the early days, the bigger sequencing labs would develop their own software to analyse their results, but there was little standardisation between systems, and the level of skills needed to use the software tools was so advanced that relatively few researchers could take advantage of them.

While the introduction of capillary array systems and improvements in computing driven by Moore's Law sped up the pace and cut the cost of sequencing, the software tools scientists needed to analyse them weren't keeping up.

"You were always choking on data," Schloss said.

The Human Genome Project made it mandatory that any reads of the genome of a certain size should be made public within 24 hours (a rival commercial effort, run concurrently by a private genome-sequencing company called Celera, preferred to keep its discoveries to itself).

However, at the beginning of the Human Genome Project's life, disclosure could only go so far. Sharing the long read was possible in so far as scientists "could FTP the data. I'm not sure if this was mainframes still. It was probably mini computers, [ the UNIVACs], and that sort of generation of computer. There was not a whole lot of this you could really do given the amount of data with PCs. In 1989, 1990 we were using a PC hooked up to a gel reader that you could use to read slab sequencing gels. PCs were in people's labs, but I think it was just too much data [to share]," said Schloss.

colony-picker.jpg
A colony picker selects and transfers bacterial colonies to growth plate
Image: Maggie Bartlett

By the time the Human Genome Project wrapped up in 2003, things were very different. Scientists were able to download results from other labs and run their own analyses on the data.

"Today, in the commercial DNA sequencing technology and analysis business, the tools are very widely accessible and almost anybody with a little bit of knowledge can do some pretty sophisticated DNA sequence analysis," Schloss said.

As well as causing a data glut, the arrival of capillary arrays also led to the introduction of robotics in DNA sequencing labs. Traditionally, growing the bacteria that housed the DNA was a job for students: for hours on end, the scientists of tomorrow would take sterile toothpicks, pick up bacteria, and introduce them into a tube of growth medium over and over again.

But manual methods of "colony picking" weren't fast enough to keep the capillary arrays fed, and robots were called in to pick up the slack. Using computer vision, the colony pickers were able to take over and once the colonies were mature, other robots would remove the DNA from the bacteria, while yet more would subject them to chemical processes necessary to detect the sequence of bases.

Initially, labs like the MIT and Harvard-affiliated Broad Institute made the robots themselves. Commercial varieties soon followed and, once again, the pace at which the human genome's secrets were revealed accelerated.

How technology played its part

When the Human Genome Project published the human genome in 2003, the cost of sequencing each genome ran to tens of millions of dollars—still way too high to take sequencing from high-level scientific research to the at-home consumer service it is today.

In the decade and more that has passed since the first human genome was published, new DNA sequencing technology techniques have been developed that have helped drive the cost down to more manageable levels: at the end of 2014, rather than millions, the cost of sequencing an entire human genome was around $10,000.

One such technique is sequencing by synthesis, and it all began with beads.

Sections of DNA to be sequenced were divided into two short strands, and one half was attached to the surface of the bead. But one lonely strand on one lonely bead is not enough to sequence part of a genome. There needs to be thousands of separate copies of the DNA covering the bead, giving it a fuzzy genomic beard. How did pioneering scientists manage to get the beads they need? "It was really very clever," Schloss said. "They made salad dressing."

costgenome.jpg

By shaking up a mixture of water, oil, and a few other ingredients—the sequencing equivalent of oil, vinegar, and mustard—each bead would end up isolated in its own water droplet along with all the right chemicals for a process called PCR to take place. PCR, short for polymerase chain reaction, can turn a single fragment of DNA into tens of thousands of fragments, all stuck to the same single bead.

The droplets could then be broken and the beads spread out, ready for sequencing. Scientists would then run chemical processes to attach bases to the single strand, one by one. In one system, as each base was added to the DNA, a particular bioluminescent molecule would be given off, while another preferred using a Sanger sequencing-style option of adding a fluorescent marker.

While the light from a single molecule would be too hard for even the most advanced systems to pick up, with thousands of DNA fragments, the system can detect which bases are going where.

Alas, DNA isn't always cooperative. Not every third A on each of those thousands of fragments will add its complementary base at the same time, meaning there's a risk of the colours getting mixed up. That's where some smart tech comes in.

Sequencing by synthesis, companies "got really good at using computer algorithms to say, 'We know what's going in most of the molecules in that cluster, and we can subtract out the noise'. And by using improved image analysis, they were able to get much longer read lengths than using the raw signal that started having mixed colors," Schloss said.

The fuzzy beads are also used in another of the next-generation sequencing technologies called Ion Torrent, developed by Life Technologies. Once the beads are covered in DNA fragments, they're put into wells on the surface of a chip and a series of chemical reactions are run. As each base on the DNA strand pairs with another base, a hydrogen ion is given off, causing a change in the pH of the solution in the well. The change in pH is converted to voltage that the chip can detect, and turns into the sequence of bases that describe the genome.

pcr-reaction.jpg
A PCR reaction in progress.
Image: iStockphoto

Without the need for bulky equipment to study the faint glow of fluorescence or bioluminescence—no lasers or photographic equipment are required, just chips to spot voltage changes—the likes of Ion Torrent meant that sequencing hardware got smaller and smaller. While the capillary array systems were the size of a fridge, their Ion Torrent counterparts are closer to the dimensions of a microwave. And as existing technologies become increasingly sophisticated and new ones are developed, they're likely to get smaller still.

Even more technology breakthroughs are poised to make a splash in sequencing. Take nanopore sequencing, for example. A segment of DNA can be ratcheted through the tiniest of holes, base by base. Think of a membrane as world's smallest needle, and your DNA as the thread. Electrodes are attached to either side of a pore in the membrane, and the DNA is pulled through the pore. The flow of ions through the electrode will change depending on which base passes through the pore—information that can be used to sequence the DNA. And not only that, nanopore sequencing will also allow scientists to tell whether a base is methylated or hydroxymethylated, both of which can impact human health. Too many methylated Cs could lead to cancer, for example.

Nanopore sequencing is in its early days, however, and there's much research still focusing on what the membrane should be made of. Some camps favour a lipid bilayer—similar to the coating around each of the cells in the human body—while others prefer materials such as graphene.

Early lipid systems have already hit the market, including one by Oxford Nanopore that plugs into the USB port of a computer.

Thanks to the wave of minaturisation that's now upon us, getting your whole genome sequenced could eventually be a matter of popping in to your family doctor's office.

A cloudy future

Since the Human Genome Project gave us the first complete map of our genes in 2003, scientists have been working to understand its secrets at an ever more granular level.

In the years since the human genome was published, scientists have been studying the genomes of individuals in an effort to understand which genetic sequences make them more likely to develop certain diseases. With this information at their disposal, researchers can work out how those conditions could be treated more effectively or prevented altogether.

Like so many other areas of research, the pace of discovery in genomics is accelerating in parallel with the growing computing power at its disposal.

staubli-lab-robot.jpg
A Stabuli robot in use in a lab.
Image: Maggie Bartlett, NHGRI

According to Dr. Wu Feng, professor of computing at Virginia Tech and head of its Synergy Lab, which provides computing tools to researchers and scientists, technology is one of the three pillars on which modern scientific research now rests.

"Computing has now become the third pillar of discovery, alongside the traditional pillars of theory and experimentation," he said.

It's a pillar that is becoming increasingly key in healthcare, helping researchers fight existing diseases, and work out how to tackle new ones.

When a new virus is discovered, rather than send scientists into a wet lab to work out its genetic make-up, the virus' genome can be quickly sequenced and compared to the genomes of known viruses. By studying similarities between the new and known viruses, clinicians will have a better idea of how to treat the newcomer. If it's similar to a known virus, the treatment that works for that should prove effective on the unknown one too.

"The more compute resources you have available to you to compute on all these different known viruses and compare to the unknown virus, the faster you'll be able to identify the correlations between the known and unknown viruses," Feng said.

"But not everyone has a supercomputer and this is where the cloud is really coming in. We're already making use of the cloud in other ways to make ourselves more productive, but what we're doing now is trying to take the next step in terms of commoditizing the use of the cloud for scientific discovery."

For modern researchers, the computing problems that genomics presents are not so far away from those that plagued their colleagues in the 1990s and 2000s: a constantly growing deluge of data and a lack of computing resources to process it at a decent speed.

While the data generated by the genomics world is doubling around every six months, Moore's Law means compute power is increasing by the same magnitude every 24 months or so.

Each sequenced human genome accounts for around 3GB of data. That might not sound so bad to deal with, but typically, mapping one genome to another involves using data sets around 10 or 20 times bigger than the genome itself. So, a 3GB genome would require handling between 30GB and 60GB of data.

For those with modest supercomputing capabilities—say eight or 16 nodes—tackling a single genome is half a day's work. But when handling more serious workloads, labs will quickly find themselves needing thousands of nodes. "Not many people are going to have that sort of supercomputing in their backyard," says Feng.

"Computing has now become the third pillar of discovery, alongside the traditional pillars of theory and experimentation."
Dr. Wu Feng

Again, that's where the cloud has come in, offering researchers a new, more flexible alternative to building out their high-performance computing facilities. When a virus is discovered—a new Ebola-like disease surfaces, for example—labs can rapidly scale up their computing power to compare it to other known viruses as they hunt for treatments that are likely to be effective. Nonetheless, the capabilities of onsite hardware remain key. Virginia Tech uses a hybrid client plus cloud model, using Microsoft's Azure, transferring some data to the cloud while doing some pre-processing locally.

Before long, as both local and cloud power march onward, that client could be your smartphone, working in concert with a thumb-sized attachment that could analyse blood samples, to sequence your genome. The day could come in the not so distant future where a consultation with a doctor could end up with the patient popping into the waiting room while their genome is sequenced.

Right now when a person calls into their doctor with high cholesterol, what drug they leave the practice with depends on the doctor's best guess about which will be more effective. Fast sequencing could change that.

"What people do now is they kind of throw darts at it: 'Let's put you on Lipitor and come back in six months and we'll see what happens. If it didn't work, we'll put you on something else'. Instead, what if you had the encrypted chemical makeup of Crestor and an encrypted chemical makeup of Lipitor and you were able to simulate the reaction of taking that drug with your genetic makeup that you stored up in cloud?" asked Feng.

Personalised medicine, according to Feng, could be a matter of years, rather than decades, away.

The germ of personalised medicine systems is already out there. It's possible today to get some level of insight into how a patient will fare with anti-clotting medicines such as warfarin and clopidogrel, which are helping reduce the risk of heart attack and stroke among those with high blood pressure, using current-gen sequencing systems.

Genotyping and sequencing are just the start of our understanding of our genetic makeup. As time rolls on, those tiny threads of DNA will tell us more about ourselves, our health, and our future than we ever dreamed possible.

23andMe, and me?

Dan had asked me earlier if I would be taking a 23andMe test myself. I told him I wouldn't. "Why would you not want to know?" he asked. Initially, it was because I'd been concerned about what I'd find out - that I'd see the cause of my own eventual death way too early.

"Not everyone has a supercomputer and this is where the cloud is really coming in... What we're doing now is trying to take the next step in terms of commoditizing the use of the cloud for scientific discovery."
Dr. Wu Feng

That argument didn't stay with me long. When it comes to health, knowledge really can be power. If you're aware you're more at risk of a particular disease, you may be able to take certain actions or start treatment to delay its onset, mitigate its effects, or reduce the risk of it occurring altogether. Take Google cofounder Sergey Brin, for example. He took the 23andMe test and found his likelihood of developing Parkinson's was elevated compared to the general population. He now drinks more coffee and exercises as a result, both activities which are thought to cut the risk of the disease.

While my fear around confronting the unknown by taking the 23andMe test had been overturned, I was still not convinced. While I'd happily donate my genome anonymously to a research project, the thought of sharing it with a commercial company made me feel more than a little uncomfortable.

For now, due to the limitations of commercial genotyping, it seems family history is as good an indication as any of my future health and ancestry. 23andMe can't tell me that I have my great aunt's nose, like finding an old black and white photo in my granny's loft did.

And yet, there could be more secrets locked up in my genome than I think.

23andMe has a feature called "DNA relatives" that works a little like a social network. If a customer opts in to the service, 23andMe will alert them to other users with genetic makeup that bears a resemblance to theirs and can even make an introduction. Since the service has limited geographic range at present, most people are likely to find very distantly related fifth or sixth cousins, branches of their family tree they didn't know existed, and may not even be able to work out how they're linked.

Subscribe to our Big Data Analytics newsletter and get best practices and tips for extracting insight from internal data, plus today's most useful data from across the Internet.

But very occasionally, the service will reveal stories like Dan Lane's. Reportedly, a man in New York who had been adopted as a child found his birth brother through 23andMe's social network when he approached him through the service. Until that point, his brother hadn't known he had a sibling that had been put up for adoption. For that family and others like it, what becomes known cannot be unknown.

Knowing what you know now, I asked Dan, would you still take the test?

"If I could go back and choose whether or not to take it, I'd still take it. I don't regret taking it at all. Some of my friends have said 'Wow, you must be really angry at your parents', but I don't see why anger would come into it. My dad was the man who raised me and, if anything, I respect him more knowing that he always knew I might not be his son and not even wanting to know, not thinking 'We should do a test and find out'... From a family point of view, it doesn't change anything to me."

Stories like Dan's are likely to become more common as sequencing services grow. While DNA may tell you who you're related to, it can't always show you who your family is. It can tell you about the genes that created you, but what you do with that information is always yours to decide.

About

Jo Best has been covering IT for the best part of a decade for publications including silicon.com, Guardian Government Computing and ZDNet in both London and Sydney.