Sharp Blue: Translucent DNA databases


About This Article

comments feed

Tips Jar

Paypal Pixel


Today I was thinking about the plans of various governments to set up databases of the DNA of their citizens for the purposes of law enforcement. Clearly many implementations of such plans would pose a substantial problem for those of us who believe in civil liberties, but today’s insight was that this needn’t be the case. The main problem is that most proposed schemes not only answer the question “Whose DNA matches this sample that was found at a crime scene?” but potentially also questions that they should not answer, such as:

  • “What ethnic group is Alice?”
  • “Is Bob the father of Carol?”
  • “Do Dave’s genes make him more than usually at risk of developing cancer or heart disease?”

My insight was that it’s possible to design schemes that allow us to answer the former question, but none of the others. The basic idea is that genes are really digital information (each base corresponding to two bits), and we can manipulate such information in various interesting ways. Most particularly, we can store in our DNA database not the actual sequences of the parts of the genome we choose to use as a “genetic fingerprint”, but a one-way hash of the sequences. We then destroy the original sample and our digital version of the sequence. Then, given a sample from a crime scene we can use the same algorithm to form the hash of the sample’s relevant sequences and compare this hash with the entries in our database. If it matches one, we can read out the name and say “Aha! The sample from the crime scene is from Fred!” On the other hand, we can’t go back from the hash to the actual sequence. The questions that we don’t want to allow people to answer with the database all rely on knowing the actual genes that are present, and so we can’t answer them even in principle.

(This is, of course, the same approach as that taken to the storage of passwords for computer systems, and a very small subset of the interesting field of translucent databases. Also, it relies on the mapping from the sample to a number being exact, and I’m not sure if current genetic fingerprinting schemes are good enough. It was still a fun thing to think about though!)

The judge who put coded messages in his Da Vinci Code plagiarism trial ruling has written another...

Leave a comment