Electricity and CRISPR used to write data to bacterial DNA
In recent years, researchers have used DNA to encode everything from an operating system to malware. Rather than being a technological curiosity, these efforts were serious attempts to take advantage of DNA’s properties for long-term storage of data. DNA can remain chemically stable for hundreds of thousands of years, and we’re unlikely to lose the technology to read it, something you can’t say about things like ZIP drives and MO disks.
But so far, writing data to DNA has involved converting the data to a sequence of bases on a computer, and then ordering that sequence from someplace that operates a chemical synthesizer—living things don’t actually enter into the picture. But separately, a group of researchers had been figuring out how to record biological events by modifying a cell’s DNA, allowing them to read out the cell’s history. A group at Columbia University has now figured out how to merge the two efforts and write data to DNA using voltage differences applied to living bacteria.
CRISPR and data storage
The CRISPR system has been developed as a way of editing genes or cutting them out of DNA entirely. But the system first came to the attention of biologists because it inserted new sequences into DNA. For all the details, see our Nobel coverage, but for now, just know that part of the CRISPR system involves identifying DNA from viruses and inserting copies of it into the bacterial genome in order to recognize it should the virus ever appear again.
The group at Columbia has figured out how to use this to record memories in bacteria. Let’s say you have a process that activates genes in response to a specific chemical, like a sugar. The researchers diverted this to also activate a system that makes copies of a circular piece of DNA called a plasmid. Once the copy number was high, they activated the CRISPR system. Given the circumstances, it was most likely to insert a copy of the plasmid DNA into the genome. When the sugar was not present, it would generally insert something else.
Using this system, it was possible to tell whether a bacterium has been exposed to the sugar in its past. It’s not perfect, since the CRISPR system doesn’t always insert something when you want it to, but it does work on average. So, you just have to sequence enough bacteria in order to figure out the average sequence of events.
To adapt this for data storage, the researchers used two plasmids. One is the same as described above: present at low levels when a specific signal is absent, and present at very high levels when the signal’s around. The second is always present at moderate levels. When CRISPR is activated, it tended to insert sequences from whichever plasmid was present at higher levels, as shown in the diagram below.
On its own, this only stores one bit. But the process can be repeated, creating a stretch of DNA that’s a series of inserts derived from the red and blue plasmids, with the identity being determined by whether the signal was present or not.
Giving it a jolt
It’s a neat system but pretty far removed from the sorts of things we normally associate with the production of data—the output of a sensor reading or calculation is rarely a sugar or antibiotic mixed in with a bunch of bacteria. Getting bacteria to respond to an electrical signal turned out to be relatively simple. E. coli is able to alter the activity of genes depending on whether it’s in an oxidizing or reducing chemical environment. And the researchers could alter the environment by applying voltage differences to a specific chemical in the culture with the bacteria.
More specifically, the voltage difference would alter the oxidative state of a chemical called ferrocyanide. That in turn caused the bacteria to alter the activity of genes. By engineering the plasmid so that it responded to the same signal as these genes, the researchers were able to control the levels of plasmid by applying different voltages. And they could then record that level of that plasmid by activating the CRISPR system in these cells.
It’s pretty easy to see how each of the inserts in a series could be considered a zero or a one, depending on the identity of the insert. But remember that this system isn’t perfect; pretty regularly, CRISPR would insert nothing when it’s activated, which would shift all the ensuing bits. As this process is random, the longer the series of bits you try to encode, the more likely it becomes that at least one of them ends up being skipped.
To limit this problem, the researchers kept their data to three bits per bacterial population. Even then, they had to train a supervised learning algorithm to reconstruct the most probable series of bits based on an average of the sequences found in the population. And, even with that, the system failed to recognize the series of bits about six percent of the time. In the end, they settled on using a parity bit that was the sum of the first two to allow error correction, and then edited lots of populations in parallel.
(By giving each population’s plasmids a unique sequence tag called a “barcode,” it was possible to mix a lot of them into a single population after the bits were encoded and still untangle everything once the DNA was sequenced.)
With everything in place, they successfully stored and read out “Hello world!” They even put the bacteria into some potting soil for a week and showed that they were able to recover the message. (Storing them in the freezer obviously works better.) They estimate that the message can be retained for at least 80 generations of bacteria.
Let’s be clear: as a storage medium, in its current form, this is pretty terrible. If you wanted to put some data into DNA, you’d be much better off having the DNA chemically synthesized. But it is intriguing to think we could go straight from electrical signals to altered DNA, and there may be some ways to improve the system now that it has been established.