Monday, August 31, 2015

The importance of being barcoded

by Owen S. Wangensteen

When, back in 2003, Canadian biologist Paul Hebert proposed the creation of a public database of short sequences aimed at identifying biological specimens by means of just these small sequences (DNA-barcodes), he did not find the unanimous response anyone would expect nowadays. Detractors were many and they actually made a lot of scientific noise. In fact, as Laurence Packer once pointed, scientific papers published from 2004 to 2007 criticizing and opposing DNA-barcoding were more numerous (and had more citations) than papers written by the first “barcoders” who embraced the idea. Then, a miracle occurred: as the public databases grew, DNA-barcoding started to prove its utility in a myriad of forms. From long-disputed basic taxonomy issues to medico-legal forensics, passing through river ecology, agricultural sciences, control of disease outbreaks, food security or biodiversity conservation, all took advantage of DNA-barcoding in ways only a few would have dreamt of some years before.

The reticences were mainly based in some false concerns and a few ancestral fears. Morphological taxonomists argued that no small-sequence could replace their meticulous, morphological observations and thorough interpretations, while internally fearing to lose their jobs. Geneticists argued that genomes were way too complex to be reduced to just a three-hundred base-pairs sequence. Ecologists thought that molecular methods were too expensive and beyond their field of expertise. All of them were basically wrong.

DNA-barcoding does not plan (and is never going) to replace taxonomists. At its best, DNA-barcoding is meant to replace traditional identification keys (which were probably bound to disappear anyway!). Identification keys are usually a nightmare even for expert taxonomists and are completely useless for specious groups including thousands or tens of thousands of species. DNA-barcoding will only facilitate the work of taxonomists, which are still needed (in fact, more needed than ever) for identifying problematic specimens. Given the estimations for the global number of species and the current description rate of new species, we will need taxonomists, at least, for the next 500 years, or probably much longer. This is more than most scientists working in other areas of knowledge could bet.

It is true that genomes of non-model organisms are large, complex, gigantic monsters waiting to be tamed. But the fact is that all you need for unambiguously identify the species where a specimen belongs is a fragment of just a few hundreds base-pairs of a highly variable genomic region (in most cases, just 200 base-pairs will do!). Mitochondrial DNA (and, specifically, the cytochrome oxidase I gene, COI) has proven its power to achieve this task in many groups of organisms (including animals; other genes are most used for green plants or fungi). If you need your sequences just for identification purposes, all you need is to sequence but 200 base-pairs of COI. Fast and easy.

And unexpensive. Sequencing costs have done nothing but plummeting over the last three decades and they are predicted to keep decreasing in a month-to-month basis. Astounding sequencing technologies that grew up under the big umbrella of Human Genome Project have already arrived to average-sized laboratories. Nowadays, DNA-barcoding of an unknown specimen is something that a freshman biology student could perform during his/her first lab course. Costs and complexity are no longer valid excuses.

I recently understood the pivotal importance of barcoding (i.e. the urgent need for getting the sequences from as many different species as you can and depositing them in a public database as soon as possible) when I started analyzing the data we got from bulk-sequencing environmental samples collected in shallow marine hard-bottom communities of selected marine reserves in Spain. This kind of research is crucial to understand how marine communities are changing (and they are changing indeed, probably at a faster pace than we use to think). We need to sample these communities and characterize them as soon as possible, if we want to be able to detect any changes in the near future, before it's too late. The sensible way to do this is by metabarcoding them. That is: get the sample inside a bag, blend it with a kitchen blender, extract the DNA of every organism present in the mixture, use this DNA to amplify a useful genetic marker (COI is the usual choice), and then try to match the millions of sequences you get to the current contents of public barcoding databases, in order to identify the organisms that are present in your samples (and probably their abundances, but that's another story I will be writing some other day). The first steps of this procedure proved to be surprisingly easy. Notably, it is the last identification step which is currently limiting the utility of this technique. We have literally millions of different COI sequences, representing all the biodiversity hidden in our samples, but we are able to identify (give a species name) just a small percentage of these sequences. Marine barcoding databases are not as developed as terrestrial ones. Small and microscopic organisms are much more underrepresented in the databases than big, macroscopic ones. A surprisingly high percentage of marine diversity is yet unknown, even for shallow ecosystems in the Mediterranean or temperate Atlantic European shores, which have been extensively studied by marine biologists, at least since Aristotle.

If we want to begin to understand the biodiversity changes which are happening now in our beloved marine ecosystems, we really need to undertake the titanic task of sequencing the DNA-barcodes of as many marine species as possible, and release the data to our public databases. I am confident that metabarcoding will become the main tool of marine ecologists in the next few years. We need to extract its full power as soon as possible by expanding our databases and filling the main gaps. This is no task for an individual or a small group of people. Given the overwhelmingly high number of different marine species, this is a task for several hundreds of marine taxonomists, working across the World, identifying species in their different groups of expertise, sequencing data, curating taxonomies, releasing data, and waiting for no bigger reward on these data release but contributing to help other marine biologists in building the greatest and most useful marine taxonomy enterprise of all time.

As Harry Truman once said, it is amazing what you can accomplish if you do not care who gets the credit.

Comparing DNA-barcoding vs environmental-DNA metabarcoding approaches to biodiversity monitoring. In DNA-barcoding, a single individual is analyzed, whereas metabarcoding can analyze whole communities by using the power of next-generation sequencing technologies.

Hebert et al. 2003. Biological identifications through DNA barcodes. Proc. R. Soc. Lond. B 270, 313–321. doi: 10.1098/rspb.2002.2218