One of Jonah’s advances in helping clients with metabarcoding was to develop a closed reference database for higher plants to aid in long-term continuity of data. In essence, we permanently assign a consensus sequence to a unique OTU ID. That way, we have continuity over time in what species is/are represented by a particular OTU. When a client comes back a year later and has more diet samples to analyze, we can sequence the samples, match the sequences to our closed reference database, and then seamlessly compare the output over time.
This week we are beta-testing a new closed reference database. This time, for phytoplankton using the 23S region (Sherwood and Presting 2007). Now, clients that wish to examine patterns of phytoplankton in water can be assured that they can compare their results across multiple runs.
We’ve had to tackle a few issues in elevating 23S for long-term usage. 23S is a good marker to use in that it amplifies both cyanobacteria and eukaryotic algae. That way, the relative abundance of the two groups can be assessed. We’ve also assessed the topology of the 23S gene tree to ensure that it broadly represents the phylogeny of plastids**. That way, when we find sequences whose hosts have not been sequenced yet, we have some assurance as to the taxonomic identity. Unknown sequences found in water will be assigned a permanent OTU. When the source for that OTU is sequenced, we update the database to reflect this, improving the results over time.
**It’s interesting to look at the 23S gene tree. You can see where taxa like eustigmatophytes acquired their plastids and the process of endosymbiosis for dinoflagellates (they often nest in with diatoms).
The taxonomic database for 23S is still developing. We have about 1400 taxa in the closed ref database right now, of which about 900 represent phytoplankton. We’re actively working on expanding this database, but in the meantime, analyses of assemblages can occur with taxonomy-free OTUs just as if the taxonomy were assigned.
Final beta-testing with select clients is occurring right now. Once any last issues are worked out, the 23S closed ref approach should be available for long-term and large-scale projects.