After our analyses are finished we will send most users just two files. The first file is the report file that quantifies the percentage of all sequences that were found for each OTU (operational taxonomic unit) for each sample. The second is a library file that quantifies the matches between the sequences found in the sample and the sequences in our library.

Report file

The first file will be the break down of the relative number of OTU’s in each sample. This is called the report file.

The report file will have one column for the OTU ID. The subsequent columns will be the number of sequences of the different OTU’s in each sample. The column header will be the sample ID of the vial that you had placed the sample in.

Here is a sample portion of a report:

OTU ID S0001 S0002 S0003
1 246 124 34
2 1 120 156
3 0 0 150
4 321 12  3
sum 568 256 343

To get the relative abundances, you just need to divide each cell by the column sum:

OTU ID S0001 S0002 S0003
1 43.31% 48.44% 9.91%
2 0.18% 46.88% 45.48%
3 0.00% 0.00% 43.73%
4 56.51% 4.69% 0.87%

In this above example, 4 OTU’s are listed. For the first sample (S0001), 43.31% of the sequences were of OTU ID #1 and 0.18% were of OTU ID #2. In contrast, for sample S0002, 48.44% were of OTU ID #1 and 46.88% were of OTU ID #2.  Note, in report files, these percentages will always sum to 1 in a column.

Library matching file

The library matching file will look something like this…

OTU ID Genus Match Coverage
1 Callitropsis 100 100
1 Juniperus 100 100
1 Calocedrus 98.75 100
1 Cunninghamia 98.75 100
1 Cryptomeria 98.75 100
3 Triticum 100 100
3 Poa 100 100
3 Eremopoa 100 100

The first column is OTU ID. This is the identification number of the OTU that you will use to match OTU’s in your data file.

The second column is the genus of the species that are in our library that match or closely match with the OTU.

The third column is the % of base pairs in the sequence that directly match with sequences in the reference sequence in our library.

The fourth column is the query coverage of the sequence. This tells you what percentage of the query sequence is aligned with the reference sequence. We primarily use this to filter out shorter sequences that might be in the library but might represent a different part of the genome.

In the above example, OTU 1 has species in 2 genera that have 100% matches with sequence found in the sample. These are species in Callitropsis and Juniperus. Species in 3 genera are present in our library but only match with the sample sequence at the 98.75% level. This is likely just one base pair off.

To use this file, you start with the OTU ID in the data file and then use this library file as reference. If a particular sequence is matched with OTU ID #1, then it is likely either species of the genus Callitropsis and/or Juniperus. Note the species could be from a different genus that is not in our library but is closely related.

If an OTU does not match at the 97% level with any species in the database, there will not be any species listed on the library matching page. You will have to manually query the database to find the best match. To do this, go to the FASTA (pronounced FAST-AY) file. Find the OTU ID in question. Copy the sequence next to it. Go to http://blast.ncbi.nlm.nih.gov/Blast.cgi. Select nucleotide blast. Paste the OTU sequence in the box where it asks for the FASTA sequence. Hit BLAST.