Skip to content

Mock Community Example

Arkadiy-Garber edited this page Jan 20, 2019 · 13 revisions

Note: All code and data files below are found in the TaxonSluice GitHub subdirectory "mock_dataset".

The mock “OTU table” represents an analysis of a marine sediment environment. The dataset is comprised of three independent environmental samples (Samples A-C) along with 3 sample-specific blanks (Blank A-C). Each blank and sample are paired alphabetically (The blank for Sample_A is Blank_A, etc.) and are color-paired below (Table 1).

Running this dataset through TaxonSluice produced the following results (summarized below in Table 2):

OTUs 00006 and 00009 are automatically removed because they are exclusive to blank samples. OTUs 00002, 00004, 00005, 00007, 00008, and 00010 are flagged as potential contaminants. Lastly, based on three independent criteria, only OTUs 00001 & 00003 in this mock dataset are cleared as very unlikely to represent kit or environmental contaminants. The algorithm at this stage has removed certain contaminants only recovered from blanks and, based on user defined thresholds, it has also identified (flagged) potential contaminants. The user is provided a summary of the ten closest matches to the SILVA database for each flagged OTU (Table 3). It is now up to the user to decide to retain or remove, based on the information, if any, available for the OTU’s closest relatives in the SILVA database. This approach is also imperfect since the database may be biased towards highly sampled environments; however, in every instance, this step provides as much information as is currently available for the user to make decisions and, ultimately, gage the strength of their data.

Below we outline the rationale used to make a TaxonSluice-enabled decision on each of our flagged OTUs. We note that this is a subjective procedure, and does not safeguard against false positives or false negatives. Further, we strongly caution users about the use of 'isolation source' data, for the closest 16S matches in SILVA, to aid in the determination of contaminants; the SILVA database, as any other database, is inherently biased, and strong matches are not necessarily indicative of your OTUs environmental relevance.

Flagged OTU00002: Discard- This OTU was present in two of three independent blank samples. In both instances its sequence abundance was within the same order of magnitude in both environmental samples and blanks. Close relatives were isolated from soil rather than marine habitats; the OTU was thus discarded from the dataset.
Flagged OTU00004: Discard- This OTU was present in two of three independent blank samples. In one instance (Sample-Blank Pair B) the blank was had two orders of magnitude sequence counts relative to the environmental sample. Close relatives were isolated from the mammalian oral cavity rather than marine habitats. The OTU was thus discarded from the dataset.
Flagged OTU00005: Keep-This OTU was recovered from one of three independent blanks and two of three independent environmental samples. The sequence abundance of this OTU is two orders less in the blank relative to one of the environmental samples. Further all close relatives for which environmental data exists inhabit marine environments. The OTU was thus kept in the dataset.
Flagged OTU00007: Keep- The sequence abundance of this OTU is two orders less in the blank relative to the environmental sample where it was detected. All close relatives for which environmental data exists inhabit marine environments. The OTU was thus kept in the dataset.
Flagged OTU00010: Discard- This OTU was present in two of three independent blank samples. In one instance, sequence abundance is within the same order of magnitude in an environmental samples and a blanks. Close relatives were isolated from soil rather than marine habitats. The OTU was thus discarded from the dataset.

Our final, TaxonSluice parsed and user pared, “clean” OTU table (omitting blanks) is show below (Table 4).

Clone this wiki locally