TY - UNPB
T1 - THAPBI PICT - a fast, cautious, and accurate metabarcoding analysis pipeline
AU - Cock, Peter
AU - Cooke, David E.L.
AU - Thorpe, Peter
AU - Pritchard, Leighton
PY - 2023/4/6
Y1 - 2023/4/6
N2 - THAPBI PICT is an open source software pipeline for metabarcoding analysis with multiplexed Illumina paired-end reads, including where different amplicons are sequenced together. We demonstrate using worked examples with our own and public data sets how, with appropriate primer settings and a custom database, THAPBI PICT can be applied to other amplicons and organisms, and used for reanalysis of existing datasets. The core dataflow of the implementation is (i) data reduction to unique marker sequences, often called amplicon sequence variants (ASVs), (ii) dynamic thresholds for discarding low abundance sequences to remove noise and artifacts (rather than error correction by default), before (iii) classification using a curated reference database. The default classifier assigns a label to each query sequence based on a database match that is either perfect, or a single base pair edit away (substitution, deletion or insertion). Abundance thresholds for inclusion can be set by the user or automatically using per-batch negative or synthetic control samples. Output is designed for practical interpretation by nonspecialists and includes a read report (ASVs with classification and counts per sample), sample report (samples with counts per species classification), and a topological graph of ASVs as nodes with short edit distances as edges. Source code available from https://github.com/peterjc/thapbi-pict/with documentation including installation instructions.
AB - THAPBI PICT is an open source software pipeline for metabarcoding analysis with multiplexed Illumina paired-end reads, including where different amplicons are sequenced together. We demonstrate using worked examples with our own and public data sets how, with appropriate primer settings and a custom database, THAPBI PICT can be applied to other amplicons and organisms, and used for reanalysis of existing datasets. The core dataflow of the implementation is (i) data reduction to unique marker sequences, often called amplicon sequence variants (ASVs), (ii) dynamic thresholds for discarding low abundance sequences to remove noise and artifacts (rather than error correction by default), before (iii) classification using a curated reference database. The default classifier assigns a label to each query sequence based on a database match that is either perfect, or a single base pair edit away (substitution, deletion or insertion). Abundance thresholds for inclusion can be set by the user or automatically using per-batch negative or synthetic control samples. Output is designed for practical interpretation by nonspecialists and includes a read report (ASVs with classification and counts per sample), sample report (samples with counts per species classification), and a topological graph of ASVs as nodes with short edit distances as edges. Source code available from https://github.com/peterjc/thapbi-pict/with documentation including installation instructions.
KW - metabarcoding
KW - open source software
KW - amplicon sequence variants
UR - https://github.com/peterjc/thapbi-pict/
U2 - 10.1101/2023.03.24.534090
DO - 10.1101/2023.03.24.534090
M3 - Working Paper/Preprint
BT - THAPBI PICT - a fast, cautious, and accurate metabarcoding analysis pipeline
ER -