Title: | Dada-BLAST-Taxon Assign-Condense Metabarcode Analysis |
---|---|
Description: | First using 'dada2' R tools to analyse metabarcode data, the 'DBTC' package then uses the BLAST algorithm to search unknown sequences against local databases, and then takes reduced matched results and provides best taxonomic assignments. |
Authors: | Robert G Young [aut, cre, cph] |
Maintainer: | Robert G Young <[email protected]> |
License: | GPL-2 | GPL-3 |
Version: | 0.1.0 |
Built: | 2024-10-26 04:42:17 UTC |
Source: | https://github.com/rgyoung6/dbtc |
This function takes a file selection and then uses all 'taxaAssign' files in that directory and combines them into a single output 'taxaAssignCombined.tsv' file.
combine_assign_output(fileLoc = NULL, numCores = 1, verbose = TRUE)
combine_assign_output(fileLoc = NULL, numCores = 1, verbose = TRUE)
fileLoc |
The location of a file in a directory where all of the 'taxaAssign' files are located (Default NULL). |
numCores |
The number of cores used to run the function (Default 1, Windows systems can only use a single core) |
verbose |
If set to TRUE then there will be output to the R console, if FALSE then this reporting data is suppressed (Default TRUE). |
The User Input: This function requires a file in a directory where all 'taxaAssign' files in that directory will be combined.
The examples are present to display the syntax for the function. These examples are not run because there are files required to run the functions, in some cases multiple files are necessary and some of these are quite large. To get specific examples please see https://github.com/rgyoung6/DBTCShinyTutorial/blob/main/README.md
This function produces a 'YYYY_MM_DD_HHMM_taxaAssignCombined.tsv' and a 'YYYY_MM_DD_HHMM_taxaAssignCombined.txt' file in the selected target directory.
WARNING - NO WHITESPACE!
When running DBTC functions the paths for the files selected cannot have white space! File folder locations should be as short as possible (close to the root as some functions do not process long naming conventions.
Also, special characters should be avoided (including question mark, number sign, exclamation mark). It is recommended that dashes be used for separations in naming conventions while retaining underscores for use as information delimiters (this is how DBTC functions use underscore).
There are several key character strings used in the DBTC pipeline, the presence of these strings in file or folder names will cause errors when running DBTC functions.
The following strings are those used in DBTC and should not be used in file or folder naming: - _BLAST - _combinedDada - _taxaAssign - _taxaAssignCombined - _taxaReduced - _CombineTaxaReduced
Robert G. Young
<https://github.com/rgyoung6/DBTC> Young, R. G., Hanner, R. H. (Submitted October 2023). Dada-BLAST-Taxon Assign-Condense Shiny Application (DBTCShiny). Biodiversity Data Journal.
dada_implement() combine_dada_output() make_BLAST_DB() seq_BLAST() taxon_assign() reduce_taxa() combine_reduced_output()
## Not run: combine_assign_output() combine_assign_output(fileLoc = NULL, numCores = 1) ## End(Not run)
## Not run: combine_assign_output() combine_assign_output(fileLoc = NULL, numCores = 1) ## End(Not run)
This function uses DBTC dada_implement ASV output files (YYYY_MM_DD_HH_MM_UserInputRunName_Merge, YYYY_MM_DD_HH_MM_UserInputRunName_MergeFwdRev, and/or YYYY_MM_DD_HH_MM_UserInputRunName_TotalTable) and combines them into a single ASV table with accompanying fasta file. This function also produces a file containing the processing information for the function. The main input argument for this function is the location of a file in a folder containing all ASV tables wanting to be combined. Output files are generated with the naming convention YYYY_MM_DD_HH_MM_combinedDada.
combine_dada_output(fileLoc = NULL, minLen = 100, verbose = TRUE)
combine_dada_output(fileLoc = NULL, minLen = 100, verbose = TRUE)
fileLoc |
Select a file in the file folder with dada_implement() results you would like to combine (YYYY_MM_DD_HHMM_FileName_MergeFwdRev OR YYYY_MM_DD_HHMM_FileName_Merge both .tsv and .fas files (Default NULL). |
minLen |
The minimum final desired length of the read (Default 100). |
verbose |
If set to TRUE then there will be output to the R console, if FALSE then this reporting data is suppressed (Default TRUE). |
Two or more files to be combined are required as input for this function. These files need to be ASV files as outputted from the dada_implement() and can include Merge, MergeFwdRev, or TotalTable.tsv files. In addition, the user can input the desired minimum length of sequences that are wanted in the output combined file.
The examples are present to display the syntax for the function. These examples are not run because there are files required to run the functions, in some cases multiple files are necessary and some of these are quite large. To get specific examples please see https://github.com/rgyoung6/DBTCShinyTutorial/blob/main/README.md
The output from this function includes three files. 1. YYYY_MM_DD_HHMM_combinedDada.tsv - combined ASV table 2. YYYY_MM_DD_HHMM_combinedDada.fas - combined fasta file 3. YYYY_MM_DD_HHMM_combinedDada.txt - Summary file from the combine_dada_output run
WARNING - NO WHITESPACE!
When running DBTC functions the paths for the files selected cannot have white space! File folder locations should be as short as possible (close to the root as some functions do not process long naming conventions.
Also, special characters should be avoided (including question mark, number sign, exclamation mark). It is recommended that dashes be used for separations in naming conventions while retaining underscores for use as information delimiters (this is how DBTC functions use underscore).
There are several key character strings used in the DBTC pipeline, the presence of these strings in file or folder names will cause errors when running DBTC functions.
The following strings are those used in DBTC and should not be used in file or folder naming: - _BLAST - _combinedDada - _taxaAssign - _taxaAssignCombined - _taxaReduced - _CombineTaxaReduced
Robert G. Young
<https://github.com/rgyoung6/DBTC> Young, R. G., Hanner, R. H. (Submitted October 2023). Dada-BLAST-Taxon Assign-Condense Shiny Application (DBTCShiny). Biodiversity Data Journal.
dada_implement() make_BLAST_DB() seq_BLAST() taxon_assign() combine_assign_output() reduce_taxa() combine_reduced_output()
## Not run: combine_dada_output() combine_dada_output(fileLoc = NULL, minLen = 100) ## End(Not run)
## Not run: combine_dada_output() combine_dada_output(fileLoc = NULL, minLen = 100) ## End(Not run)
This function takes a file selection and then uses all 'taxaReduced' files in that directory and combines them into a single taxa table file with presence absence results.The output file is named with the string _CombineTaxaReduced.tsv
combine_reduced_output(fileLoc = NULL, presenceAbsence = TRUE, verbose = TRUE)
combine_reduced_output(fileLoc = NULL, presenceAbsence = TRUE, verbose = TRUE)
fileLoc |
The location of a file in a directory where all of the 'taxa_assign' and/or 'combined_taxa_assign' files are located (Default NULL). |
presenceAbsence |
This setting is a TRUE or FASLE value that indicates if the results will include read counts or be reduced to 0/1 presence absence values (Default TRUE) |
verbose |
If set to TRUE then there will be output to the R console, if FALSE then this reporting data is suppressed (Default TRUE). |
The User Input: This function requires a file in a directory where all 'taxaReduced' files in that directory will be combined. The output format will be a taxa table with all taxa from all files combined into a single table with presence absence (0 or 1) results. The value metrics for the identification of the taxa from each combined file will remain in a column with the parenthetical results from the 'taxaReduced' files ("Num_Rec", "Coverage", "Identity", "Max_eVal").
The examples are present to display the syntax for the function. These examples are not run because there are files required to run the functions, in some cases multiple files are necessary and some of these are quite large. To get specific examples please see https://github.com/rgyoung6/DBTCShinyTutorial/blob/main/README.md
This function produces a single 'YYYY_MM_DD_HHMM_CombineTaxaReduced' file and associated summary file in the target directory.
WARNING - NO WHITESPACE!
When running DBTC functions the paths for the files selected cannot have white space! File folder locations should be as short as possible (close to the root as some functions do not process long naming conventions.
Also, special characters should be avoided (including question mark, number sign, exclamation mark). It is recommended that dashes be used for separations in naming conventions while retaining underscores for use as information delimiters (this is how DBTC functions use underscore).
There are several key character strings used in the DBTC pipeline, the presence of these strings in file or folder names will cause errors when running DBTC functions.
The following strings are those used in DBTC and should not be used in file or folder naming: - _BLAST - _combinedDada - _taxaAssign - _taxaAssignCombined - _taxaReduced - _CombineTaxaReduced
Robert G. Young
<https://github.com/rgyoung6/DBTC> Young, R. G., Hanner, R. H. (Submitted October 2023). Dada-BLAST-Taxon Assign-Condense Shiny Application (DBTCShiny). Biodiversity Data Journal.
dada_implement() combine_dada_output() make_BLAST_DB() seq_BLAST() taxon_assign() combine_assign_output() reduce_taxa()
## Not run: combine_reduced_output() combine_reduced_output(fileLoc = NULL, presenceAbsence = TRUE) ## End(Not run)
## Not run: combine_reduced_output() combine_reduced_output(fileLoc = NULL, presenceAbsence = TRUE) ## End(Not run)
This function requires a main directory containing a folder(s) representing sequencing runs which in-turn contain fastq files (the location of one of the fastq files in one of the sequencing run folders is used as an input argument). All sequencing folders in the main directory need to represent data from sequencing runs that have used the same primers and protocols. Output from this function includes all processing files and final main output files in the form of fasta files and amplicon sequencing variant (ASV) tables.
dada_implement( runFolderLoc = NULL, primerFile = NULL, fwdIdent = "_R1_001", revIdent = "_R2_001", unidirectional = FALSE, bidirectional = TRUE, printQualityPdf = TRUE, maxPrimeMis = 2, fwdTrimLen = 0, revTrimLen = 0, maxEEVal = 2, truncQValue = 2, truncLenValueF = 0, truncLenValueR = 0, error = 0.1, nbases = 1e+80, maxMismatchValue = 0, minOverlapValue = 12, trimOverhang = FALSE, minFinalSeqLen = 100, verbose = TRUE )
dada_implement( runFolderLoc = NULL, primerFile = NULL, fwdIdent = "_R1_001", revIdent = "_R2_001", unidirectional = FALSE, bidirectional = TRUE, printQualityPdf = TRUE, maxPrimeMis = 2, fwdTrimLen = 0, revTrimLen = 0, maxEEVal = 2, truncQValue = 2, truncLenValueF = 0, truncLenValueR = 0, error = 0.1, nbases = 1e+80, maxMismatchValue = 0, minOverlapValue = 12, trimOverhang = FALSE, minFinalSeqLen = 100, verbose = TRUE )
runFolderLoc |
Select a file in the one of the run folders with the fastq files of interest (Default NULL). |
primerFile |
Select a file with the primers for this analysis (Default NULL). |
fwdIdent |
Forward identifier naming string (Default '_R1_001'). |
revIdent |
Reverse identifier naming string (Default '_R2_001'). |
unidirectional |
Selection to process files independently (Default FALSE). |
bidirectional |
Selection to process paired forward and reverse sequence for analysis (Default TRUE). |
printQualityPdf |
Selection to process save image files showing quality metrics (Default TRUE). |
maxPrimeMis |
Maximum number of mismatches allowed when pattern matching trimming the primers from the ends of the reads for the ShortRead trimLRPatterns() function (Default 2). |
fwdTrimLen |
Select a forward trim length for the Dada filterAndTrim() function (Default 0). |
revTrimLen |
Select a reverse trim length for the Dada filterAndTrim() function (Default 0). |
maxEEVal |
Maximum number of expected errors allowed in a read for the Dada filterAndTrim() function (Default 2). |
truncQValue |
Truncation value use to trim ends of reads, nucleotides with quality values less than this value will be used to trim the remainder of the reads for the Dada filterAndTrim() function (Default 2). |
truncLenValueF |
Dada forward length trim value for the Dada filterAndTrim() function. This function is set to 0 when the pattern matching trim function is enabled (Default 0). |
truncLenValueR |
Dada reverse length trim value for the Dada filterAndTrim() function. This function is set to 0 when the pattern matching trim function is enabled (Default 0). |
error |
Percent of fastq files used to assess error rates for the Dada learnErrors() function (Default 0.1). |
nbases |
The total number of bases used to assess errors for the Dada learnErrors() function (Default 1e80) NOTE: this value is set very high to get all nucleotides in the error present file subset. If the error is to be assessed using total reads and not specific fastq files then set the error to 1 and set this value to the desired number of reads. |
maxMismatchValue |
Maximum number of mismatches allowed when merging two reads for the Dada mergePairs() function (Default 2). |
minOverlapValue |
Minimum number of overlapping nucleotides for the forward and reverse reads for the Dada mergePairs() function (Default 12). |
trimOverhang |
Trim merged reads past the start of the complimentary primer regions for the Dada mergePairs() function (Default FALSE). |
minFinalSeqLen |
The minimum final desired length of the read (Default 100). |
verbose |
If set to TRUE then there will be output to the R console, if FALSE then this reporting data is suppressed (Default TRUE). |
Two file types are required as input for the dada_implement() function. The first are the fastq files in the appropriate folder structure (see below) and the second is a file containing the primers used for the amplification of the sequence reads.
Fastq File Folder Structure
Parent Directory | | —————– | | | | Run1 Directory Run2 Directory -Fastq -Fastq -Fastq -Fastq ... ...
Format of the primer file
| Forward | Reverse | | AGTGTGTAGTGATTG | CGCATCGCTCAGACTGACTGC | | GAGCCCTCGATCGCT | GGTCGATAGCTACGCGCGCATACGACT | | | GGTTCACATCGCATTCAT |
The examples are present to display the syntax for the function. These examples are not run because there are files required to run the functions, in some cases multiple files are necessary and some of these are quite large. To get specific examples please see https://github.com/rgyoung6/DBTCShinyTutorial/blob/main/README.md
The output from this function includes four folders. A_Qual - Contains quality pdf files for the input fastq files (if printQualityPdf set to TRUE). B_Filt - Contains dada filtered fastq files and a folder with the end trimmed fastq files before quality filtering. C_FiltQual - Contains quality pdf files for the filtered fastq files (if printQualityPdf set to TRUE). D_Output - This folder contains output files including and analysis summary, an analysis summary table of processing values, forward and reverse error assessments, and finally the output ASV and fasta files of obtained sequences. -TotalTable.tsv
WARNING - NO WHITESPACE!
When running DBTC functions the paths for the files selected cannot have white space! File folder locations should be as short as possible (close to the root as some functions do not process long naming conventions.
Also, special characters should be avoided (including question mark, number sign, exclamation mark). It is recommended that dashes be used for separations in naming conventions while retaining underscores for use as information delimiters (this is how DBTC functions use underscore).
There are several key character strings used in the DBTC pipeline, the presence of these strings in file or folder names will cause errors when running DBTC functions.
The following strings are those used in DBTC and should not be used in file or folder naming: - _BLAST - _combinedDada - _taxaAssign - _taxaAssignCombined - _taxaReduced - _CombineTaxaReduced
Robert G. Young
<https://github.com/rgyoung6/DBTC> Young, R. G., Hanner, R. H. (Submitted October 2023). Dada-BLAST-Taxon Assign-Condense Shiny Application (DBTCShiny). Biodiversity Data Journal.
combine_dada_output() make_BLAST_DB() seq_BLAST() taxon_assign() combine_assign_output() reduce_taxa() combine_reduced_output()
## Not run: dada_implement() dada_implement(runFolderLoc = NULL, primerFile = NULL,fwdIdent = "_R1_001", revIdent = "_R2_001",unidirectional = FALSE, bidirectional = TRUE, printQualityPdf = TRUE, maxPrimeMis = 2, fwdTrimLen = 0, revTrimLen = 0,maxEEVal=2, truncQValue = 2, truncLenValueF = 0, truncLenValueR = 0,error = 0.1, nbases = 1e80, maxMismatchValue = 0, minOverlapValue = 12,trimOverhang = FALSE, minFinalSeqLen = 100) ## End(Not run)
## Not run: dada_implement() dada_implement(runFolderLoc = NULL, primerFile = NULL,fwdIdent = "_R1_001", revIdent = "_R2_001",unidirectional = FALSE, bidirectional = TRUE, printQualityPdf = TRUE, maxPrimeMis = 2, fwdTrimLen = 0, revTrimLen = 0,maxEEVal=2, truncQValue = 2, truncLenValueF = 0, truncLenValueR = 0,error = 0.1, nbases = 1e80, maxMismatchValue = 0, minOverlapValue = 12,trimOverhang = FALSE, minFinalSeqLen = 100) ## End(Not run)
This function takes a fasta file (in MACER format) and establishes a database upon which a BLAST search can be completed.
make_BLAST_DB( fileLoc = NULL, makeblastdbPath = "makeblastdb", taxaDBLoc = NULL, dbName = NULL, minLen = 100, verbose = TRUE )
make_BLAST_DB( fileLoc = NULL, makeblastdbPath = "makeblastdb", taxaDBLoc = NULL, dbName = NULL, minLen = 100, verbose = TRUE )
fileLoc |
The location of a file in a directory where all fasta files will be used to construct a BLASTable database (Default NULL). |
makeblastdbPath |
The local path for the blast+ makeblastdbPath program (Default 'makeblastdb'). |
taxaDBLoc |
The location of the NCBI taxonomic data base (Default NULL; for accessionTaxa.sql see the main DBTC page for details). |
dbName |
A short 6-8 alpha character name used when building a database (Default NULL). |
minLen |
The minimum sequence length used to construct the BLAST database (Default 100). |
verbose |
If set to TRUE then there will be output to the R console, if FALSE then this reporting data is suppressed (Default TRUE). |
The user inputs the location of a file in a directory that contains a properly formatted fasta file which can be used to construct a BLASTable database. The NCBI blast+ program, makeblastdb and the NCBI taxonomic database (accessionTaxa.sql) are required to run this script (see readme instructions for details).
The examples are present to display the syntax for the function. These examples are not run because there are files required to run the functions, in some cases multiple files are necessary and some of these are quite large. To get specific examples please see https://github.com/rgyoung6/DBTCShinyTutorial/blob/main/README.md
The output from this function includes a folder with the BLAST database named according to the submitted dbName
WARNING - NO WHITESPACE!
When running DBTC functions the paths for the files selected cannot have white space! File folder locations should be as short as possible (close to the root as some functions do not process long naming conventions.
Also, special characters should be avoided (including question mark, number sign, exclamation mark). It is recommended that dashes be used for separations in naming conventions while retaining underscores for use as information delimiters (this is how DBTC functions use underscore).
There are several key character strings used in the DBTC pipeline, the presence of these strings in file or folder names will cause errors when running DBTC functions.
The following strings are those used in DBTC and should not be used in file or folder naming: - _BLAST - _combinedDada - _taxaAssign - _taxaAssignCombined - _taxaReduced - _CombineTaxaReduced
Robert G. Young
<https://github.com/rgyoung6/DBTC> Young, R. G., Hanner, R. H. (Submitted October 2023). Dada-BLAST-Taxon Assign-Condense Shiny Application (DBTCShiny). Biodiversity Data Journal.
dada_implement() combine_dada_output() seq_BLAST() taxon_assign() combine_assign_output() reduce_taxa() combine_reduced_output()
## Not run: make_BLAST_DB() make_BLAST_DB(fileLoc = NULL, makeblastdbPath = "makeblastdb", taxaDBLoc = NULL, inputFormat = NULL, dbName = NULL, minLen = 100) ## End(Not run)
## Not run: make_BLAST_DB() make_BLAST_DB(fileLoc = NULL, makeblastdbPath = "makeblastdb", taxaDBLoc = NULL, inputFormat = NULL, dbName = NULL, minLen = 100) ## End(Not run)
This function takes a file selection and then uses all '_taxaAssign_YYYY_MM_DD_HHMM.tsv' and/or 'YYYY_MM_DD_HHMM_taxaAssignCombined.tsv' files in that directory and reduces all ASV with the same taxonomic assignment into a single taxonomic result for each submitted file. The results are then placed in to a '_taxaReduced_YYYY_MM_DD_HHMM.tsv' file for each of the target files in the directory.
reduce_taxa(fileLoc = NULL, numCores = 1, verbose = TRUE)
reduce_taxa(fileLoc = NULL, numCores = 1, verbose = TRUE)
fileLoc |
The location of a file in a directory where all of the 'taxaAssign' and/or 'taxaAssignCombine' files are located (Default NULL). |
numCores |
The number of cores used to run the function (Default 1, Windows systems can only use a single core) |
verbose |
If set to TRUE then there will be output to the R console, if FALSE then this reporting data is suppressed (Default TRUE). |
This function requires a file in a directory where all '_taxaAssign_YYYY_MM_DD_HHMM.tsv' and/or 'YYYY_MM_DD_HHMM_taxaAssignCombined.tsv' files in that directory will be combined. All records with the same taxonomic result will be combined. The BLAST values in parentheses ("Num_Rec", "Coverage", "Identity", "Max_eVal") are combine by the mean number of records, the mean of the minimum coverage and identity values, and the mean of the maximum eValues.
The examples are present to display the syntax for the function. These examples are not run because there are files required to run the functions, in some cases multiple files are necessary and some of these are quite large. To get specific examples please see https://github.com/rgyoung6/DBTCShinyTutorial/blob/main/README.md
This function produces a 'taxa_reduced' file for every 'taxaAssign' or 'taxaAssignCombine' present in the target directory.
WARNING - NO WHITESPACE!
When running DBTC functions the paths for the files selected cannot have white space! File folder locations should be as short as possible (close to the root as some functions do not process long naming conventions.
Also, special characters should be avoided (including question mark, number sign, exclamation mark). It is recommended that dashes be used for separations in naming conventions while retaining underscores for use as information delimiters (this is how DBTC functions use underscore).
There are several key character strings used in the DBTC pipeline, the presence of these strings in file or folder names will cause errors when running DBTC functions.
The following strings are those used in DBTC and should not be used in file or folder naming: - _BLAST - _combinedDada - _taxaAssign - _taxaAssignCombined - _taxaReduced - _CombineTaxaReduced
Robert G. Young
<https://github.com/rgyoung6/DBTC> Young, R. G., Hanner, R. H. (Submitted October 2023). Dada-BLAST-Taxon Assign-Condense Shiny Application (DBTCShiny). Biodiversity Data Journal.
dada_implement() combine_dada_output() make_BLAST_DB() seq_BLAST() taxon_assign() combine_assign_output() combine_reduced_output()
## Not run: reduce_taxa() reduce_taxa(fileLoc = NULL, numCores = 1) ## End(Not run)
## Not run: reduce_taxa() reduce_taxa(fileLoc = NULL, numCores = 1) ## End(Not run)
This function takes fasta files as input along with a user selected NCBI formatted database to BLAST sequences against. The outcome of the function are two files, a BLAST run file and a single file containing all of the BLAST results in tab delimited format (Note: there are no headers but the columns are, query sequence ID, search sequence ID, search taxonomic ID, query to sequence coverage, percent identity, search scientific name, search common name, query start, query end, search start, search end, e-value.
seq_BLAST( databasePath = NULL, querySeqPath = NULL, blastnPath = "blastn", minLen = 100, BLASTResults = 200, numCores = 1, verbose = TRUE )
seq_BLAST( databasePath = NULL, querySeqPath = NULL, blastnPath = "blastn", minLen = 100, BLASTResults = 200, numCores = 1, verbose = TRUE )
databasePath |
The location of a file in a directory where the desired BLAST database is located (Default NULL). |
querySeqPath |
The location of a file in a directory containing all of the fasta files wishing to be BLASTed (Default NULL). |
blastnPath |
The location of the NCBI blast+ blastn program (Default 'blastn'). |
minLen |
The minimum length of the sequences that will be BLASTed (Default 100). |
BLASTResults |
The number of returned results, or the depth of the reported results, saved from the BLAST (Default 200). |
numCores |
The number of cores used to run the function (Default 1, Windows systems can only use a single core). |
verbose |
If set to TRUE then there will be output to the R console, if FALSE then this reporting data is suppressed (Default TRUE). |
The user input provides a location for the BLAST database you would like to use by selecting a file in the target directory. Then provide the location of the query sequence file(s) by indicating a file in a directory that contains the fasta file(s) of interest. Provide the path for the blast+ blastn program. Finally, provide the minimum query sequence length to BLAST (Default 100), the depth of the BLAST returned results (default 200), and finally the number of cores to process the function (Default 1, Windows implementation will only accept this value as 1).
The examples are present to display the syntax for the function. These examples are not run because there are files required to run the functions, in some cases multiple files are necessary and some of these are quite large. To get specific examples please see https://github.com/rgyoung6/DBTCShinyTutorial/blob/main/README.md
Two files are produced from this function, a BLAST run file and a BLAST results file for each of the fasta files in the target directory.
WARNING - NO WHITESPACE!
When running DBTC functions the paths for the files selected cannot have white space! File folder locations should be as short as possible (close to the root as some functions do not process long naming conventions.
Also, special characters should be avoided (including question mark, number sign, exclamation mark). It is recommended that dashes be used for separations in naming conventions while retaining underscores for use as information delimiters (this is how DBTC functions use underscore).
There are several key character strings used in the DBTC pipeline, the presence of these strings in file or folder names will cause errors when running DBTC functions.
The following strings are those used in DBTC and should not be used in file or folder naming: - _BLAST - _combinedDada - _taxaAssign - _taxaAssignCombined - _taxaReduced - _CombineTaxaReduced
Robert G. Young
<https://github.com/rgyoung6/DBTC> Young, R. G., Hanner, R. H. (Submitted October 2023). Dada-BLAST-Taxon Assign-Condense Shiny Application (DBTCShiny). Biodiversity Data Journal.
dada_implement() combine_dada_output() make_BLAST_DB() taxon_assign() combine_assign_output() reduce_taxa() combine_reduced_output()
## Not run: seq_BLAST() seq_BLAST(databasePath = NULL, querySeqPath = NULL, blastnPath = "blastn", minLen = 100, BLASTResults = 200, numCores = 1) ## End(Not run)
## Not run: seq_BLAST() seq_BLAST(databasePath = NULL, querySeqPath = NULL, blastnPath = "blastn", minLen = 100, BLASTResults = 200, numCores = 1) ## End(Not run)
This function takes a BLAST result file and associated fasta files (either on their own or with accompanying ASV files generated from the dada_implement function) and collapses the multiple BLAST results into as single result for each query sequence. When an ASV table is present the taxonomic results will be combined with the ASV table.
taxon_assign( fileLoc = NULL, taxaDBLoc = NULL, numCores = 1, coverage = 95, ident = 95, propThres = 0.95, coverReportThresh = 0, identReportThresh = 0, includeAllDada = TRUE, verbose = TRUE )
taxon_assign( fileLoc = NULL, taxaDBLoc = NULL, numCores = 1, coverage = 95, ident = 95, propThres = 0.95, coverReportThresh = 0, identReportThresh = 0, includeAllDada = TRUE, verbose = TRUE )
fileLoc |
The location of a file in a directory where all of the paired fasta and BLAST (and potentially ASV) files are located (Default NULL). |
taxaDBLoc |
The location of the NCBI taxonomic data base (Default NULL; for accessionTaxa.sql see the main DBTC page for details). |
numCores |
The number of cores used to run the function (Default 1, Windows systems can only use a single core). |
coverage |
The percent coverage used for taxonomic assignment for the above threshold results (Default 95). |
ident |
The percent identity used for the taxonomic assignment for above threshold results (Default 95). |
propThres |
The proportional threshold flags the final result based on the preponderance of the data. So if the threshold is set to 0.95, results will be flagged if the taxa directly below the assigned taxa has fewer than 0.95 percent of the records causing the upward taxonomic placement (Default 0.95). |
coverReportThresh |
The percent coverage threshold used for reporting flags below this threshold (Default 95). |
identReportThresh |
The percent identity threshold used for reporting flags below this threshold (Default 95). |
includeAllDada |
When paired Dada ASV tables are present, when set to FALSE, this will exclude records without taxonomic assignment (Default TRUE). |
verbose |
If set to TRUE then there will be output to the R console, if FALSE then this reporting data is suppressed (Default TRUE). |
This function requires a BLAST output file and an associated fasta file. In addition, if present an ASV file will also be used and combined with the taxonomic results when present. The BLAST results are reduced to a single result for each read. At each taxonomic level there may be one or more taxonomic assignments. Each assignment has quality metrics in parentheses after the name. These values ("Num_Rec", "Coverage", "Identity", "Max_eVal") represent the number of records with this taxonomic placement, the minimum coverage and identity, and the maximum eValue for the reported taxa.
The examples are present to display the syntax for the function. These examples are not run because there are files required to run the functions, in some cases multiple files are necessary and some of these are quite large. To get specific examples please see https://github.com/rgyoung6/DBTCShinyTutorial/blob/main/README.md
This function produces a taxa_reduced file for each submitted BLAST-fasta submission.
WARNING - NO WHITESPACE!
When running DBTC functions the paths for the files selected cannot have white space! File folder locations should be as short as possible (close to the root as some functions do not process long naming conventions.
Also, special characters should be avoided (including question mark, number sign, exclamation mark). It is recommended that dashes be used for separations in naming conventions while retaining underscores for use as information delimiters (this is how DBTC functions use underscore).
There are several key character strings used in the DBTC pipeline, the presence of these strings in file or folder names will cause errors when running DBTC functions.
The following strings are those used in DBTC and should not be used in file or folder naming: - _BLAST - _combinedDada - _taxaAssign - _taxaAssignCombined - _taxaReduced - _CombineTaxaReduced
Robert G. Young
<https://github.com/rgyoung6/DBTC> Young, R. G., Hanner, R. H. (Submitted October 2023). Dada-BLAST-Taxon Assign-Condense Shiny Application (DBTCShiny). Biodiversity Data Journal.
dada_implement() combine_dada_output() make_BLAST_DB() seq_BLAST() combine_assign_output() reduce_taxa() combine_reduced_output()
## Not run: taxon_assign() taxon_assign(fileLoc = NULL, taxaDBLoc = NULL, numCores = 1, coverage = 95, ident = 95, propThres = 0.95, coverReportThresh=0, identReportThresh=0, includeAllDada=TRUE) ## End(Not run)
## Not run: taxon_assign() taxon_assign(fileLoc = NULL, taxaDBLoc = NULL, numCores = 1, coverage = 95, ident = 95, propThres = 0.95, coverReportThresh=0, identReportThresh=0, includeAllDada=TRUE) ## End(Not run)