nf-core/rnavar
gatk4 RNA variant calling pipeline
Define where the pipeline should find input data and save output data.
Path to comma-separated file containing information about the samples in the experiment.
string
^\S+\.csv$
A design file with information about the samples in your experiment. Use this parameter to specify the location of the input files. It has to be a comma-separated file with a header row. See usage docs.
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
string
Email address for completion summary.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config
) then you don't need to specify this on the command line for every run.
MultiQC report title. Printed as page header, used for filename if not otherwise specified.
string
Save FastQ files after merging re-sequenced libraries in the results directory.
boolean
Reference genome related files and options required for the workflow.
Name of iGenomes reference.
string
If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. --genome GRCh38
.
See the nf-core website docs for more details.
Path to FASTA genome file.
string
^\S+\.fn?a(sta)?(\.gz)?$
This parameter is mandatory if --genome
is not specified. If you don't have a BWA index available this will be generated for you automatically. Combine with --save_reference
to save BWA index for future runs.
Path to FASTA dictionary file.
string
NB If none provided, will be generated automatically from the FASTA reference.
Path to FASTA reference index.
string
NB If none provided, will be generated automatically from the FASTA reference
Path to GTF annotation file.
string
This parameter is mandatory if --genome
is not specified.
Path to GFF3 annotation file.
string
This parameter must be specified if --genome
or --gtf
are not specified.
Path to BED file containing exon intervals. This will be created from the GTF file if not specified.
string
Read length
number
150
Specify the read length for the STAR aligner.
If generated by the pipeline, save the STAR index in the results directory.
boolean
If the STAR index is generated by the pipeline, then please use this parameter to save it to your results folder. These index can then be used for future pipeline runs, reducing processing times.
Path to known indels VCF file
string
Path to known indels index file
string
Path to dbSNP VCF file
string
Path to dbSNP VCF index file
string
snpEff DB version.
string
If you use AWS iGenomes, this has already been set for you appropriately.
This is used to specify the database to be use to annotate with.
Alternatively databases' names can be listed with the snpEff databases
.
snpEff genome.
string
If you use AWS iGenomes, this has already been set for you appropriately.
This is used to specify the genome when using the container with pre-downloaded cache.
VEP genome.
string
If you use AWS iGenomes, this has already been set for you appropriately.
This is used to specify the genome when using the container with pre-downloaded cache.
VEP species.
string
If you use AWS iGenomes, this has already been set for you appropriately.
Alternatively species listed in Ensembl Genomes caches can be used.
VEP cache version.
string
If you use AWS iGenomes, this has already been set for you appropriately.
Alternatively cache version can be use to specify the correct Ensembl Genomes version number as these differ from the concurrent Ensembl/VEP version numbers
Type of feature to parse from annotation file
string
This parameter value can be exon, transcript or gene. Default exon
Download annotation cache.
boolean
Set this parameter, if you wish to download annotation cache.
Do not load the iGenomes reference config.
boolean
Do not load igenomes.config
when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config
.
The base path to the igenomes reference files
string
s3://ngi-igenomes/igenomes/
Define parameters related to read alignment
Specifies the alignment algorithm to use. Currently available option is 'star'
string
star
This parameter define which aligner is to be used for aligning the RNA reads to the reference genome. Currently only STAR aligner is supported. So use 'star' as the value for this option.
Path to STAR index folder or compressed file (tar.gz)
string
This parameter can be used if there is an pre-defined STAR index available. You can either give the full path to the index directory or a compressed file in tar.gz format.
Enable STAR 2-pass mapping mode.
boolean
This parameter enables STAR to perform 2-pass mapping. Default true.
Do not use GTF file during STAR index buidling step
boolean
Do not use parameter --sjdbGTFfile <GTF file> during the STAR genomeGenerate process.
Option to limit RAM when sorting BAM file. Value to be specified in bytes. If 0, will be set to the genome index size.
integer
This parameter specifies the maximum available RAM (bytes) for sorting BAM during STAR alignment.
Specifies the number of genome bins for coordinate-sorting
integer
50
This parameter specifies the number of bins to be used for coordinate sorting during STAR alignment step.
Specifies the maximum number of collapsed junctions
integer
1000000
Sequencing center information to be added to read group of BAM files.
string
This parameter is required for creating a proper BAM header to use in the downstream analysis of GATK.
Specify the sequencing platform used
string
illumina
This parameter is required for creating a proper BAM header to use in the downstream analysis of GATK.
Where possible, save unaligned reads from aligner to the results directory.
boolean
This may either be in the form of FastQ or BAM files depending on the options available for that particular tool.
Save the intermediate BAM files from the alignment step.
boolean
By default, intermediate BAM files will not be saved. The final BAM files created after the appropriate filtering step are always saved to limit storage usage. Set this parameter to also save other intermediate BAM files.
Create a CSI index for BAM files instead of the traditional BAI index. This will be required for genomes with larger chromosome sizes.
boolean
Specify whether to remove duplicates from the BAM during Picard MarkDuplicates step.
boolean
Specify true for removing duplicates from BAM file during Picard MarkDuplicates step.
The minimum phred-scaled confidence threshold at which variants should be called.
number
20
Specify the minimum phred-scaled confidence threshold at which variants should be called.
Enable generation of GVCFs by sample additionnaly to the VCFs.
boolean
This parameter enables GATK HAPLOTYPECALLER to generate GVCFs. Default false.
Specify which tools RNAvar should use for annotating variants. Values can be 'snpeff', 'vep' or 'merge'. If you specify 'merge', the pipeline runs both snpeff and VEP annotation.
string
List of tools to be used for variant annotation.
This parameter must be a combination of the following values:snpeff
, vep
, merge
Path to VEP cache.
string
s3://annotation-cache/vep_cache/
Path to VEP cache which should contain the relevant species, genome and build directories at the path ${vep_species}/${vep_genome}_${vep_cache_version}
Path to snpEff cache.
string
s3://annotation-cache/snpeff_cache/
Path to snpEff cache which should contain the relevant genome and build directory in the path ${snpeff_species}.${snpeff_version}
Allow usage of fasta file for annotation with VEP
boolean
By pointing VEP to a FASTA file, it is possible to retrieve reference sequence locally. This enables VEP to retrieve HGVS notations (--hgvs), check the reference sequence given in input data, and construct transcript models from a GFF or GTF file without accessing a database.
For details, see here.
Path to dbNSFP processed file.
string
To be used with --vep_dbnsfp
.
dbNSFP files and more information are available at https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#dbnsfp and https://sites.google.com/site/jpopgen/dbNSFP/
Path to dbNSFP tabix indexed file.
string
To be used with --vep_dbnsfp
.
Consequence to annotate with
string
To be used with --vep_dbnsfp
.
This params is used to filter/limit outputs to a specific effect of the variant.
The set of consequence terms is defined by the Sequence Ontology and an overview of those used in VEP can be found here: https://www.ensembl.org/info/genome/variation/prediction/predicted_data.html
If one wants to filter using several consequences, then separate those by using '&' (i.e. 'consequence=3_prime_UTR_variant&intron_variant'.
Fields to annotate with
string
rs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AF
To be used with --vep_dbnsfp
.
This params can be used to retrieve individual values from the dbNSFP file. The values correspond to the name of the columns in the dbNSFP file and are separated by comma.
The column names might differ between the different dbNSFP versions. Please check the Readme.txt file, which is provided with the dbNSFP file, to obtain the correct column names. The Readme file contains also a short description of the provided values and the version of the tools used to generate them.
Default value are explained below:
rs_dbSNP - rs number from dbSNP
HGVSc_VEP - HGVS coding variant presentation from VEP. Multiple entries separated by ';', corresponds to Ensembl_transcriptid
HGVSp_VEP - HGVS protein variant presentation from VEP. Multiple entries separated by ';', corresponds to Ensembl_proteinid
1000Gp3_EAS_AF - Alternative allele frequency in the 1000Gp3 East Asian descendent samples
1000Gp3_AMR_AF - Alternative allele counts in the 1000Gp3 American descendent samples
LRT_score - Original LRT two-sided p-value (LRTori), ranges from 0 to 1
GERP++_RS - Conservation score. The larger the score, the more conserved the site, ranges from -12.3 to 6.17
gnomAD_exomes_AF - Alternative allele frequency in the whole gnomAD exome samples.
Path to spliceai raw scores snv file.
string
To be used with --vep_spliceai
.
Path to spliceai raw scores snv tabix indexed file.
string
To be used with --vep_spliceai
.
Path to spliceai raw scores indel file.
string
To be used with --vep_spliceai
.
Path to spliceai raw scores indel tabix indexed file.
string
To be used with --vep_spliceai
.
Add an extra custom argument to VEP.
string
--everything --filter_common --per_gene --total_length --offline --format vcf
Using this params you can add custom args to VEP.
Use annotation cache keys for snpeff_cache and vep_cache.
Only when using annotation-cache or a similar structure.
See here for more information.
boolean
The output directory where the cache will be saved. You have to use absolute paths to storage on Cloud infrastructure.
string
VEP output-file format.
string
Sets the format of the output-file from VEP. Available formats: json, tab and vcf.
Define parameters that control the stages in the pipeline
Skip the process of base recalibration steps i.e., GATK BaseRecalibrator and GATK ApplyBQSR.
boolean
This parameter disable the base recalibration step, thus using a un-calibrated BAM file for variant calling.
Skip the process of preparing interval lists for the GATK variant calling step
boolean
This parameter disable preparing multiple interval lists to use with HaplotypeCaller module of GATK. It is recommended not to disable the step as it is required to run the variant calling correctly.
Skip variant filtering of GATK
boolean
Set this parameter if you don't want to filter any variants.
Skip variant annotation
boolean
Set this parameter if you don't want to run variant annotation.
Skip MultiQC reports
boolean
This parameter disable all QC reports
Define parameters of the tools used in the pipeline
Number of times the gene interval list to be split in order to run GATK haplotype caller in parallel
integer
25
Set this parameter to decide the number of splits for the gene interval list file.
Do not use gene interval file during variant calling
boolean
This parameter, if set to True, does not use the gene intervals during the variant calling step, which then results in variants from all regions including non-genic. Default is False
The window size (in bases) in which to evaluate clustered SNPs.
integer
35
This parameter is used by GATK variant filteration step. It defines the window size (in bases) in which to evaluate clustered SNPs. It has to be used together with the other option 'cluster'.
The number of SNPs which make up a cluster. Must be at least 2.
integer
3
This parameter is used by GATK variant filteration step. It defines the number of SNPs which make up a cluster within a window. Must be at least 2.
Value to be used for the FisherStrand (FS) filter
number
30
This parameter defines the value to use for the FisherStrand (FS) filter in the GATK variant-filtering step.
The value should given in a float number format. Default is 30.0
Value to be used for the QualByDepth (QD) filter
number
2
This parameter defines the value to use for the QualByDepth (QD) filter in the GATK variant-filtering step.
The value should given in a float number format. Default is 2.0
Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
string
master
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.
Institutional config name.
string
Institutional config description.
string
Institutional config contact information.
string
Institutional config URL link.
string
Less common options for the pipeline, typically set in a config file.
Display version and exit.
boolean
Method used to save pipeline results to output directory.
string
The Nextflow publishDir
option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.
Email address for completion summary, only when pipeline fails.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.
Send plain-text email instead of HTML.
boolean
File size limit when attaching MultiQC reports to summary emails.
string
25.MB
^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$
Do not use coloured log outputs.
boolean
Incoming hook URL for messaging service
string
Incoming hook URL for messaging service. Currently, MS Teams and Slack are supported.
Custom config file to supply to MultiQC.
string
Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
string
Custom MultiQC yaml file containing HTML including a methods description.
string
Boolean whether to validate parameters against the schema at runtime
boolean
true
Base URL or local path to location of pipeline test dataset files
string
https://raw.githubusercontent.com/nf-core/test-datasets/modules/data
Base URL or local path to location of pipeline test dataset files
string
https://raw.githubusercontent.com/nf-core/test-datasets/modules/data
Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.
string