Pipeline Inputs

This page documents all input parameters for the pipeline.

Input/output options

`--input`

Type: string | Required | Format: file-path

Path to comma-separated file containing information about the samples in the experiment.

A design file with information about the samples in your experiment. Use this parameter to specify the location of the input files. It has to be a tab or comma-separated file with a header row or a JSON/YAML file. See usage docs.

Pattern: ^\S+\.(csv|tsv|yaml|yml|json)$

`--outdir`

Type: string | Required | Format: directory-path

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

`--tools`

Type: string | Optional

Specify which additional tools RNAvar should use. Values can be 'seq2hla', 'bcfann', 'snpeff', 'vep' or 'merge'. If you specify 'merge', the pipeline runs both snpeff and VEP annotation.

List of tools to be used in addition to variant calling: currently hlatyping with Seq2HLA and a choice of annotation tools.

Pattern: ^((bcfann|seq2hla|snpeff|vep|merge)*(,)*)*$

`--save_merged_fastq`

Type: boolean | Optional

Save FastQ files after merging re-sequenced libraries in the results directory.

Preprocessing of alignment

`--extract_umi`

Type: boolean | Optional

Specify whether to remove UMIs from the reads with UMI-tools extract.

`--umitools_extract_method`

Type: string | Optional

UMI pattern to use. Can be either 'string' (default) or 'regex'.

More details can be found in the UMI-tools documentation.

Default: string

Allowed values:

string
regex

`--umitools_bc_pattern`

Type: string | Optional

The UMI barcode pattern to use e.g. 'NNNNNN' indicates that the first 6 nucleotides of the read are from the UMI.

More details can be found in the UMI-tools documentation.

Pattern: ^[NXC]*$

`--umitools_bc_pattern2`

Type: string | Optional

The UMI barcode pattern to use if the UMI is located in read 2.

Pattern: ^[NXC]*$

`--umitools_umi_separator`

Type: string | Optional

The character that separates the UMI in the read name. Most likely a colon if you skipped the extraction with UMI-tools and used other software.

Alignment options

`--aligner`

Type: string | Required

Specifies the alignment algorithm to use.

This parameter define which aligner is to be used for aligning the RNA reads to the reference genome.

Default: star

Allowed values:

star

`--star_index`

Type: string | Optional | Format: path

Path to STAR index folder or compressed file (tar.gz)

This parameter can be used if there is an pre-defined STAR index available. You can either give the full path to the index directory or a compressed file in tar.gz format.

`--star_twopass`

Type: boolean | Optional

Enable STAR 2-pass mapping mode.

This parameter enables STAR to perform 2-pass mapping. Default true.

Default: True

`--star_ignore_sjdbgtf`

Type: boolean | Optional

Do not use GTF file during STAR index building step

Do not use parameter --sjdbGTFfile during the STAR genomeGenerate process.

`--star_max_memory_bamsort`

Type: integer | Optional

Option to limit RAM when sorting BAM file. Value to be specified in bytes. If 0, will be set to the genome index size.

This parameter specifies the maximum available RAM (bytes) for sorting BAM during STAR alignment.

Default: 0

`--star_bins_bamsort`

Type: integer | Optional

Specifies the number of genome bins for coordinate-sorting

This parameter specifies the number of bins to be used for coordinate sorting during STAR alignment step.

Default: 50

`--star_max_collapsed_junc`

Type: integer | Optional

Specifies the maximum number of collapsed junctions

Default: 1000000

`--star_max_intron_size`

Type: integer | Optional

Specifies the maximum intron size

This parameter specifies the maximum intron size for STAR alignment

`--seq_center`

Type: string | Optional

Sequencing center information to be added to read group of BAM files.

This parameter is required for creating a proper BAM header to use in the downstream analysis of GATK.

`--seq_platform`

Type: string | Required

Specify the sequencing platform used

This parameter is required for creating a proper BAM header to use in the downstream analysis of GATK.

Default: illumina

`--save_unaligned`

Type: boolean | Optional

Where possible, save unaligned reads from aligner to the results directory.

This may either be in the form of FastQ or BAM files depending on the options available for that particular tool.

`--save_align_intermeds`

Type: boolean | Optional

Save the intermediate BAM files from the alignment step.

By default, intermediate BAM files will not be saved. The final BAM files created after the appropriate filtering step are always saved to limit storage usage. Set this parameter to also save other intermediate BAM files.

`--bam_csi_index`

Type: boolean | Optional

Create a CSI index for BAM files instead of the traditional BAI index. This will be required for genomes with larger chromosome sizes.

Postprocessing of alignment

`--remove_duplicates`

Type: boolean | Optional

Specify whether to remove duplicates from the BAM during Picard MarkDuplicates step.

Specify true for removing duplicates from BAM file during Picard MarkDuplicates step.

Variant calling

`--gatk_hc_call_conf`

Type: integer | Optional

The minimum phred-scaled confidence threshold at which variants should be called.

Specify the minimum phred-scaled confidence threshold at which variants should be called.

Default: 20

`--generate_gvcf`

Type: boolean | Optional

Enable generation of GVCFs by sample additionnaly to the VCFs.

This parameter enables GATK HAPLOTYPECALLER to generate GVCFs. Default false.

`--gatk_interval_scatter_count`

Type: integer | Optional

Number of times the gene interval list to be split in order to run GATK haplotype caller in parallel

Set this parameter to decide the number of splits for the gene interval list file.

Default: 25

`--no_intervals`

Type: boolean | Optional

Do not use gene interval file during variant calling

This parameter, if set to True, does not use the gene intervals during the variant calling step, which then results in variants from all regions including non-genic. Default is False

Variant filtering

`--gatk_vf_qd_filter`

Type: number | Optional

Value to be used for the QualByDepth (QD) filter

This parameter defines the value to use for the QualByDepth (QD) filter in the GATK variant-filtering step. The value should given in a float number format.

Default: 2

`--gatk_vf_fs_filter`

Type: number | Optional

Value to be used for the FisherStrand (FS) filter

This parameter defines the value to use for the FisherStrand (FS) filter in the GATK variant-filtering step. The value should given in a float number format.

Default: 30

`--gatk_vf_window_size`

Type: integer | Optional

The window size (in bases) in which to evaluate clustered SNPs.

This parameter is used by GATK variant filteration step. It defines the window size (in bases) in which to evaluate clustered SNPs. It has to be used together with the other option 'cluster'.

Default: 35

`--gatk_vf_cluster_size`

Type: integer | Optional

The number of SNPs which make up a cluster. Must be at least 2.

This parameter is used by GATK variant filteration step. It defines the number of SNPs which make up a cluster within a window. Must be at least 2.

Default: 3

Variant Annotation

`--vep_cache`

Type: string | Optional | Format: directory-path

Path to VEP cache.

Path to VEP cache which should contain the relevant species, genome and build directories at the path ${vep_species}/${vepgenome}${vep_cache_version}

Default: s3://annotation-cache/vep_cache/

`--snpeff_cache`

Type: string | Optional | Format: directory-path

Path to snpEff cache.

Path to snpEff cache which should contain the relevant genome and build directory in the path ${snpeff_species}.${snpeff_version}

Default: s3://annotation-cache/snpeff_cache/

`--vep_include_fasta`

Type: boolean | Optional

Allow usage of fasta file for annotation with VEP

By pointing VEP to a FASTA file, it is possible to retrieve reference sequence locally. This enables VEP to retrieve HGVS notations (--hgvs), check the reference sequence given in input data, and construct transcript models from a GFF or GTF file without accessing a database.

For details, see here.

`--vep_dbnsfp`

Type: boolean | Optional

Enable the use of the VEP dbNSFP plugin.

For details, see here.

`--dbnsfp`

Type: string | Optional | Format: file-path

Path to dbNSFP processed file.

To be used with --vep_dbnsfp. dbNSFP files and more information are available at https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#dbnsfp and https://sites.google.com/site/jpopgen/dbNSFP/

Pattern: ^\S+\.gz$

`--dbnsfp_tbi`

Type: string | Optional | Format: file-path

Path to dbNSFP tabix indexed file.

To be used with --vep_dbnsfp.

Pattern: ^\S+\.tbi$

`--dbnsfp_consequence`

Type: string | Optional

Consequence to annotate with

To be used with --vep_dbnsfp. This params is used to filter/limit outputs to a specific effect of the variant. The set of consequence terms is defined by the Sequence Ontology and an overview of those used in VEP can be found here: https://www.ensembl.org/info/genome/variation/prediction/predicted_data.html If one wants to filter using several consequences, then separate those by using '&' (i.e. 'consequence=3_prime_UTR_variant&intron_variant'.

`--dbnsfp_fields`

Type: string | Optional

Fields to annotate with

To be used with --vep_dbnsfp. This params can be used to retrieve individual values from the dbNSFP file. The values correspond to the name of the columns in the dbNSFP file and are separated by comma. The column names might differ between the different dbNSFP versions. Please check the Readme.txt file, which is provided with the dbNSFP file, to obtain the correct column names. The Readme file contains also a short description of the provided values and the version of the tools used to generate them.

Default value are explained below:

rs_dbSNP - rs number from dbSNP HGVSc_VEP - HGVS coding variant presentation from VEP. Multiple entries separated by ';', corresponds to Ensembl_transcriptid HGVSp_VEP - HGVS protein variant presentation from VEP. Multiple entries separated by ';', corresponds to Ensembl_proteinid 1000Gp3_EAS_AF - Alternative allele frequency in the 1000Gp3 East Asian descendent samples 1000Gp3_AMR_AF - Alternative allele counts in the 1000Gp3 American descendent samples LRT_score - Original LRT two-sided p-value (LRTori), ranges from 0 to 1 GERP++_RS - Conservation score. The larger the score, the more conserved the site, ranges from -12.3 to 6.17 gnomAD_exomes_AF - Alternative allele frequency in the whole gnomAD exome samples.

Default: rs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AF

`--vep_loftee`

Type: boolean | Optional

Enable the use of the VEP LOFTEE plugin.

For details, see here.

`--vep_spliceai`

Type: boolean | Optional

Enable the use of the VEP SpliceAI plugin.

For details, see here.

`--spliceai_snv`

Type: string | Optional | Format: file-path

Path to spliceai raw scores snv file.

To be used with --vep_spliceai.

Pattern: ^\S+\.vcf\.gz$

`--spliceai_snv_tbi`

Type: string | Optional | Format: file-path

Path to spliceai raw scores snv tabix indexed file.

To be used with --vep_spliceai.

Pattern: ^\S+\.tbi$

`--spliceai_indel`

Type: string | Optional | Format: file-path

Path to spliceai raw scores indel file.

To be used with --vep_spliceai.

Pattern: ^\S+\.vcf\.gz$

`--spliceai_indel_tbi`

Type: string | Optional | Format: file-path

Path to spliceai raw scores indel tabix indexed file.

To be used with --vep_spliceai.

Pattern: ^\S+\.tbi$

`--vep_spliceregion`

Type: boolean | Optional

Enable the use of the VEP SpliceRegion plugin.

For details, see here and here.

`--vep_custom_args`

Type: string | Optional

Add an extra custom argument to VEP.

Using this parameter, you can add custom args to VEP.

Default: --everything --filter_common --per_gene --total_length --offline --format vcf

`--outdir_cache`

Type: string | Optional | Format: directory-path

The output directory where the cache will be saved. You have to use absolute paths to storage on Cloud infrastructure.

`--vep_out_format`

Type: string | Optional

VEP output-file format.

Sets the format of the output-file from VEP.

Default: vcf

Allowed values:

json
tab
vcf

`--bcftools_annotations`

Type: string | Optional | Format: file-path

A vcf file containing custom annotations to be used with bcftools annotate. Needs to be bgzipped.

Pattern: ^\S+\.vcf\.gz$

`--bcftools_annotations_tbi`

Type: string | Optional | Format: file-path

Index file for bcftools_annotations

Pattern: ^\S+\.vcf\.gz\.tbi$

`--bcftools_columns`

Type: string | Optional

Optional text file with list of columns to use from bcftools_annotations, one name per row

`--bcftools_header_lines`

Type: string | Optional

Text file with the header lines of bcftools_annotations

Pipeline stage options

`--skip_baserecalibration`

Type: boolean | Optional

Skip the process of base recalibration steps i.e., GATK BaseRecalibrator and GATK ApplyBQSR.

This parameter disable the base recalibration step, thus using a un-calibrated BAM file for variant calling.

`--skip_intervallisttools`

Type: boolean | Optional

Skip the process of preparing interval lists for the GATK variant calling step

This parameter disable preparing multiple interval lists to use with HaplotypeCaller module of GATK. It is recommended not to disable the step as it is required to run the variant calling correctly.

`--skip_variantfiltration`

Type: boolean | Optional

Skip variant filtering of GATK

Set this parameter if you don't want to filter any variants.

`--skip_variantannotation`

Type: boolean | Optional

Skip variant annotation

Set this parameter if you don't want to run variant annotation.

`--skip_multiqc`

Type: boolean | Optional

Skip MultiQC reports

This parameter disable all QC reports

`--skip_exon_bed_check`

Type: boolean | Optional

Skip the check of the exon bed

Set this parameter if you don't want to the pipeline to check and filter unknown regions in the exon bed file.

General reference genome options

`--igenomes_base`

Type: string | Optional | Format: directory-path

The base path to the igenomes reference files

Default: s3://ngi-igenomes/igenomes/

`--igenomes_ignore`

Type: boolean | Optional

Do not load the iGenomes reference config.

Do not load igenomes.config when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config. NB You can then run Sarek by specifying at least a FASTA genome file

`--save_reference`

Type: boolean | Optional

Save built references.

Set this parameter, if you wish to save all computed reference files. This is useful to avoid re-computation on future runs.

`--download_cache`

Type: boolean | Optional

Download annotation cache.

Set this parameter, if you wish to download annotation cache. Using this parameter will download cache even if --snpeff_cache and --vep_cache are provided.

Reference genome options

`--genome`

Type: string | Optional

Name of iGenomes reference.

If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. --genome GRCh38.

See the nf-core website docs for more details.

Default: GRCh38

`--fasta`

Type: string | Optional | Format: file-path

Path to FASTA genome file.

This parameter is mandatory if --genome is not specified.

If you use AWS iGenomes, this has already been set for you appropriately.

Pattern: ^\S+\.fn?a(sta)?(\.gz)?$

`--dict`

Type: string | Optional | Format: file-path

Path to FASTA dictionary file.

NB If none provided, will be generated automatically from the FASTA reference. Combine with --save_reference to save for future runs.

If you use AWS iGenomes, this has already been set for you appropriately.

Pattern: ^\S+\.dict$

`--fasta_fai`

Type: string | Optional | Format: file-path

Path to FASTA reference index.

NB If none provided, will be generated automatically from the FASTA reference. Combine with --save_reference to save for future runs.

If you use AWS iGenomes, this has already been set for you appropriately.

`--gtf`

Type: string | Optional | Format: file-path

Path to GTF annotation file.

This parameter is mandatory if --genome is not specified.

Pattern: ^\S+\.gtf$

`--gff`

Type: string | Optional | Format: file-path

Path to GFF3 annotation file.

This parameter must be specified if --genome or --gtf are not specified.

Pattern: ^\S+\.gff\d?$

`--exon_bed`

Type: string | Optional | Format: file-path

Path to BED file containing exon intervals. This will be created from the GTF file if not specified.

Pattern: ^\S+\.bed$

`--read_length`

Type: number | Optional

Read length

Specify the read length for the STAR aligner.

Default: 150

`--known_indels`

Type: string | Optional | Format: file-path-pattern

Path to known indels file.

If you use AWS iGenomes, this has already been set for you appropriately.

`--known_indels_tbi`

Type: string | Optional | Format: file-path-pattern

Path to known indels file index.

NB If none provided, will be generated automatically from the known index file, if provided. Combine with --save_reference to save for future runs.

If you use AWS iGenomes, this has already been set for you appropriately.

`--dbsnp`

Type: string | Optional | Format: file-path

Path to dbsnp file.

If you use AWS iGenomes, this has already been set for you appropriately.

Pattern: ^\S+\.vcf\.gz$

`--dbsnp_tbi`

Type: string | Optional | Format: file-path

Path to dbsnp index.

NB If none provided, will be generated automatically from the dbsnp file. Combine with --save_reference to save for future runs.

If you use AWS iGenomes, this has already been set for you appropriately.

Pattern: ^\S+\.vcf\.gz\.tbi$

`--snpeff_db`

Type: string | Optional

snpEff DB version.

This is used to specify the database to be use to annotate with. Alternatively databases' names can be listed with the snpEff databases.

If you use AWS iGenomes, this has already been set for you appropriately.

`--vep_genome`

Type: string | Optional

VEP genome.

This is used to specify the genome when looking for local cache, or cloud based cache.

If you use AWS iGenomes, this has already been set for you appropriately.

`--vep_species`

Type: string | Optional

VEP species.

Alternatively species listed in Ensembl Genomes caches can be used.

If you use AWS iGenomes, this has already been set for you appropriately.

`--vep_cache_version`

Type: integer | Optional

VEP cache version.

Alternative cache version can be used to specify the correct Ensembl Genomes version number as these differ from the concurrent Ensembl/VEP version numbers.

If you use AWS iGenomes, this has already been set for you appropriately.

`--feature_type`

Type: string | Optional

Type of feature to parse from annotation file

Default: exon

Allowed values:

exon
transcript
gene

Institutional config options

`--custom_config_version`

Type: string | Optional

Git commit id for Institutional configs.

Default: master

`--custom_config_base`

Type: string | Optional

Base directory for Institutional configs.

If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.

Default: https://raw.githubusercontent.com/nf-core/configs/master

`--config_profile_name`

Type: string | Optional

Institutional config name.

`--config_profile_description`

Type: string | Optional

Institutional config description.

`--config_profile_contact`

Type: string | Optional

Institutional config contact information.

`--config_profile_url`

Type: string | Optional

Institutional config URL link.

Generic options

`--version`

Type: boolean | Optional

Display version and exit.

`--publish_dir_mode`

Type: string | Optional

Method used to save pipeline results to output directory.

The Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.

Default: copy

Allowed values:

symlink
rellink
link
copy
copyNoFollow
move

`--email`

Type: string | Optional

Email address for completion summary.

Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config) then you don't need to specify this on the command line for every run.

Pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

`--email_on_fail`

Type: string | Optional

Email address for completion summary, only when pipeline fails.

An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.

Pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

`--plaintext_email`

Type: boolean | Optional

Send plain-text email instead of HTML.

`--max_multiqc_email_size`

Type: string | Optional

File size limit when attaching MultiQC reports to summary emails.

Default: 25.MB

Pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

`--monochrome_logs`

Type: boolean | Optional

Do not use coloured log outputs.

`--hook_url`

Type: string | Optional

Incoming hook URL for messaging service

Incoming hook URL for messaging service. Currently, MS Teams and Slack are supported.

`--multiqc_config`

Type: string | Optional | Format: file-path

Custom config file to supply to MultiQC.

`--multiqc_logo`

Type: string | Optional

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

`--multiqc_methods_description`

Type: string | Optional

Custom MultiQC yaml file containing HTML including a methods description.

`--multiqc_title`

Type: string | Optional

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

`--validate_params`

Type: boolean | Optional

Boolean whether to validate parameters against the schema at runtime

Default: True

`--modules_testdata_base_path`

Type: string | Optional

Base URL or local path to location of pipeline test dataset files

Default: https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/

`--pipelines_testdata_base_path`

Type: string | Optional

Base URL or local path to location of pipeline test dataset files

Default: https://raw.githubusercontent.com/nf-core/test-datasets/rnavar/data/

`--trace_report_suffix`

Type: string | Optional

Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.

`--help`

Type: boolean | Optional

Display the help message.

`--help_full`

Type: boolean | Optional

Display the full detailed help message.

`--show_hidden`

Type: boolean | Optional

Display hidden parameters in the help message (only works when --help or --help_full are provided).

This pipeline was built with Nextflow. Documentation generated by nf-docs v0.1.0 on 2026-01-23 17:23:12 UTC.