nf-core/rnavar

GATK4 RNA variant calling pipeline

Introduction¶

nf-core/rnavar is a bioinformatics pipeline for RNA variant calling analysis following GATK4 best practices.

Pipeline summary¶

Merge re-sequenced FastQ files (cat)
Read QC (FastQC)
(Optionally) Extract UMIs from FASTQ reads (UMI-tools)
(Optionally) HLATyping from FASTQ reads (Seq2HLA)
Align reads to reference genome (STAR)
Sort and index alignments (SAMtools)
Duplicate read marking (Picard MarkDuplicates)
Scatter one interval-list into many interval-files (GATK4 IntervalListTools)
Splits reads that contain Ns in their cigar string (GATK4 SplitNCigarReads)
Estimate and correct systematic bias using base quality score recalibration (GATK4 BaseRecalibrator, GATK4 ApplyBQSR)
Convert a BED file to a Picard Interval List (GATK4 BedToIntervalList)
Call SNPs and indels (GATK4 HaplotypeCaller)
Merge multiple VCF files into one VCF (GATK4 MergeVCFs)
Index the VCF (Tabix)
Filter variant calls based on certain criteria (GATK4 VariantFiltration)
Annotate variants (BCFtools Annotate, snpEff, Ensembl VEP)
Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks (MultiQC, R)

Summary of tools and version used in the pipeline¶

Tool	Version
BCFTools	1.22
BEDTools	2.31.1
cat	9.5
EnsemblVEP	115.2
FastQC	0.12.1
GATK	4.6.2.0
GffRead	0.12.7
HTSlib	1.21
Mosdepth	0.3.10
MultiQC	1.33
Picard	3.4.0
SAMtools	1.22.1
Seq2HLA	2.3
SnpEff	5.3.0a
STAR	2.7.11b
Tabix	1.21
UMI-tools	1.1.6

Usage¶

Note

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz

Each row represents a fastq file (single-end) or a pair of fastq files (paired end).

Now, you can run the pipeline using:

nextflow run nf-core/rnavar -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --input samplesheet.csv  --outdir <OUTDIR> --genome GRCh38

Warning

Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Pipeline output¶

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits¶

rnavar was originally written by Praveen Raj and Maxime U Garcia at The Swedish Childhood Tumor Biobank (Barntumörbanken), Karolinska Institutet. Nicolas Vannieuwkerke at CMGG later joined and helped with further development (1.1.0 and forward).

Maintenance is now lead by Maxime U Garcia (before at Seqera, now at NGI)

Main developers:

We thank the following people for their extensive assistance in the development of this pipeline:

Contributions and Support¶

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #rnavar channel (you can join with this invite).

Citations¶

If you use nf-core/rnavar for your analysis, please cite it using the following doi: 10.5281/zenodo.6669636

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Pipeline Inputs

This page documents all input parameters for the pipeline.

Input/output options ¶

--input ¶

string file-path Required

Path to comma-separated file containing information about the samples in the experiment.

A design file with information about the samples in your experiment. Use this parameter to specify the location of the input files. It has to be a tab or comma-separated file with a header row or a JSON/YAML file. See usage docs.

--outdir ¶

string directory-path Required

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

--tools ¶

string Optional

Specify which additional tools RNAvar should use. Values can be 'seq2hla', 'bcfann', 'snpeff', 'vep' or 'merge'. If you specify 'merge', the pipeline runs both snpeff and VEP annotation.

List of tools to be used in addition to variant calling: currently hlatyping with Seq2HLA and a choice of annotation tools.

--save_merged_fastq ¶

boolean Optional

Save FastQ files after merging re-sequenced libraries in the results directory.

Preprocessing of alignment ¶

--extract_umi ¶

boolean Optional

Specify whether to remove UMIs from the reads with UMI-tools extract.

--umitools_extract_method ¶

string Optional

UMI pattern to use. Can be either 'string' (default) or 'regex'.

More details can be found in the UMI-tools documentation.

Default: string

Allowed values: string , regex

--umitools_bc_pattern ¶

string Optional

The UMI barcode pattern to use e.g. 'NNNNNN' indicates that the first 6 nucleotides of the read are from the UMI.

More details can be found in the UMI-tools documentation.

--umitools_bc_pattern2 ¶

string Optional

The UMI barcode pattern to use if the UMI is located in read 2.

--umitools_umi_separator ¶

string Optional

The character that separates the UMI in the read name. Most likely a colon if you skipped the extraction with UMI-tools and used other software.

Alignment options ¶

--aligner ¶

string Required

Specifies the alignment algorithm to use.

This parameter define which aligner is to be used for aligning the RNA reads to the reference genome.

Default: star

Allowed values: star

--star_index ¶

string path Optional

Path to STAR index folder or compressed file (tar.gz)

This parameter can be used if there is an pre-defined STAR index available. You can either give the full path to the index directory or a compressed file in tar.gz format.

--star_twopass ¶

boolean Optional

Enable STAR 2-pass mapping mode.

This parameter enables STAR to perform 2-pass mapping. Default true.

Default: True

--star_ignore_sjdbgtf ¶

boolean Optional

Do not use GTF file during STAR index building step

Do not use parameter --sjdbGTFfile during the STAR genomeGenerate process.

--star_max_memory_bamsort ¶

integer Optional

Option to limit RAM when sorting BAM file. Value to be specified in bytes. If 0, will be set to the genome index size.

This parameter specifies the maximum available RAM (bytes) for sorting BAM during STAR alignment.

Default: 0

--star_bins_bamsort ¶

integer Optional

Specifies the number of genome bins for coordinate-sorting

This parameter specifies the number of bins to be used for coordinate sorting during STAR alignment step.

Default: 50

--star_max_collapsed_junc ¶

integer Optional

Specifies the maximum number of collapsed junctions

Default: 1000000

--star_max_intron_size ¶

integer Optional

Specifies the maximum intron size

This parameter specifies the maximum intron size for STAR alignment

--seq_center ¶

string Optional

Sequencing center information to be added to read group of BAM files.

This parameter is required for creating a proper BAM header to use in the downstream analysis of GATK.

--seq_platform ¶

string Required

Specify the sequencing platform used

This parameter is required for creating a proper BAM header to use in the downstream analysis of GATK.

Default: illumina

--save_unaligned ¶

boolean Optional

Where possible, save unaligned reads from aligner to the results directory.

This may either be in the form of FastQ or BAM files depending on the options available for that particular tool.

--save_align_intermeds ¶

boolean Optional

Save the intermediate BAM files from the alignment step.

By default, intermediate BAM files will not be saved. The final BAM files created after the appropriate filtering step are always saved to limit storage usage. Set this parameter to also save other intermediate BAM files.

--bam_csi_index ¶

boolean Optional

Create a CSI index for BAM files instead of the traditional BAI index. This will be required for genomes with larger chromosome sizes.

Postprocessing of alignment ¶

--remove_duplicates ¶

boolean Optional

Specify whether to remove duplicates from the BAM during Picard MarkDuplicates step.

Specify true for removing duplicates from BAM file during Picard MarkDuplicates step.

Variant calling ¶

--gatk_hc_call_conf ¶

integer Optional

The minimum phred-scaled confidence threshold at which variants should be called.

Specify the minimum phred-scaled confidence threshold at which variants should be called.

Default: 20

--generate_gvcf ¶

boolean Optional

Enable generation of GVCFs by sample additionnaly to the VCFs.

This parameter enables GATK HAPLOTYPECALLER to generate GVCFs. Default false.

--gatk_interval_scatter_count ¶

integer Optional

Number of times the gene interval list to be split in order to run GATK haplotype caller in parallel

Set this parameter to decide the number of splits for the gene interval list file.

Default: 25

--no_intervals ¶

boolean Optional

Do not use gene interval file during variant calling

This parameter, if set to True, does not use the gene intervals during the variant calling step, which then results in variants from all regions including non-genic. Default is False

Variant filtering ¶

--gatk_vf_qd_filter ¶

number Optional

Value to be used for the QualByDepth (QD) filter

This parameter defines the value to use for the QualByDepth (QD) filter in the GATK variant-filtering step. The value should given in a float number format.

Default: 2

--gatk_vf_fs_filter ¶

number Optional

Value to be used for the FisherStrand (FS) filter

This parameter defines the value to use for the FisherStrand (FS) filter in the GATK variant-filtering step. The value should given in a float number format.

Default: 30

--gatk_vf_window_size ¶

integer Optional

The window size (in bases) in which to evaluate clustered SNPs.

This parameter is used by GATK variant filteration step. It defines the window size (in bases) in which to evaluate clustered SNPs. It has to be used together with the other option 'cluster'.

Default: 35

--gatk_vf_cluster_size ¶

integer Optional

The number of SNPs which make up a cluster. Must be at least 2.

This parameter is used by GATK variant filteration step. It defines the number of SNPs which make up a cluster within a window. Must be at least 2.

Default: 3

Variant Annotation ¶

--vep_cache ¶

string directory-path Optional

Path to VEP cache.

Path to VEP cache which should contain the relevant species, genome and build directories at the path ${vep_species}/${vep_genome}_${vep_cache_version}

Default: s3://annotation-cache/vep_cache/

--snpeff_cache ¶

string directory-path Optional

Path to snpEff cache.

Path to snpEff cache which should contain the relevant genome and build directory in the path ${snpeff_species}.${snpeff_version}

Default: s3://annotation-cache/snpeff_cache/

--vep_include_fasta ¶

boolean Optional

Allow usage of fasta file for annotation with VEP

By pointing VEP to a FASTA file, it is possible to retrieve reference sequence locally. This enables VEP to retrieve HGVS notations (--hgvs), check the reference sequence given in input data, and construct transcript models from a GFF or GTF file without accessing a database.

For details, see here.

--vep_dbnsfp ¶

boolean Optional

Enable the use of the VEP dbNSFP plugin.

For details, see here.

--dbnsfp ¶

string file-path Optional

Path to dbNSFP processed file.

To be used with --vep_dbnsfp. dbNSFP files and more information are available at https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#dbnsfp and https://sites.google.com/site/jpopgen/dbNSFP/

--dbnsfp_tbi ¶

string file-path Optional

Path to dbNSFP tabix indexed file.

To be used with --vep_dbnsfp.

--dbnsfp_consequence ¶

string Optional

Consequence to annotate with

To be used with --vep_dbnsfp. This params is used to filter/limit outputs to a specific effect of the variant. The set of consequence terms is defined by the Sequence Ontology and an overview of those used in VEP can be found here: https://www.ensembl.org/info/genome/variation/prediction/predicted_data.html If one wants to filter using several consequences, then separate those by using '&' (i.e. 'consequence=3_prime_UTR_variant&intron_variant'.

--dbnsfp_fields ¶

string Optional

Fields to annotate with

To be used with --vep_dbnsfp. This params can be used to retrieve individual values from the dbNSFP file. The values correspond to the name of the columns in the dbNSFP file and are separated by comma. The column names might differ between the different dbNSFP versions. Please check the Readme.txt file, which is provided with the dbNSFP file, to obtain the correct column names. The Readme file contains also a short description of the provided values and the version of the tools used to generate them.

Default value are explained below:

rs_dbSNP - rs number from dbSNP HGVSc_VEP - HGVS coding variant presentation from VEP. Multiple entries separated by ';', corresponds to Ensembl_transcriptid HGVSp_VEP - HGVS protein variant presentation from VEP. Multiple entries separated by ';', corresponds to Ensembl_proteinid 1000Gp3_EAS_AF - Alternative allele frequency in the 1000Gp3 East Asian descendent samples 1000Gp3_AMR_AF - Alternative allele counts in the 1000Gp3 American descendent samples LRT_score - Original LRT two-sided p-value (LRTori), ranges from 0 to 1 GERP++_RS - Conservation score. The larger the score, the more conserved the site, ranges from -12.3 to 6.17 gnomAD_exomes_AF - Alternative allele frequency in the whole gnomAD exome samples.

Default: rs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AF

--vep_loftee ¶

boolean Optional

Enable the use of the VEP LOFTEE plugin.

For details, see here.

--vep_spliceai ¶

boolean Optional

Enable the use of the VEP SpliceAI plugin.

For details, see here.

--spliceai_snv ¶

string file-path Optional

Path to spliceai raw scores snv file.

To be used with --vep_spliceai.

--spliceai_snv_tbi ¶

string file-path Optional

Path to spliceai raw scores snv tabix indexed file.

To be used with --vep_spliceai.

--spliceai_indel ¶

string file-path Optional

Path to spliceai raw scores indel file.

To be used with --vep_spliceai.

--spliceai_indel_tbi ¶

string file-path Optional

Path to spliceai raw scores indel tabix indexed file.

To be used with --vep_spliceai.

--vep_spliceregion ¶

boolean Optional

Enable the use of the VEP SpliceRegion plugin.

For details, see here and here.

--vep_custom_args ¶

string Optional

Add an extra custom argument to VEP.

Using this parameter, you can add custom args to VEP.

Default: --everything --filter_common --per_gene --total_length --offline --format vcf

--outdir_cache ¶

string directory-path Optional

The output directory where the cache will be saved. You have to use absolute paths to storage on Cloud infrastructure.

--vep_out_format ¶

string Optional

VEP output-file format.

Sets the format of the output-file from VEP.

Default: vcf

Allowed values: json , tab , vcf

--bcftools_annotations ¶

string file-path Optional

A vcf file containing custom annotations to be used with bcftools annotate. Needs to be bgzipped.

--bcftools_annotations_tbi ¶

string file-path Optional

Index file for bcftools_annotations

--bcftools_columns ¶

string Optional

Optional text file with list of columns to use from bcftools_annotations, one name per row

--bcftools_header_lines ¶

string Optional

Text file with the header lines of bcftools_annotations

Pipeline stage options ¶

--skip_baserecalibration ¶

boolean Optional

Skip the process of base recalibration steps i.e., GATK BaseRecalibrator and GATK ApplyBQSR.

This parameter disable the base recalibration step, thus using a un-calibrated BAM file for variant calling.

--skip_intervallisttools ¶

boolean Optional

Skip the process of preparing interval lists for the GATK variant calling step

This parameter disable preparing multiple interval lists to use with HaplotypeCaller module of GATK. It is recommended not to disable the step as it is required to run the variant calling correctly.

--skip_variantfiltration ¶

boolean Optional

Skip variant filtering of GATK

Set this parameter if you don't want to filter any variants.

--skip_variantannotation ¶

boolean Optional

Skip variant annotation

Set this parameter if you don't want to run variant annotation.

--skip_multiqc ¶

boolean Optional

Skip MultiQC reports

This parameter disable all QC reports

--skip_exon_bed_check ¶

boolean Optional

Skip the check of the exon bed

Set this parameter if you don't want to the pipeline to check and filter unknown regions in the exon bed file.

General reference genome options ¶

--igenomes_base ¶

string directory-path Optional

The base path to the igenomes reference files

Default: s3://ngi-igenomes/igenomes/

--igenomes_ignore ¶

boolean Optional

Do not load the iGenomes reference config.

Do not load igenomes.config when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config. NB You can then run Sarek by specifying at least a FASTA genome file

--save_reference ¶

boolean Optional

Save built references.

Set this parameter, if you wish to save all computed reference files. This is useful to avoid re-computation on future runs.

--download_cache ¶

boolean Optional

Download annotation cache.

Set this parameter, if you wish to download annotation cache. Using this parameter will download cache even if --snpeff_cache and --vep_cache are provided.

Reference genome options ¶

--genome ¶

string Optional

Name of iGenomes reference.

If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. --genome GRCh38.

See the nf-core website docs for more details.

Default: GRCh38

--fasta ¶

string file-path Optional

Path to FASTA genome file.

This parameter is mandatory if --genome is not specified.

If you use AWS iGenomes, this has already been set for you appropriately.

--dict ¶

string file-path Optional

Path to FASTA dictionary file.

NB If none provided, will be generated automatically from the FASTA reference. Combine with --save_reference to save for future runs.

If you use AWS iGenomes, this has already been set for you appropriately.

--fasta_fai ¶

string file-path Optional

Path to FASTA reference index.

NB If none provided, will be generated automatically from the FASTA reference. Combine with --save_reference to save for future runs.

If you use AWS iGenomes, this has already been set for you appropriately.

--gtf ¶

string file-path Optional

Path to GTF annotation file.

This parameter is mandatory if --genome is not specified.

--gff ¶

string file-path Optional

Path to GFF3 annotation file.

This parameter must be specified if --genome or --gtf are not specified.

--exon_bed ¶

string file-path Optional

Path to BED file containing exon intervals. This will be created from the GTF file if not specified.

--read_length ¶

number Optional

Read length

Specify the read length for the STAR aligner.

Default: 150

--known_indels ¶

string file-path-pattern Optional

Path to known indels file.

If you use AWS iGenomes, this has already been set for you appropriately.

--known_indels_tbi ¶

string file-path-pattern Optional

Path to known indels file index.

NB If none provided, will be generated automatically from the known index file, if provided. Combine with --save_reference to save for future runs.

If you use AWS iGenomes, this has already been set for you appropriately.

--dbsnp ¶

string file-path Optional

Path to dbsnp file.

If you use AWS iGenomes, this has already been set for you appropriately.

--dbsnp_tbi ¶

string file-path Optional

Path to dbsnp index.

NB If none provided, will be generated automatically from the dbsnp file. Combine with --save_reference to save for future runs.

If you use AWS iGenomes, this has already been set for you appropriately.

--snpeff_db ¶

string Optional

snpEff DB version.

This is used to specify the database to be use to annotate with. Alternatively databases' names can be listed with the snpEff databases.

If you use AWS iGenomes, this has already been set for you appropriately.

--vep_genome ¶

string Optional

VEP genome.

This is used to specify the genome when looking for local cache, or cloud based cache.

If you use AWS iGenomes, this has already been set for you appropriately.

--vep_species ¶

string Optional

VEP species.

Alternatively species listed in Ensembl Genomes caches can be used.

If you use AWS iGenomes, this has already been set for you appropriately.

--vep_cache_version ¶

integer Optional

VEP cache version.

Alternative cache version can be used to specify the correct Ensembl Genomes version number as these differ from the concurrent Ensembl/VEP version numbers.

If you use AWS iGenomes, this has already been set for you appropriately.

--feature_type ¶

string Optional

Type of feature to parse from annotation file

Default: exon

Allowed values: exon , transcript , gene

Institutional config options ¶

--custom_config_version ¶

string Optional

Git commit id for Institutional configs.

Default: master

--custom_config_base ¶

string Optional

Base directory for Institutional configs.

If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.

Default: https://raw.githubusercontent.com/nf-core/configs/master

--config_profile_name ¶

string Optional

Institutional config name.

--config_profile_description ¶

string Optional

Institutional config description.

--config_profile_contact ¶

string Optional

Institutional config contact information.

--config_profile_url ¶

string Optional

Institutional config URL link.

Generic options ¶

--version ¶

boolean Optional

Display version and exit.

--publish_dir_mode ¶

string Optional

Method used to save pipeline results to output directory.

The Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.

Default: copy

Allowed values: symlink , rellink , link , copy , copyNoFollow , move

--email ¶

string Optional

Email address for completion summary.

Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config) then you don't need to specify this on the command line for every run.

--email_on_fail ¶

string Optional

Email address for completion summary, only when pipeline fails.

An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.

--plaintext_email ¶

boolean Optional

Send plain-text email instead of HTML.

--max_multiqc_email_size ¶

string Optional

File size limit when attaching MultiQC reports to summary emails.

Default: 25.MB

--monochrome_logs ¶

boolean Optional

Do not use coloured log outputs.

--hook_url ¶

string Optional

Incoming hook URL for messaging service

Incoming hook URL for messaging service. Currently, MS Teams and Slack are supported.

--multiqc_config ¶

string file-path Optional

Custom config file to supply to MultiQC.

--multiqc_logo ¶

string Optional

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

--multiqc_methods_description ¶

string Optional

Custom MultiQC yaml file containing HTML including a methods description.

--multiqc_title ¶

string Optional

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

--validate_params ¶

boolean Optional

Boolean whether to validate parameters against the schema at runtime

Default: True

--modules_testdata_base_path ¶

string Optional

Base URL or local path to location of pipeline test dataset files

Default: https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/

--pipelines_testdata_base_path ¶

string Optional

Base URL or local path to location of pipeline test dataset files

Default: https://raw.githubusercontent.com/nf-core/test-datasets/rnavar/data/

--trace_report_suffix ¶

string Optional

Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.

--help ¶

boolean Optional

Display the help message.

--help_full ¶

boolean Optional

Display the full detailed help message.

--show_hidden ¶

boolean Optional

Display hidden parameters in the help message (only works when --help or --help_full are provided).

Workflows

This page documents all workflows in the pipeline.

workflow <entry> Entry Point [source] ¶

Defined in main.nf:211

workflow ANNOTATION_CACHE_INITIALISATION [source] ¶

Defined in subworkflows/local/annotation_cache_initialisation/main.nf:11

Inputs (take)

Name	Description
`snpeff_enabled`	-
`snpeff_cache`	-
`snpeff_db`	-
`vep_enabled`	-
`vep_cache`	-
`vep_species`	-
`vep_cache_version`	-
`vep_genome`	-
`vep_custom_args`	-
`help_message`	-

Outputs (emit)

Name	Description
`?`	-
`?`	-

workflow BAM_MARKDUPLICATES_PICARD [source] ¶

Defined in subworkflows/nf-core/bam_markduplicates_picard/main.nf:9

markduplicates bam sam cram

Picard MarkDuplicates, index BAM file and run samtools stats, flagstat and idxstats

Components

picard/markduplicates samtools/index samtools/stats samtools/idxstats samtools/flagstat bam_stats_samtools

Inputs (take)

Name	Description
`ch_reads`	Sequence reads in BAM/CRAM/SAM format Structure: [ val(meta), path(reads) ]
`ch_fasta`	Reference genome fasta file required for CRAM input Structure: [ path(fasta) ]
`ch_fasta`	Index of the reference genome fasta file Structure: [ path(fai) ]

Outputs (emit)

Name	Description
`bam`	processed BAM/SAM file Structure: [ val(meta), path(bam) ]
`bai`	BAM/SAM samtools index Structure: [ val(meta), path(bai) ]
`cram`	processed CRAM file Structure: [ val(meta), path(cram) ]
`crai`	CRAM samtools index Structure: [ val(meta), path(crai) ]
`csi`	CSI samtools index Structure: [ val(meta), path(csi) ]
`stats`	File containing samtools stats output Structure: [ val(meta), path(stats) ]
`flagstat`	File containing samtools flagstat output Structure: [ val(meta), path(flagstat) ]
`idxstats`	File containing samtools idxstats output Structure: [ val(meta), path(idxstats) ]
`versions`	Files containing software versions Structure: [ path(versions.yml) ]

Authors: @dmarron, @drpatelh Maintainers: @dmarron, @drpatelh

workflow BAM_SORT_STATS_SAMTOOLS [source] ¶

Defined in subworkflows/nf-core/bam_sort_stats_samtools/main.nf:9

sort bam sam cram

Sort SAM/BAM/CRAM file

Components

samtools/sort samtools/index samtools/stats samtools/idxstats samtools/flagstat bam_stats_samtools

Inputs (take)

Name	Description
`meta`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`bam`	BAM/CRAM/SAM file
`fasta`	Reference genome fasta file

Outputs (emit)

Name	Description
`meta`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`bam`	Sorted BAM/CRAM/SAM file
`bai`	BAM/CRAM/SAM index file
`crai`	BAM/CRAM/SAM index file
`stats`	File containing samtools stats output
`flagstat`	File containing samtools flagstat output
`idxstats`	File containing samtools idxstats output
`versions`	File containing software versions

Authors: @drpatelh, @ewels Maintainers: @drpatelh, @ewels

workflow BAM_STATS_SAMTOOLS [source] ¶

Defined in subworkflows/nf-core/bam_stats_samtools/main.nf:9

statistics counts bam sam cram

Produces comprehensive statistics from SAM/BAM/CRAM file

Components

samtools/stats samtools/idxstats samtools/flagstat

Inputs (take)

Name	Description
`ch_bam_bai`	The input channel containing the BAM/CRAM and it's index Structure: [ val(meta), path(bam), path(bai) ]
`ch_fasta`	Reference genome fasta file Structure: [ path(fasta) ]

Outputs (emit)

Name	Description
`stats`	File containing samtools stats output Structure: [ val(meta), path(stats) ]
`flagstat`	File containing samtools flagstat output Structure: [ val(meta), path(flagstat) ]
`idxstats`	File containing samtools idxstats output Structure: [ val(meta), path(idxstats)]
`versions`	Files containing software versions Structure: [ path(versions.yml) ]

Authors: @drpatelh Maintainers: @drpatelh

workflow DOWNLOAD_CACHE_SNPEFF_VEP [source] ¶

Defined in subworkflows/local/download_cache_snpeff_vep/main.nf:14

Inputs (take)

Name	Description
`ensemblvep_info`	-
`snpeff_info`	-

Outputs (emit)

Name	Description
`ensemblvep_cache`	-
`snpeff_cache`	-

workflow FASTQ_ALIGN_STAR [source] ¶

Defined in subworkflows/nf-core/fastq_align_star/main.nf:6

align fasta genome reference

Align reads to a reference genome using bowtie2 then sort with samtools

Components

star/align samtools/sort samtools/index samtools/stats samtools/idxstats samtools/flagstat bam_sort_stats_samtools

Inputs (take)

Name	Description
`ch_reads`	List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively. Structure: [ val(meta), [ path(reads) ] ]
`ch_index`	STAR genome index
`ch_gtf`	GTF file used to set the splice junctions with the --sjdbGTFfile flag
`val_star_ignore_sjdbgtf`	If true the --sjdbGTFfile flag is set
`val_seq_platform`	Sequencing platform to be added to the bam header using the --outSAMattrRGline flag
`val_seq_center`	Sequencing center to be added to the bam header using the --outSAMattrRGline flag
`ch_fasta`	Reference genome fasta file
`ch_transcripts_fasta`	Optional reference genome fasta file

Outputs (emit)

Name	Description
`orig_bam`	Output BAM file containing read alignments Structure: [ val(meta), path(bam) ]
`log_final`	STAR final log file Structure: [ val(meta), path(log_final) ]
`log_out`	STAR log out file Structure: [ val(meta), path(log_out) ]
`log_progress`	STAR log progress file Structure: [ val(meta), path(log_progress) ]
`bam_sorted`	Sorted BAM file of read alignments (optional) Structure: [ val(meta), path(bam) ]
`orig_bam_transcript`	Output BAM file of transcriptome alignment (optional) Structure: [ val(meta), path(bam) ]
`fastq`	Unmapped FastQ files (optional) Structure: [ val(meta), path(fastq) ]
`tab`	STAR output tab file(s) (optional) Structure: [ val(meta), path(tab) ]
`bam`	BAM file ordered by samtools Structure: [ val(meta), path(bam) ]
`bai`	BAI index of the ordered BAM file Structure: [ val(meta), path(bai) ]
`stats`	File containing samtools stats output Structure: [ val(meta), path(stats) ]
`flagstat`	File containing samtools flagstat output Structure: [ val(meta), path(flagstat) ]
`idxstats`	File containing samtools idxstats output Structure: [ val(meta), path(idxstats) ]
`bam_transcript`	Transcriptome-level BAM file ordered by samtools (optional) Structure: [ val(meta), path(bam) ]
`bai_transcript`	Transcriptome-level BAI index of the ordered BAM file (optional) Structure: [ val(meta), path(bai) ]
`stats_transcript`	Transcriptome-level file containing samtools stats output (optional) Structure: [ val(meta), path(stats) ]
`flagstat_transcript`	Transcriptome-level file containing samtools flagstat output (optional) Structure: [ val(meta), path(flagstat) ]
`idxstats_transcript`	Transcriptome-level file containing samtools idxstats output (optional) Structure: [ val(meta), path(idxstats) ]
`versions`	File containing software versions

Authors: @JoseEspinosa Maintainers: @JoseEspinosa

workflow NFCORE_RNAVAR [source] ¶

Defined in main.nf:63

Inputs (take)

Name	Description
`samplesheet`	-
`align`	-

Outputs (emit)

Name	Description
`?`	-
`?`	-

workflow PIPELINE_COMPLETION [source] ¶

Defined in subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:198

Handle pipeline completion tasks. Executes cleanup and notification tasks when the pipeline finishes:

Send completion email with run summary
Generate completion summary to stdout
Send notifications to messaging platforms (Slack, Teams, etc.)
Log error messages for failed runs

Inputs (take)

Name	Description
`email`	-
`email_on_fail`	-
`plaintext_email`	-
`outdir`	-
`monochrome_logs`	-
`hook_url`	-
`multiqc_report`	-

Outputs (emit)

Name	Description
	-

workflow PIPELINE_INITIALISATION [source] ¶

Defined in subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:51

Initialize the nf-core/rnavar pipeline. Performs all setup tasks required before running the main workflow:

Display version information if requested
Validate parameters against the schema
Check Conda channel configuration
Parse and validate the input samplesheet
Generate parameter summary for logging

Inputs (take)

Name	Description
`version`	-
`validate_params`	-
`nextflow_cli_args`	-
`outdir`	-
`input`	-
`help`	-
`help_full`	-
`show_hidden`	-

Outputs (emit)

Name	Description
`samplesheet`	-
`align`	-
`versions`	-

workflow PREPARE_ALIGNMENT [source] ¶

Defined in subworkflows/local/prepare_alignment/main.nf:7

Inputs (take)

Name	Description
`cram`	-
`bam`	-

Outputs (emit)

Name	Description
`bam`	-
`versions`	-

workflow PREPARE_GENOME [source] ¶

Defined in subworkflows/local/prepare_genome/main.nf:22

Inputs (take)

Name	Description
`bcftools_annotations`	-
`bcftools_annotations_tbi`	-
`dbsnp`	-
`dbsnp_tbi`	-
`dict`	-
`exon_bed`	-
`fasta`	-
`fasta_fai`	-
`gff`	-
`gtf`	-
`known_indels`	-
`known_indels_tbi`	-
`star_index`	-
`feature_type`	-
`skip_exon_bed_check`	-
`align`	-

Outputs (emit)

Name	Description
`bcfann`	-
`bcfann_tbi`	-
`dbsnp`	-
`dbsnp_tbi`	-
`dict`	-
`exon_bed`	-
`fasta`	-
`fasta_fai`	-
`gtf`	-
`known_indels`	-
`known_indels_tbi`	-
`known_sites`	-
`known_sites_tbi`	-
`star_index`	-
`versions`	-

workflow RECALIBRATE [source] ¶

Defined in subworkflows/local/recalibrate/main.nf:27

Apply base quality score recalibration (BQSR) to BAM files. This subworkflow applies the BQSR model generated by GATK BaseRecalibrator to adjust base quality scores in BAM files. Recalibrated quality scores improve the accuracy of variant calling by correcting systematic errors in the original quality scores assigned by the sequencing machine. Optionally generates alignment statistics using samtools stats for QC.

Inputs (take)

Name	Description
`skip_samtools`	-
`bam`	-
`dict`	-
`fai`	-
`fasta`	-

Outputs (emit)

Name	Description
`bam`	-
`qc`	-
`versions`	-

workflow RNAVAR [source] ¶

Defined in workflows/rnavar.nf:83

Main workflow for RNA variant calling analysis. This workflow performs end-to-end RNA-seq variant calling including:

Quality control with FastQC
Read alignment with STAR
Duplicate marking with Picard
Split N CIGAR reads for RNA-seq data
Base quality score recalibration (BQSR)
Variant calling with GATK HaplotypeCaller
Variant filtering
Variant annotation with SnpEff and VEP
HLA typing with seq2HLA (optional) The workflow supports multiple input types including FASTQ, BAM, CRAM, and VCF files.

Inputs (take)

Name	Description
`input`	-
`bcftools_annotations`	-
`bcftools_annotations_tbi`	-
`bcftools_columns`	-
`bcftools_header_lines`	-
`dbsnp`	-
`dbsnp_tbi`	-
`dict`	-
`exon_bed`	-
`fasta`	-
`fasta_fai`	-
`gtf`	-
`known_sites`	-
`known_sites_tbi`	-
`star_index`	-
`snpeff_cache`	-
`snpeff_db`	-
`vep_genome`	-
`vep_species`	-
`vep_cache_version`	-
`vep_include_fasta`	-
`vep_cache`	-
`vep_extra_files`	-
`seq_center`	-
`seq_platform`	-
`aligner`	-
`bam_csi_index`	-
`extract_umi`	-
`generate_gvcf`	-
`skip_multiqc`	-
`skip_baserecalibration`	-
`skip_intervallisttools`	-
`skip_variantannotation`	-
`skip_variantfiltration`	-
`star_ignore_sjdbgtf`	-
`tools`	-

Outputs (emit)

Name	Description
`?`	-
`?`	-

workflow SPLITNCIGAR [source] ¶

Defined in subworkflows/local/splitncigar/main.nf:25

Split reads that contain N CIGAR operations for RNA-seq variant calling. This subworkflow handles the GATK SplitNCigarReads step which is essential for RNA-seq variant calling. It splits reads that span introns (N in CIGAR) and reassigns mapping qualities to meet GATK requirements. The workflow processes BAM files in parallel across genomic intervals, then merges and indexes the results for efficient downstream processing.

Inputs (take)

Name	Description
`bam`	-
`fasta`	-
`fai`	-
`dict`	-
`intervals`	-

Outputs (emit)

Name	Description
`bam_bai`	-
`versions`	-

workflow VCF_ANNOTATE_ALL [source] ¶

Defined in subworkflows/local/vcf_annotate_all/main.nf:37

Annotate variants using multiple annotation tools. This subworkflow provides flexible variant annotation using one or more tools:

SnpEff: Functional annotation and effect prediction
VEP (Ensembl Variant Effect Predictor): Comprehensive variant annotation
BCFtools annotate: Add custom annotations from external files
Merge: Combined SnpEff + VEP annotation The tools to use are specified via the tools parameter as a comma-separated list (e.g., "snpeff,vep" or "merge").

Inputs (take)

Name	Description
`vcf`	-
`fasta`	-
`tools`	-
`snpeff_db`	-
`snpeff_cache`	-
`vep_genome`	-
`vep_species`	-
`vep_cache_version`	-
`vep_cache`	-
`vep_extra_files`	-
`bcftools_annotations`	-
`bcftools_annotations_index`	-
`bcftools_columns`	-
`bcftools_header_lines`	-

Outputs (emit)

Name	Description
`?`	-
`?`	-
`?`	-
`?`	-

workflow VCF_ANNOTATE_ENSEMBLVEP [source] ¶

Defined in subworkflows/nf-core/vcf_annotate_ensemblvep/main.nf:8

vcf annotation ensemblvep

Perform annotation with ensemblvep and bgzip + tabix index the resulting VCF file

Components

ensemblvep/vep tabix/tabix

Inputs (take)

Name	Description
`ch_vcf`	vcf file to annotate Structure: [ val(meta), path(vcf), [path(custom_file1), path(custom_file2)... (optional)] ]
`ch_fasta`	Reference genome fasta file (optional) Structure: [ val(meta2), path(fasta) ]
`val_genome`	genome to use
`val_species`	species to use
`val_cache_version`	cache version to use
`ch_cache`	the root cache folder for ensemblvep (optional) Structure: [ val(meta3), path(cache) ]
`ch_extra_files`	any extra files needed by plugins for ensemblvep (optional) Structure: [ path(file1), path(file2)... ]

Outputs (emit)

Name	Description
`vcf_tbi`	Compressed vcf file + tabix index Structure: [ val(meta), path(vcf), path(tbi) ]
`json`	json file Structure: [ val(meta), path(json) ]
`tab`	tab file Structure: [ val(meta), path(tab) ]
`reports`	html reports
`versions`	File containing software versions

Authors: @maxulysse, @matthdsm, @nvnieuwk Maintainers: @maxulysse, @matthdsm, @nvnieuwk

workflow VCF_ANNOTATE_SNPEFF [source] ¶

Defined in subworkflows/nf-core/vcf_annotate_snpeff/main.nf:8

vcf annotation snpeff

Perform annotation with snpEff and bgzip + tabix index the resulting VCF file

Components

snpeff snpeff/snpeff tabix/bgziptabix

Inputs (take)

Name	Description
`ch_vcf`	vcf file Structure: [ val(meta), path(vcf) ]
`val_snpeff_db`	db version to use
`ch_snpeff_cache`	path to root cache folder for snpEff (optional) Structure: [ path(cache) ]

Outputs (emit)

Name	Description
`vcf_tbi`	Compressed vcf file + tabix index Structure: [ val(meta), path(vcf), path(tbi) ]
`reports`	html reports Structure: [ path(html) ]
`summary`	html reports Structure: [ path(csv) ]
`genes_txt`	html reports Structure: [ path(txt) ]
`versions`	Files containing software versions Structure: [ path(versions.yml) ]

Authors: @maxulysse Maintainers: @maxulysse

Processes

This page documents all processes in the pipeline.

process BCFTOOLS_ANNOTATE [source] ¶

Defined in modules/nf-core/bcftools/annotate/main.nf:1

bcftools annotate vcf remove add

Add or remove annotations.

Tools

annotate

Add or remove annotations.

Homepage Documentation biotools:bcftools License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`input`	`file`	Query VCF or BCF file, can be either uncompressed or compressed
`index`	`file`	Index of the query VCF or BCF file
`annotations`	`file`	Bgzip-compressed file with annotations
`annotations_index`	`file`	Index of the annotations file

Outputs

Name	Type	Pattern	Description
`vcf`	`file`	`*{vcf,vcf.gz,bcf,bcf.gz}`	Compressed annotated VCF file
`tbi`	`file`	`*.tbi`	Alternative VCF file index
`csi`	`file`	`*.csi`	Default VCF file index

Authors: @projectoriented, @ramprasadn Maintainers: @projectoriented, @ramprasadn

process BEDTOOLS_MERGE [source] ¶

Defined in modules/nf-core/bedtools/merge/main.nf:1

bed merge bedtools overlapped bed

combines overlapping or “book-ended” features in an interval file into a single feature which spans all of the combined features.

Tools

bedtools

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Documentation biotools:bedtools License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`bed`	`file`	Input BED file

Outputs

Name	Type	Pattern	Description
`bed`	`file`	`*.{bed}`	Overlapped bed file with combined features

Authors: @edmundmiller, @sruthipsuresh, @drpatelh Maintainers: @edmundmiller, @sruthipsuresh, @drpatelh

process BEDTOOLS_SORT [source] ¶

Defined in modules/nf-core/bedtools/sort/main.nf:1

bed sort bedtools chromosome

Sorts a feature file by chromosome and other criteria.

Tools

bedtools

A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.

Documentation biotools:bedtools License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`intervals`	`file`	BED/BEDGRAPH

Outputs

Name	Type	Pattern	Description
`sorted`	`file`	`*.${extension}`	Sorted output file

Authors: @edmundmiller, @sruthipsuresh, @drpatelh, @chris-cheshire, @adamrtalbot Maintainers: @edmundmiller, @sruthipsuresh, @drpatelh, @chris-cheshire, @adamrtalbot

process CAT_FASTQ [source] ¶

Defined in modules/nf-core/cat/fastq/main.nf:1

cat fastq concatenate

Concatenates fastq files

Tools

cat

The cat utility reads files sequentially, writing them to the standard output.

Documentation License: GPL-3.0-or-later

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`reads`	`file`	List of input FastQ files to be concatenated.

Outputs

Name	Type	Pattern	Description
`reads`	`file`	`*.{merged.fastq.gz}`	Merged fastq file

Authors: @joseespinosa, @drpatelh Maintainers: @joseespinosa, @drpatelh

process ENSEMBLVEP_DOWNLOAD [source] ¶

Defined in modules/nf-core/ensemblvep/download/main.nf:1

annotation cache download

Ensembl Variant Effect Predictor (VEP). The cache downloading options are controlled through task.ext.args.

Tools

ensemblvep

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Homepage Documentation License: Apache-2.0

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`assembly`	`string`	Genome assembly
`species`	`string`	Specie
`cache_version`	`string`	cache version

Outputs

Name	Type	Pattern	Description
`cache`	`file`	`*`	cache

Authors: @maxulysse Maintainers: @maxulysse

process ENSEMBLVEP_VEP [source] ¶

Defined in modules/nf-core/ensemblvep/vep/main.nf:1

annotation vcf json tab

Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args.

Tools

ensemblvep

VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.

Homepage Documentation License: Apache-2.0

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`vcf`	`file`	vcf to annotate
`custom_extra_files`	`file`	extra sample-specific files to be used with the `--custom` flag to be configured with ext.args (optional)
`meta2`	`map`	Groovy Map containing fasta reference information e.g. [ id:'test' ]
`fasta`	`file`	reference FASTA file (optional)

Outputs

Name	Type	Pattern	Description
`vcf`	`file`	`*.vcf.gz`	annotated vcf (optional)
`tbi`	`file`	`*.vcf.gz.tbi`	annotated vcf index (optional)
`tab`	`file`	`*.ann.tab.gz`	tab file with annotated variants (optional)
`json`	`file`	`*.ann.json.gz`	json file with annotated variants (optional)
`report`	`string`	`*.html`	The process The tool name VEP report file

Authors: @maxulysse, @matthdsm, @nvnieuwk Maintainers: @maxulysse, @matthdsm, @nvnieuwk

process FASTQC [source] ¶

Defined in modules/nf-core/fastqc/main.nf:19

quality control qc adapters fastq

Run FastQC on sequenced reads

Code Documentation

Run FastQC quality control on sequencing reads. FastQC provides a comprehensive quality control report for high-throughput sequencing data. It generates an HTML report and a ZIP archive containing detailed metrics including:

Basic statistics (total sequences, sequence length, GC content)
Per-base sequence quality scores
Per-sequence quality scores
Per-base sequence content
Sequence duplication levels
Overrepresented sequences
Adapter content

Tools

fastqc

FastQC gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%A/C/G/T).

You get information about adapter contamination and other overrepresented sequences.

Homepage Documentation biotools:fastqc License: GPL-2.0-only

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`reads`	`file`	List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively.

Outputs

Name	Type	Pattern	Description
`html`	`file`	`*_{fastqc.html}`	FastQC report
`zip`	`file`	`*_{fastqc.zip}`	FastQC report archive

Authors: @drpatelh, @grst, @ewels, @FelixKrueger Maintainers: @drpatelh, @grst, @ewels, @FelixKrueger

process GATK4_APPLYBQSR [source] ¶

Defined in modules/nf-core/gatk4/applybqsr/main.nf:1

bam base quality score recalibration bqsr cram gatk4

Apply base quality score recalibration (BQSR) to a bam file

Tools

gatk4

Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Homepage Documentation License: Apache-2.0

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`input`	`file`	BAM/CRAM file from alignment
`input_index`	`file`	BAI/CRAI file from alignment
`bqsr_table`	`file`	Recalibration table from gatk4_baserecalibrator
`intervals`	`file`	Bed file with the genomic regions included in the library (optional)

Outputs

Name	Type	Pattern	Description
`bam`	`file`	`${prefix}.bam`	Recalibrated BAM file
`bai`	`file`	`${prefix}*bai`	Recalibrated BAM index file
`cram`	`file`	`${prefix}.cram`	Recalibrated CRAM file

Authors: @yocra3, @FriederikeHanssen Maintainers: @yocra3, @FriederikeHanssen

process GATK4_BASERECALIBRATOR [source] ¶

Defined in modules/nf-core/gatk4/baserecalibrator/main.nf:1

base quality score recalibration table bqsr gatk4 sort

Generate recalibration table for Base Quality Score Recalibration (BQSR)

Tools

gatk4

Homepage Documentation License: Apache-2.0

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`input`	`file`	BAM/CRAM file from alignment
`input_index`	`file`	BAI/CRAI file from alignment
`intervals`	`file`	Bed file with the genomic regions included in the library (optional)
`meta2`	`map`	Groovy Map containing reference information e.g. [ id:'genome']
`fasta`	`file`	The reference fasta file
`meta3`	`map`	Groovy Map containing reference information e.g. [ id:'genome']
`fai`	`file`	Index of reference fasta file
`meta4`	`map`	Groovy Map containing reference information e.g. [ id:'genome']
`dict`	`file`	GATK sequence dictionary
`meta5`	`map`	Groovy Map containing reference information e.g. [ id:'genome']
`known_sites`	`file`	VCF files with known sites for indels / snps
`meta6`	`map`	Groovy Map containing reference information e.g. [ id:'genome']
`known_sites_tbi`	`file`	Tabix index of the known_sites

Outputs

Name	Type	Pattern	Description
`table`	`file`	`*.{table}`	Recalibration table from BaseRecalibrator

Authors: @yocra3, @FriederikeHanssen, @maxulysse Maintainers: @yocra3, @FriederikeHanssen, @maxulysse

process GATK4_BEDTOINTERVALLIST [source] ¶

Defined in modules/nf-core/gatk4/bedtointervallist/main.nf:1

bed bedtointervallist gatk4 interval list

Creates an interval list from a bed file and a reference dict

Tools

gatk4

Homepage Documentation License: Apache-2.0

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test']
`bed`	`file`	Input bed file
`meta2`	`map`	Groovy Map containing reference information e.g. [ id:'genome' ]
`dict`	`file`	Sequence dictionary

Outputs

Name	Type	Pattern	Description
`interval_list`	`file`	`*.interval_list`	gatk interval list file

Authors: @kevinmenden, @ramprasadn Maintainers: @kevinmenden, @ramprasadn

process GATK4_COMBINEGVCFS [source] ¶

Defined in modules/nf-core/gatk4/combinegvcfs/main.nf:1

gvcf gatk4 vcf combinegvcfs short variant discovery

Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file

Tools

gatk4

Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Homepage Documentation License: Apache-2.0

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test' ]
`vcf`	`file`	Compressed VCF files
`vcf_idx`	`file`	VCF Index file

Outputs

Name	Type	Pattern	Description
`combined_gvcf`	`file`	`*.combined.g.vcf.gz`	Compressed Combined GVCF file

Authors: @sateeshperi, @mjcipriano, @hseabolt, @maxulysse Maintainers: @sateeshperi, @mjcipriano, @hseabolt, @maxulysse

process GATK4_CREATESEQUENCEDICTIONARY [source] ¶

Defined in modules/nf-core/gatk4/createsequencedictionary/main.nf:1

createsequencedictionary dictionary fasta gatk4

Creates a sequence dictionary for a reference sequence

Tools

gatk

Homepage Documentation License: Apache-2.0

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing reference information e.g. [ id:'genome' ]
`fasta`	`file`	Input fasta file

Outputs

Name	Type	Pattern	Description
`dict`	`file`	`*.{dict}`	gatk dictionary file

Authors: @maxulysse, @ramprasadn Maintainers: @maxulysse, @ramprasadn

process GATK4_HAPLOTYPECALLER [source] ¶

Defined in modules/nf-core/gatk4/haplotypecaller/main.nf:25

gatk4 haplotype haplotypecaller

Call germline SNPs and indels via local re-assembly of haplotypes

Code Documentation

Call germline SNPs and indels using GATK HaplotypeCaller. HaplotypeCaller is GATK's flagship variant caller, performing local de-novo assembly of haplotypes in regions showing variation. It can produce either standard VCF output or GVCF output for joint calling. Key features:

Local re-assembly for accurate indel calling
Population-aware calling using dbSNP
Support for GVCF output mode for cohort analysis
DRAGstr model support for improved STR calling For RNA-seq data, this should be run after SplitNCigarReads processing.

Tools

gatk4

Homepage Documentation License: Apache-2.0

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`input`	`file`	BAM/CRAM file from alignment
`input_index`	`file`	BAI/CRAI file from alignment
`intervals`	`file`	Bed file with the genomic regions included in the library (optional)
`dragstr_model`	`file`	Text file containing the DragSTR model of the used BAM/CRAM file (optional)
`meta2`	`map`	Groovy Map containing reference information e.g. [ id:'test_reference' ]
`fasta`	`file`	The reference fasta file
`meta3`	`map`	Groovy Map containing reference information e.g. [ id:'test_reference' ]
`fai`	`file`	Index of reference fasta file
`meta4`	`map`	Groovy Map containing reference information e.g. [ id:'test_reference' ]
`dict`	`file`	GATK sequence dictionary
`meta5`	`map`	Groovy Map containing dbsnp information e.g. [ id:'test_dbsnp' ]
`dbsnp`	`file`	VCF file containing known sites (optional)
`meta6`	`map`	Groovy Map containing dbsnp information e.g. [ id:'test_dbsnp' ]
`dbsnp_tbi`	`file`	VCF index of dbsnp (optional)

Outputs

Name	Type	Pattern	Description
`vcf`	`file`	`*.vcf.gz`	Compressed VCF file
`tbi`	`file`	`*.vcf.gz.tbi`	Index of VCF file
`bam`	`file`	`*.realigned.bam`	Assembled haplotypes and locally realigned reads

Authors: @suzannejin, @FriederikeHanssen Maintainers: @suzannejin, @FriederikeHanssen

process GATK4_INDEXFEATUREFILE [source] ¶

Defined in modules/nf-core/gatk4/indexfeaturefile/main.nf:1

feature gatk4 index indexfeaturefile

Creates an index for a feature file, e.g. VCF or BED file.

Tools

gatk4

Genome Analysis Toolkit (GATK4)

Homepage Documentation License: BSD-3-clause

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`feature_file`	`file`	VCF/BED file

Outputs

Name	Type	Pattern	Description
`index`	`file`	`*.{tbi,idx}`	Index for VCF/BED file

Authors: @santiagorevale Maintainers: @santiagorevale

process GATK4_INTERVALLISTTOOLS [source] ¶

Defined in modules/nf-core/gatk4/intervallisttools/main.nf:1

bed gatk4 interval_list sort

Splits the interval list file into unique, equally-sized interval files and place it under a directory

Tools

gatk4

Homepage Documentation License: Apache-2.0

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`intervals`	`file`	Interval file

Outputs

Name	Type	Pattern	Description
`interval_list`	`file`	`*.interval_list`	Interval list files

Authors: @praveenraj2018 Maintainers: @praveenraj2018

process GATK4_MERGEVCFS [source] ¶

Defined in modules/nf-core/gatk4/mergevcfs/main.nf:1

gatk4 merge vcf

Merges several vcf files

Tools

gatk4

Homepage Documentation License: Apache-2.0

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test']
`vcf`	`list`	Two or more VCF files
`meta2`	`map`	Groovy Map containing reference information e.g. [ id:'genome']
`dict`	`file`	Optional Sequence Dictionary as input

Outputs

Name	Type	Pattern	Description
`vcf`	`file`	`*.vcf.gz`	merged vcf file
`tbi`	`file`	`*.tbi`	index files for the merged vcf files

Authors: @kevinmenden Maintainers: @kevinmenden

process GATK4_SPLITNCIGARREADS [source] ¶

Defined in modules/nf-core/gatk4/splitncigarreads/main.nf:1

gatk4 merge vcf

Splits reads that contain Ns in their cigar string

Tools

gatk4

Homepage Documentation License: Apache-2.0

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test']
`bam`	`list`	BAM/SAM/CRAM file containing reads
`bai`	`list`	BAI/SAI/CRAI index file (optional)
`intervals`	`file`	Bed file with the genomic regions included in the library (optional)
`meta2`	`map`	Groovy Map containing reference information e.g. [ id:'reference' ]
`fasta`	`file`	The reference fasta file
`meta3`	`map`	Groovy Map containing reference information e.g. [ id:'reference' ]
`fai`	`file`	Index of reference fasta file
`meta4`	`map`	Groovy Map containing reference information e.g. [ id:'reference' ]
`dict`	`file`	GATK sequence dictionary

Outputs

Name	Type	Pattern	Description
`bam`	`file`	`*.{bam,sam,cram}`	Output file with split reads (BAM/SAM/CRAM)

Authors: @kevinmenden Maintainers: @kevinmenden

process GATK4_VARIANTFILTRATION [source] ¶

Defined in modules/nf-core/gatk4/variantfiltration/main.nf:1

filter gatk4 variantfiltration vcf

Filter variants

Tools

gatk4

Homepage Documentation License: Apache-2.0

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test']
`vcf`	`list`	List of VCF(.gz) files
`tbi`	`list`	List of VCF file indexes
`meta2`	`map`	Groovy Map containing reference information e.g. [ id:'genome' ]
`fasta`	`file`	Fasta file of reference genome
`meta3`	`map`	Groovy Map containing reference information e.g. [ id:'genome' ]
`fai`	`file`	Index of fasta file
`meta4`	`map`	Groovy Map containing reference information e.g. [ id:'genome' ]
`dict`	`file`	Sequence dictionary of fastea file
`meta5`	`map`	Groovy Map containing reference information e.g. [ id:'genome' ]
`gzi`	`file`	Genome index file only needed when the genome file was compressed with the BGZF algorithm.

Outputs

Name	Type	Pattern	Description
`vcf`	`file`	`*.vcf.gz`	Compressed VCF file
`tbi`	`file`	`*.vcf.gz.tbi`	Index of VCF file

Authors: @kevinmenden, @ramprasadn Maintainers: @kevinmenden, @ramprasadn

process GFFREAD [source] ¶

Defined in modules/nf-core/gffread/main.nf:1

gff conversion validation

Validate, filter, convert and perform various other operations on GFF files

Tools

gffread

GFF/GTF utility providing format conversions, region filtering, FASTA sequence extraction and more.

Homepage Documentation biotools:gffread License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing meta data e.g. [ id:'test' ]
`gff`	`file`	A reference file in either the GFF3, GFF2 or GTF format.

Outputs

Name	Type	Pattern	Description
`gtf`	`file`	`*.{gtf}`	GTF file resulting from the conversion of the GFF input file if '-T' argument is present
`gffread_gff`	`file`	`*.gff3`	GFF3 file resulting from the conversion of the GFF input file if '-T' argument is absent
`gffread_fasta`	`file`	`*.fasta`	Fasta file produced when either of '-w', '-x', '-y' parameters is present

Authors: @edmundmiller Maintainers: @edmundmiller, @gallvp

process GTF2BED [source] ¶

Defined in modules/local/gtf2bed/main.nf:13

Convert GTF annotation file to BED format. Extracts genomic features (exons, transcripts, or genes) from a GTF file and outputs them in BED format for use with interval-based tools. The output BED file uses 0-based coordinates (BED standard) converted from the 1-based GTF coordinates.

Inputs

Name	Type	Description
`val(meta), path(gtf)`	`tuple`	-

Outputs

Name	Type	Emit	Description
`val(meta), path('*.bed')`	`tuple`	`bed`	-

process GUNZIP [source] ¶

Defined in modules/nf-core/gunzip/main.nf:16

gunzip compression decompression

Compresses and decompresses files.

Tools

gunzip

gzip is a file format and a software application used for file compression and decompression.

Documentation License: GPL-3.0-or-later

Inputs

Name	Type	Description
`meta`	`map`	Optional groovy Map containing meta information e.g. [ id:'test', single_end:false ]
`archive`	`file`	File to be compressed/uncompressed

Outputs

Name	Type	Pattern	Description
`gunzip`	`file`	`.`	Compressed/uncompressed file

Authors: @joseespinosa, @drpatelh, @jfy133 Maintainers: @joseespinosa, @drpatelh, @jfy133, @gallvp

process MOSDEPTH [source] ¶

Defined in modules/nf-core/mosdepth/main.nf:1

mosdepth bam cram coverage

Calculates genome-wide sequencing coverage.

Tools

mosdepth

Fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.

Documentation biotools:mosdepth License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`bam`	`file`	Input BAM/CRAM file
`bai`	`file`	Index for BAM/CRAM file
`bed`	`file`	BED file with intersected intervals
`meta2`	`map`	Groovy Map containing bed information e.g. [ id:'test' ]
`fasta`	`file`	Reference genome FASTA file

Outputs

Name	Type	Pattern	Description
`global_txt`	`file`	`*.{global.dist.txt}`	Text file with global cumulative coverage distribution
`summary_txt`	`file`	`*.{summary.txt}`	Text file with summary mean depths per chromosome and regions
`regions_txt`	`file`	`*.{region.dist.txt}`	Text file with region cumulative coverage distribution
`per_base_d4`	`file`	`*.{per-base.d4}`	D4 file with per-base coverage
`per_base_bed`	`file`	`*.{per-base.bed.gz}`	BED file with per-base coverage
`per_base_csi`	`file`	`*.{per-base.bed.gz.csi}`	Index file for BED file with per-base coverage
`regions_bed`	`file`	`*.{regions.bed.gz}`	BED file with per-region coverage
`regions_csi`	`file`	`*.{regions.bed.gz.csi}`	Index file for BED file with per-region coverage
`quantized_bed`	`file`	`*.{quantized.bed.gz}`	BED file with binned coverage
`quantized_csi`	`file`	`*.{quantized.bed.gz.csi}`	Index file for BED file with binned coverage
`thresholds_bed`	`file`	`*.{thresholds.bed.gz}`	BED file with the number of bases in each region that are covered at or above each threshold
`thresholds_csi`	`file`	`*.{thresholds.bed.gz.csi}`	Index file for BED file with threshold coverage

Authors: @joseespinosa, @drpatelh, @ramprasadn, @matthdsm Maintainers: @joseespinosa, @ramprasadn, @matthdsm

process MULTIQC [source] ¶

Defined in modules/nf-core/multiqc/main.nf:21

QC bioinformatics tools Beautiful stand-alone HTML report

Aggregate results from bioinformatics analyses across many samples into a single report

Code Documentation

Aggregate results from multiple analysis tools into a single report. MultiQC searches a given directory for analysis logs and compiles them into a single HTML report. It supports output from many common bioinformatics tools including FastQC, STAR, Picard, GATK, and more. The report provides:

Summary statistics across all samples
Interactive plots for QC metrics
Data tables for detailed metrics
Export functionality for plots and data

Tools

multiqc

MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.

Homepage Documentation biotools:multiqc License: GPL-3.0-or-later

Outputs

Name	Type	Pattern	Description
`report`	`-`	-	-
`data`	`-`	-	-
`plots`	`-`	-	-

Authors: @abhi18av, @bunop, @drpatelh, @jfy133 Maintainers: @abhi18av, @bunop, @drpatelh, @jfy133

process PICARD_MARKDUPLICATES [source] ¶

Defined in modules/nf-core/picard/markduplicates/main.nf:1

markduplicates pcr duplicates bam sam cram

Locate and tag duplicate reads in a BAM file

Tools

picard

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.

Homepage Documentation biotools:picard_tools License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`reads`	`file`	Sequence reads file, can be SAM/BAM/CRAM format
`meta2`	`map`	Groovy Map containing reference information e.g. [ id:'genome' ]
`fasta`	`file`	Reference genome fasta file, required for CRAM input
`meta3`	`map`	Groovy Map containing reference information e.g. [ id:'genome' ]
`fai`	`file`	Reference genome fasta index

Outputs

Name	Type	Pattern	Description
`bam`	`file`	`*.{bam}`	BAM file with duplicate reads marked/removed
`bai`	`file`	`*.{bai}`	An optional BAM index file. If desired, --CREATE_INDEX must be passed as a flag
`cram`	`file`	`*.{cram}`	Output CRAM file
`metrics`	`file`	`*.{metrics.txt}`	Duplicate metrics file generated by picard

Authors: @drpatelh, @projectoriented, @ramprasadn Maintainers: @drpatelh, @projectoriented, @ramprasadn

process REMOVE_UNKNOWN_REGIONS [source] ¶

Defined in modules/local/remove_unknown_regions/main.nf:1

Inputs

Name	Type	Description
`val(meta), path(bed)`	`tuple`	-
`val(meta2), path(dict)`	`tuple`	-

Outputs

Name	Type	Emit	Description
`val(meta), path('*.bed')`	`tuple`	`bed`	-

process SAMTOOLS_CONVERT [source] ¶

Defined in modules/nf-core/samtools/convert/main.nf:1

view index bam cram

convert and then index CRAM -> BAM or BAM -> CRAM file

Tools

samtools

SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.

Homepage Documentation biotools:samtools License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`input`	`file`	BAM/CRAM file
`index`	`file`	BAM/CRAM index file
`meta2`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`fasta`	`file`	Reference file to create the CRAM file
`meta3`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`fai`	`file`	Reference index file to create the CRAM file

Outputs

Name	Type	Pattern	Description
`bam`	`file`	`*{.bam}`	filtered/converted BAM file
`cram`	`file`	`*{cram}`	filtered/converted CRAM file
`bai`	`file`	`*{.bai}`	filtered/converted BAM index
`crai`	`file`	`*{.crai}`	filtered/converted CRAM index

Authors: @FriederikeHanssen, @maxulysse Maintainers: @FriederikeHanssen, @maxulysse, @matthdsm

process SAMTOOLS_FAIDX [source] ¶

Defined in modules/nf-core/samtools/faidx/main.nf:1

index fasta faidx chromosome

Index FASTA file, and optionally generate a file of chromosome sizes

Tools

samtools

Homepage Documentation biotools:samtools License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing reference information e.g. [ id:'test' ]
`fasta`	`file`	FASTA file
`meta2`	`map`	Groovy Map containing reference information e.g. [ id:'test' ]
`fai`	`file`	FASTA index file

Outputs

Name	Type	Pattern	Description
`fa`	`file`	`*.{fa}`	FASTA file
`sizes`	`file`	`*.{sizes}`	File containing chromosome lengths
`fai`	`file`	`*.{fai}`	FASTA index file
`gzi`	`file`	`*.gzi`	Optional gzip index file for compressed inputs

Authors: @drpatelh, @ewels, @phue Maintainers: @maxulysse, @phue

process SAMTOOLS_FLAGSTAT [source] ¶

Defined in modules/nf-core/samtools/flagstat/main.nf:1

stats mapping counts bam sam cram

Counts the number of alignments in a BAM/CRAM/SAM file for each FLAG type

Tools

samtools

Homepage Documentation biotools:samtools License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`bam`	`file`	BAM/CRAM/SAM file
`bai`	`file`	Index for BAM/CRAM/SAM file

Outputs

Name	Type	Pattern	Description
`flagstat`	`file`	`*.{flagstat}`	File containing samtools flagstat output

Authors: @drpatelh Maintainers: @drpatelh

process SAMTOOLS_IDXSTATS [source] ¶

Defined in modules/nf-core/samtools/idxstats/main.nf:1

stats mapping counts chromosome bam sam cram

Reports alignment summary statistics for a BAM/CRAM/SAM file

Tools

samtools

Homepage Documentation biotools:samtools License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`bam`	`file`	BAM/CRAM/SAM file
`bai`	`file`	Index for BAM/CRAM/SAM file

Outputs

Name	Type	Pattern	Description
`idxstats`	`file`	`*.{idxstats}`	File containing samtools idxstats output

Authors: @drpatelh Maintainers: @drpatelh

process SAMTOOLS_INDEX [source] ¶

Defined in modules/nf-core/samtools/index/main.nf:1

index bam sam cram

Index SAM/BAM/CRAM file

Tools

samtools

Homepage Documentation biotools:samtools License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`input`	`file`	input file

Outputs

Name	Type	Pattern	Description
`bai`	`file`	`*.{bai,crai,sai}`	BAM/CRAM/SAM index file
`csi`	`file`	`*.{csi}`	CSI index file
`crai`	`file`	`*.{bai,crai,sai}`	BAM/CRAM/SAM index file

Authors: @drpatelh, @ewels, @maxulysse Maintainers: @drpatelh, @ewels, @maxulysse

process SAMTOOLS_MERGE [source] ¶

Defined in modules/nf-core/samtools/merge/main.nf:1

merge bam sam cram

Merge BAM or CRAM file

Tools

samtools

Homepage Documentation biotools:samtools License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`input_files`	`file`	BAM/CRAM file
`meta2`	`map`	Groovy Map containing reference information e.g. [ id:'genome' ]
`fasta`	`file`	Reference file the CRAM was created with (optional)
`meta3`	`map`	Groovy Map containing reference information e.g. [ id:'genome' ]
`fai`	`file`	Index of the reference file the CRAM was created with (optional)
`meta4`	`map`	Groovy Map containing reference information e.g. [ id:'genome' ]
`gzi`	`file`	Index of the compressed reference file the CRAM was created with (optional)

Outputs

Name	Type	Pattern	Description
`bam`	`file`	`*.{bam}`	BAM file
`cram`	`file`	`*.{cram}`	CRAM file
`csi`	`file`	`*.csi`	BAM index file (optional)
`crai`	`file`	`*.crai`	CRAM index file (optional)

Authors: @yuukiiwa, @maxulysse, @FriederikeHanssen, @ramprasadn Maintainers: @yuukiiwa, @maxulysse, @FriederikeHanssen, @ramprasadn

process SAMTOOLS_SORT [source] ¶

Defined in modules/nf-core/samtools/sort/main.nf:1

sort bam sam cram

Sort SAM/BAM/CRAM file

Tools

samtools

Homepage Documentation biotools:samtools License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`bam`	`file`	BAM/CRAM/SAM file(s)
`meta2`	`map`	Groovy Map containing reference information e.g. [ id:'genome' ]
`fasta`	`file`	Reference genome FASTA file

Outputs

Name	Type	Pattern	Description
`bam`	`file`	`*.{bam}`	Sorted BAM file
`cram`	`file`	`*.{cram}`	Sorted CRAM file
`sam`	`file`	`*.{sam}`	Sorted SAM file
`crai`	`file`	`*.crai`	CRAM index file (optional)
`csi`	`file`	`*.csi`	BAM index file (optional)
`bai`	`file`	`*.bai`	BAM index file (optional)

Authors: @drpatelh, @ewels, @matthdsm Maintainers: @drpatelh, @ewels, @matthdsm

process SAMTOOLS_STATS [source] ¶

Defined in modules/nf-core/samtools/stats/main.nf:1

statistics counts bam sam cram

Produces comprehensive statistics from SAM/BAM/CRAM file

Tools

samtools

Homepage Documentation biotools:samtools License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`input`	`file`	BAM/CRAM file from alignment
`input_index`	`file`	BAI/CRAI file from alignment
`meta2`	`map`	Groovy Map containing reference information e.g. [ id:'genome' ]
`fasta`	`file`	Reference file the CRAM was created with (optional)

Outputs

Name	Type	Pattern	Description
`stats`	`file`	`*.{stats}`	File containing samtools stats output

Authors: @drpatelh, @FriederikeHanssen, @ramprasadn Maintainers: @drpatelh, @FriederikeHanssen, @ramprasadn

process SEQ2HLA [source] ¶

Defined in modules/nf-core/seq2hla/main.nf:20

hla typing rna-seq genomics immunogenetics

Precision HLA typing and expression from RNA-seq data using seq2HLA

Code Documentation

Perform HLA typing from RNA-seq data using seq2HLA. seq2HLA determines HLA class I and class II genotypes from RNA-seq reads by mapping to a reference database of HLA alleles. It provides:

2-digit resolution typing (e.g., HLA-A*02)
4-digit resolution typing (e.g., HLA-A*02:01)
Expression levels of HLA alleles
Ambiguity reports when alleles cannot be distinguished Supports both classical HLA genes (HLA-A, -B, -C, -DRB1, -DQB1, -DQA1) and non-classical genes. Requires paired-end RNA-seq reads as input.

Tools

seq2hla

Precision HLA typing and expression from next-generation RNA sequencing data

Homepage Documentation biotools:seq2HLA License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. `[ id:'sample1', single_end:false ]`
`reads`	`file`	Paired-end FASTQ files for RNA-seq data

Outputs

Name	Type	Pattern	Description
`class1_genotype_2d`	`file`	`*ClassI-class.HLAgenotype2digits`	HLA Class I 2-digit genotype results
`class2_genotype_2d`	`file`	`*ClassII.HLAgenotype2digits`	HLA Class II 2-digit genotype results
`class1_genotype_4d`	`file`	`*ClassI-class.HLAgenotype4digits`	HLA Class I 4-digit genotype results
`class2_genotype_4d`	`file`	`*ClassII.HLAgenotype4digits`	HLA Class II 4-digit genotype results
`class1_bowtielog`	`file`	`*ClassI-class.bowtielog`	HLA Class I Bowtie alignment log
`class2_bowtielog`	`file`	`*ClassII.bowtielog`	HLA Class II Bowtie alignment log
`class1_expression`	`file`	`*ClassI-class.expression`	HLA Class I expression results
`class2_expression`	`file`	`*ClassII.expression`	HLA Class II expression results
`class1_nonclass_genotype_2d`	`file`	`*ClassI-nonclass.HLAgenotype2digits`	HLA Class I non-classical 2-digit genotype results
`ambiguity`	`file`	`*.ambiguity`	HLA typing ambiguity results
`class1_nonclass_genotype_4d`	`file`	`*ClassI-nonclass.HLAgenotype4digits`	HLA Class I non-classical 4-digit genotype results
`class1_nonclass_bowtielog`	`file`	`*ClassI-nonclass.bowtielog`	HLA Class I non-classical Bowtie alignment log
`class1_nonclass_expression`	`file`	`*ClassI-nonclass.expression`	HLA Class I non-classical expression results

Authors: @FriederikeHanssen Maintainers: @FriederikeHanssen

process SNPEFF_DOWNLOAD [source] ¶

Defined in modules/nf-core/snpeff/download/main.nf:1

annotation effect prediction snpeff variant vcf

Genetic variant annotation and functional effect prediction toolbox

Tools

snpeff

SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).

Homepage Documentation biotools:snpeff License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`snpeff_db`	`string`	SnpEff database name

Outputs

Name	Type	Pattern	Description
`cache`	`file`	-	snpEff cache

Authors: @maxulysse Maintainers: @maxulysse

process SNPEFF_SNPEFF [source] ¶

Defined in modules/nf-core/snpeff/snpeff/main.nf:1

annotation effect prediction snpeff variant vcf

Genetic variant annotation and functional effect prediction toolbox

Tools

snpeff

SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).

Homepage Documentation biotools:snpeff License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`vcf`	`file`	vcf to annotate
`meta2`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`cache`	`file`	path to snpEff cache (optional)

Outputs

Name	Type	Pattern	Description
`vcf`	`file`	`*.ann.vcf`	annotated vcf
`report`	`string`	`*.csv`	The process The tool name snpEff report csv file
`summary_html`	`string`	`*.html`	The process The tool name snpEff summary statistics in html file
`genes_txt`	`string`	`*.genes.txt`	The process The tool name txt (tab separated) file having counts of the number of variants affecting each transcript and gene

Authors: @maxulysse Maintainers: @maxulysse

process STAR_ALIGN [source] ¶

Defined in modules/nf-core/star/align/main.nf:1

align fasta genome reference

Align reads to a reference genome using STAR

Tools

star

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Homepage biotools:star License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`reads`	`file`	List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively.
`meta2`	`map`	Groovy Map containing reference information e.g. [ id:'test' ]
`index`	`directory`	STAR genome index
`meta3`	`map`	Groovy Map containing reference information e.g. [ id:'test' ]
`gtf`	`file`	Annotation GTF file

Outputs

Name	Type	Pattern	Description
`log_final`	`file`	`*Log.final.out`	STAR final log file
`log_out`	`file`	`*Log.out`	STAR lot out file
`log_progress`	`file`	`*Log.progress.out`	STAR log progress file
`bam`	`file`	`*.{bam}`	Output BAM file containing read alignments
`bam_sorted`	`file`	`*sortedByCoord.out.bam`	Sorted BAM file of read alignments (optional)
`bam_sorted_aligned`	`file`	`*.Aligned.sortedByCoord.out.bam`	Sorted BAM file of read alignments (optional)
`bam_transcript`	`file`	`*toTranscriptome.out.bam`	Output BAM file of transcriptome alignment (optional)
`bam_unsorted`	`file`	`*Aligned.unsort.out.bam`	Unsorted BAM file of read alignments (optional)
`fastq`	`file`	`*fastq.gz`	Unmapped FastQ files (optional)
`tab`	`file`	`*.tab`	STAR output tab file(s) (optional)
`spl_junc_tab`	`file`	`*.SJ.out.tab`	STAR output splice junction tab file
`read_per_gene_tab`	`file`	`*.ReadsPerGene.out.tab`	STAR output read per gene tab file
`junction`	`file`	`*.out.junction`	STAR chimeric junction output file (optional)
`sam`	`file`	`*.out.sam`	STAR output SAM file(s) (optional)
`wig`	`file`	`*.wig`	STAR output wiggle format file(s) (optional)
`bedgraph`	`file`	`*.bg`	STAR output bedGraph format file(s) (optional)

Authors: @kevinmenden, @drpatelh, @praveenraj2018 Maintainers: @kevinmenden, @drpatelh, @praveenraj2018

process STAR_GENOMEGENERATE [source] ¶

Defined in modules/nf-core/star/genomegenerate/main.nf:1

index fasta genome reference

Create index for STAR

Tools

star

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Homepage biotools:star License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`fasta`	`file`	Fasta file of the reference genome
`meta2`	`map`	Groovy Map containing reference information e.g. [ id:'test' ]
`gtf`	`file`	GTF file of the reference genome

Outputs

Name	Type	Pattern	Description
`index`	`directory`	`star`	Folder containing the star index files

Authors: @kevinmenden, @drpatelh Maintainers: @kevinmenden, @drpatelh

process STAR_INDEXVERSION [source] ¶

Defined in modules/nf-core/star/indexversion/main.nf:1

index version rna

Get the minimal allowed index version from STAR

Tools

star

STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.

Homepage biotools:star License: MIT

Outputs

Name	Type	Pattern	Description
`index_version`	`-`	-	-

Authors: @nvnieuwk Maintainers: @nvnieuwk

process TABIX_BGZIPTABIX [source] ¶

Defined in modules/nf-core/tabix/bgziptabix/main.nf:1

bgzip compress index tabix vcf

bgzip a sorted tab-delimited genome file and then create tabix index

Tools

tabix

Generic indexer for TAB-delimited genome position files.

Homepage Documentation biotools:tabix License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`input`	`file`	Sorted tab-delimited genome file

Outputs

Name	Type	Pattern	Description
`gz_index`	`file`	`.gz, .{tbi,csi}`	bgzipped tab-delimited genome file Tabix index file (either tbi or csi)

Authors: @maxulysse, @DLBPointon Maintainers: @maxulysse, @DLBPointon

process TABIX_TABIX [source] ¶

Defined in modules/nf-core/tabix/tabix/main.nf:1

index tabix vcf

create tabix index from a sorted bgzip tab-delimited genome file

Tools

tabix

Generic indexer for TAB-delimited genome position files.

Homepage Documentation biotools:tabix License: MIT

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`tab`	`file`	TAB-delimited genome position file compressed with bgzip

Outputs

Name	Type	Pattern	Description
`index`	`file`	`*.{tbi,csi}`	Tabix index file (either tbi or csi)

Authors: @joseespinosa, @drpatelh, @maxulysse Maintainers: @joseespinosa, @drpatelh, @maxulysse

process UMITOOLS_EXTRACT [source] ¶

Defined in modules/nf-core/umitools/extract/main.nf:1

UMI barcode extract umitools

Extracts UMI barcode from a read and add it to the read name, leaving any sample barcode in place

Tools

umi_tools

UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes

Documentation

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`reads`	`list`	List of input FASTQ files whose UMIs will be extracted.

Outputs

Name	Type	Pattern	Description
`reads`	`file`	`*.{fastq.gz}`	Extracted FASTQ files. \| For single-end reads, pattern is \${prefix}.umi_extract.fastq.gz. \| For paired-end reads, pattern is \${prefix}.umi_extract_{1,2}.fastq.gz.
`log`	`file`	`*.{log}`	Logfile for umi_tools

Authors: @drpatelh, @grst Maintainers: @drpatelh, @grst

process UNTAR [source] ¶

Defined in modules/nf-core/untar/main.nf:1

untar uncompress extract

Extract files from tar, tar.gz, tar.bz2, tar.xz archives

Tools

untar

Extract tar, tar.gz, tar.bz2, tar.xz files.

Documentation License: GPL-3.0-or-later

Inputs

Name	Type	Description
`meta`	`map`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
`archive`	`file`	File to be untarred

Outputs

Name	Type	Pattern	Description
`untar`	`map`	`*/`	Groovy Map containing sample information e.g. [ id:'test', single_end:false ]

Authors: @joseespinosa, @drpatelh, @matthdsm, @jfy133 Maintainers: @joseespinosa, @drpatelh, @matthdsm, @jfy133

Functions

This page documents helper functions defined in the pipeline.

def checkSamplesAfterGrouping(input) [source] ¶

Defined in subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:288

Validate samples after grouping by sample ID. Performs consistency checks on grouped sample data:

Ensures only one BAM/CRAM file per sample
Prevents mixing of FASTQ and BAM/CRAM inputs
Validates consistent single-end/paired-end status
Properly interleaves paired-end FASTQ files

Parameters

Name	Description	Default
`input`	-	-

def convertVersionToList(version) [source] ¶

Defined in subworkflows/local/prepare_genome/main.nf:299

Parameters

Name	Description	Default
`version`	-	-

def genomeExistsError() [source] ¶

Defined in subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:343

Check if the specified genome exists in the configuration. Throws an error with a helpful message listing available genomes if the specified genome key is not found in the config.

def isCloudUrl(cache_url) [source] ¶

Defined in subworkflows/local/annotation_cache_initialisation/main.nf:70

Parameters

Name	Description	Default
`cache_url`	-	-

def isCompatibleStarIndex(index_version, minimal_index_version) [source] ¶

Defined in subworkflows/local/prepare_genome/main.nf:263

Parameters

Name	Description	Default
`index_version`	-	-
`minimal_index_version`	-	-

def methodsDescriptionText(mqc_methods_yaml) [source] ¶

Defined in subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:379

Parameters

Name	Description	Default
`mqc_methods_yaml`	-	-

def paramsSummaryMultiqc(summary_params) [source] ¶

Defined in main.nf:327

Parameters

Name	Description	Default
`summary_params`	-	-

def toolBibliographyText() [source] ¶

Defined in subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:367

def toolCitationText() [source] ¶

Defined in subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:353

def validateInputParameters() [source] ¶

Defined in subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:251

Validate pipeline input parameters. Checks that all required parameters are provided and valid. Currently validates that the specified genome exists in the config.

def validateInputSamplesheet(input) [source] ¶

Defined in subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:264

Validate and parse input samplesheet entries. Ensures that multiple runs of the same sample have consistent sequencing type (all single-end or all paired-end).

Parameters

Name	Description	Default
`input`	-	-

Search Results

nf-core/rnavar

Introduction¶

Pipeline summary¶

Summary of tools and version used in the pipeline¶

Usage¶

Pipeline output¶

Credits¶

Contributions and Support¶

Citations¶

Pipeline Inputs

Input/output options ¶

Preprocessing of alignment ¶

Alignment options ¶

Postprocessing of alignment ¶

Variant calling ¶

Variant filtering ¶

Variant Annotation ¶

Pipeline stage options ¶

General reference genome options ¶

Reference genome options ¶

Institutional config options ¶

Generic options ¶

Workflows

Inputs (take)

Outputs (emit)

Components

Inputs (take)

Outputs (emit)

Components

Inputs (take)

Outputs (emit)

Components

Inputs (take)

Outputs (emit)

Inputs (take)

Outputs (emit)

Components

Inputs (take)

Outputs (emit)

Inputs (take)

Outputs (emit)

Inputs (take)

Outputs (emit)

Inputs (take)

Outputs (emit)

Inputs (take)

Outputs (emit)

Inputs (take)

Outputs (emit)

Inputs (take)

Outputs (emit)

Inputs (take)

Outputs (emit)

Inputs (take)

Outputs (emit)

Inputs (take)

Outputs (emit)

Components

Inputs (take)

Outputs (emit)

Components

Inputs (take)

Outputs (emit)

Processes

Tools

Inputs

Outputs

Tools

Inputs

Outputs

Tools

Inputs

Outputs

Tools

Inputs

Outputs

Tools

Inputs

Outputs