Search Results
Showing results for ""
Enter a search term
Find parameters, processes, workflows, and more
No results found
Try a different search term
nf-core/rnavar
GATK4 RNA variant calling pipeline
Introduction¶
nf-core/rnavar is a bioinformatics pipeline for RNA variant calling analysis following GATK4 best practices.
Pipeline summary¶
- Merge re-sequenced FastQ files (
cat) - Read QC (
FastQC) - (Optionally) Extract UMIs from FASTQ reads (
UMI-tools) - (Optionally) HLATyping from FASTQ reads (
Seq2HLA) - Align reads to reference genome (
STAR) - Sort and index alignments (
SAMtools) - Duplicate read marking (
Picard MarkDuplicates) - Scatter one interval-list into many interval-files (
GATK4 IntervalListTools) - Splits reads that contain Ns in their cigar string (
GATK4 SplitNCigarReads) - Estimate and correct systematic bias using base quality score recalibration (
GATK4 BaseRecalibrator,GATK4 ApplyBQSR) - Convert a BED file to a Picard Interval List (
GATK4 BedToIntervalList) - Call SNPs and indels (
GATK4 HaplotypeCaller) - Merge multiple VCF files into one VCF (
GATK4 MergeVCFs) - Index the VCF (
Tabix) - Filter variant calls based on certain criteria (
GATK4 VariantFiltration) - Annotate variants (
BCFtools Annotate,snpEff, Ensembl VEP) - Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks (
MultiQC,R)
Summary of tools and version used in the pipeline¶
| Tool | Version |
|---|---|
| BCFTools | 1.22 |
| BEDTools | 2.31.1 |
| cat | 9.5 |
| EnsemblVEP | 115.2 |
| FastQC | 0.12.1 |
| GATK | 4.6.2.0 |
| GffRead | 0.12.7 |
| HTSlib | 1.21 |
| Mosdepth | 0.3.10 |
| MultiQC | 1.33 |
| Picard | 3.4.0 |
| SAMtools | 1.22.1 |
| Seq2HLA | 2.3 |
| SnpEff | 5.3.0a |
| STAR | 2.7.11b |
| Tabix | 1.21 |
| UMI-tools | 1.1.6 |
Usage¶
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv:
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
Now, you can run the pipeline using:
nextflow run nf-core/rnavar -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --input samplesheet.csv --outdir <OUTDIR> --genome GRCh38
Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.
For more details and further functionality, please refer to the usage documentation and the parameter documentation.
Pipeline output¶
To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.
Credits¶
rnavar was originally written by Praveen Raj and Maxime U Garcia at The Swedish Childhood Tumor Biobank (Barntumörbanken), Karolinska Institutet. Nicolas Vannieuwkerke at CMGG later joined and helped with further development (1.1.0 and forward).
Maintenance is now lead by Maxime U Garcia (before at Seqera, now at NGI)
Main developers:
We thank the following people for their extensive assistance in the development of this pipeline:
Contributions and Support¶
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don't hesitate to get in touch on the Slack #rnavar channel (you can join with this invite).
Citations¶
If you use nf-core/rnavar for your analysis, please cite it using the following doi: 10.5281/zenodo.6669636
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
You can cite the nf-core publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
Pipeline Inputs
This page documents all input parameters for the pipeline.
Input/output options ¶
Path to comma-separated file containing information about the samples in the experiment.
A design file with information about the samples in your experiment. Use this parameter to specify the location of the input files. It has to be a tab or comma-separated file with a header row or a JSON/YAML file. See usage docs.
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
Specify which additional tools RNAvar should use. Values can be 'seq2hla', 'bcfann', 'snpeff', 'vep' or 'merge'. If you specify 'merge', the pipeline runs both snpeff and VEP annotation.
List of tools to be used in addition to variant calling: currently hlatyping with Seq2HLA and a choice of annotation tools.
Save FastQ files after merging re-sequenced libraries in the results directory.
Preprocessing of alignment ¶
Specify whether to remove UMIs from the reads with UMI-tools extract.
UMI pattern to use. Can be either 'string' (default) or 'regex'.
More details can be found in the UMI-tools documentation.
Default:
string
Allowed values:
string
,
regex
The UMI barcode pattern to use e.g. 'NNNNNN' indicates that the first 6 nucleotides of the read are from the UMI.
More details can be found in the UMI-tools documentation.
The UMI barcode pattern to use if the UMI is located in read 2.
The character that separates the UMI in the read name. Most likely a colon if you skipped the extraction with UMI-tools and used other software.
Alignment options ¶
Specifies the alignment algorithm to use.
This parameter define which aligner is to be used for aligning the RNA reads to the reference genome.
Default:
star
Allowed values:
star
Path to STAR index folder or compressed file (tar.gz)
This parameter can be used if there is an pre-defined STAR index available. You can either give the full path to the index directory or a compressed file in tar.gz format.
Enable STAR 2-pass mapping mode.
This parameter enables STAR to perform 2-pass mapping. Default true.
Default:
True
Do not use GTF file during STAR index building step
Do not use parameter --sjdbGTFfile
Option to limit RAM when sorting BAM file. Value to be specified in bytes. If 0, will be set to the genome index size.
This parameter specifies the maximum available RAM (bytes) for sorting BAM during STAR alignment.
Default:
0
Specifies the number of genome bins for coordinate-sorting
This parameter specifies the number of bins to be used for coordinate sorting during STAR alignment step.
Default:
50
Specifies the maximum number of collapsed junctions
Default:
1000000
Specifies the maximum intron size
This parameter specifies the maximum intron size for STAR alignment
Sequencing center information to be added to read group of BAM files.
This parameter is required for creating a proper BAM header to use in the downstream analysis of GATK.
Specify the sequencing platform used
This parameter is required for creating a proper BAM header to use in the downstream analysis of GATK.
Default:
illumina
Where possible, save unaligned reads from aligner to the results directory.
This may either be in the form of FastQ or BAM files depending on the options available for that particular tool.
Save the intermediate BAM files from the alignment step.
By default, intermediate BAM files will not be saved. The final BAM files created after the appropriate filtering step are always saved to limit storage usage. Set this parameter to also save other intermediate BAM files.
Create a CSI index for BAM files instead of the traditional BAI index. This will be required for genomes with larger chromosome sizes.
Postprocessing of alignment ¶
Specify whether to remove duplicates from the BAM during Picard MarkDuplicates step.
Specify true for removing duplicates from BAM file during Picard MarkDuplicates step.
Variant calling ¶
The minimum phred-scaled confidence threshold at which variants should be called.
Specify the minimum phred-scaled confidence threshold at which variants should be called.
Default:
20
Enable generation of GVCFs by sample additionnaly to the VCFs.
This parameter enables GATK HAPLOTYPECALLER to generate GVCFs. Default false.
Number of times the gene interval list to be split in order to run GATK haplotype caller in parallel
Set this parameter to decide the number of splits for the gene interval list file.
Default:
25
Do not use gene interval file during variant calling
This parameter, if set to True, does not use the gene intervals during the variant calling step, which then results in variants from all regions including non-genic. Default is False
Variant filtering ¶
Value to be used for the QualByDepth (QD) filter
This parameter defines the value to use for the QualByDepth (QD) filter in the GATK variant-filtering step. The value should given in a float number format.
Default:
2
Value to be used for the FisherStrand (FS) filter
This parameter defines the value to use for the FisherStrand (FS) filter in the GATK variant-filtering step. The value should given in a float number format.
Default:
30
The window size (in bases) in which to evaluate clustered SNPs.
This parameter is used by GATK variant filteration step. It defines the window size (in bases) in which to evaluate clustered SNPs. It has to be used together with the other option 'cluster'.
Default:
35
The number of SNPs which make up a cluster. Must be at least 2.
This parameter is used by GATK variant filteration step. It defines the number of SNPs which make up a cluster within a window. Must be at least 2.
Default:
3
Variant Annotation ¶
Path to VEP cache.
Path to VEP cache which should contain the relevant species, genome and build directories at the path ${vep_species}/${vep_genome}_${vep_cache_version}
Default:
s3://annotation-cache/vep_cache/
Path to snpEff cache.
Path to snpEff cache which should contain the relevant genome and build directory in the path ${snpeff_species}.${snpeff_version}
Default:
s3://annotation-cache/snpeff_cache/
Allow usage of fasta file for annotation with VEP
By pointing VEP to a FASTA file, it is possible to retrieve reference sequence locally. This enables VEP to retrieve HGVS notations (--hgvs), check the reference sequence given in input data, and construct transcript models from a GFF or GTF file without accessing a database.
For details, see here.
Path to dbNSFP processed file.
To be used with --vep_dbnsfp.
dbNSFP files and more information are available at https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#dbnsfp and https://sites.google.com/site/jpopgen/dbNSFP/
Path to dbNSFP tabix indexed file.
To be used with --vep_dbnsfp.
Consequence to annotate with
To be used with --vep_dbnsfp.
This params is used to filter/limit outputs to a specific effect of the variant.
The set of consequence terms is defined by the Sequence Ontology and an overview of those used in VEP can be found here: https://www.ensembl.org/info/genome/variation/prediction/predicted_data.html
If one wants to filter using several consequences, then separate those by using '&' (i.e. 'consequence=3_prime_UTR_variant&intron_variant'.
Fields to annotate with
To be used with --vep_dbnsfp.
This params can be used to retrieve individual values from the dbNSFP file. The values correspond to the name of the columns in the dbNSFP file and are separated by comma.
The column names might differ between the different dbNSFP versions. Please check the Readme.txt file, which is provided with the dbNSFP file, to obtain the correct column names. The Readme file contains also a short description of the provided values and the version of the tools used to generate them.
Default value are explained below:
rs_dbSNP - rs number from dbSNP HGVSc_VEP - HGVS coding variant presentation from VEP. Multiple entries separated by ';', corresponds to Ensembl_transcriptid HGVSp_VEP - HGVS protein variant presentation from VEP. Multiple entries separated by ';', corresponds to Ensembl_proteinid 1000Gp3_EAS_AF - Alternative allele frequency in the 1000Gp3 East Asian descendent samples 1000Gp3_AMR_AF - Alternative allele counts in the 1000Gp3 American descendent samples LRT_score - Original LRT two-sided p-value (LRTori), ranges from 0 to 1 GERP++_RS - Conservation score. The larger the score, the more conserved the site, ranges from -12.3 to 6.17 gnomAD_exomes_AF - Alternative allele frequency in the whole gnomAD exome samples.
Default:
rs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AF
Path to spliceai raw scores snv file.
To be used with --vep_spliceai.
Path to spliceai raw scores snv tabix indexed file.
To be used with --vep_spliceai.
Path to spliceai raw scores indel file.
To be used with --vep_spliceai.
Path to spliceai raw scores indel tabix indexed file.
To be used with --vep_spliceai.
Add an extra custom argument to VEP.
Using this parameter, you can add custom args to VEP.
Default:
--everything --filter_common --per_gene --total_length --offline --format vcf
The output directory where the cache will be saved. You have to use absolute paths to storage on Cloud infrastructure.
VEP output-file format.
Sets the format of the output-file from VEP.
Default:
vcf
Allowed values:
json
,
tab
,
vcf
A vcf file containing custom annotations to be used with bcftools annotate. Needs to be bgzipped.
Optional text file with list of columns to use from bcftools_annotations, one name per row
Pipeline stage options ¶
Skip the process of base recalibration steps i.e., GATK BaseRecalibrator and GATK ApplyBQSR.
This parameter disable the base recalibration step, thus using a un-calibrated BAM file for variant calling.
Skip the process of preparing interval lists for the GATK variant calling step
This parameter disable preparing multiple interval lists to use with HaplotypeCaller module of GATK. It is recommended not to disable the step as it is required to run the variant calling correctly.
Skip variant filtering of GATK
Set this parameter if you don't want to filter any variants.
Skip variant annotation
Set this parameter if you don't want to run variant annotation.
Skip the check of the exon bed
Set this parameter if you don't want to the pipeline to check and filter unknown regions in the exon bed file.
General reference genome options ¶
The base path to the igenomes reference files
Default:
s3://ngi-igenomes/igenomes/
Do not load the iGenomes reference config.
Do not load igenomes.config when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config. NB You can then run Sarek by specifying at least a FASTA genome file
Save built references.
Set this parameter, if you wish to save all computed reference files. This is useful to avoid re-computation on future runs.
Download annotation cache.
Set this parameter, if you wish to download annotation cache. Using this parameter will download cache even if --snpeff_cache and --vep_cache are provided.
Reference genome options ¶
Name of iGenomes reference.
If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. --genome GRCh38.
See the nf-core website docs for more details.
Default:
GRCh38
Path to FASTA genome file.
This parameter is mandatory if --genome is not specified.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to FASTA dictionary file.
NB If none provided, will be generated automatically from the FASTA reference. Combine with
--save_referenceto save for future runs.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to FASTA reference index.
NB If none provided, will be generated automatically from the FASTA reference. Combine with
--save_referenceto save for future runs.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to GTF annotation file.
This parameter is mandatory if --genome is not specified.
Path to GFF3 annotation file.
This parameter must be specified if --genome or --gtf are not specified.
Path to BED file containing exon intervals. This will be created from the GTF file if not specified.
Read length
Specify the read length for the STAR aligner.
Default:
150
Path to known indels file.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to known indels file index.
NB If none provided, will be generated automatically from the known index file, if provided. Combine with
--save_referenceto save for future runs.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to dbsnp file.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to dbsnp index.
NB If none provided, will be generated automatically from the dbsnp file. Combine with
--save_referenceto save for future runs.
If you use AWS iGenomes, this has already been set for you appropriately.
snpEff DB version.
This is used to specify the database to be use to annotate with.
Alternatively databases' names can be listed with the snpEff databases.
If you use AWS iGenomes, this has already been set for you appropriately.
VEP genome.
This is used to specify the genome when looking for local cache, or cloud based cache.
If you use AWS iGenomes, this has already been set for you appropriately.
VEP species.
Alternatively species listed in Ensembl Genomes caches can be used.
If you use AWS iGenomes, this has already been set for you appropriately.
VEP cache version.
Alternative cache version can be used to specify the correct Ensembl Genomes version number as these differ from the concurrent Ensembl/VEP version numbers.
If you use AWS iGenomes, this has already been set for you appropriately.
Type of feature to parse from annotation file
Default:
exon
Allowed values:
exon
,
transcript
,
gene
Institutional config options ¶
Base directory for Institutional configs.
If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.
Default:
https://raw.githubusercontent.com/nf-core/configs/master
Generic options ¶
Method used to save pipeline results to output directory.
The Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.
Default:
copy
Allowed values:
symlink
,
rellink
,
link
,
copy
,
copyNoFollow
,
move
Email address for completion summary.
Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config) then you don't need to specify this on the command line for every run.
Email address for completion summary, only when pipeline fails.
An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.
File size limit when attaching MultiQC reports to summary emails.
Default:
25.MB
Incoming hook URL for messaging service
Incoming hook URL for messaging service. Currently, MS Teams and Slack are supported.
Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
Custom MultiQC yaml file containing HTML including a methods description.
MultiQC report title. Printed as page header, used for filename if not otherwise specified.
Boolean whether to validate parameters against the schema at runtime
Default:
True
Base URL or local path to location of pipeline test dataset files
Default:
https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/
Base URL or local path to location of pipeline test dataset files
Default:
https://raw.githubusercontent.com/nf-core/test-datasets/rnavar/data/
Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.
Display hidden parameters in the help message (only works when --help or --help_full are provided).
Workflows
This page documents all workflows in the pipeline.
subworkflows/local/annotation_cache_initialisation/main.nf:11Inputs (take)
| Name | Description |
|---|---|
snpeff_enabled
|
- |
snpeff_cache
|
- |
snpeff_db
|
- |
vep_enabled
|
- |
vep_cache
|
- |
vep_species
|
- |
vep_cache_version
|
- |
vep_genome
|
- |
vep_custom_args
|
- |
help_message
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
subworkflows/nf-core/bam_markduplicates_picard/main.nf:9Picard MarkDuplicates, index BAM file and run samtools stats, flagstat and idxstats
Components
picard/markduplicates
samtools/index
samtools/stats
samtools/idxstats
samtools/flagstat
bam_stats_samtools
Inputs (take)
| Name | Description |
|---|---|
ch_reads
|
Sequence reads in BAM/CRAM/SAM format Structure: [ val(meta), path(reads) ] |
ch_fasta
|
Reference genome fasta file required for CRAM input Structure: [ path(fasta) ] |
ch_fasta
|
Index of the reference genome fasta file Structure: [ path(fai) ] |
Outputs (emit)
| Name | Description |
|---|---|
bam
|
processed BAM/SAM file Structure: [ val(meta), path(bam) ] |
bai
|
BAM/SAM samtools index Structure: [ val(meta), path(bai) ] |
cram
|
processed CRAM file Structure: [ val(meta), path(cram) ] |
crai
|
CRAM samtools index Structure: [ val(meta), path(crai) ] |
csi
|
CSI samtools index Structure: [ val(meta), path(csi) ] |
stats
|
File containing samtools stats output Structure: [ val(meta), path(stats) ] |
flagstat
|
File containing samtools flagstat output Structure: [ val(meta), path(flagstat) ] |
idxstats
|
File containing samtools idxstats output Structure: [ val(meta), path(idxstats) ] |
versions
|
Files containing software versions Structure: [ path(versions.yml) ] |
subworkflows/nf-core/bam_sort_stats_samtools/main.nf:9Sort SAM/BAM/CRAM file
Components
samtools/sort
samtools/index
samtools/stats
samtools/idxstats
samtools/flagstat
bam_stats_samtools
Inputs (take)
| Name | Description |
|---|---|
meta
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
bam
|
BAM/CRAM/SAM file |
fasta
|
Reference genome fasta file |
Outputs (emit)
| Name | Description |
|---|---|
meta
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
bam
|
Sorted BAM/CRAM/SAM file |
bai
|
BAM/CRAM/SAM index file |
crai
|
BAM/CRAM/SAM index file |
stats
|
File containing samtools stats output |
flagstat
|
File containing samtools flagstat output |
idxstats
|
File containing samtools idxstats output |
versions
|
File containing software versions |
subworkflows/nf-core/bam_stats_samtools/main.nf:9Produces comprehensive statistics from SAM/BAM/CRAM file
Components
samtools/stats
samtools/idxstats
samtools/flagstat
Inputs (take)
| Name | Description |
|---|---|
ch_bam_bai
|
The input channel containing the BAM/CRAM and it's index Structure: [ val(meta), path(bam), path(bai) ] |
ch_fasta
|
Reference genome fasta file Structure: [ path(fasta) ] |
Outputs (emit)
| Name | Description |
|---|---|
stats
|
File containing samtools stats output Structure: [ val(meta), path(stats) ] |
flagstat
|
File containing samtools flagstat output Structure: [ val(meta), path(flagstat) ] |
idxstats
|
File containing samtools idxstats output Structure: [ val(meta), path(idxstats)] |
versions
|
Files containing software versions Structure: [ path(versions.yml) ] |
subworkflows/local/download_cache_snpeff_vep/main.nf:14Inputs (take)
| Name | Description |
|---|---|
ensemblvep_info
|
- |
snpeff_info
|
- |
Outputs (emit)
| Name | Description |
|---|---|
ensemblvep_cache
|
- |
snpeff_cache
|
- |
subworkflows/nf-core/fastq_align_star/main.nf:6Align reads to a reference genome using bowtie2 then sort with samtools
Components
star/align
samtools/sort
samtools/index
samtools/stats
samtools/idxstats
samtools/flagstat
bam_sort_stats_samtools
Inputs (take)
| Name | Description |
|---|---|
ch_reads
|
List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively. Structure: [ val(meta), [ path(reads) ] ] |
ch_index
|
STAR genome index |
ch_gtf
|
GTF file used to set the splice junctions with the --sjdbGTFfile flag |
val_star_ignore_sjdbgtf
|
If true the --sjdbGTFfile flag is set |
val_seq_platform
|
Sequencing platform to be added to the bam header using the --outSAMattrRGline flag |
val_seq_center
|
Sequencing center to be added to the bam header using the --outSAMattrRGline flag |
ch_fasta
|
Reference genome fasta file |
ch_transcripts_fasta
|
Optional reference genome fasta file |
Outputs (emit)
| Name | Description |
|---|---|
orig_bam
|
Output BAM file containing read alignments Structure: [ val(meta), path(bam) ] |
log_final
|
STAR final log file Structure: [ val(meta), path(log_final) ] |
log_out
|
STAR log out file Structure: [ val(meta), path(log_out) ] |
log_progress
|
STAR log progress file Structure: [ val(meta), path(log_progress) ] |
bam_sorted
|
Sorted BAM file of read alignments (optional) Structure: [ val(meta), path(bam) ] |
orig_bam_transcript
|
Output BAM file of transcriptome alignment (optional) Structure: [ val(meta), path(bam) ] |
fastq
|
Unmapped FastQ files (optional) Structure: [ val(meta), path(fastq) ] |
tab
|
STAR output tab file(s) (optional) Structure: [ val(meta), path(tab) ] |
bam
|
BAM file ordered by samtools Structure: [ val(meta), path(bam) ] |
bai
|
BAI index of the ordered BAM file Structure: [ val(meta), path(bai) ] |
stats
|
File containing samtools stats output Structure: [ val(meta), path(stats) ] |
flagstat
|
File containing samtools flagstat output Structure: [ val(meta), path(flagstat) ] |
idxstats
|
File containing samtools idxstats output Structure: [ val(meta), path(idxstats) ] |
bam_transcript
|
Transcriptome-level BAM file ordered by samtools (optional) Structure: [ val(meta), path(bam) ] |
bai_transcript
|
Transcriptome-level BAI index of the ordered BAM file (optional) Structure: [ val(meta), path(bai) ] |
stats_transcript
|
Transcriptome-level file containing samtools stats output (optional) Structure: [ val(meta), path(stats) ] |
flagstat_transcript
|
Transcriptome-level file containing samtools flagstat output (optional) Structure: [ val(meta), path(flagstat) ] |
idxstats_transcript
|
Transcriptome-level file containing samtools idxstats output (optional) Structure: [ val(meta), path(idxstats) ] |
versions
|
File containing software versions |
main.nf:63Inputs (take)
| Name | Description |
|---|---|
samplesheet
|
- |
align
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:198Handle pipeline completion tasks. Executes cleanup and notification tasks when the pipeline finishes:
- Send completion email with run summary
- Generate completion summary to stdout
- Send notifications to messaging platforms (Slack, Teams, etc.)
- Log error messages for failed runs
Inputs (take)
| Name | Description |
|---|---|
email
|
- |
email_on_fail
|
- |
plaintext_email
|
- |
outdir
|
- |
monochrome_logs
|
- |
hook_url
|
- |
multiqc_report
|
- |
Outputs (emit)
| Name | Description |
|---|---|
|
- |
subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:51Initialize the nf-core/rnavar pipeline. Performs all setup tasks required before running the main workflow:
- Display version information if requested
- Validate parameters against the schema
- Check Conda channel configuration
- Parse and validate the input samplesheet
- Generate parameter summary for logging
Inputs (take)
| Name | Description |
|---|---|
version
|
- |
validate_params
|
- |
nextflow_cli_args
|
- |
outdir
|
- |
input
|
- |
help
|
- |
help_full
|
- |
show_hidden
|
- |
Outputs (emit)
| Name | Description |
|---|---|
samplesheet
|
- |
align
|
- |
versions
|
- |
subworkflows/local/prepare_alignment/main.nf:7Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
bam
|
- |
Outputs (emit)
| Name | Description |
|---|---|
bam
|
- |
versions
|
- |
subworkflows/local/prepare_genome/main.nf:22Inputs (take)
| Name | Description |
|---|---|
bcftools_annotations
|
- |
bcftools_annotations_tbi
|
- |
dbsnp
|
- |
dbsnp_tbi
|
- |
dict
|
- |
exon_bed
|
- |
fasta
|
- |
fasta_fai
|
- |
gff
|
- |
gtf
|
- |
known_indels
|
- |
known_indels_tbi
|
- |
star_index
|
- |
feature_type
|
- |
skip_exon_bed_check
|
- |
align
|
- |
Outputs (emit)
| Name | Description |
|---|---|
bcfann
|
- |
bcfann_tbi
|
- |
dbsnp
|
- |
dbsnp_tbi
|
- |
dict
|
- |
exon_bed
|
- |
fasta
|
- |
fasta_fai
|
- |
gtf
|
- |
known_indels
|
- |
known_indels_tbi
|
- |
known_sites
|
- |
known_sites_tbi
|
- |
star_index
|
- |
versions
|
- |
subworkflows/local/recalibrate/main.nf:27Apply base quality score recalibration (BQSR) to BAM files. This subworkflow applies the BQSR model generated by GATK BaseRecalibrator to adjust base quality scores in BAM files. Recalibrated quality scores improve the accuracy of variant calling by correcting systematic errors in the original quality scores assigned by the sequencing machine. Optionally generates alignment statistics using samtools stats for QC.
Inputs (take)
| Name | Description |
|---|---|
skip_samtools
|
- |
bam
|
- |
dict
|
- |
fai
|
- |
fasta
|
- |
Outputs (emit)
| Name | Description |
|---|---|
bam
|
- |
qc
|
- |
versions
|
- |
workflows/rnavar.nf:83Main workflow for RNA variant calling analysis. This workflow performs end-to-end RNA-seq variant calling including:
- Quality control with FastQC
- Read alignment with STAR
- Duplicate marking with Picard
- Split N CIGAR reads for RNA-seq data
- Base quality score recalibration (BQSR)
- Variant calling with GATK HaplotypeCaller
- Variant filtering
- Variant annotation with SnpEff and VEP
- HLA typing with seq2HLA (optional) The workflow supports multiple input types including FASTQ, BAM, CRAM, and VCF files.
Inputs (take)
| Name | Description |
|---|---|
input
|
- |
bcftools_annotations
|
- |
bcftools_annotations_tbi
|
- |
bcftools_columns
|
- |
bcftools_header_lines
|
- |
dbsnp
|
- |
dbsnp_tbi
|
- |
dict
|
- |
exon_bed
|
- |
fasta
|
- |
fasta_fai
|
- |
gtf
|
- |
known_sites
|
- |
known_sites_tbi
|
- |
star_index
|
- |
snpeff_cache
|
- |
snpeff_db
|
- |
vep_genome
|
- |
vep_species
|
- |
vep_cache_version
|
- |
vep_include_fasta
|
- |
vep_cache
|
- |
vep_extra_files
|
- |
seq_center
|
- |
seq_platform
|
- |
aligner
|
- |
bam_csi_index
|
- |
extract_umi
|
- |
generate_gvcf
|
- |
skip_multiqc
|
- |
skip_baserecalibration
|
- |
skip_intervallisttools
|
- |
skip_variantannotation
|
- |
skip_variantfiltration
|
- |
star_ignore_sjdbgtf
|
- |
tools
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
subworkflows/local/splitncigar/main.nf:25Split reads that contain N CIGAR operations for RNA-seq variant calling. This subworkflow handles the GATK SplitNCigarReads step which is essential for RNA-seq variant calling. It splits reads that span introns (N in CIGAR) and reassigns mapping qualities to meet GATK requirements. The workflow processes BAM files in parallel across genomic intervals, then merges and indexes the results for efficient downstream processing.
Inputs (take)
| Name | Description |
|---|---|
bam
|
- |
fasta
|
- |
fai
|
- |
dict
|
- |
intervals
|
- |
Outputs (emit)
| Name | Description |
|---|---|
bam_bai
|
- |
versions
|
- |
subworkflows/local/vcf_annotate_all/main.nf:37Annotate variants using multiple annotation tools. This subworkflow provides flexible variant annotation using one or more tools:
- SnpEff: Functional annotation and effect prediction
- VEP (Ensembl Variant Effect Predictor): Comprehensive variant annotation
- BCFtools annotate: Add custom annotations from external files
- Merge: Combined SnpEff + VEP annotation
The tools to use are specified via the
toolsparameter as a comma-separated list (e.g., "snpeff,vep" or "merge").
Inputs (take)
| Name | Description |
|---|---|
vcf
|
- |
fasta
|
- |
tools
|
- |
snpeff_db
|
- |
snpeff_cache
|
- |
vep_genome
|
- |
vep_species
|
- |
vep_cache_version
|
- |
vep_cache
|
- |
vep_extra_files
|
- |
bcftools_annotations
|
- |
bcftools_annotations_index
|
- |
bcftools_columns
|
- |
bcftools_header_lines
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
?
|
- |
subworkflows/nf-core/vcf_annotate_ensemblvep/main.nf:8Perform annotation with ensemblvep and bgzip + tabix index the resulting VCF file
Components
ensemblvep/vep
tabix/tabix
Inputs (take)
| Name | Description |
|---|---|
ch_vcf
|
vcf file to annotate Structure: [ val(meta), path(vcf), [path(custom_file1), path(custom_file2)... (optional)] ] |
ch_fasta
|
Reference genome fasta file (optional) Structure: [ val(meta2), path(fasta) ] |
val_genome
|
genome to use |
val_species
|
species to use |
val_cache_version
|
cache version to use |
ch_cache
|
the root cache folder for ensemblvep (optional) Structure: [ val(meta3), path(cache) ] |
ch_extra_files
|
any extra files needed by plugins for ensemblvep (optional) Structure: [ path(file1), path(file2)... ] |
Outputs (emit)
| Name | Description |
|---|---|
vcf_tbi
|
Compressed vcf file + tabix index Structure: [ val(meta), path(vcf), path(tbi) ] |
json
|
json file Structure: [ val(meta), path(json) ] |
tab
|
tab file Structure: [ val(meta), path(tab) ] |
reports
|
html reports |
versions
|
File containing software versions |
subworkflows/nf-core/vcf_annotate_snpeff/main.nf:8Perform annotation with snpEff and bgzip + tabix index the resulting VCF file
Components
snpeff
snpeff/snpeff
tabix/bgziptabix
Inputs (take)
| Name | Description |
|---|---|
ch_vcf
|
vcf file Structure: [ val(meta), path(vcf) ] |
val_snpeff_db
|
db version to use |
ch_snpeff_cache
|
path to root cache folder for snpEff (optional) Structure: [ path(cache) ] |
Outputs (emit)
| Name | Description |
|---|---|
vcf_tbi
|
Compressed vcf file + tabix index Structure: [ val(meta), path(vcf), path(tbi) ] |
reports
|
html reports Structure: [ path(html) ] |
summary
|
html reports Structure: [ path(csv) ] |
genes_txt
|
html reports Structure: [ path(txt) ] |
versions
|
Files containing software versions Structure: [ path(versions.yml) ] |
Processes
This page documents all processes in the pipeline.
modules/nf-core/bcftools/annotate/main.nf:1Add or remove annotations.
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
Query VCF or BCF file, can be either uncompressed or compressed |
index
|
file
|
Index of the query VCF or BCF file |
annotations
|
file
|
Bgzip-compressed file with annotations |
annotations_index
|
file
|
Index of the annotations file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*{
|
Compressed annotated VCF file |
tbi
|
file
|
*.tbi
|
Alternative VCF file index |
csi
|
file
|
*.csi
|
Default VCF file index |
modules/nf-core/bedtools/merge/main.nf:1combines overlapping or “book-ended” features in an interval file into a single feature which spans all of the combined features.
Tools
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
bed
|
file
|
Input BED file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bed
|
file
|
*.{
|
Overlapped bed file with combined features |
modules/nf-core/bedtools/sort/main.nf:1Sorts a feature file by chromosome and other criteria.
Tools
A set of tools for genomic analysis tasks, specifically enabling genome arithmetic (merge, count, complement) on various file types.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
intervals
|
file
|
BED/BEDGRAPH |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
sorted
|
file
|
*.${
|
Sorted output file |
modules/nf-core/cat/fastq/main.nf:1Concatenates fastq files
Tools
The cat utility reads files sequentially, writing them to the standard output.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
reads
|
file
|
List of input FastQ files to be concatenated. |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
reads
|
file
|
*.{
|
Merged fastq file |
modules/nf-core/ensemblvep/download/main.nf:1Ensembl Variant Effect Predictor (VEP). The cache downloading options are controlled through task.ext.args.
Tools
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
assembly
|
string
|
Genome assembly |
species
|
string
|
Specie |
cache_version
|
string
|
cache version |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
cache
|
file
|
*
|
cache |
modules/nf-core/ensemblvep/vep/main.nf:1Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args.
Tools
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
vcf
|
file
|
vcf to annotate |
custom_extra_files
|
file
|
extra sample-specific files to be used with the |
meta2
|
map
|
Groovy Map containing fasta reference information e.g. [ id:'test' ] |
fasta
|
file
|
reference FASTA file (optional) |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.vcf.gz
|
annotated vcf (optional) |
tbi
|
file
|
*.vcf.gz.tbi
|
annotated vcf index (optional) |
tab
|
file
|
*.ann.tab.gz
|
tab file with annotated variants (optional) |
json
|
file
|
*.ann.json.gz
|
json file with annotated variants (optional) |
report
|
string
|
*.html
|
The process The tool name VEP report file |
modules/nf-core/fastqc/main.nf:19Run FastQC on sequenced reads
Code Documentation
Run FastQC quality control on sequencing reads. FastQC provides a comprehensive quality control report for high-throughput sequencing data. It generates an HTML report and a ZIP archive containing detailed metrics including:
- Basic statistics (total sequences, sequence length, GC content)
- Per-base sequence quality scores
- Per-sequence quality scores
- Per-base sequence content
- Sequence duplication levels
- Overrepresented sequences
- Adapter content
Tools
FastQC gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%A/C/G/T).
You get information about adapter contamination and other overrepresented sequences.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
reads
|
file
|
List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively. |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
html
|
file
|
*_{
|
FastQC report |
zip
|
file
|
*_{
|
FastQC report archive |
modules/nf-core/gatk4/applybqsr/main.nf:1Apply base quality score recalibration (BQSR) to a bam file
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM file from alignment |
input_index
|
file
|
BAI/CRAI file from alignment |
bqsr_table
|
file
|
Recalibration table from gatk4_baserecalibrator |
intervals
|
file
|
Bed file with the genomic regions included in the library (optional) |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bam
|
file
|
${
|
Recalibrated BAM file |
bai
|
file
|
${
|
Recalibrated BAM index file |
cram
|
file
|
${
|
Recalibrated CRAM file |
modules/nf-core/gatk4/baserecalibrator/main.nf:1Generate recalibration table for Base Quality Score Recalibration (BQSR)
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM file from alignment |
input_index
|
file
|
BAI/CRAI file from alignment |
intervals
|
file
|
Bed file with the genomic regions included in the library (optional) |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome'] |
fasta
|
file
|
The reference fasta file |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'genome'] |
fai
|
file
|
Index of reference fasta file |
meta4
|
map
|
Groovy Map containing reference information e.g. [ id:'genome'] |
dict
|
file
|
GATK sequence dictionary |
meta5
|
map
|
Groovy Map containing reference information e.g. [ id:'genome'] |
known_sites
|
file
|
VCF files with known sites for indels / snps |
meta6
|
map
|
Groovy Map containing reference information e.g. [ id:'genome'] |
known_sites_tbi
|
file
|
Tabix index of the known_sites |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
table
|
file
|
*.{
|
Recalibration table from BaseRecalibrator |
modules/nf-core/gatk4/bedtointervallist/main.nf:1Creates an interval list from a bed file and a reference dict
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test'] |
bed
|
file
|
Input bed file |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
dict
|
file
|
Sequence dictionary |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
interval_list
|
file
|
*.interval_list
|
gatk interval list file |
modules/nf-core/gatk4/combinegvcfs/main.nf:1Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file
Tools
Genome Analysis Toolkit (GATK4). Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test' ] |
vcf
|
file
|
Compressed VCF files |
vcf_idx
|
file
|
VCF Index file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
combined_gvcf
|
file
|
*.combined.g.vcf.gz
|
Compressed Combined GVCF file |
modules/nf-core/gatk4/createsequencedictionary/main.nf:1Creates a sequence dictionary for a reference sequence
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
Input fasta file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
dict
|
file
|
*.{
|
gatk dictionary file |
modules/nf-core/gatk4/haplotypecaller/main.nf:25Call germline SNPs and indels via local re-assembly of haplotypes
Code Documentation
Call germline SNPs and indels using GATK HaplotypeCaller. HaplotypeCaller is GATK's flagship variant caller, performing local de-novo assembly of haplotypes in regions showing variation. It can produce either standard VCF output or GVCF output for joint calling. Key features:
- Local re-assembly for accurate indel calling
- Population-aware calling using dbSNP
- Support for GVCF output mode for cohort analysis
- DRAGstr model support for improved STR calling For RNA-seq data, this should be run after SplitNCigarReads processing.
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM file from alignment |
input_index
|
file
|
BAI/CRAI file from alignment |
intervals
|
file
|
Bed file with the genomic regions included in the library (optional) |
dragstr_model
|
file
|
Text file containing the DragSTR model of the used BAM/CRAM file (optional) |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'test_reference' ] |
fasta
|
file
|
The reference fasta file |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'test_reference' ] |
fai
|
file
|
Index of reference fasta file |
meta4
|
map
|
Groovy Map containing reference information e.g. [ id:'test_reference' ] |
dict
|
file
|
GATK sequence dictionary |
meta5
|
map
|
Groovy Map containing dbsnp information e.g. [ id:'test_dbsnp' ] |
dbsnp
|
file
|
VCF file containing known sites (optional) |
meta6
|
map
|
Groovy Map containing dbsnp information e.g. [ id:'test_dbsnp' ] |
dbsnp_tbi
|
file
|
VCF index of dbsnp (optional) |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.vcf.gz
|
Compressed VCF file |
tbi
|
file
|
*.vcf.gz.tbi
|
Index of VCF file |
bam
|
file
|
*.realigned.bam
|
Assembled haplotypes and locally realigned reads |
modules/nf-core/gatk4/indexfeaturefile/main.nf:1Creates an index for a feature file, e.g. VCF or BED file.
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
feature_file
|
file
|
VCF/BED file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
index
|
file
|
*.{
|
Index for VCF/BED file |
modules/nf-core/gatk4/intervallisttools/main.nf:1Splits the interval list file into unique, equally-sized interval files and place it under a directory
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
intervals
|
file
|
Interval file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
interval_list
|
file
|
*.interval_list
|
Interval list files |
modules/nf-core/gatk4/mergevcfs/main.nf:1Merges several vcf files
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test'] |
vcf
|
list
|
Two or more VCF files |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome'] |
dict
|
file
|
Optional Sequence Dictionary as input |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.vcf.gz
|
merged vcf file |
tbi
|
file
|
*.tbi
|
index files for the merged vcf files |
modules/nf-core/gatk4/splitncigarreads/main.nf:1Splits reads that contain Ns in their cigar string
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test'] |
bam
|
list
|
BAM/SAM/CRAM file containing reads |
bai
|
list
|
BAI/SAI/CRAI index file (optional) |
intervals
|
file
|
Bed file with the genomic regions included in the library (optional) |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'reference' ] |
fasta
|
file
|
The reference fasta file |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'reference' ] |
fai
|
file
|
Index of reference fasta file |
meta4
|
map
|
Groovy Map containing reference information e.g. [ id:'reference' ] |
dict
|
file
|
GATK sequence dictionary |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bam
|
file
|
*.{
|
Output file with split reads (BAM/SAM/CRAM) |
modules/nf-core/gatk4/variantfiltration/main.nf:1Filter variants
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test'] |
vcf
|
list
|
List of VCF(.gz) files |
tbi
|
list
|
List of VCF file indexes |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
Fasta file of reference genome |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fai
|
file
|
Index of fasta file |
meta4
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
dict
|
file
|
Sequence dictionary of fastea file |
meta5
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
gzi
|
file
|
Genome index file only needed when the genome file was compressed with the BGZF algorithm. |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.vcf.gz
|
Compressed VCF file |
tbi
|
file
|
*.vcf.gz.tbi
|
Index of VCF file |
modules/nf-core/gffread/main.nf:1Validate, filter, convert and perform various other operations on GFF files
Tools
GFF/GTF utility providing format conversions, region filtering, FASTA sequence extraction and more.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing meta data e.g. [ id:'test' ] |
gff
|
file
|
A reference file in either the GFF3, GFF2 or GTF format. |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
gtf
|
file
|
*.{
|
GTF file resulting from the conversion of the GFF input file if '-T' argument is present |
gffread_gff
|
file
|
*.gff3
|
GFF3 file resulting from the conversion of the GFF input file if '-T' argument is absent |
gffread_fasta
|
file
|
*.fasta
|
Fasta file produced when either of '-w', '-x', '-y' parameters is present |
modules/local/gtf2bed/main.nf:13Convert GTF annotation file to BED format. Extracts genomic features (exons, transcripts, or genes) from a GTF file and outputs them in BED format for use with interval-based tools. The output BED file uses 0-based coordinates (BED standard) converted from the 1-based GTF coordinates.
Inputs
| Name | Type | Description |
|---|---|---|
val(meta)
|
tuple
|
- |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
bed
|
- |
modules/nf-core/gunzip/main.nf:16Compresses and decompresses files.
Tools
gzip is a file format and a software application used for file compression and decompression.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Optional groovy Map containing meta information e.g. [ id:'test', single_end:false ] |
archive
|
file
|
File to be compressed/uncompressed |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
gunzip
|
file
|
*.*
|
Compressed/uncompressed file |
modules/nf-core/mosdepth/main.nf:1Calculates genome-wide sequencing coverage.
Tools
Fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
bam
|
file
|
Input BAM/CRAM file |
bai
|
file
|
Index for BAM/CRAM file |
bed
|
file
|
BED file with intersected intervals |
meta2
|
map
|
Groovy Map containing bed information e.g. [ id:'test' ] |
fasta
|
file
|
Reference genome FASTA file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
global_txt
|
file
|
*.{
|
Text file with global cumulative coverage distribution |
summary_txt
|
file
|
*.{
|
Text file with summary mean depths per chromosome and regions |
regions_txt
|
file
|
*.{
|
Text file with region cumulative coverage distribution |
per_base_d4
|
file
|
*.{
|
D4 file with per-base coverage |
per_base_bed
|
file
|
*.{
|
BED file with per-base coverage |
per_base_csi
|
file
|
*.{
|
Index file for BED file with per-base coverage |
regions_bed
|
file
|
*.{
|
BED file with per-region coverage |
regions_csi
|
file
|
*.{
|
Index file for BED file with per-region coverage |
quantized_bed
|
file
|
*.{
|
BED file with binned coverage |
quantized_csi
|
file
|
*.{
|
Index file for BED file with binned coverage |
thresholds_bed
|
file
|
*.{
|
BED file with the number of bases in each region that are covered at or above each threshold |
thresholds_csi
|
file
|
*.{
|
Index file for BED file with threshold coverage |
modules/nf-core/multiqc/main.nf:21Aggregate results from bioinformatics analyses across many samples into a single report
Code Documentation
Aggregate results from multiple analysis tools into a single report. MultiQC searches a given directory for analysis logs and compiles them into a single HTML report. It supports output from many common bioinformatics tools including FastQC, STAR, Picard, GATK, and more. The report provides:
- Summary statistics across all samples
- Interactive plots for QC metrics
- Data tables for detailed metrics
- Export functionality for plots and data
Tools
MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
report
|
-
|
- | - |
data
|
-
|
- | - |
plots
|
-
|
- | - |
modules/nf-core/picard/markduplicates/main.nf:1Locate and tag duplicate reads in a BAM file
Tools
A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
reads
|
file
|
Sequence reads file, can be SAM/BAM/CRAM format |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
Reference genome fasta file, required for CRAM input |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fai
|
file
|
Reference genome fasta index |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bam
|
file
|
*.{
|
BAM file with duplicate reads marked/removed |
bai
|
file
|
*.{
|
An optional BAM index file. If desired, --CREATE_INDEX must be passed as a flag |
cram
|
file
|
*.{
|
Output CRAM file |
metrics
|
file
|
*.{
|
Duplicate metrics file generated by picard |
modules/local/remove_unknown_regions/main.nf:1Inputs
| Name | Type | Description |
|---|---|---|
val(meta)
|
tuple
|
- |
val(meta2)
|
tuple
|
- |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
bed
|
- |
modules/nf-core/samtools/convert/main.nf:1convert and then index CRAM -> BAM or BAM -> CRAM file
Tools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM file |
index
|
file
|
BAM/CRAM index file |
meta2
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
fasta
|
file
|
Reference file to create the CRAM file |
meta3
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
fai
|
file
|
Reference index file to create the CRAM file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bam
|
file
|
*{
|
filtered/converted BAM file |
cram
|
file
|
*{
|
filtered/converted CRAM file |
bai
|
file
|
*{
|
filtered/converted BAM index |
crai
|
file
|
*{
|
filtered/converted CRAM index |
modules/nf-core/samtools/faidx/main.nf:1Index FASTA file, and optionally generate a file of chromosome sizes
Tools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing reference information e.g. [ id:'test' ] |
fasta
|
file
|
FASTA file |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'test' ] |
fai
|
file
|
FASTA index file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
fa
|
file
|
*.{
|
FASTA file |
sizes
|
file
|
*.{
|
File containing chromosome lengths |
fai
|
file
|
*.{
|
FASTA index file |
gzi
|
file
|
*.gzi
|
Optional gzip index file for compressed inputs |
modules/nf-core/samtools/flagstat/main.nf:1Counts the number of alignments in a BAM/CRAM/SAM file for each FLAG type
Tools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
bam
|
file
|
BAM/CRAM/SAM file |
bai
|
file
|
Index for BAM/CRAM/SAM file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
flagstat
|
file
|
*.{
|
File containing samtools flagstat output |
modules/nf-core/samtools/idxstats/main.nf:1Reports alignment summary statistics for a BAM/CRAM/SAM file
Tools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
bam
|
file
|
BAM/CRAM/SAM file |
bai
|
file
|
Index for BAM/CRAM/SAM file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
idxstats
|
file
|
*.{
|
File containing samtools idxstats output |
modules/nf-core/samtools/index/main.nf:1Index SAM/BAM/CRAM file
Tools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
input file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bai
|
file
|
*.{
|
BAM/CRAM/SAM index file |
csi
|
file
|
*.{
|
CSI index file |
crai
|
file
|
*.{
|
BAM/CRAM/SAM index file |
modules/nf-core/samtools/merge/main.nf:1Merge BAM or CRAM file
Tools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input_files
|
file
|
BAM/CRAM file |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
Reference file the CRAM was created with (optional) |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fai
|
file
|
Index of the reference file the CRAM was created with (optional) |
meta4
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
gzi
|
file
|
Index of the compressed reference file the CRAM was created with (optional) |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bam
|
file
|
*.{
|
BAM file |
cram
|
file
|
*.{
|
CRAM file |
csi
|
file
|
*.csi
|
BAM index file (optional) |
crai
|
file
|
*.crai
|
CRAM index file (optional) |
modules/nf-core/samtools/sort/main.nf:1Sort SAM/BAM/CRAM file
Tools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
bam
|
file
|
BAM/CRAM/SAM file(s) |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
Reference genome FASTA file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bam
|
file
|
*.{
|
Sorted BAM file |
cram
|
file
|
*.{
|
Sorted CRAM file |
sam
|
file
|
*.{
|
Sorted SAM file |
crai
|
file
|
*.crai
|
CRAM index file (optional) |
csi
|
file
|
*.csi
|
BAM index file (optional) |
bai
|
file
|
*.bai
|
BAM index file (optional) |
modules/nf-core/samtools/stats/main.nf:1Produces comprehensive statistics from SAM/BAM/CRAM file
Tools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM file from alignment |
input_index
|
file
|
BAI/CRAI file from alignment |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
Reference file the CRAM was created with (optional) |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
stats
|
file
|
*.{
|
File containing samtools stats output |
modules/nf-core/seq2hla/main.nf:20Precision HLA typing and expression from RNA-seq data using seq2HLA
Code Documentation
Perform HLA typing from RNA-seq data using seq2HLA. seq2HLA determines HLA class I and class II genotypes from RNA-seq reads by mapping to a reference database of HLA alleles. It provides:
- 2-digit resolution typing (e.g., HLA-A*02)
- 4-digit resolution typing (e.g., HLA-A*02:01)
- Expression levels of HLA alleles
- Ambiguity reports when alleles cannot be distinguished Supports both classical HLA genes (HLA-A, -B, -C, -DRB1, -DQB1, -DQA1) and non-classical genes. Requires paired-end RNA-seq reads as input.
Tools
Precision HLA typing and expression from next-generation RNA sequencing data
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information
e.g. |
reads
|
file
|
Paired-end FASTQ files for RNA-seq data |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
class1_genotype_2d
|
file
|
*ClassI-
|
HLA Class I 2-digit genotype results |
class2_genotype_2d
|
file
|
*ClassII.HLAgenotype2digits
|
HLA Class II 2-digit genotype results |
class1_genotype_4d
|
file
|
*ClassI-
|
HLA Class I 4-digit genotype results |
class2_genotype_4d
|
file
|
*ClassII.HLAgenotype4digits
|
HLA Class II 4-digit genotype results |
class1_bowtielog
|
file
|
*ClassI-
|
HLA Class I Bowtie alignment log |
class2_bowtielog
|
file
|
*ClassII.bowtielog
|
HLA Class II Bowtie alignment log |
class1_expression
|
file
|
*ClassI-
|
HLA Class I expression results |
class2_expression
|
file
|
*ClassII.expression
|
HLA Class II expression results |
class1_nonclass_genotype_2d
|
file
|
*ClassI-
|
HLA Class I non-classical 2-digit genotype results |
ambiguity
|
file
|
*.ambiguity
|
HLA typing ambiguity results |
class1_nonclass_genotype_4d
|
file
|
*ClassI-
|
HLA Class I non-classical 4-digit genotype results |
class1_nonclass_bowtielog
|
file
|
*ClassI-
|
HLA Class I non-classical Bowtie alignment log |
class1_nonclass_expression
|
file
|
*ClassI-
|
HLA Class I non-classical expression results |
modules/nf-core/snpeff/download/main.nf:1Genetic variant annotation and functional effect prediction toolbox
Tools
SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
snpeff_db
|
string
|
SnpEff database name |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
cache
|
file
|
- |
snpEff cache |
modules/nf-core/snpeff/snpeff/main.nf:1Genetic variant annotation and functional effect prediction toolbox
Tools
SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
vcf
|
file
|
vcf to annotate |
meta2
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
cache
|
file
|
path to snpEff cache (optional) |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.ann.vcf
|
annotated vcf |
report
|
string
|
*.csv
|
The process The tool name snpEff report csv file |
summary_html
|
string
|
*.html
|
The process The tool name snpEff summary statistics in html file |
genes_txt
|
string
|
*.genes.txt
|
The process The tool name txt (tab separated) file having counts of the number of variants affecting each transcript and gene |
modules/nf-core/star/align/main.nf:1Align reads to a reference genome using STAR
Tools
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
reads
|
file
|
List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively. |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'test' ] |
index
|
directory
|
STAR genome index |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'test' ] |
gtf
|
file
|
Annotation GTF file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
log_final
|
file
|
*Log.final.out
|
STAR final log file |
log_out
|
file
|
*Log.out
|
STAR lot out file |
log_progress
|
file
|
*Log.progress.out
|
STAR log progress file |
bam
|
file
|
*.{
|
Output BAM file containing read alignments |
bam_sorted
|
file
|
*sortedByCoord.out.bam
|
Sorted BAM file of read alignments (optional) |
bam_sorted_aligned
|
file
|
*.Aligned.sortedByCoord.out.bam
|
Sorted BAM file of read alignments (optional) |
bam_transcript
|
file
|
*toTranscriptome.out.bam
|
Output BAM file of transcriptome alignment (optional) |
bam_unsorted
|
file
|
*Aligned.unsort.out.bam
|
Unsorted BAM file of read alignments (optional) |
fastq
|
file
|
*fastq.gz
|
Unmapped FastQ files (optional) |
tab
|
file
|
*.tab
|
STAR output tab file(s) (optional) |
spl_junc_tab
|
file
|
*.SJ.out.tab
|
STAR output splice junction tab file |
read_per_gene_tab
|
file
|
*.ReadsPerGene.out.tab
|
STAR output read per gene tab file |
junction
|
file
|
*.out.junction
|
STAR chimeric junction output file (optional) |
sam
|
file
|
*.out.sam
|
STAR output SAM file(s) (optional) |
wig
|
file
|
*.wig
|
STAR output wiggle format file(s) (optional) |
bedgraph
|
file
|
*.bg
|
STAR output bedGraph format file(s) (optional) |
modules/nf-core/star/genomegenerate/main.nf:1Create index for STAR
Tools
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
fasta
|
file
|
Fasta file of the reference genome |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'test' ] |
gtf
|
file
|
GTF file of the reference genome |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
index
|
directory
|
star
|
Folder containing the star index files |
modules/nf-core/star/indexversion/main.nf:1Get the minimal allowed index version from STAR
Tools
STAR is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
index_version
|
-
|
- | - |
modules/nf-core/tabix/bgziptabix/main.nf:1bgzip a sorted tab-delimited genome file and then create tabix index
Tools
Generic indexer for TAB-delimited genome position files.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
Sorted tab-delimited genome file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
gz_index
|
file
|
*.gz,
|
bgzipped tab-delimited genome file Tabix index file (either tbi or csi) |
modules/nf-core/tabix/tabix/main.nf:1create tabix index from a sorted bgzip tab-delimited genome file
Tools
Generic indexer for TAB-delimited genome position files.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
tab
|
file
|
TAB-delimited genome position file compressed with bgzip |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
index
|
file
|
*.{
|
Tabix index file (either tbi or csi) |
modules/nf-core/umitools/extract/main.nf:1Extracts UMI barcode from a read and add it to the read name, leaving any sample barcode in place
Tools
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
reads
|
list
|
List of input FASTQ files whose UMIs will be extracted. |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
reads
|
file
|
*.{
|
Extracted FASTQ files. | For single-end reads, pattern is \${prefix}.umi_extract.fastq.gz. | For paired-end reads, pattern is \${prefix}.umi_extract_{1,2}.fastq.gz. |
log
|
file
|
*.{
|
Logfile for umi_tools |
modules/nf-core/untar/main.nf:1Extract files from tar, tar.gz, tar.bz2, tar.xz archives
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
archive
|
file
|
File to be untarred |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
untar
|
map
|
*/
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
Functions
This page documents helper functions defined in the pipeline.
subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:288Validate samples after grouping by sample ID. Performs consistency checks on grouped sample data:
- Ensures only one BAM/CRAM file per sample
- Prevents mixing of FASTQ and BAM/CRAM inputs
- Validates consistent single-end/paired-end status
- Properly interleaves paired-end FASTQ files
Parameters
| Name | Description | Default |
|---|---|---|
input
|
- | - |
subworkflows/local/prepare_genome/main.nf:299Parameters
| Name | Description | Default |
|---|---|---|
version
|
- | - |
subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:343Check if the specified genome exists in the configuration. Throws an error with a helpful message listing available genomes if the specified genome key is not found in the config.
subworkflows/local/annotation_cache_initialisation/main.nf:70Parameters
| Name | Description | Default |
|---|---|---|
cache_url
|
- | - |
subworkflows/local/prepare_genome/main.nf:263Parameters
| Name | Description | Default |
|---|---|---|
index_version
|
- | - |
minimal_index_version
|
- | - |
subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:379Parameters
| Name | Description | Default |
|---|---|---|
mqc_methods_yaml
|
- | - |
main.nf:327Parameters
| Name | Description | Default |
|---|---|---|
summary_params
|
- | - |
subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:367subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:353subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:251Validate pipeline input parameters. Checks that all required parameters are provided and valid. Currently validates that the specified genome exists in the config.
subworkflows/local/utils_nfcore_rnavar_pipeline/main.nf:264Validate and parse input samplesheet entries. Ensures that multiple runs of the same sample have consistent sequencing type (all single-end or all paired-end).
Parameters
| Name | Description | Default |
|---|---|---|
input
|
- | - |