Skip to content

nf-core/rnavar

Version: 1.3.0dev · GATK4 RNA variant calling pipeline

Inputs

Input/output options

Name Description Type Default Required
--input Path to comma-separated file containing information about the samples in the experiment. string n/a yes
--outdir The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure. string n/a yes
--tools Specify which additional tools RNAvar should use. Values can be 'seq2hla', 'bcfann', 'snpeff', 'vep' or 'merge'. If you specify 'merge', the pipeline runs both snpeff and VEP annotation. string n/a no
--save_merged_fastq Save FastQ files after merging re-sequenced libraries in the results directory. boolean n/a no

Preprocessing of alignment

Name Description Type Default Required
--extract_umi Specify whether to remove UMIs from the reads with UMI-tools extract. boolean n/a no
--umitools_extract_method UMI pattern to use. Can be either 'string' (default) or 'regex'. string string no
--umitools_bc_pattern The UMI barcode pattern to use e.g. 'NNNNNN' indicates that the first 6 nucleotides of the read are from the UMI. string n/a no
--umitools_bc_pattern2 The UMI barcode pattern to use if the UMI is located in read 2. string n/a no
--umitools_umi_separator The character that separates the UMI in the read name. Most likely a colon if you skipped the extraction with UMI-tools and used other software. string n/a no

Alignment options

Name Description Type Default Required
--aligner Specifies the alignment algorithm to use. string star yes
--star_index Path to STAR index folder or compressed file (tar.gz) string n/a no
--star_twopass Enable STAR 2-pass mapping mode. boolean True no
--star_ignore_sjdbgtf Do not use GTF file during STAR index building step boolean n/a no
--star_max_memory_bamsort Option to limit RAM when sorting BAM file. Value to be specified in bytes. If 0, will be set to the genome index size. integer 0 no
--star_bins_bamsort Specifies the number of genome bins for coordinate-sorting integer 50 no
--star_max_collapsed_junc Specifies the maximum number of collapsed junctions integer 1000000 no
--star_max_intron_size Specifies the maximum intron size integer n/a no
--seq_center Sequencing center information to be added to read group of BAM files. string n/a no
--seq_platform Specify the sequencing platform used string illumina yes
--save_unaligned Where possible, save unaligned reads from aligner to the results directory. boolean n/a no
--save_align_intermeds Save the intermediate BAM files from the alignment step. boolean n/a no
--bam_csi_index Create a CSI index for BAM files instead of the traditional BAI index. This will be required for genomes with larger chromosome sizes. boolean n/a no

Postprocessing of alignment

Name Description Type Default Required
--remove_duplicates Specify whether to remove duplicates from the BAM during Picard MarkDuplicates step. boolean n/a no

Variant calling

Name Description Type Default Required
--gatk_hc_call_conf The minimum phred-scaled confidence threshold at which variants should be called. integer 20 no
--generate_gvcf Enable generation of GVCFs by sample additionnaly to the VCFs. boolean n/a no
--gatk_interval_scatter_count Number of times the gene interval list to be split in order to run GATK haplotype caller in parallel integer 25 no
--no_intervals Do not use gene interval file during variant calling boolean n/a no

Variant filtering

Name Description Type Default Required
--gatk_vf_qd_filter Value to be used for the QualByDepth (QD) filter number 2 no
--gatk_vf_fs_filter Value to be used for the FisherStrand (FS) filter number 30 no
--gatk_vf_window_size The window size (in bases) in which to evaluate clustered SNPs. integer 35 no
--gatk_vf_cluster_size The number of SNPs which make up a cluster. Must be at least 2. integer 3 no

Variant Annotation

Name Description Type Default Required
--vep_cache Path to VEP cache. string s3://annotation-cache/vep_cache/ no
--snpeff_cache Path to snpEff cache. string s3://annotation-cache/snpeff_cache/ no
--vep_include_fasta Allow usage of fasta file for annotation with VEP boolean n/a no
--vep_dbnsfp Enable the use of the VEP dbNSFP plugin. boolean n/a no
--dbnsfp Path to dbNSFP processed file. string n/a no
--dbnsfp_tbi Path to dbNSFP tabix indexed file. string n/a no
--dbnsfp_consequence Consequence to annotate with string n/a no
--dbnsfp_fields Fields to annotate with string rs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AF no
--vep_loftee Enable the use of the VEP LOFTEE plugin. boolean n/a no
--vep_spliceai Enable the use of the VEP SpliceAI plugin. boolean n/a no
--spliceai_snv Path to spliceai raw scores snv file. string n/a no
--spliceai_snv_tbi Path to spliceai raw scores snv tabix indexed file. string n/a no
--spliceai_indel Path to spliceai raw scores indel file. string n/a no
--spliceai_indel_tbi Path to spliceai raw scores indel tabix indexed file. string n/a no
--vep_spliceregion Enable the use of the VEP SpliceRegion plugin. boolean n/a no
--vep_custom_args Add an extra custom argument to VEP. string --everything --filter_common --per_gene --total_length --offline --format vcf no
--outdir_cache The output directory where the cache will be saved. You have to use absolute paths to storage on Cloud infrastructure. string n/a no
--vep_out_format VEP output-file format. string vcf no
--bcftools_annotations A vcf file containing custom annotations to be used with bcftools annotate. Needs to be bgzipped. string n/a no
--bcftools_annotations_tbi Index file for bcftools_annotations string n/a no
--bcftools_columns Optional text file with list of columns to use from bcftools_annotations, one name per row string n/a no
--bcftools_header_lines Text file with the header lines of bcftools_annotations string n/a no

Pipeline stage options

Name Description Type Default Required
--skip_baserecalibration Skip the process of base recalibration steps i.e., GATK BaseRecalibrator and GATK ApplyBQSR. boolean n/a no
--skip_intervallisttools Skip the process of preparing interval lists for the GATK variant calling step boolean n/a no
--skip_variantfiltration Skip variant filtering of GATK boolean n/a no
--skip_variantannotation Skip variant annotation boolean n/a no
--skip_multiqc Skip MultiQC reports boolean n/a no
--skip_exon_bed_check Skip the check of the exon bed boolean n/a no

General reference genome options

Name Description Type Default Required
--igenomes_base The base path to the igenomes reference files string s3://ngi-igenomes/igenomes/ no
--igenomes_ignore Do not load the iGenomes reference config. boolean n/a no
--save_reference Save built references. boolean n/a no
--download_cache Download annotation cache. boolean n/a no

Reference genome options

Name Description Type Default Required
--genome Name of iGenomes reference. string GRCh38 no
--fasta Path to FASTA genome file. string n/a no
--dict Path to FASTA dictionary file. string n/a no
--fasta_fai Path to FASTA reference index. string n/a no
--gtf Path to GTF annotation file. string n/a no
--gff Path to GFF3 annotation file. string n/a no
--exon_bed Path to BED file containing exon intervals. This will be created from the GTF file if not specified. string n/a no
--read_length Read length number 150 no
--known_indels Path to known indels file. string n/a no
--known_indels_tbi Path to known indels file index. string n/a no
--dbsnp Path to dbsnp file. string n/a no
--dbsnp_tbi Path to dbsnp index. string n/a no
--snpeff_db snpEff DB version. string n/a no
--vep_genome VEP genome. string n/a no
--vep_species VEP species. string n/a no
--vep_cache_version VEP cache version. integer n/a no
--feature_type Type of feature to parse from annotation file string exon no

Institutional config options

Name Description Type Default Required
--custom_config_version Git commit id for Institutional configs. string master no
--custom_config_base Base directory for Institutional configs. string https://raw.githubusercontent.com/nf-core/configs/master no
--config_profile_name Institutional config name. string n/a no
--config_profile_description Institutional config description. string n/a no
--config_profile_contact Institutional config contact information. string n/a no
--config_profile_url Institutional config URL link. string n/a no

Generic options

Name Description Type Default Required
--version Display version and exit. boolean n/a no
--publish_dir_mode Method used to save pipeline results to output directory. string copy no
--email Email address for completion summary. string n/a no
--email_on_fail Email address for completion summary, only when pipeline fails. string n/a no
--plaintext_email Send plain-text email instead of HTML. boolean n/a no
--max_multiqc_email_size File size limit when attaching MultiQC reports to summary emails. string 25.MB no
--monochrome_logs Do not use coloured log outputs. boolean n/a no
--hook_url Incoming hook URL for messaging service string n/a no
--multiqc_config Custom config file to supply to MultiQC. string n/a no
--multiqc_logo Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file string n/a no
--multiqc_methods_description Custom MultiQC yaml file containing HTML including a methods description. string n/a no
--multiqc_title MultiQC report title. Printed as page header, used for filename if not otherwise specified. string n/a no
--validate_params Boolean whether to validate parameters against the schema at runtime boolean True no
--modules_testdata_base_path Base URL or local path to location of pipeline test dataset files string https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/ no
--pipelines_testdata_base_path Base URL or local path to location of pipeline test dataset files string https://raw.githubusercontent.com/nf-core/test-datasets/rnavar/data/ no
--trace_report_suffix Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss. string n/a no
--help Display the help message. boolean n/a no
--help_full Display the full detailed help message. boolean n/a no
--show_hidden Display hidden parameters in the help message (only works when --help or --help_full are provided). boolean n/a no

Workflows

Name Description Entry
NFCORE_RNAVAR n/a no
(entry) n/a yes
RNAVAR Main workflow for RNA variant calling analysis. This workflow performs end-to-end RNA-seq variant calling including: - Quality control with FastQC - Read alignment with STAR - Duplicate marking with Picard - Split N CIGAR reads for RNA-seq data - Base quality score recalibration (BQSR) - Variant calling with GATK HaplotypeCaller - Variant filtering - Variant annotation with SnpEff and VEP - HLA typing with seq2HLA (optional) The workflow supports multiple input types including FASTQ, BAM, CRAM, and VCF files. no
BAM_STATS_SAMTOOLS Produces comprehensive statistics from SAM/BAM/CRAM file no
FASTQ_ALIGN_STAR Align reads to a reference genome using bowtie2 then sort with samtools no
VCF_ANNOTATE_SNPEFF Perform annotation with snpEff and bgzip + tabix index the resulting VCF file no
VCF_ANNOTATE_ENSEMBLVEP Perform annotation with ensemblvep and bgzip + tabix index the resulting VCF file no
BAM_MARKDUPLICATES_PICARD Picard MarkDuplicates, index BAM file and run samtools stats, flagstat and idxstats no
BAM_SORT_STATS_SAMTOOLS Sort SAM/BAM/CRAM file no
PREPARE_ALIGNMENT n/a no
SPLITNCIGAR Split reads that contain N CIGAR operations for RNA-seq variant calling. This subworkflow handles the GATK SplitNCigarReads step which is essential for RNA-seq variant calling. It splits reads that span introns (N in CIGAR) and reassigns mapping qualities to meet GATK requirements. The workflow processes BAM files in parallel across genomic intervals, then merges and indexes the results for efficient downstream processing. no
RECALIBRATE Apply base quality score recalibration (BQSR) to BAM files. This subworkflow applies the BQSR model generated by GATK BaseRecalibrator to adjust base quality scores in BAM files. Recalibrated quality scores improve the accuracy of variant calling by correcting systematic errors in the original quality scores assigned by the sequencing machine. Optionally generates alignment statistics using samtools stats for QC. no
DOWNLOAD_CACHE_SNPEFF_VEP n/a no
PIPELINE_INITIALISATION Initialize the nf-core/rnavar pipeline. Performs all setup tasks required before running the main workflow: - Display version information if requested - Validate parameters against the schema - Check Conda channel configuration - Parse and validate the input samplesheet - Generate parameter summary for logging no
PIPELINE_COMPLETION Handle pipeline completion tasks. Executes cleanup and notification tasks when the pipeline finishes: - Send completion email with run summary - Generate completion summary to stdout - Send notifications to messaging platforms (Slack, Teams, etc.) - Log error messages for failed runs no
ANNOTATION_CACHE_INITIALISATION n/a no
PREPARE_GENOME n/a no
VCF_ANNOTATE_ALL Annotate variants using multiple annotation tools. This subworkflow provides flexible variant annotation using one or more tools: - SnpEff: Functional annotation and effect prediction - VEP (Ensembl Variant Effect Predictor): Comprehensive variant annotation - BCFtools annotate: Add custom annotations from external files - Merge: Combined SnpEff + VEP annotation The tools to use are specified via the tools parameter as a comma-separated list (e.g., "snpeff,vep" or "merge"). no

NFCORE_RNAVAR Inputs

Name Description
samplesheet n/a
align n/a

NFCORE_RNAVAR Outputs

Name Description
? n/a
? n/a

RNAVAR Inputs

Name Description
input n/a
bcftools_annotations n/a
bcftools_annotations_tbi n/a
bcftools_columns n/a
bcftools_header_lines n/a
dbsnp n/a
dbsnp_tbi n/a
dict n/a
exon_bed n/a
fasta n/a
fasta_fai n/a
gtf n/a
known_sites n/a
known_sites_tbi n/a
star_index n/a
snpeff_cache n/a
snpeff_db n/a
vep_genome n/a
vep_species n/a
vep_cache_version n/a
vep_include_fasta n/a
vep_cache n/a
vep_extra_files n/a
seq_center n/a
seq_platform n/a
aligner n/a
bam_csi_index n/a
extract_umi n/a
generate_gvcf n/a
skip_multiqc n/a
skip_baserecalibration n/a
skip_intervallisttools n/a
skip_variantannotation n/a
skip_variantfiltration n/a
star_ignore_sjdbgtf n/a
tools n/a

RNAVAR Outputs

Name Description
? n/a
? n/a

BAM_STATS_SAMTOOLS Inputs

Name Description
ch_bam_bai The input channel containing the BAM/CRAM and it's index Structure: [ val(meta), path(bam), path(bai) ]
ch_fasta Reference genome fasta file Structure: [ path(fasta) ]

BAM_STATS_SAMTOOLS Outputs

Name Description
stats File containing samtools stats output Structure: [ val(meta), path(stats) ]
flagstat File containing samtools flagstat output Structure: [ val(meta), path(flagstat) ]
idxstats File containing samtools idxstats output Structure: [ val(meta), path(idxstats)]
versions Files containing software versions Structure: [ path(versions.yml) ]

FASTQ_ALIGN_STAR Inputs

Name Description
ch_reads List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively. Structure: [ val(meta), [ path(reads) ] ]
ch_index STAR genome index
ch_gtf GTF file used to set the splice junctions with the --sjdbGTFfile flag
val_star_ignore_sjdbgtf If true the --sjdbGTFfile flag is set
val_seq_platform Sequencing platform to be added to the bam header using the --outSAMattrRGline flag
val_seq_center Sequencing center to be added to the bam header using the --outSAMattrRGline flag
ch_fasta Reference genome fasta file
ch_transcripts_fasta Optional reference genome fasta file

FASTQ_ALIGN_STAR Outputs

Name Description
orig_bam Output BAM file containing read alignments Structure: [ val(meta), path(bam) ]
log_final STAR final log file Structure: [ val(meta), path(log_final) ]
log_out STAR log out file Structure: [ val(meta), path(log_out) ]
log_progress STAR log progress file Structure: [ val(meta), path(log_progress) ]
bam_sorted Sorted BAM file of read alignments (optional) Structure: [ val(meta), path(bam) ]
orig_bam_transcript Output BAM file of transcriptome alignment (optional) Structure: [ val(meta), path(bam) ]
fastq Unmapped FastQ files (optional) Structure: [ val(meta), path(fastq) ]
tab STAR output tab file(s) (optional) Structure: [ val(meta), path(tab) ]
bam BAM file ordered by samtools Structure: [ val(meta), path(bam) ]
bai BAI index of the ordered BAM file Structure: [ val(meta), path(bai) ]
stats File containing samtools stats output Structure: [ val(meta), path(stats) ]
flagstat File containing samtools flagstat output Structure: [ val(meta), path(flagstat) ]
idxstats File containing samtools idxstats output Structure: [ val(meta), path(idxstats) ]
bam_transcript Transcriptome-level BAM file ordered by samtools (optional) Structure: [ val(meta), path(bam) ]
bai_transcript Transcriptome-level BAI index of the ordered BAM file (optional) Structure: [ val(meta), path(bai) ]
stats_transcript Transcriptome-level file containing samtools stats output (optional) Structure: [ val(meta), path(stats) ]
flagstat_transcript Transcriptome-level file containing samtools flagstat output (optional) Structure: [ val(meta), path(flagstat) ]
idxstats_transcript Transcriptome-level file containing samtools idxstats output (optional) Structure: [ val(meta), path(idxstats) ]
versions File containing software versions

VCF_ANNOTATE_SNPEFF Inputs

Name Description
ch_vcf vcf file Structure: [ val(meta), path(vcf) ]
val_snpeff_db db version to use
ch_snpeff_cache path to root cache folder for snpEff (optional) Structure: [ path(cache) ]

VCF_ANNOTATE_SNPEFF Outputs

Name Description
vcf_tbi Compressed vcf file + tabix index Structure: [ val(meta), path(vcf), path(tbi) ]
reports html reports Structure: [ path(html) ]
summary html reports Structure: [ path(csv) ]
genes_txt html reports Structure: [ path(txt) ]
versions Files containing software versions Structure: [ path(versions.yml) ]

VCF_ANNOTATE_ENSEMBLVEP Inputs

Name Description
ch_vcf vcf file to annotate Structure: [ val(meta), path(vcf), [path(custom_file1), path(custom_file2)... (optional)] ]
ch_fasta Reference genome fasta file (optional) Structure: [ val(meta2), path(fasta) ]
val_genome genome to use
val_species species to use
val_cache_version cache version to use
ch_cache the root cache folder for ensemblvep (optional) Structure: [ val(meta3), path(cache) ]
ch_extra_files any extra files needed by plugins for ensemblvep (optional) Structure: [ path(file1), path(file2)... ]

VCF_ANNOTATE_ENSEMBLVEP Outputs

Name Description
vcf_tbi Compressed vcf file + tabix index Structure: [ val(meta), path(vcf), path(tbi) ]
json json file Structure: [ val(meta), path(json) ]
tab tab file Structure: [ val(meta), path(tab) ]
reports html reports
versions File containing software versions

BAM_MARKDUPLICATES_PICARD Inputs

Name Description
ch_reads Sequence reads in BAM/CRAM/SAM format Structure: [ val(meta), path(reads) ]
ch_fasta Reference genome fasta file required for CRAM input Structure: [ path(fasta) ]
ch_fasta Index of the reference genome fasta file Structure: [ path(fai) ]

BAM_MARKDUPLICATES_PICARD Outputs

Name Description
bam processed BAM/SAM file Structure: [ val(meta), path(bam) ]
bai BAM/SAM samtools index Structure: [ val(meta), path(bai) ]
cram processed CRAM file Structure: [ val(meta), path(cram) ]
crai CRAM samtools index Structure: [ val(meta), path(crai) ]
csi CSI samtools index Structure: [ val(meta), path(csi) ]
stats File containing samtools stats output Structure: [ val(meta), path(stats) ]
flagstat File containing samtools flagstat output Structure: [ val(meta), path(flagstat) ]
idxstats File containing samtools idxstats output Structure: [ val(meta), path(idxstats) ]
versions Files containing software versions Structure: [ path(versions.yml) ]

BAM_SORT_STATS_SAMTOOLS Inputs

Name Description
meta Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
bam BAM/CRAM/SAM file
fasta Reference genome fasta file

BAM_SORT_STATS_SAMTOOLS Outputs

Name Description
meta Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
bam Sorted BAM/CRAM/SAM file
bai BAM/CRAM/SAM index file
crai BAM/CRAM/SAM index file
stats File containing samtools stats output
flagstat File containing samtools flagstat output
idxstats File containing samtools idxstats output
versions File containing software versions

PREPARE_ALIGNMENT Inputs

Name Description
cram n/a
bam n/a

PREPARE_ALIGNMENT Outputs

Name Description
bam n/a
versions n/a

SPLITNCIGAR Inputs

Name Description
bam n/a
fasta n/a
fai n/a
dict n/a
intervals n/a

SPLITNCIGAR Outputs

Name Description
bam_bai n/a
versions n/a

RECALIBRATE Inputs

Name Description
skip_samtools n/a
bam n/a
dict n/a
fai n/a
fasta n/a

RECALIBRATE Outputs

Name Description
bam n/a
qc n/a
versions n/a

DOWNLOAD_CACHE_SNPEFF_VEP Inputs

Name Description
ensemblvep_info n/a
snpeff_info n/a

DOWNLOAD_CACHE_SNPEFF_VEP Outputs

Name Description
ensemblvep_cache n/a
snpeff_cache n/a

PIPELINE_INITIALISATION Inputs

Name Description
version n/a
validate_params n/a
nextflow_cli_args n/a
outdir n/a
input n/a
help n/a
help_full n/a
show_hidden n/a

PIPELINE_INITIALISATION Outputs

Name Description
samplesheet n/a
align n/a
versions n/a

PIPELINE_COMPLETION Inputs

Name Description
email n/a
email_on_fail n/a
plaintext_email n/a
outdir n/a
monochrome_logs n/a
hook_url n/a
multiqc_report n/a

PIPELINE_COMPLETION Outputs

Name Description
<none> n/a

ANNOTATION_CACHE_INITIALISATION Inputs

Name Description
snpeff_enabled n/a
snpeff_cache n/a
snpeff_db n/a
vep_enabled n/a
vep_cache n/a
vep_species n/a
vep_cache_version n/a
vep_genome n/a
vep_custom_args n/a
help_message n/a

ANNOTATION_CACHE_INITIALISATION Outputs

Name Description
? n/a
? n/a

PREPARE_GENOME Inputs

Name Description
bcftools_annotations n/a
bcftools_annotations_tbi n/a
dbsnp n/a
dbsnp_tbi n/a
dict n/a
exon_bed n/a
fasta n/a
fasta_fai n/a
gff n/a
gtf n/a
known_indels n/a
known_indels_tbi n/a
star_index n/a
feature_type n/a
skip_exon_bed_check n/a
align n/a

PREPARE_GENOME Outputs

Name Description
bcfann n/a
bcfann_tbi n/a
dbsnp n/a
dbsnp_tbi n/a
dict n/a
exon_bed n/a
fasta n/a
fasta_fai n/a
gtf n/a
known_indels n/a
known_indels_tbi n/a
known_sites n/a
known_sites_tbi n/a
star_index n/a
versions n/a

VCF_ANNOTATE_ALL Inputs

Name Description
vcf n/a
fasta n/a
tools n/a
snpeff_db n/a
snpeff_cache n/a
vep_genome n/a
vep_species n/a
vep_cache_version n/a
vep_cache n/a
vep_extra_files n/a
bcftools_annotations n/a
bcftools_annotations_index n/a
bcftools_columns n/a
bcftools_header_lines n/a

VCF_ANNOTATE_ALL Outputs

Name Description
? n/a
? n/a
? n/a
? n/a

Processes

Name Description
MULTIQC Aggregate results from bioinformatics analyses across many samples into a single report
UNTAR Extract files from tar, tar.gz, tar.bz2, tar.xz archives
MOSDEPTH Calculates genome-wide sequencing coverage.
SEQ2HLA Precision HLA typing and expression from RNA-seq data using seq2HLA
FASTQC Run FastQC on sequenced reads
GFFREAD Validate, filter, convert and perform various other operations on GFF files
GUNZIP Compresses and decompresses files.
GATK4_COMBINEGVCFS Combine per-sample gVCF files produced by HaplotypeCaller into a multi-sample gVCF file
GATK4_INDEXFEATUREFILE Creates an index for a feature file, e.g. VCF or BED file.
GATK4_VARIANTFILTRATION Filter variants
GATK4_CREATESEQUENCEDICTIONARY Creates a sequence dictionary for a reference sequence
GATK4_SPLITNCIGARREADS Splits reads that contain Ns in their cigar string
GATK4_HAPLOTYPECALLER Call germline SNPs and indels via local re-assembly of haplotypes
GATK4_INTERVALLISTTOOLS Splits the interval list file into unique, equally-sized interval files and place it under a directory
GATK4_BASERECALIBRATOR Generate recalibration table for Base Quality Score Recalibration (BQSR)
GATK4_APPLYBQSR Apply base quality score recalibration (BQSR) to a bam file
GATK4_BEDTOINTERVALLIST Creates an interval list from a bed file and a reference dict
GATK4_MERGEVCFS Merges several vcf files
UMITOOLS_EXTRACT Extracts UMI barcode from a read and add it to the read name, leaving any sample barcode in place
SAMTOOLS_SORT Sort SAM/BAM/CRAM file
SAMTOOLS_MERGE Merge BAM or CRAM file
SAMTOOLS_IDXSTATS Reports alignment summary statistics for a BAM/CRAM/SAM file
SAMTOOLS_FAIDX Index FASTA file, and optionally generate a file of chromosome sizes
SAMTOOLS_INDEX Index SAM/BAM/CRAM file
SAMTOOLS_FLAGSTAT Counts the number of alignments in a BAM/CRAM/SAM file for each FLAG type
SAMTOOLS_STATS Produces comprehensive statistics from SAM/BAM/CRAM file
SAMTOOLS_CONVERT convert and then index CRAM -> BAM or BAM -> CRAM file
BEDTOOLS_SORT Sorts a feature file by chromosome and other criteria.
BEDTOOLS_MERGE combines overlapping or “book-ended” features in an interval file into a single feature which spans all of the combined features.
STAR_GENOMEGENERATE Create index for STAR
STAR_ALIGN Align reads to a reference genome using STAR
STAR_INDEXVERSION Get the minimal allowed index version from STAR
SNPEFF_SNPEFF Genetic variant annotation and functional effect prediction toolbox
SNPEFF_DOWNLOAD Genetic variant annotation and functional effect prediction toolbox
ENSEMBLVEP_VEP Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args.
ENSEMBLVEP_DOWNLOAD Ensembl Variant Effect Predictor (VEP). The cache downloading options are controlled through task.ext.args.
BCFTOOLS_ANNOTATE Add or remove annotations.
PICARD_MARKDUPLICATES Locate and tag duplicate reads in a BAM file
TABIX_TABIX create tabix index from a sorted bgzip tab-delimited genome file
TABIX_BGZIPTABIX bgzip a sorted tab-delimited genome file and then create tabix index
CAT_FASTQ Concatenates fastq files
REMOVE_UNKNOWN_REGIONS n/a
GTF2BED Convert GTF annotation file to BED format. Extracts genomic features (exons, transcripts, or genes) from a GTF file and outputs them in BED format for use with interval-based tools. The output BED file uses 0-based coordinates (BED standard) converted from the 1-based GTF coordinates.

MULTIQC Outputs

Name Type Pattern Description
report - n/a n/a
data - n/a n/a
plots - n/a n/a

UNTAR Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
archive file File to be untarred

UNTAR Outputs

Name Type Pattern Description
untar map */ Groovy Map containing sample information e.g. [ id:'test', single_end:false ]

MOSDEPTH Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
bam file Input BAM/CRAM file
bai file Index for BAM/CRAM file
bed file BED file with intersected intervals
meta2 map Groovy Map containing bed information e.g. [ id:'test' ]
fasta file Reference genome FASTA file

MOSDEPTH Outputs

Name Type Pattern Description
global_txt file *.{global.dist.txt} Text file with global cumulative coverage distribution
summary_txt file *.{summary.txt} Text file with summary mean depths per chromosome and regions
regions_txt file *.{region.dist.txt} Text file with region cumulative coverage distribution
per_base_d4 file *.{per-base.d4} D4 file with per-base coverage
per_base_bed file *.{per-base.bed.gz} BED file with per-base coverage
per_base_csi file *.{per-base.bed.gz.csi} Index file for BED file with per-base coverage
regions_bed file *.{regions.bed.gz} BED file with per-region coverage
regions_csi file *.{regions.bed.gz.csi} Index file for BED file with per-region coverage
quantized_bed file *.{quantized.bed.gz} BED file with binned coverage
quantized_csi file *.{quantized.bed.gz.csi} Index file for BED file with binned coverage
thresholds_bed file *.{thresholds.bed.gz} BED file with the number of bases in each region that are covered at or above each threshold
thresholds_csi file *.{thresholds.bed.gz.csi} Index file for BED file with threshold coverage

SEQ2HLA Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'sample1', single_end:false ]
reads file Paired-end FASTQ files for RNA-seq data

SEQ2HLA Outputs

Name Type Pattern Description
class1_genotype_2d file *ClassI-class.HLAgenotype2digits HLA Class I 2-digit genotype results
class2_genotype_2d file *ClassII.HLAgenotype2digits HLA Class II 2-digit genotype results
class1_genotype_4d file *ClassI-class.HLAgenotype4digits HLA Class I 4-digit genotype results
class2_genotype_4d file *ClassII.HLAgenotype4digits HLA Class II 4-digit genotype results
class1_bowtielog file *ClassI-class.bowtielog HLA Class I Bowtie alignment log
class2_bowtielog file *ClassII.bowtielog HLA Class II Bowtie alignment log
class1_expression file *ClassI-class.expression HLA Class I expression results
class2_expression file *ClassII.expression HLA Class II expression results
class1_nonclass_genotype_2d file *ClassI-nonclass.HLAgenotype2digits HLA Class I non-classical 2-digit genotype results
ambiguity file *.ambiguity HLA typing ambiguity results
class1_nonclass_genotype_4d file *ClassI-nonclass.HLAgenotype4digits HLA Class I non-classical 4-digit genotype results
class1_nonclass_bowtielog file *ClassI-nonclass.bowtielog HLA Class I non-classical Bowtie alignment log
class1_nonclass_expression file *ClassI-nonclass.expression HLA Class I non-classical expression results

FASTQC Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
reads file List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively.

FASTQC Outputs

Name Type Pattern Description
html file *_{fastqc.html} FastQC report
zip file *_{fastqc.zip} FastQC report archive

GFFREAD Inputs

Name Type Description
meta map Groovy Map containing meta data e.g. [ id:'test' ]
gff file A reference file in either the GFF3, GFF2 or GTF format.

GFFREAD Outputs

Name Type Pattern Description
gtf file *.{gtf} GTF file resulting from the conversion of the GFF input file if '-T' argument is present
gffread_gff file *.gff3 GFF3 file resulting from the conversion of the GFF input file if '-T' argument is absent
gffread_fasta file *.fasta Fasta file produced when either of '-w', '-x', '-y' parameters is present

GUNZIP Inputs

Name Type Description
meta map Optional groovy Map containing meta information e.g. [ id:'test', single_end:false ]
archive file File to be compressed/uncompressed

GUNZIP Outputs

Name Type Pattern Description
gunzip file *.* Compressed/uncompressed file

GATK4_COMBINEGVCFS Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test' ]
vcf file Compressed VCF files
vcf_idx file VCF Index file

GATK4_COMBINEGVCFS Outputs

Name Type Pattern Description
combined_gvcf file *.combined.g.vcf.gz Compressed Combined GVCF file

GATK4_INDEXFEATUREFILE Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
feature_file file VCF/BED file

GATK4_INDEXFEATUREFILE Outputs

Name Type Pattern Description
index file *.{tbi,idx} Index for VCF/BED file

GATK4_VARIANTFILTRATION Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test']
vcf list List of VCF(.gz) files
tbi list List of VCF file indexes
meta2 map Groovy Map containing reference information e.g. [ id:'genome' ]
fasta file Fasta file of reference genome
meta3 map Groovy Map containing reference information e.g. [ id:'genome' ]
fai file Index of fasta file
meta4 map Groovy Map containing reference information e.g. [ id:'genome' ]
dict file Sequence dictionary of fastea file
meta5 map Groovy Map containing reference information e.g. [ id:'genome' ]
gzi file Genome index file only needed when the genome file was compressed with the BGZF algorithm.

GATK4_VARIANTFILTRATION Outputs

Name Type Pattern Description
vcf file *.vcf.gz Compressed VCF file
tbi file *.vcf.gz.tbi Index of VCF file

GATK4_CREATESEQUENCEDICTIONARY Inputs

Name Type Description
meta map Groovy Map containing reference information e.g. [ id:'genome' ]
fasta file Input fasta file

GATK4_CREATESEQUENCEDICTIONARY Outputs

Name Type Pattern Description
dict file *.{dict} gatk dictionary file

GATK4_SPLITNCIGARREADS Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test']
bam list BAM/SAM/CRAM file containing reads
bai list BAI/SAI/CRAI index file (optional)
intervals file Bed file with the genomic regions included in the library (optional)
meta2 map Groovy Map containing reference information e.g. [ id:'reference' ]
fasta file The reference fasta file
meta3 map Groovy Map containing reference information e.g. [ id:'reference' ]
fai file Index of reference fasta file
meta4 map Groovy Map containing reference information e.g. [ id:'reference' ]
dict file GATK sequence dictionary

GATK4_SPLITNCIGARREADS Outputs

Name Type Pattern Description
bam file *.{bam,sam,cram} Output file with split reads (BAM/SAM/CRAM)

GATK4_HAPLOTYPECALLER Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
input file BAM/CRAM file from alignment
input_index file BAI/CRAI file from alignment
intervals file Bed file with the genomic regions included in the library (optional)
dragstr_model file Text file containing the DragSTR model of the used BAM/CRAM file (optional)
meta2 map Groovy Map containing reference information e.g. [ id:'test_reference' ]
fasta file The reference fasta file
meta3 map Groovy Map containing reference information e.g. [ id:'test_reference' ]
fai file Index of reference fasta file
meta4 map Groovy Map containing reference information e.g. [ id:'test_reference' ]
dict file GATK sequence dictionary
meta5 map Groovy Map containing dbsnp information e.g. [ id:'test_dbsnp' ]
dbsnp file VCF file containing known sites (optional)
meta6 map Groovy Map containing dbsnp information e.g. [ id:'test_dbsnp' ]
dbsnp_tbi file VCF index of dbsnp (optional)

GATK4_HAPLOTYPECALLER Outputs

Name Type Pattern Description
vcf file *.vcf.gz Compressed VCF file
tbi file *.vcf.gz.tbi Index of VCF file
bam file *.realigned.bam Assembled haplotypes and locally realigned reads

GATK4_INTERVALLISTTOOLS Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
intervals file Interval file

GATK4_INTERVALLISTTOOLS Outputs

Name Type Pattern Description
interval_list file *.interval_list Interval list files

GATK4_BASERECALIBRATOR Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
input file BAM/CRAM file from alignment
input_index file BAI/CRAI file from alignment
intervals file Bed file with the genomic regions included in the library (optional)
meta2 map Groovy Map containing reference information e.g. [ id:'genome']
fasta file The reference fasta file
meta3 map Groovy Map containing reference information e.g. [ id:'genome']
fai file Index of reference fasta file
meta4 map Groovy Map containing reference information e.g. [ id:'genome']
dict file GATK sequence dictionary
meta5 map Groovy Map containing reference information e.g. [ id:'genome']
known_sites file VCF files with known sites for indels / snps
meta6 map Groovy Map containing reference information e.g. [ id:'genome']
known_sites_tbi file Tabix index of the known_sites

GATK4_BASERECALIBRATOR Outputs

Name Type Pattern Description
table file *.{table} Recalibration table from BaseRecalibrator

GATK4_APPLYBQSR Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
input file BAM/CRAM file from alignment
input_index file BAI/CRAI file from alignment
bqsr_table file Recalibration table from gatk4_baserecalibrator
intervals file Bed file with the genomic regions included in the library (optional)

GATK4_APPLYBQSR Outputs

Name Type Pattern Description
bam file ${prefix}.bam Recalibrated BAM file
bai file ${prefix}*bai Recalibrated BAM index file
cram file ${prefix}.cram Recalibrated CRAM file

GATK4_BEDTOINTERVALLIST Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test']
bed file Input bed file
meta2 map Groovy Map containing reference information e.g. [ id:'genome' ]
dict file Sequence dictionary

GATK4_BEDTOINTERVALLIST Outputs

Name Type Pattern Description
interval_list file *.interval_list gatk interval list file

GATK4_MERGEVCFS Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test']
vcf list Two or more VCF files
meta2 map Groovy Map containing reference information e.g. [ id:'genome']
dict file Optional Sequence Dictionary as input

GATK4_MERGEVCFS Outputs

Name Type Pattern Description
vcf file *.vcf.gz merged vcf file
tbi file *.tbi index files for the merged vcf files

UMITOOLS_EXTRACT Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
reads list List of input FASTQ files whose UMIs will be extracted.

UMITOOLS_EXTRACT Outputs

Name Type Pattern Description
reads file *.{fastq.gz} Extracted FASTQ files. | For single-end reads, pattern is \${prefix}.umi_extract.fastq.gz. | For paired-end reads, pattern is \${prefix}.umi_extract_{1,2}.fastq.gz.
log file *.{log} Logfile for umi_tools

SAMTOOLS_SORT Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
bam file BAM/CRAM/SAM file(s)
meta2 map Groovy Map containing reference information e.g. [ id:'genome' ]
fasta file Reference genome FASTA file

SAMTOOLS_SORT Outputs

Name Type Pattern Description
bam file *.{bam} Sorted BAM file
cram file *.{cram} Sorted CRAM file
sam file *.{sam} Sorted SAM file
crai file *.crai CRAM index file (optional)
csi file *.csi BAM index file (optional)
bai file *.bai BAM index file (optional)

SAMTOOLS_MERGE Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
input_files file BAM/CRAM file
meta2 map Groovy Map containing reference information e.g. [ id:'genome' ]
fasta file Reference file the CRAM was created with (optional)
meta3 map Groovy Map containing reference information e.g. [ id:'genome' ]
fai file Index of the reference file the CRAM was created with (optional)
meta4 map Groovy Map containing reference information e.g. [ id:'genome' ]
gzi file Index of the compressed reference file the CRAM was created with (optional)

SAMTOOLS_MERGE Outputs

Name Type Pattern Description
bam file *.{bam} BAM file
cram file *.{cram} CRAM file
csi file *.csi BAM index file (optional)
crai file *.crai CRAM index file (optional)

SAMTOOLS_IDXSTATS Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
bam file BAM/CRAM/SAM file
bai file Index for BAM/CRAM/SAM file

SAMTOOLS_IDXSTATS Outputs

Name Type Pattern Description
idxstats file *.{idxstats} File containing samtools idxstats output

SAMTOOLS_FAIDX Inputs

Name Type Description
meta map Groovy Map containing reference information e.g. [ id:'test' ]
fasta file FASTA file
meta2 map Groovy Map containing reference information e.g. [ id:'test' ]
fai file FASTA index file

SAMTOOLS_FAIDX Outputs

Name Type Pattern Description
fa file *.{fa} FASTA file
sizes file *.{sizes} File containing chromosome lengths
fai file *.{fai} FASTA index file
gzi file *.gzi Optional gzip index file for compressed inputs

SAMTOOLS_INDEX Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
input file input file

SAMTOOLS_INDEX Outputs

Name Type Pattern Description
bai file *.{bai,crai,sai} BAM/CRAM/SAM index file
csi file *.{csi} CSI index file
crai file *.{bai,crai,sai} BAM/CRAM/SAM index file

SAMTOOLS_FLAGSTAT Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
bam file BAM/CRAM/SAM file
bai file Index for BAM/CRAM/SAM file

SAMTOOLS_FLAGSTAT Outputs

Name Type Pattern Description
flagstat file *.{flagstat} File containing samtools flagstat output

SAMTOOLS_STATS Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
input file BAM/CRAM file from alignment
input_index file BAI/CRAI file from alignment
meta2 map Groovy Map containing reference information e.g. [ id:'genome' ]
fasta file Reference file the CRAM was created with (optional)

SAMTOOLS_STATS Outputs

Name Type Pattern Description
stats file *.{stats} File containing samtools stats output

SAMTOOLS_CONVERT Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
input file BAM/CRAM file
index file BAM/CRAM index file
meta2 map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
fasta file Reference file to create the CRAM file
meta3 map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
fai file Reference index file to create the CRAM file

SAMTOOLS_CONVERT Outputs

Name Type Pattern Description
bam file *{.bam} filtered/converted BAM file
cram file *{cram} filtered/converted CRAM file
bai file *{.bai} filtered/converted BAM index
crai file *{.crai} filtered/converted CRAM index

BEDTOOLS_SORT Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
intervals file BED/BEDGRAPH

BEDTOOLS_SORT Outputs

Name Type Pattern Description
sorted file *.${extension} Sorted output file

BEDTOOLS_MERGE Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
bed file Input BED file

BEDTOOLS_MERGE Outputs

Name Type Pattern Description
bed file *.{bed} Overlapped bed file with combined features

STAR_GENOMEGENERATE Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
fasta file Fasta file of the reference genome
meta2 map Groovy Map containing reference information e.g. [ id:'test' ]
gtf file GTF file of the reference genome

STAR_GENOMEGENERATE Outputs

Name Type Pattern Description
index directory star Folder containing the star index files

STAR_ALIGN Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
reads file List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively.
meta2 map Groovy Map containing reference information e.g. [ id:'test' ]
index directory STAR genome index
meta3 map Groovy Map containing reference information e.g. [ id:'test' ]
gtf file Annotation GTF file

STAR_ALIGN Outputs

Name Type Pattern Description
log_final file *Log.final.out STAR final log file
log_out file *Log.out STAR lot out file
log_progress file *Log.progress.out STAR log progress file
bam file *.{bam} Output BAM file containing read alignments
bam_sorted file *sortedByCoord.out.bam Sorted BAM file of read alignments (optional)
bam_sorted_aligned file *.Aligned.sortedByCoord.out.bam Sorted BAM file of read alignments (optional)
bam_transcript file *toTranscriptome.out.bam Output BAM file of transcriptome alignment (optional)
bam_unsorted file *Aligned.unsort.out.bam Unsorted BAM file of read alignments (optional)
fastq file *fastq.gz Unmapped FastQ files (optional)
tab file *.tab STAR output tab file(s) (optional)
spl_junc_tab file *.SJ.out.tab STAR output splice junction tab file
read_per_gene_tab file *.ReadsPerGene.out.tab STAR output read per gene tab file
junction file *.out.junction STAR chimeric junction output file (optional)
sam file *.out.sam STAR output SAM file(s) (optional)
wig file *.wig STAR output wiggle format file(s) (optional)
bedgraph file *.bg STAR output bedGraph format file(s) (optional)

STAR_INDEXVERSION Outputs

Name Type Pattern Description
index_version - n/a n/a

SNPEFF_SNPEFF Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
vcf file vcf to annotate
meta2 map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
cache file path to snpEff cache (optional)

SNPEFF_SNPEFF Outputs

Name Type Pattern Description
vcf file *.ann.vcf annotated vcf
report string *.csv The process The tool name snpEff report csv file
summary_html string *.html The process The tool name snpEff summary statistics in html file
genes_txt string *.genes.txt The process The tool name txt (tab separated) file having counts of the number of variants affecting each transcript and gene

SNPEFF_DOWNLOAD Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
snpeff_db string SnpEff database name

SNPEFF_DOWNLOAD Outputs

Name Type Pattern Description
cache file n/a snpEff cache

ENSEMBLVEP_VEP Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
vcf file vcf to annotate
custom_extra_files file extra sample-specific files to be used with the --custom flag to be configured with ext.args (optional)
meta2 map Groovy Map containing fasta reference information e.g. [ id:'test' ]
fasta file reference FASTA file (optional)

ENSEMBLVEP_VEP Outputs

Name Type Pattern Description
vcf file *.vcf.gz annotated vcf (optional)
tbi file *.vcf.gz.tbi annotated vcf index (optional)
tab file *.ann.tab.gz tab file with annotated variants (optional)
json file *.ann.json.gz json file with annotated variants (optional)
report string *.html The process The tool name VEP report file

ENSEMBLVEP_DOWNLOAD Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
assembly string Genome assembly
species string Specie
cache_version string cache version

ENSEMBLVEP_DOWNLOAD Outputs

Name Type Pattern Description
cache file * cache

BCFTOOLS_ANNOTATE Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
input file Query VCF or BCF file, can be either uncompressed or compressed
index file Index of the query VCF or BCF file
annotations file Bgzip-compressed file with annotations
annotations_index file Index of the annotations file

BCFTOOLS_ANNOTATE Outputs

Name Type Pattern Description
vcf file *{vcf,vcf.gz,bcf,bcf.gz} Compressed annotated VCF file
tbi file *.tbi Alternative VCF file index
csi file *.csi Default VCF file index

PICARD_MARKDUPLICATES Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
reads file Sequence reads file, can be SAM/BAM/CRAM format
meta2 map Groovy Map containing reference information e.g. [ id:'genome' ]
fasta file Reference genome fasta file, required for CRAM input
meta3 map Groovy Map containing reference information e.g. [ id:'genome' ]
fai file Reference genome fasta index

PICARD_MARKDUPLICATES Outputs

Name Type Pattern Description
bam file *.{bam} BAM file with duplicate reads marked/removed
bai file *.{bai} An optional BAM index file. If desired, --CREATE_INDEX must be passed as a flag
cram file *.{cram} Output CRAM file
metrics file *.{metrics.txt} Duplicate metrics file generated by picard

TABIX_TABIX Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
tab file TAB-delimited genome position file compressed with bgzip

TABIX_TABIX Outputs

Name Type Pattern Description
index file *.{tbi,csi} Tabix index file (either tbi or csi)

TABIX_BGZIPTABIX Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
input file Sorted tab-delimited genome file

TABIX_BGZIPTABIX Outputs

Name Type Pattern Description
gz_index file *.gz, *.{tbi,csi} bgzipped tab-delimited genome file Tabix index file (either tbi or csi)

CAT_FASTQ Inputs

Name Type Description
meta map Groovy Map containing sample information e.g. [ id:'test', single_end:false ]
reads file List of input FastQ files to be concatenated.

CAT_FASTQ Outputs

Name Type Pattern Description
reads file *.{merged.fastq.gz} Merged fastq file

REMOVE_UNKNOWN_REGIONS Inputs

Name Type Description
val(meta), path(bed) tuple n/a
val(meta2), path(dict) tuple n/a

REMOVE_UNKNOWN_REGIONS Outputs

Name Type Emit Description
val(meta), path('*.bed') tuple bed n/a

GTF2BED Inputs

Name Type Description
val(meta), path(gtf) tuple n/a

GTF2BED Outputs

Name Type Emit Description
val(meta), path('*.bed') tuple bed n/a

Functions

Name Parameters Returns Description
paramsSummaryMultiqc summary_params n/a n/a
validateInputParameters n/a n/a Validate pipeline input parameters. Checks that all required parameters are provided and valid. Currently validates that the specified genome exists in the config.
validateInputSamplesheet input n/a Validate and parse input samplesheet entries. Ensures that multiple runs of the same sample have consistent sequencing type (all single-end or all paired-end).
checkSamplesAfterGrouping input n/a Validate samples after grouping by sample ID. Performs consistency checks on grouped sample data: - Ensures only one BAM/CRAM file per sample - Prevents mixing of FASTQ and BAM/CRAM inputs - Validates consistent single-end/paired-end status - Properly interleaves paired-end FASTQ files
genomeExistsError n/a n/a Check if the specified genome exists in the configuration. Throws an error with a helpful message listing available genomes if the specified genome key is not found in the config.
toolCitationText n/a n/a n/a
toolBibliographyText n/a n/a n/a
methodsDescriptionText mqc_methods_yaml n/a n/a
isCloudUrl cache_url n/a n/a
isCompatibleStarIndex index_version, minimal_index_version n/a n/a
convertVersionToList version n/a n/a

This pipeline was built with Nextflow. Documentation generated by nf-docs v0.2.0 on 2026-03-03 22:40:54 UTC.