Search Results
Showing results for ""
Enter a search term
Find parameters, processes, workflows, and more
No results found
Try a different search term
nf-core/sarek
An open-source analysis pipeline to detect germline or somatic variants from whole genome or targeted sequencing
Introduction¶
nf-core/sarek is a workflow designed to detect variants on whole genome or targeted sequencing data. Initially designed for Human, and Mouse, it can work on any species with a reference genome. Sarek can also handle tumour / normal pairs and could include additional relapses.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!
On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the nf-core website.
It's listed on Elixir - Tools and Data Services Registry and Dockstore.
Pipeline summary¶
Depending on the options and samples provided, the pipeline can currently perform the following:
- Form consensus reads from UMI sequences (
fgbio) - Sequencing quality control and trimming (enabled by
--trim_fastq) (FastQC,fastp) - Contamination removal (
BBSplit, enabled by--tools bbsplit) - Map Reads to Reference (
BWA-mem,BWA-mem2,dragmaporSentieon BWA-mem) - Process BAM file (
GATK MarkDuplicates,GATK BaseRecalibratorandGATK ApplyBQSRorSentieon LocusCollectorandSentieon Dedup) - Experimental Feature: Use GPU-accelerated parabricks implementation as alternative to "Map Reads to Reference" + "Process BAM file" (
--aligner parabricks) - Summarise alignment statistics (
samtools stats,mosdepth) - Variant calling (enabled by
--tools, see compatibility):ASCATCNVkitControl-FREECDeepVariantfreebayesGATK HaplotypeCallerGATK Mutect2indexcovLofreqMantampileupMSIsensor2MSIsensor-proMuSESentieon HaplotyperStrelkaTIDDIT
- Post-variant calling options, one of:
- Filtering (
bcftools view(default: filter byPASS,.)), normalisation (bcftools norm) and consensus calling (bcftools isec, default: called by at least 2 tools-n+2) on all vcfs and/orbcftools concatfor germline vcfs Varlociraptorfor all vcfs
- Filtering (
- Variant filtering and annotation (
SnpEff,Ensembl VEP,BCFtools annotate) - Summarise and represent QC (
MultiQC)
Usage¶
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv:
patient,sample,lane,fastq_1,fastq_2
ID1,S1,L002,ID1_S1_L002_R1_001.fastq.gz,ID1_S1_L002_R2_001.fastq.gz
Each row represents a pair of fastq files (paired end).
Now, you can run the pipeline using:
nextflow run nf-core/sarek \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR>
Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.
For more details and further functionality, please refer to the usage documentation and the parameter documentation.
Pipeline output¶
To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.
Benchmarking¶
On each release, the pipeline is run on 3 full size tests:
test_fullruns tumor-normal data for one patient from the SEQ2C consortiumtest_full_germlineruns a WGS 30X Genome-in-a-Bottle(NA12878) datasettest_full_germline_ncbench_agilentruns two WES samples with 75M and 200M reads (data available here). The results are uploaded to Zenodo, evaluated against a truth dataset, and results are made available via the NCBench dashboard.
Credits¶
Sarek was originally written by Maxime U Garcia and Szilveszter Juhos at the National Genomics Infastructure and National Bioinformatics Infastructure Sweden which are both platforms at SciLifeLab, with the support of The Swedish Childhood Tumor Biobank (Barntumörbanken). Friederike Hanssen and Gisela Gabernet at QBiC later joined and helped with further development.
The Nextflow DSL2 conversion of the pipeline was lead by Friederike Hanssen and Maxime U Garcia.
Maintenance is now lead by Friederike Hanssen and Maxime U Garcia (now at Seqera)
Main developers:
We thank the following people for their extensive assistance in the development of this pipeline:
- Abhinav Sharma
- Adam Talbot
- Adrian Lärkeryd
- Àitor Olivares
- Alexander Peltzer
- Alison Meynert
- Anders Sune Pedersen
- arontommi
- BarryDigby
- Bekir Ergüner
- bjornnystedt
- cgpu
- Chela James
- David Mas-Ponte
- Edmund Miller
- Famke Bäuerle
- Francesco Lescai
- Francisco Martínez
- Gavin Mackenzie
- Gisela Gabernet
- Grant Neilson
- gulfshores
- Harshil Patel
- Hongwei Ye
- James A. Fellows Yates
- Jesper Eisfeldt
- Johannes Alneberg
- Jonas Kjellin
- José Fernández Navarro
- Júlia Mir Pedrol
- Ken Brewer
- Lasse Westergaard Folkersen
- Lucia Conde
- Louis Le Nézet
- Malin Larsson
- Marcel Martin
- Nick Smith
- Nicolas Schcolnicov
- Nilesh Tawari
- Nils Homer
- Olga Botvinnik
- Oskar Wacker
- pallolason
- Paul Cantalupo
- Phil Ewels
- Pierre Lindenbaum
- Sabrina Krakau
- Sam Minot
- Sebastian-D
- Silvia Morini
- Simon Pearce
- Solenne Correard
- Susanne Jodoin
- Szilveszter Juhos
- Tobias Koch
- Winni Kretzschmar
- Patricie Skaláková
Acknowledgements¶
Contributions & Support¶
If you would like to contribute to this pipeline, please see the contributing guidelines.
For further information or help, don't hesitate to get in touch on the Slack #sarek channel (you can join with this invite), or contact us: Maxime U Garcia, Friederike Hanssen
Citations¶
If you use nf-core/sarek for your analysis, please cite the Sarek article as follows:
Friederike Hanssen, Maxime U Garcia, Lasse Folkersen, Anders Sune Pedersen, Francesco Lescai, Susanne Jodoin, Edmund Miller, Oskar Wacker, Nicholas Smith, nf-core community, Gisela Gabernet, Sven Nahnsen Scalable and efficient DNA sequencing analysis on different compute infrastructures aiding variant discovery NAR Genomics and Bioinformatics Volume 6, Issue 2, June 2024, lqae031, doi: 10.1093/nargab/lqae031.
Garcia M, Juhos S, Larsson M et al. Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants [version 2; peer review: 2 approved] F1000Research 2020, 9:63 doi: 10.12688/f1000research.16665.2.
You can cite the sarek zenodo record for a specific version using the following doi: 10.5281/zenodo.3476425
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
You can cite the nf-core publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
CHANGELOG¶
Pipeline Inputs
This page documents all input parameters for the pipeline.
Input/output options ¶
Path to comma-separated file containing information about the samples in the experiment.
A design file with information about the samples in your experiment. Use this parameter to specify the location of the input files. It has to be a comma-separated file with a header row. See usage docs.
If no input file is specified, sarek will attempt to locate one in the {outdir} directory. If no input should be supplied, i.e. when --step is supplied or --build_only_index, then set --input false
Starting step
The pipeline starts from this step and then runs through the possible subsequent steps.
Default:
mapping
Allowed values:
mapping
,
markduplicates
,
prepare_recalibration
,
recalibrate
,
variant_calling
,
annotate
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
Main options ¶
Specify how many reads each split of a FastQ file contains. Set 0 to turn off splitting at all.
Use the the tool FastP to split FASTQ file by number of reads. This parallelizes across fastq file shards speeding up mapping. Note although the minimum value is 250 reads, if you have fewer than 250 reads a single FASTQ shard will still be created.
Default:
50000000
Estimate interval size.
Intervals are parts of the chopped up genome used to speed up preprocessing and variant calling. See --intervals for more info.
Changing this parameter, changes the number of intervals that are grouped and processed together. Bed files from target sequencing can contain thousands or small intervals. Spinning up a new process for each can be quite resource intensive. Instead it can be desired to process small intervals together on larger nodes. In order to make use of this parameter, no runtime estimate can be present in the bed file (column 5).
Default:
200000
Path to target bed file in case of whole exome or targeted sequencing or intervals file.
To speed up preprocessing and variant calling processes, the execution is parallelized across a reference chopped into smaller pieces.
Parts of preprocessing and variant calling are done by these intervals, the different resulting files are then merged. This can parallelize processes, and push down wall clock time significantly.
We are aligning to the whole genome, and then run Base Quality Score Recalibration and Variant Calling on the supplied regions.
Whole Genome Sequencing:
The (provided) intervals are chromosomes cut at their centromeres (so each chromosome arm processed separately) also additional unassigned contigs.
We are ignoring the hs37d5 contig that contains concatenated decoy sequences.
The calling intervals can be defined using a .list or a BED file.
A .list file contains one interval per line in the format chromosome:start-end (1-based coordinates).
A BED file must be a tab-separated text file with one interval per line.
There must be at least three columns: chromosome, start, and end (0-based coordinates).
Additionally, the score column of the BED file can be used to provide an estimate of how many seconds it will take to call variants on that interval.
The fourth column remains unused.
|chr1|10000|207666|NA|47.3|
This indicates that variant calling on the interval chr1:10001-207666 takes approximately 47.3 seconds.
The runtime estimate is used in two different ways.
First, when there are multiple consecutive intervals in the file that take little time to compute, they are processed as a single job, thus reducing the number of processes that needs to be spawned.
Second, the jobs with largest processing time are started first, which reduces wall-clock time.
If no runtime is given, a time of 200000 nucleotides per second is assumed. See --nucleotides_per_second on how to customize this.
Actual figures vary from 2 nucleotides/second to 30000 nucleotides/second.
If you prefer, you can specify the full path to your reference genome when you run the pipeline:
NB If none provided, will be generated automatically from the FASTA reference NB Use --no_intervals to disable automatic generation.
Targeted Sequencing:
The recommended flow for targeted sequencing data is to use the workflow as it is, but also provide a BED file containing targets for all steps using the --intervals option. In addition, the parameter --wes should be set.
It is advised to pad the variant calling regions (exons or target) to some extent before submitting to the workflow.
The procedure is similar to whole genome sequencing, except that only BED file are accepted. See above for formatting description.
Adding every exon as an interval in case of WES can generate >200K processes or jobs, much more forks, and similar number of directories in the Nextflow work directory. These are appropriately grouped together to reduce number of processes run in parallel (see above and --nucleotides_per_second for details).
Furthermore, primers and/or baits are not 100% specific, (certainly not for MHC and KIR, etc.), quite likely there going to be reads mapping to multiple locations.
If you are certain that the target is unique for your genome (all the reads will certainly map to only one location), and aligning to the whole genome is an overkill, it is actually better to change the reference itself.
Disable usage of intervals.
Intervals are parts of the chopped up genome used to speed up preprocessing and variant calling. See --intervals for more info.
If --no_intervals is set no intervals will be taken into account for speed up or data processing.
Enable when exome or panel data is provided.
With this parameter flags in various tools are set for targeted sequencing data. It is recommended to enable for whole-exome and panel data analysis.
Tools to use for contamination removal, duplicate marking, variant calling and/or for annotation.
Multiple tools separated with commas.
Variant Calling:
Germline variant calling can currently be performed with the following variant callers:
- SNPs/Indels: DeepVariant, FreeBayes, GATK HaplotypeCaller, mpileup, Sentieon Haplotyper
- Structural Variants: indexcov, Manta, TIDDIT
- Copy-number: CNVKit
Tumor-only somatic variant calling can currently be performed with the following variant callers:
- SNPs/Indels: FreeBayes, Lofreq, mpileup, Mutect2, Sentieon TNScope, Strelka
- Structural Variants: Manta, Sentieon TNScope, TIDDIT
- Copy-number: CNVKit, ControlFREEC
Somatic variant calling can currently only be performed with the following variant callers:
- SNPs/Indels: FreeBayes, Mutect2, Sentieon TNScope, Strelka2
- Structural variants: Manta, TIDDIT
- Copy-Number: ASCAT, CNVKit, Control-FREEC, Sentieon TNScope
- Microsatellite Instability: MSIsensor2, MSIsensorpro
NB Mutect2 for somatic variant calling cannot be combined with
--no_intervals
Annotation:
- snpEff, VEP, merge (both consecutively), and bcftools annotate (needs
--bcftools_annotation).
NB As Sarek will use bgzip and tabix to compress and index VCF files annotated, it expects VCF files to be sorted when starting from
--step annotate.
Disable specified tools.
Multiple tools can be specified, separated by commas.
NB
--skip_tools baserecalibrator_reportis actually just not saving the reports. NB--skip_tools markduplicates_reportdoes not skipMarkDuplicatesbut prevent the collection of duplicate metrics that slows down performance.
FASTQ Preprocessing ¶
Run FastP for read trimming
Use this to perform adapter trimming. Adapter are detected automatically by using the FastP flag --detect_adapter_for_pe. For more info see FastP.
Remove bp from the 5' end of read 1
This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5' end. Corresponds to the FastP flag --trim_front1.
Default:
0
Remove bp from the 5' end of read 2
This may be useful if the qualities were very poor, or if there is some sort of unwanted bias at the 5' end. Corresponds to the FastP flag --trim_front2.
Default:
0
Remove bp from the 3' end of read 1
This may remove some unwanted bias from the 3'. Corresponds to the FastP flag --trim_tail1.
Default:
0
Remove bp from the 3' end of read 2
This may remove some unwanted bias from the 3' end. Corresponds to the FastP flag --trim_tail2.
Default:
0
Removing poly-G tails.
DetectS polyG in read tails and trim them. Corresponds to the FastP flag --trim_poly_g.
Minimum length of reads to keep
This is the minimum length of reads to keep after trimming. Corresponds to the FastP flag --length_required (default in FastP is 15bp).
Default:
15
If set, publishes split FASTQ files. Intended for testing purposes.
Unique Molecular Identifiers ¶
Specify UMI read structure for fgbio UMI consensus read generation
One structure if UMI is present on one end (i.e. '+T 2M11S+T'), or two structures separated by a blank space if UMIs a present on both ends (i.e. '2M11S+T 2M11S+T'); please note, this does not handle duplex-UMIs.
For more info on UMI usage in the pipeline, also check docs here.
Default strategy for fgbio UMI-based consensus read generation
Default:
Adjacency
Allowed values:
Identity
,
Edit
,
Adjacency
,
Paired
Move UMIs from fastq read headers to a tag prior to deduplication.
Set to true if UMIs are already present in the header of the read, for instance from using OverrideCycles in bclconvert or umi_tools/extract.
Location of the UMI(s) to be extracted with fastp.
Use if UMIs are not present in the read header, but in a specific location within the reads/fastq header index. This will be used to extract UMIs from reads or index in the fastq header and store them in the RX tag.
Allowed values:
read1
,
read2
,
per_read
,
index1
,
index2
,
per_index
Length of the UMI(s) in the read.
If UMIs are being extracted using fastp, specify the length of the UMI here. This will be used to extract UMIs from reads and store them in the RX tag.
Number of bases to skip after the UMI(s) in the read when extracting with fastp.
If UMIs are being extracted using fastp, specify the number of bases to skip after the UMI here. This will trim some bases after the UMI.
Tag detailing where UMIs are present inside the bam/cram file (e.g. RX).
If UMIs are already present in the cram/bam file, this details the tag which will be used in GATK MarkDuplicates and Sentieon dedup. This should be set to RX if restarting from bam files where the UMIs have been extracted by the umi_in_read_header or umi_length options. Note this is not compatible with MarkDuplicates Spark.
Path to comma-separated file containing a list of reference genomes to filter reads against with BBSplit. You have to also explicitly set --tools bbsplit if you want to use BBSplit.
The file should contain 2 columns: short name and full path to reference genome(s) e.g.
mm10,/path/to/mm10.fa
ecoli,/path/to/ecoli.fa
Path to directory or tar.gz archive for pre-built BBSplit index.
The BBSplit index will have to be built at least once with this pipeline (see --save_reference to save index). It can then be provided via --bbsplit_index for future runs.
If this option is specified, FastQ files split by reference will be saved in the results directory.
Preprocessing ¶
Specify aligner to be used to map reads to reference genome.
Sarek will build missing indices automatically if not provided. Set --bwa false if indices should be (re-)built.
If DragMap is selected as aligner, it is recommended to skip baserecalibration with --skip_tools baserecalibrator. For more info see here.
Default:
bwa-mem
Allowed values:
bwa-mem
,
bwa-mem2
,
dragmap
,
sentieon-bwamem
,
parabricks
Save mapped files.
If the parameter --split-fastq is used, the sharded bam files are merged and converted to CRAM before saving them.
Saves output from mapping (if --save_mapped), Markduplicates & Baserecalibration as BAM file instead of CRAM
Enable usage of GATK Spark implementation for duplicate marking and/or base quality score recalibration
Multiple separated with commas.
The GATK4 Base Quality Score recalibration tools
BaserecalibratorandApplyBQSRare currently available as Beta release. Please be aware that--use_gatk_sparkis not compatible with--save_output_as_bam --save_mapped. Use with caution!
Generate consensus reads with Sentieon dedup rather than choosing one best read.
If set, the Sentieon dedup output will combine duplicate reads into single consensus read. This is only relevant if --tools contains sentieon_dedup.
Variant Calling ¶
If true, skips germline variant calling for matched normal to tumor sample. Normal samples without matched tumor will still be processed through germline variant calling tools.
This can speed up computation for somatic variant calling with matched normal samples. If false, all normal samples are processed as well through the germline variantcalling tools. If true, only somatic variant calling is done.
Overwrite Ascat min base quality required for a read to be counted.
For more details see here
Default:
20
Overwrite Ascat minimum depth required in the normal for a SNP to be considered.
For more details, see here.
Default:
10
Overwrite Ascat min mapping quality required for a read to be counted.
For more details, see here.
Default:
35
Overwrite ASCAT ploidy.
ASCAT: optional argument to override ASCAT optimization and supply psi parameter (expert parameter, do not adapt unless you know what you are doing). See here
Overwrite ASCAT purity.
Overwrites ASCAT's rho_manual parameter. Expert use only, see here for details.
Requires that --ascat_ploidy is set.
Specify a custom chromosome length file.
Control-FREEC requires a file containing all chromosome lengths. By default the fasta.fai is used. If the fasta.fai file contains chromosomes not present in the intervals, it fails (see: https://github.com/BoevaLab/FREEC/issues/106).
In this case, a custom chromosome length can be specified. It must be of the same format as the fai, but only contain the relevant chromosomes.
Overwrite Control-FREEC coefficientOfVariation
Details, see ControlFREEC manual.
Default:
0.05
Overwrite Control-FREEC contaminationAdjustement
Details, see ControlFREEC manual.
Design known contamination value for Control-FREEC
Details, see ControlFREEC manual.
Default:
0
Minimal sequencing quality for a position to be considered in BAF analysis.
Details, see ControlFREEC manual.
Default:
0
Minimal read coverage for a position to be considered in BAF analysis.
Details, see ControlFREEC manual.
Default:
0
Genome ploidy used by ControlFREEC
In case of doubt, you can set different values and Control-FREEC will select the one that explains most observed CNAs Example: ploidy=2 , ploidy=2,3,4. For more details, see the manual.
Default:
2
Overwrite Control-FREEC window size.
Details, see ControlFREEC manual.
Copy-number reference for CNVkit
https://cnvkit.readthedocs.io/en/stable/pipeline.html?highlight=reference.cnn#batch
Filtering expression for vcflib/vcffilter
Freebayes offers a QUAL score for each called variant. The QUAL estimate provides the phred-scaled probability that the locus is not polymorphic provided the data and the model. This is reasonably-well calibrated, so you can specify that you want things where we expect error rates of no more than 1/100 (QUAL > 20) or 1/1000 (QUAL > 30). Where the default setting for sarek is QUAL > 30.
Default:
30
Turn on the joint germline variant calling for GATK haplotypecaller
Uses all normal germline samples (as designated by status in the input csv) in the joint germline variant calling process.
Runs Mutect2 in joint (multi-sample) mode for better concordance among variant calls of tumor samples from the same patient. Mutect2 outputs will be stored in a subfolder named with patient ID under variant_calling/mutect2/ folder. Only a single normal sample per patient is allowed. Tumor-only mode is also supported.
Do not analyze soft clipped bases in the reads for GATK Mutect2.
use the --dont-use-soft-clipped-bases params with GATK Mutect2.
Panel-of-normals VCF (bgzipped) for GATK Mutect2
Without PON, there will be no calls with PASS in the INFO field, only an unfiltered VCF is written. It is highly recommended to make your own PON, as it depends on sequencer and library preparation.
The pipeline is shipped with a panel-of-normals for --genome GATK.GRCh38 provided by GATK.
NB PON file should be bgzipped.
Index of PON panel-of-normals VCF.
If none provided, will be generated automatically from the PON bgzipped VCF file.
Option for selecting output and emit-mode of Sentieon's Haplotyper.
The option --sentieon_haplotyper_emit_mode can be set to the same string values as the Haplotyper's --emit_mode. To output both a vcf and a gvcf, specify both a vcf-option (currently, all, confident and variant) and gvcf. For example, to obtain a vcf and gvcf one could set --sentieon_haplotyper_emit_mode to variant, gvcf.
Default:
variant
Option for selecting output and emit-mode of Sentieon's Dnascope.
The option --sentieon_dnascope_emit_mode can be set to the same string values as the Dnascope's --emit_mode. To output both a vcf and a gvcf, specify both a vcf-option (currently, all, confident and variant) and gvcf. For example, to obtain a vcf and gvcf one could set --sentieon_dnascope_emit_mode to variant, gvcf.
Default:
variant
Option for selecting the PCR indel model used by Sentieon Dnascope.
PCR indel model used to weed out false positive indels more or less aggressively. The possible MODELs are: NONE (used for PCR free samples), and HOSTILE, AGGRESSIVE and CONSERVATIVE, in order of decreasing aggressiveness. The default value is CONSERVATIVE.
Default:
CONSERVATIVE
Option for selecting the PCR indel model used by GATK HaplotypeCaller.
Default:
CONSERVATIVE
Post variant calling ¶
Enable filtering of VCFs with bcftools view
Filtering of all vcf-files from each applied variant-caller using bfctools filter and applying filtering criteria specified in --bcftools_filter_criteria.
Filter criteria. Uses bcftools view filter options. To customize, follow instructions here: https://samtools.github.io/bcftools/bcftools.html#view
Default:
-f PASS,.
Option for normalization of vcf-files.
Normalization of all vcf-files from each applied variant-caller using bfctools norm.
Enable consensus calling of multiple VCF files from one sample
Intersects multiple VCF files from one sample using bcftools isec. As consensus criterium -n+${params.snv_consensus_calling} is used, meaning a variant is found in this many or more files. For details, visit: https://samtools.github.io/bcftools/bcftools.html#isec
Minimum number of variant callers calling a variant for consensus results
Determines the minimum number of variant callers a variant must be called in to be included in the consensus results. As consensus criterium -n+${params.consensus_min_count} is used, meaning a variant is found in this many or more files. For details, visit: https://samtools.github.io/bcftools/bcftools.html#isec
Default:
2
Option for concatenating germline vcf-files.
Enable concatenation of germline vcf-files from each applied variant-caller into one vcf-file using bfctools concat.
Number of chunks to split the vcf-files for varlociraptor. Minimum 1, indicates no splitting
Default:
15
Yte compatible scenario file for tumor only samples. Defaults to assets/varlociraptor_tumor_only.yte.yaml
Yte compatible scenario file for somatic samples. Defaults to assets/varlociraptor_somatic.yte.yaml
Yte compatible scenario file for germline samples. Defaults to assets/varlociraptor_germline.yte.yaml
Annotation ¶
Allow usage of fasta file for annotation with VEP
By pointing VEP to a FASTA file, it is possible to retrieve reference sequence locally. This enables VEP to retrieve HGVS notations (--hgvs), check the reference sequence given in input data, and construct transcript models from a GFF or GTF file without accessing a database.
For details, see here.
Path to dbNSFP processed file.
Will not work without a provided dbnsfp_tbi. To be used with --vep_dbnsfp.
dbNSFP files and more information are available at https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html#dbnsfp and https://sites.google.com/site/jpopgen/dbNSFP/
Path to dbNSFP tabix indexed file.
To be used with --vep_dbnsfp.
Consequence to annotate with
To be used with --vep_dbnsfp.
This params is used to filter/limit outputs to a specific effect of the variant.
The set of consequence terms is defined by the Sequence Ontology and an overview of those used in VEP can be found here: https://www.ensembl.org/info/genome/variation/prediction/predicted_data.html
If one wants to filter using several consequences, then separate those by using '&' (i.e. 'consequence=3_prime_UTR_variant&intron_variant'.
Fields to annotate with
To be used with --vep_dbnsfp.
This params can be used to retrieve individual values from the dbNSFP file. The values correspond to the name of the columns in the dbNSFP file and are separated by comma.
The column names might differ between the different dbNSFP versions. Please check the Readme.txt file, which is provided with the dbNSFP file, to obtain the correct column names. The Readme file contains also a short description of the provided values and the version of the tools used to generate them.
Default value are explained below:
rs_dbSNP - rs number from dbSNP HGVSc_VEP - HGVS coding variant presentation from VEP. Multiple entries separated by ';', corresponds to Ensembl_transcriptid HGVSp_VEP - HGVS protein variant presentation from VEP. Multiple entries separated by ';', corresponds to Ensembl_proteinid 1000Gp3_EAS_AF - Alternative allele frequency in the 1000Gp3 East Asian descendent samples 1000Gp3_AMR_AF - Alternative allele counts in the 1000Gp3 American descendent samples LRT_score - Original LRT two-sided p-value (LRTori), ranges from 0 to 1 GERP++_RS - Conservation score. The larger the score, the more conserved the site, ranges from -12.3 to 6.17 gnomAD_exomes_AF - Alternative allele frequency in the whole gnomAD exome samples.
Default:
rs_dbSNP,HGVSc_VEP,HGVSp_VEP,1000Gp3_EAS_AF,1000Gp3_AMR_AF,LRT_score,GERP++_RS,gnomAD_exomes_AF
Path to spliceai raw scores snv file.
To be used with --vep_spliceai.
Path to spliceai raw scores snv tabix indexed file.
To be used with --vep_spliceai.
Path to spliceai raw scores indel file.
To be used with --vep_spliceai.
Path to spliceai raw scores indel tabix indexed file.
To be used with --vep_spliceai.
Add an extra custom argument to VEP.
Using this params you can add custom args to VEP.
Default:
--everything --filter_common --per_gene --total_length --offline --format vcf
Should reflect the VEP version used in the container.
Used by the loftee plugin that need the full path.
Default:
111.0-0
The output directory where the cache will be saved. You have to use absolute paths to storage on Cloud infrastructure.
VEP output-file format.
Sets the format of the output-file from VEP. Available formats: json, tab and vcf.
Default:
vcf
Allowed values:
json
,
tab
,
vcf
A vcf file containing custom annotations to be used with bcftools annotate. Needs to be bgzipped.
Optional text file with list of columns to use from bcftools_annotations, one name per row
General reference genome options ¶
The base path to the igenomes reference files
Default:
s3://ngi-igenomes/igenomes/
Do not load the iGenomes reference config.
Do not load igenomes.config when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config. NB You can then run Sarek by specifying at least a FASTA genome file
Save built references.
Set this parameter, if you wish to save all computed reference files. This is useful to avoid re-computation on future runs.
Only built references.
Set this parameter, if you wish to compute and save all computed reference files. No alignment or any other downstream steps will be performed.
Download annotation cache.
Set this parameter, if you wish to download annotation cache. Using this parameter will download cache even if --snpeff_cache and --vep_cache are provided.
Reference genome options ¶
Name of iGenomes reference.
If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. --genome GRCh38.
See the nf-core website docs for more details.
Default:
GATK.GRCh38
ASCAT genome.
Must be set to run ASCAT, either hg19 or hg38.
If you use AWS iGenomes, this has already been set for you appropriately.
Allowed values:
hg19
,
hg38
Path to ASCAT allele zip file.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to ASCAT loci zip file.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to ASCAT GC content correction file.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to ASCAT RT (replictiming) correction file.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to BWA mem indices.
If you wish to recompute indices available on igenomes, set --bwa false.
NB If none provided, will be generated automatically from the FASTA reference. Combine with
--save_referenceto save for future runs.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to bwa-mem2 mem indices.
If you use AWS iGenomes, this has already been set for you appropriately.
If you wish to recompute indices available on igenomes, set --bwamem2 false.
NB If none provided, will be generated automatically from the FASTA reference, if
--aligner bwa-mem2is specified. Combine with--save_referenceto save for future runs.
Path to chromosomes folder used with ControLFREEC.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to dbsnp file.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to dbsnp index.
NB If none provided, will be generated automatically from the dbsnp file. Combine with
--save_referenceto save for future runs.
If you use AWS iGenomes, this has already been set for you appropriately.
Label string for VariantRecalibration (haplotypecaller joint variant calling).
If you use AWS iGenomes, this has already been set for you appropriately.
Path to FASTA dictionary file.
NB If none provided, will be generated automatically from the FASTA reference. Combine with
--save_referenceto save for future runs.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to dragmap indices.
If you wish to recompute indices available on igenomes, set --dragmap false.
NB If none provided, will be generated automatically from the FASTA reference, if
--aligner dragmapis specified. Combine with--save_referenceto save for future runs.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to FASTA genome file.
This parameter is mandatory if --genome is not specified.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to FASTA reference index.
NB If none provided, will be generated automatically from the FASTA reference. Combine with
--save_referenceto save for future runs.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to GATK Mutect2 Germline Resource File.
The germline resource VCF file (bgzipped and tabixed) needed by GATK4 Mutect2 is a collection of calls that are likely present in the sample, with allele frequencies. The AF info field must be present. You can find a smaller, stripped gnomAD VCF file (most of the annotation is removed and only calls signed by PASS are stored) in the AWS iGenomes Annotation/GermlineResource folder.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to GATK Mutect2 Germline Resource Index.
NB If none provided, will be generated automatically from the Germline Resource file, if provided. Combine with
--save_referenceto save for future runs.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to known indels file.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to known indels file index.
NB If none provided, will be generated automatically from the known index file, if provided. Combine with
--save_referenceto save for future runs.
If you use AWS iGenomes, this has already been set for you appropriately.
Label string for VariantRecalibration (haplotypecaller joint variant calling). If you use AWS iGenomes, this has already been set for you appropriately.
Path to known snps file.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to known snps file snps.
NB If none provided, will be generated automatically from the known index file, if provided. Combine with
--save_referenceto save for future runs.
If you use AWS iGenomes, this has already been set for you appropriately.
Label string for VariantRecalibration (haplotypecaller joint variant calling).If you use AWS iGenomes, this has already been set for you appropriately.
Path to Control-FREEC mappability file.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to models folder used with MSIsensor2.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to scan file used with MSIsensor2.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to scan file used with MSIsensorPro.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to SNP bed file for sample checking with NGSCheckMate
If you use AWS iGenomes, this has already been set for you appropriately.
Machine learning model for Sentieon Dnascope.
It is recommended to use DNAscope with a machine learning model to perform variant calling with higher accuracy by improving the candidate detection and filtering. Sentieon can provide you with a model trained using a subset of the data from the GiAB truth-set found in https://github.com/genome-in-a-bottle. In addition, Sentieon can assist you in the creation of models using your own data, which will calibrate the specifics of your sequencing and bio-informatics processing.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to snpEff cache.
Path to snpEff cache which should contain the relevant genome and build directory in the path ${snpeff_species}.${snpeff_version}
If you use AWS iGenomes, this has already been set for you appropriately.
Default:
s3://annotation-cache/snpeff_cache/
snpEff DB version.
This is used to specify the database to be use to annotate with.
Alternatively databases' names can be listed with the snpEff databases.
If you use AWS iGenomes, this has already been set for you appropriately.
Path to VEP cache.
Path to VEP cache which should contain the relevant species, genome and build directories at the path ${vep_species}/${vep_genome}_${vep_cache_version}
If you use AWS iGenomes, this has already been set for you appropriately.
Default:
s3://annotation-cache/vep_cache/
VEP cache version.
Alternative cache version can be used to specify the correct Ensembl Genomes version number as these differ from the concurrent Ensembl/VEP version numbers.
If you use AWS iGenomes, this has already been set for you appropriately.
VEP genome.
This is used to specify the genome when looking for local cache, or cloud based cache.
If you use AWS iGenomes, this has already been set for you appropriately.
VEP species.
Alternatively species listed in Ensembl Genomes caches can be used.
If you use AWS iGenomes, this has already been set for you appropriately.
Institutional config options ¶
Base directory for Institutional configs.
If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.
Default:
https://raw.githubusercontent.com/nf-core/configs/master
Base path / URL for data used in the test profiles
Warning: The -profile test samplesheet file itself contains remote paths. Setting this parameter does not alter the contents of that file.
Default:
https://raw.githubusercontent.com/nf-core/test-datasets/sarek3
Sequencing platform information to be added to read group (PL field).
Default: ILLUMINA. Will be used to create a proper header for further GATK4 downstream analysis.
Default:
ILLUMINA
Generic options ¶
Method used to save pipeline results to output directory.
The Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.
Default:
copy
Allowed values:
symlink
,
rellink
,
link
,
copy
,
copyNoFollow
,
move
Email address for completion summary.
Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config) then you don't need to specify this on the command line for every run.
Email address for completion summary, only when pipeline fails.
An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.
File size limit when attaching MultiQC reports to summary emails.
Default:
25.MB
Incoming hook URL for messaging service
Incoming hook URL for messaging service. Currently, MS Teams and Slack are supported.
MultiQC report title. Printed as page header, used for filename if not otherwise specified.
Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
Custom MultiQC yaml file containing HTML including a methods description.
Boolean whether to validate parameters against the schema at runtime
Default:
True
Base URL or local path to location of pipeline test dataset files
Default:
https://raw.githubusercontent.com/nf-core/test-datasets/
Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.
Display hidden parameters in the help message (only works when --help or --help_full are provided).
Workflows
This page documents all workflows in the pipeline.
subworkflows/local/annotation_cache_initialisation/main.nf:11Inputs (take)
| Name | Description |
|---|---|
snpeff_enabled
|
- |
snpeff_cache
|
- |
snpeff_db
|
- |
vep_enabled
|
- |
vep_cache
|
- |
vep_species
|
- |
vep_cache_version
|
- |
vep_genome
|
- |
vep_custom_args
|
- |
help_message
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
subworkflows/local/bam_applybqsr/main.nf:11Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
dict
|
- |
fasta
|
- |
fasta_fai
|
- |
intervals
|
- |
Outputs (emit)
| Name | Description |
|---|---|
bam
|
- |
cram
|
- |
?
|
- |
subworkflows/local/bam_applybqsr_spark/main.nf:11Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
dict
|
- |
fasta
|
- |
fasta_fai
|
- |
intervals
|
- |
Outputs (emit)
| Name | Description |
|---|---|
bam
|
- |
cram
|
- |
?
|
- |
subworkflows/local/bam_baserecalibrator/main.nf:10Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
dict
|
- |
fasta
|
- |
fasta_fai
|
- |
intervals
|
- |
known_sites
|
- |
known_sites_tbi
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
subworkflows/local/bam_baserecalibrator_spark/main.nf:10Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
dict
|
- |
fasta
|
- |
fasta_fai
|
- |
intervals
|
- |
known_sites
|
- |
known_sites_tbi
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
subworkflows/local/bam_convert_samtools/main.nf:14Inputs (take)
| Name | Description |
|---|---|
input
|
- |
fasta
|
- |
fasta_fai
|
- |
interleaved
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
subworkflows/local/bam_joint_calling_germline_gatk/main.nf:17Inputs (take)
| Name | Description |
|---|---|
input
|
- |
fasta
|
- |
fai
|
- |
dict
|
- |
dbsnp
|
- |
dbsnp_tbi
|
- |
dbsnp_vqsr
|
- |
resource_indels_vcf
|
- |
resource_indels_tbi
|
- |
known_indels_vqsr
|
- |
resource_snps_vcf
|
- |
resource_snps_tbi
|
- |
known_snps_vqsr
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_joint_calling_germline_sentieon/main.nf:15Inputs (take)
| Name | Description |
|---|---|
input
|
- |
fasta
|
- |
fai
|
- |
dict
|
- |
dbsnp
|
- |
dbsnp_tbi
|
- |
dbsnp_vqsr
|
- |
resource_indels_vcf
|
- |
resource_indels_tbi
|
- |
known_indels_vqsr
|
- |
resource_snps_vcf
|
- |
resource_snps_tbi
|
- |
known_snps_vqsr
|
- |
variant_caller
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_markduplicates/main.nf:10Inputs (take)
| Name | Description |
|---|---|
bam
|
- |
fasta
|
- |
fasta_fai
|
- |
intervals_bed_combined
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_markduplicates_spark/main.nf:12Inputs (take)
| Name | Description |
|---|---|
bam
|
- |
dict
|
- |
fasta
|
- |
fasta_fai
|
- |
intervals_bed_combined
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_merge_index_samtools/main.nf:10Inputs (take)
| Name | Description |
|---|---|
bam
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
subworkflows/nf-core/bam_ngscheckmate/main.nf:4Take a set of bam files and run NGSCheckMate to determine whether samples match with each other, using a set of SNPs.
Components
bcftools/mpileup
ngscheckmate/ncm
Inputs (take)
| Name | Description |
|---|---|
meta1
|
Groovy Map containing sample information e.g. [ id:'test' ] |
bam
|
BAM files for each sample |
meta2
|
Groovy Map containing bed file information e.g. [ id:'sarscov2' ] |
snp_bed
|
BED file containing the SNPs to analyse. NGSCheckMate provides some default ones for hg19/hg38. |
meta3
|
Groovy Map containing reference genome meta information e.g. [ id:'sarscov2' ] |
fasta
|
fasta file for the genome |
Outputs (emit)
| Name | Description |
|---|---|
pdf
|
A pdf containing a dendrogram showing how the samples match up |
corr_matrix
|
A text file containing the correlation matrix between each sample |
matched
|
A txt file containing only the samples that match with each other |
all
|
A txt file containing all the sample comparisons, whether they match or not |
vcf
|
vcf files for each sample giving the SNP calls |
versions
|
File containing software versions |
subworkflows/local/bam_sentieon_dedup/main.nf:7Inputs (take)
| Name | Description |
|---|---|
bam
|
- |
bai
|
- |
fasta
|
- |
fasta_fai
|
- |
intervals_bed_combined
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_cnvkit/main.nf:12Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
fasta
|
- |
fasta_fai
|
- |
targets
|
- |
reference
|
- |
Outputs (emit)
| Name | Description |
|---|---|
cnv_calls_raw
|
- |
cnv_calls_export
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_deepvariant/main.nf:12Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
dict
|
- |
fasta
|
- |
fasta_fai
|
- |
intervals
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_freebayes/main.nf:14Inputs (take)
| Name | Description |
|---|---|
ch_cram
|
- |
ch_dict
|
- |
ch_fasta
|
- |
ch_fasta_fai
|
- |
ch_intervals
|
- |
Outputs (emit)
| Name | Description |
|---|---|
vcf_unfiltered
|
- |
vcf
|
- |
tbi
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_germline_all/main.nf:22Inputs (take)
| Name | Description |
|---|---|
tools
|
- |
skip_tools
|
- |
bam
|
- |
cram
|
- |
bwa
|
- |
cnvkit_reference
|
- |
dbsnp
|
- |
dbsnp_tbi
|
- |
dbsnp_vqsr
|
- |
dict
|
- |
fasta
|
- |
fasta_fai
|
- |
intervals
|
- |
intervals_bed_combined
|
- |
intervals_bed_gz_tbi_combined
|
- |
intervals_bed_combined_haplotypec
|
- |
intervals_bed_gz_tbi
|
- |
known_indels_vqsr
|
- |
known_sites_indels
|
- |
known_sites_indels_tbi
|
- |
known_sites_snps
|
- |
known_sites_snps_tbi
|
- |
known_snps_vqsr
|
- |
joint_germline
|
- |
skip_haplotypecaller_filter
|
- |
sentieon_haplotyper_emit_mode
|
- |
sentieon_dnascope_emit_mode
|
- |
sentieon_dnascope_pcr_indel_model
|
- |
sentieon_dnascope_model
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_germline_manta/main.nf:10Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
fasta
|
- |
fasta_fai
|
- |
intervals
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_haplotypecaller/main.nf:11Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
fasta
|
- |
fasta_fai
|
- |
dict
|
- |
dbsnp
|
- |
dbsnp_tbi
|
- |
intervals
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_indexcov/main.nf:11Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
fasta
|
- |
fasta_fai
|
- |
Outputs (emit)
| Name | Description |
|---|---|
out_indexcov
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_mpileup/main.nf:12Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
dict
|
- |
fasta
|
- |
intervals
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_sentieon_dnascope/main.nf:11Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
fasta
|
- |
fasta_fai
|
- |
dict
|
- |
dbsnp
|
- |
dbsnp_tbi
|
- |
dbsnp_vqsr
|
- |
intervals
|
- |
joint_germline
|
- |
sentieon_dnascope_emit_mode
|
- |
sentieon_dnascope_pcr_indel_model
|
- |
sentieon_dnascope_model
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_sentieon_haplotyper/main.nf:11Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
fasta
|
- |
fasta_fai
|
- |
dict
|
- |
dbsnp
|
- |
dbsnp_tbi
|
- |
dbsnp_vqsr
|
- |
intervals
|
- |
joint_germline
|
- |
sentieon_haplotyper_emit_mode
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_single_strelka/main.nf:11Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
dict
|
- |
fasta
|
- |
fasta_fai
|
- |
intervals
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_single_tiddit/main.nf:10Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
fasta
|
- |
bwa
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_somatic_all/main.nf:21Inputs (take)
| Name | Description |
|---|---|
tools
|
- |
bam
|
- |
cram
|
- |
bwa
|
- |
cf_chrom_len
|
- |
chr_files
|
- |
dbsnp
|
- |
dbsnp_tbi
|
- |
dict
|
- |
fasta
|
- |
fasta_fai
|
- |
germline_resource
|
- |
germline_resource_tbi
|
- |
intervals
|
- |
intervals_bed_gz_tbi
|
- |
intervals_bed_combined
|
- |
intervals_bed_gz_tbi_combined
|
- |
mappability
|
- |
msisensorpro_scan
|
- |
panel_of_normals
|
- |
panel_of_normals_tbi
|
- |
allele_files
|
- |
loci_files
|
- |
gc_file
|
- |
rt_file
|
- |
joint_mutect2
|
- |
wes
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_somatic_ascat/main.nf:9Inputs (take)
| Name | Description |
|---|---|
cram_pair
|
- |
allele_files
|
- |
loci_files
|
- |
intervals_bed
|
- |
fasta
|
- |
gc_file
|
- |
rt_file
|
- |
Outputs (emit)
| Name | Description |
|---|---|
versions
|
- |
subworkflows/local/bam_variant_calling_somatic_controlfreec/main.nf:13Inputs (take)
| Name | Description |
|---|---|
controlfreec_input
|
- |
fasta
|
- |
fasta_fai
|
- |
dbsnp
|
- |
dbsnp_tbi
|
- |
chr_files
|
- |
mappability
|
- |
intervals_bed
|
- |
Outputs (emit)
| Name | Description |
|---|---|
versions
|
- |
subworkflows/local/bam_variant_calling_somatic_manta/main.nf:9Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
fasta
|
- |
fasta_fai
|
- |
intervals
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_somatic_muse/main.nf:11Inputs (take)
| Name | Description |
|---|---|
bam_normal
|
- |
bam_tumor
|
- |
fasta
|
- |
dbsnp
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_somatic_mutect2/main.nf:17Perform variant calling on a paired tumor normal set of samples using mutect2 tumor normal mode. f1r2 output of mutect2 is run through learnreadorientationmodel to get the artifact priors. Run the input bam files through getpileupsummarries and then calculatecontamination to get the contamination and segmentation tables. Filter the mutect2 output vcf using filtermutectcalls, artifact priors and the contamination & segmentation tables for additional filtering.
Components
gatk4/mutect2
gatk4/learnreadorientationmodel
gatk4/getpileupsummaries
gatk4/calculatecontamination
gatk4/filtermutectcalls
Inputs (take)
| Name | Description |
|---|---|
meta
|
Groovy Map containing sample information e.g. [ id:'test' ] |
input
|
list containing the tumor and normal BAM files, in that order, also able to take CRAM as an input |
input_index
|
list containing the tumor and normal BAM file indexes, in that order, also able to take CRAM index as an input |
which_norm
|
optional list of sample headers contained in the normal sample input file. |
fasta
|
The reference fasta file |
fai
|
Index of reference fasta file |
dict
|
GATK sequence dictionary |
germline_resource
|
Population vcf of germline sequencing, containing allele fractions. |
germline_resource_tbi
|
Index file for the germline resource. |
panel_of_normals
|
vcf file to be used as a panel of normals. |
panel_of_normals_tbi
|
Index for the panel of normals. |
interval_file
|
File containing intervals. |
Outputs (emit)
| Name | Description |
|---|---|
versions
|
File containing software versions |
mutect2_vcf
|
Compressed vcf file to be used for variant_calling. |
mutect2_tbi
|
Indexes of the mutect2_vcf file |
mutect2_stats
|
Stats files for the mutect2 vcf |
mutect2_f1r2
|
file containing information to be passed to LearnReadOrientationModel. |
artifact_priors
|
file containing artifact-priors to be used by filtermutectcalls. |
pileup_table_tumor
|
File containing the tumor pileup summary table, kept separate as calculatecontamination needs them individually specified. |
pileup_table_normal
|
File containing the normal pileup summary table, kept separate as calculatecontamination needs them individually specified. |
contamination_table
|
File containing the contamination table. |
segmentation_table
|
Output table containing segmentation of tumor minor allele fractions. |
filtered_vcf
|
file containing filtered mutect2 calls. |
filtered_tbi
|
tbi file that pairs with filtered vcf. |
filtered_stats
|
file containing statistics of the filtermutectcalls run. |
subworkflows/local/bam_variant_calling_somatic_strelka/main.nf:11Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
dict
|
- |
fasta
|
- |
fasta_fai
|
- |
intervals
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_somatic_tiddit/main.nf:11Inputs (take)
| Name | Description |
|---|---|
cram_normal
|
- |
cram_tumor
|
- |
fasta
|
- |
bwa
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_somatic_tnscope/main.nf:9Inputs (take)
| Name | Description |
|---|---|
input
|
- |
fasta
|
- |
fai
|
- |
dict
|
- |
germline_resource
|
- |
germline_resource_tbi
|
- |
panel_of_normals
|
- |
panel_of_normals_tbi
|
- |
intervals
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_tumor_only_all/main.nf:17Inputs (take)
| Name | Description |
|---|---|
tools
|
- |
bam
|
- |
cram
|
- |
bwa
|
- |
cf_chrom_len
|
- |
chr_files
|
- |
cnvkit_reference
|
- |
dbsnp
|
- |
dbsnp_tbi
|
- |
dict
|
- |
fasta
|
- |
fasta_fai
|
- |
germline_resource
|
- |
germline_resource_tbi
|
- |
intervals
|
- |
intervals_bed_gz_tbi
|
- |
intervals_bed_combined
|
- |
intervals_bed_gz_tbi_combined
|
- |
mappability
|
- |
msisensor2_models
|
- |
panel_of_normals
|
- |
panel_of_normals_tbi
|
- |
joint_mutect2
|
- |
wes
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_tumor_only_controlfreec/main.nf:13Inputs (take)
| Name | Description |
|---|---|
controlfreec_input
|
- |
fasta
|
- |
fasta_fai
|
- |
dbsnp
|
- |
dbsnp_tbi
|
- |
chr_files
|
- |
mappability
|
- |
intervals_bed
|
- |
Outputs (emit)
| Name | Description |
|---|---|
versions
|
- |
subworkflows/local/bam_variant_calling_tumor_only_lofreq/main.nf:4Inputs (take)
| Name | Description |
|---|---|
input
|
- |
fasta
|
- |
fai
|
- |
intervals
|
- |
dict
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_tumor_only_manta/main.nf:10Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
fasta
|
- |
fasta_fai
|
- |
intervals
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/bam_variant_calling_tumor_only_mutect2/main.nf:16Perform variant calling on a single tumor sample using mutect2 tumor only mode. Run the input bam file through getpileupsummarries and then calculatecontaminationto get the contamination and segmentation tables. Filter the mutect2 output vcf using filtermutectcalls and the contamination & segmentation tables for additional filtering.
Components
gatk4/mutect2
gatk4/getpileupsummaries
gatk4/calculatecontamination
gatk4/filtermutectcalls
Inputs (take)
| Name | Description |
|---|---|
meta
|
Groovy Map containing sample information e.g. [ id:'test' ] |
input
|
list containing one BAM file, also able to take CRAM as an input |
input_index
|
list containing one BAM file indexe, also able to take CRAM index as an input |
fasta
|
The reference fasta file |
fai
|
Index of reference fasta file |
dict
|
GATK sequence dictionary |
germline_resource
|
Population vcf of germline sequencing, containing allele fractions. |
germline_resource_tbi
|
Index file for the germline resource. |
panel_of_normals
|
vcf file to be used as a panel of normals. |
panel_of_normals_tbi
|
Index for the panel of normals. |
interval_file
|
File containing intervals. |
Outputs (emit)
| Name | Description |
|---|---|
versions
|
File containing software versions |
mutect2_vcf
|
Compressed vcf file to be used for variant_calling. |
mutect2_tbi
|
Indexes of the mutect2_vcf file |
mutect2_stats
|
Stats files for the mutect2 vcf |
pileup_table
|
File containing the pileup summary table. |
contamination_table
|
File containing the contamination table. |
segmentation_table
|
Output table containing segmentation of tumor minor allele fractions. |
filtered_vcf
|
file containing filtered mutect2 calls. |
filtered_tbi
|
tbi file that pairs with filtered vcf. |
filtered_stats
|
file containing statistics of the filtermutectcalls run. |
subworkflows/local/bam_variant_calling_tumor_only_tnscope/main.nf:9Perform variant calling on a single tumor sample using mutect2 tumor only mode. Run the input bam file through getpileupsummarries and then calculatecontaminationto get the contamination and segmentation tables. Filter the mutect2 output vcf using filtermutectcalls and the contamination & segmentation tables for additional filtering.
Components
gatk4/mutect2
gatk4/getpileupsummaries
gatk4/calculatecontamination
gatk4/filtermutectcalls
Inputs (take)
| Name | Description |
|---|---|
meta
|
Groovy Map containing sample information e.g. [ id:'test' ] |
input
|
list containing one BAM file, also able to take CRAM as an input |
input_index
|
list containing one BAM file indexe, also able to take CRAM index as an input |
fasta
|
The reference fasta file |
fai
|
Index of reference fasta file |
dict
|
GATK sequence dictionary |
germline_resource
|
Population vcf of germline sequencing, containing allele fractions. |
germline_resource_tbi
|
Index file for the germline resource. |
panel_of_normals
|
vcf file to be used as a panel of normals. |
panel_of_normals_tbi
|
Index for the panel of normals. |
interval_file
|
File containing intervals. |
Outputs (emit)
| Name | Description |
|---|---|
versions
|
File containing software versions |
mutect2_vcf
|
Compressed vcf file to be used for variant_calling. |
mutect2_tbi
|
Indexes of the mutect2_vcf file |
mutect2_stats
|
Stats files for the mutect2 vcf |
pileup_table
|
File containing the pileup summary table. |
contamination_table
|
File containing the contamination table. |
segmentation_table
|
Output table containing segmentation of tumor minor allele fractions. |
filtered_vcf
|
file containing filtered mutect2 calls. |
filtered_tbi
|
tbi file that pairs with filtered vcf. |
filtered_stats
|
file containing statistics of the filtermutectcalls run. |
subworkflows/local/channel_align_create_csv/main.nf:5Inputs (take)
| Name | Description |
|---|---|
bam_indexed
|
- |
outdir
|
- |
save_output_as_bam
|
- |
Outputs (emit)
| Name | Description |
|---|---|
|
- |
subworkflows/local/channel_applybqsr_create_csv/main.nf:5Inputs (take)
| Name | Description |
|---|---|
cram_recalibrated_index
|
- |
outdir
|
- |
save_output_as_bam
|
- |
Outputs (emit)
| Name | Description |
|---|---|
|
- |
subworkflows/local/channel_baserecalibrator_create_csv/main.nf:5Inputs (take)
| Name | Description |
|---|---|
cram_table_bqsr
|
- |
tools
|
- |
skip_tools
|
- |
outdir
|
- |
save_output_as_bam
|
- |
Outputs (emit)
| Name | Description |
|---|---|
|
- |
subworkflows/local/channel_markduplicates_create_csv/main.nf:5Inputs (take)
| Name | Description |
|---|---|
cram_markduplicates
|
- |
csv_subfolder
|
- |
outdir
|
- |
save_output_as_bam
|
- |
Outputs (emit)
| Name | Description |
|---|---|
|
- |
subworkflows/local/channel_variant_calling_create_csv/main.nf:5Inputs (take)
| Name | Description |
|---|---|
vcf_to_annotate
|
- |
outdir
|
- |
Outputs (emit)
| Name | Description |
|---|---|
|
- |
subworkflows/local/vcf_concatenate_germline/main.nf:12Inputs (take)
| Name | Description |
|---|---|
vcfs
|
- |
Outputs (emit)
| Name | Description |
|---|---|
vcfs
|
- |
tbis
|
- |
?
|
- |
subworkflows/local/vcf_consensus/main.nf:8Inputs (take)
| Name | Description |
|---|---|
vcfs
|
- |
Outputs (emit)
| Name | Description |
|---|---|
versions
|
- |
vcfs
|
- |
tbis
|
- |
subworkflows/local/cram_merge_index_samtools/main.nf:10Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
fasta
|
- |
fasta_fai
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
subworkflows/local/cram_qc_mosdepth_samtools/main.nf:10Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
fasta
|
- |
intervals
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
subworkflows/local/cram_sampleqc/main.nf:4Inputs (take)
| Name | Description |
|---|---|
cram
|
- |
ngscheckmate_bed
|
- |
fasta
|
- |
skip_baserecalibration
|
- |
intervals_for_preprocessing
|
- |
Outputs (emit)
| Name | Description |
|---|---|
corr_matrix
|
- |
matched
|
- |
all
|
- |
vcf
|
- |
pdf
|
- |
?
|
- |
?
|
- |
subworkflows/local/download_cache_snpeff_vep/main.nf:14Inputs (take)
| Name | Description |
|---|---|
ensemblvep_info
|
- |
snpeff_info
|
- |
Outputs (emit)
| Name | Description |
|---|---|
ensemblvep_cache
|
- |
snpeff_cache
|
- |
?
|
- |
subworkflows/local/fastq_align/main.nf:12Inputs (take)
| Name | Description |
|---|---|
reads
|
- |
index
|
- |
sort
|
- |
fasta
|
- |
fasta_fai
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/fastq_create_umi_consensus_fgbio/main.nf:16Inputs (take)
| Name | Description |
|---|---|
reads
|
- |
fasta
|
- |
fai
|
- |
map_index
|
- |
groupreadsbyumi_strategy
|
- |
Outputs (emit)
| Name | Description |
|---|---|
umibam
|
- |
groupbam
|
- |
consensusbam
|
- |
versions
|
- |
subworkflows/local/fastq_preprocess_gatk/main.nf:52Inputs (take)
| Name | Description |
|---|---|
input_fastq
|
- |
input_sample
|
- |
dict
|
- |
fasta
|
- |
fasta_fai
|
- |
index_alignment
|
- |
intervals_and_num_intervals
|
- |
intervals_for_preprocessing
|
- |
known_sites_indels
|
- |
known_sites_indels_tbi
|
- |
bbsplit_index
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/fastq_preprocess_parabricks/main.nf:4Inputs (take)
| Name | Description |
|---|---|
ch_reads
|
- |
ch_fasta
|
- |
ch_index
|
- |
ch_interval_file
|
- |
ch_known_sites
|
- |
val_output_fmt
|
- |
Outputs (emit)
| Name | Description |
|---|---|
cram
|
- |
versions
|
- |
reports
|
- |
main.nf:86Inputs (take)
| Name | Description |
|---|---|
samplesheet
|
- |
Outputs (emit)
| Name | Description |
|---|---|
multiqc_report
|
- |
subworkflows/local/vcf_normalization/main.nf:10Inputs (take)
| Name | Description |
|---|---|
vcfs
|
- |
fasta
|
- |
Outputs (emit)
| Name | Description |
|---|---|
vcfs
|
- |
tbis
|
- |
?
|
- |
subworkflows/local/utils_nfcore_sarek_pipeline/main.nf:203Inputs (take)
| Name | Description |
|---|---|
email
|
- |
email_on_fail
|
- |
plaintext_email
|
- |
outdir
|
- |
monochrome_logs
|
- |
hook_url
|
- |
multiqc_report
|
- |
Outputs (emit)
| Name | Description |
|---|---|
|
- |
subworkflows/local/utils_nfcore_sarek_pipeline/main.nf:26Inputs (take)
| Name | Description |
|---|---|
version
|
- |
validate_params
|
- |
nextflow_cli_args
|
- |
outdir
|
- |
input
|
- |
help
|
- |
help_full
|
- |
show_hidden
|
- |
Outputs (emit)
| Name | Description |
|---|---|
samplesheet
|
- |
?
|
- |
subworkflows/local/post_variantcalling/main.nf:12Inputs (take)
| Name | Description |
|---|---|
tools
|
- |
cram_germline
|
- |
germline_vcfs
|
- |
germline_tbis
|
- |
cram_tumor_only
|
- |
tumor_only_vcfs
|
- |
tumor_only_tbis
|
- |
cram_somatic
|
- |
somatic_vcfs
|
- |
somatic_tbis
|
- |
fasta
|
- |
fai
|
- |
concatenate_vcfs
|
- |
filter_vcfs
|
- |
snv_consensus_calling
|
- |
normalize_vcfs
|
- |
varlociraptor_chunk_size
|
- |
varlociraptor_scenario_germline
|
- |
varlociraptor_scenario_somatic
|
- |
varlociraptor_scenario_tumor_only
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/prepare_genome/main.nf:22Inputs (take)
| Name | Description |
|---|---|
ascat_alleles_in
|
- |
ascat_loci_in
|
- |
ascat_loci_gc_in
|
- |
ascat_loci_rt_in
|
- |
bbsplit_fasta_list_in
|
- |
bbsplit_index_in
|
- |
bcftools_annotations_in
|
- |
bcftools_annotations_tbi_in
|
- |
bwa_in
|
- |
bwamem2_in
|
- |
chr_dir_in
|
- |
dbsnp_in
|
- |
dbsnp_tbi_in
|
- |
dict_in
|
- |
dragmap_in
|
- |
fasta_in
|
- |
fasta_fai_in
|
- |
germline_resource_in
|
- |
germline_resource_tbi_in
|
- |
known_indels_in
|
- |
known_indels_tbi_in
|
- |
known_snps_in
|
- |
known_snps_tbi_in
|
- |
msisensor2_models_in
|
- |
msisensorpro_scan_in
|
- |
pon_in
|
- |
pon_tbi_in
|
- |
aligner
|
- |
step
|
- |
tools
|
- |
vep_include_fasta
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/prepare_reference_cnvkit/main.nf:4Inputs (take)
| Name | Description |
|---|---|
fasta
|
- |
intervals_bed_combined
|
- |
Outputs (emit)
| Name | Description |
|---|---|
cnvkit_reference
|
- |
?
|
- |
subworkflows/local/samplesheet_to_channel/main.nf:3Inputs (take)
| Name | Description |
|---|---|
ch_from_samplesheet
|
- |
aligner
|
- |
ascat_alleles
|
- |
ascat_loci
|
- |
ascat_loci_gc
|
- |
ascat_loci_rt
|
- |
bcftools_annotations
|
- |
bcftools_annotations_tbi
|
- |
bcftools_columns
|
- |
bcftools_header_lines
|
- |
build_only_index
|
- |
dbsnp
|
- |
fasta
|
- |
germline_resource
|
- |
intervals
|
- |
joint_germline
|
- |
joint_mutect2
|
- |
known_indels
|
- |
known_snps
|
- |
no_intervals
|
- |
pon
|
- |
sentieon_dnascope_emit_mode
|
- |
sentieon_haplotyper_emit_mode
|
- |
seq_center
|
- |
seq_platform
|
- |
skip_tools
|
- |
snpeff_cache
|
- |
snpeff_db
|
- |
step
|
- |
tools
|
- |
umi_length
|
- |
umi_location
|
- |
umi_in_read_header
|
- |
umi_read_structure
|
- |
wes
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
workflows/sarek/main.nf:63Inputs (take)
| Name | Description |
|---|---|
input_sample
|
- |
aligner
|
- |
skip_tools
|
- |
step
|
- |
tools
|
- |
ascat_alleles
|
- |
ascat_loci
|
- |
ascat_loci_gc
|
- |
ascat_loci_rt
|
- |
bbsplit_index
|
- |
bcftools_annotations
|
- |
bcftools_annotations_tbi
|
- |
bcftools_columns
|
- |
bcftools_header_lines
|
- |
cf_chrom_len
|
- |
chr_files
|
- |
cnvkit_reference
|
- |
dbsnp
|
- |
dbsnp_tbi
|
- |
dbsnp_vqsr
|
- |
dict
|
- |
fasta
|
- |
fasta_fai
|
- |
germline_resource
|
- |
germline_resource_tbi
|
- |
index_alignment
|
- |
intervals_and_num_intervals
|
- |
intervals_bed_combined
|
- |
intervals_bed_combined_for_variant_calling
|
- |
intervals_bed_gz_tbi_and_num_intervals
|
- |
intervals_bed_gz_tbi_combined
|
- |
intervals_for_preprocessing
|
- |
known_indels_vqsr
|
- |
known_sites_indels
|
- |
known_sites_indels_tbi
|
- |
known_sites_snps
|
- |
known_sites_snps_tbi
|
- |
known_snps_vqsr
|
- |
mappability
|
- |
msisensor2_models
|
- |
msisensorpro_scan
|
- |
ngscheckmate_bed
|
- |
pon
|
- |
pon_tbi
|
- |
sentieon_dnascope_model
|
- |
varlociraptor_scenario_germline
|
- |
varlociraptor_scenario_somatic
|
- |
varlociraptor_scenario_tumor_only
|
- |
snpeff_cache
|
- |
snpeff_db
|
- |
vep_cache
|
- |
vep_cache_version
|
- |
vep_extra_files
|
- |
vep_fasta
|
- |
vep_genome
|
- |
vep_species
|
- |
versions
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
subworkflows/nf-core/utils_nextflow_pipeline/main.nf:11Subworkflow with functionality that may be useful for any Nextflow pipeline
Inputs (take)
| Name | Description |
|---|---|
print_version
|
Print the version of the pipeline and exit |
dump_parameters
|
Dump the parameters of the pipeline to a JSON file |
output_directory
|
Path to output dir to write JSON file to. |
check_conda_channel
|
Check if the conda channel priority is correct. |
Outputs (emit)
| Name | Description |
|---|---|
dummy_emit
|
Dummy emit to make nf-core subworkflows lint happy |
subworkflows/nf-core/utils_nfcore_pipeline/main.nf:11Subworkflow with utility functions specific to the nf-core pipeline template
Inputs (take)
| Name | Description |
|---|---|
nextflow_cli_args
|
Nextflow CLI positional arguments |
Outputs (emit)
| Name | Description |
|---|---|
success
|
Dummy output to indicate success |
subworkflows/nf-core/utils_nfschema_plugin/main.nf:9Run nf-schema to validate parameters and create a summary of changed parameters
Inputs (take)
| Name | Description |
|---|---|
input_workflow
|
The workflow object of the used pipeline. This object contains meta data used to create the params summary log |
validate_params
|
Validate the parameters and error if invalid. |
parameters_schema
|
Path to the parameters JSON schema.
This has to be the same as the schema given to the |
Outputs (emit)
| Name | Description |
|---|---|
dummy_emit
|
Dummy emit to make nf-core subworkflows lint happy |
subworkflows/local/vcf_annotate_all/main.nf:10Inputs (take)
| Name | Description |
|---|---|
vcf
|
- |
fasta
|
- |
tools
|
- |
snpeff_db
|
- |
snpeff_cache
|
- |
vep_genome
|
- |
vep_species
|
- |
vep_cache_version
|
- |
vep_cache
|
- |
vep_extra_files
|
- |
bcftools_annotations
|
- |
bcftools_annotations_index
|
- |
bcftools_columns
|
- |
bcftools_header_lines
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
?
|
- |
?
|
- |
subworkflows/nf-core/vcf_annotate_ensemblvep/main.nf:8Perform annotation with ensemblvep and bgzip + tabix index the resulting VCF file
Components
ensemblvep/vep
tabix/tabix
Inputs (take)
| Name | Description |
|---|---|
ch_vcf
|
vcf file to annotate Structure: [ val(meta), path(vcf), [path(custom_file1), path(custom_file2)... (optional)] ] |
ch_fasta
|
Reference genome fasta file (optional) Structure: [ val(meta2), path(fasta) ] |
val_genome
|
genome to use |
val_species
|
species to use |
val_cache_version
|
cache version to use |
ch_cache
|
the root cache folder for ensemblvep (optional) Structure: [ val(meta3), path(cache) ] |
ch_extra_files
|
any extra files needed by plugins for ensemblvep (optional) Structure: [ path(file1), path(file2)... ] |
Outputs (emit)
| Name | Description |
|---|---|
vcf_tbi
|
Compressed vcf file + tabix index Structure: [ val(meta), path(vcf), path(tbi) ] |
json
|
json file Structure: [ val(meta), path(json) ] |
tab
|
tab file Structure: [ val(meta), path(tab) ] |
reports
|
html reports |
versions
|
File containing software versions |
subworkflows/nf-core/vcf_annotate_snpeff/main.nf:8Perform annotation with snpEff and bgzip + tabix index the resulting VCF file
Components
snpeff
snpeff/snpeff
tabix/bgziptabix
Inputs (take)
| Name | Description |
|---|---|
ch_vcf
|
vcf file Structure: [ val(meta), path(vcf) ] |
val_snpeff_db
|
db version to use |
ch_snpeff_cache
|
path to root cache folder for snpEff (optional) Structure: [ path(cache) ] |
Outputs (emit)
| Name | Description |
|---|---|
vcf_tbi
|
Compressed vcf file + tabix index Structure: [ val(meta), path(vcf), path(tbi) ] |
reports
|
html reports Structure: [ path(html) ] |
summary
|
html reports Structure: [ path(csv) ] |
genes_txt
|
html reports Structure: [ path(txt) ] |
versions
|
Files containing software versions Structure: [ path(versions.yml) ] |
subworkflows/local/vcf_qc_bcftools_vcftools/main.nf:6Inputs (take)
| Name | Description |
|---|---|
vcf
|
- |
target_bed
|
- |
Outputs (emit)
| Name | Description |
|---|---|
bcftools_stats
|
- |
vcftools_tstv_counts
|
- |
vcftools_tstv_qual
|
- |
vcftools_filter_summary
|
- |
?
|
- |
subworkflows/local/vcf_variant_filtering_gatk/main.nf:4Inputs (take)
| Name | Description |
|---|---|
vcf
|
- |
fasta
|
- |
fasta_fai
|
- |
dict
|
- |
intervals_bed_combined
|
- |
known_sites
|
- |
known_sites_tbi
|
- |
Outputs (emit)
| Name | Description |
|---|---|
?
|
- |
?
|
- |
?
|
- |
subworkflows/local/vcf_varlociraptor_single/main.nf:9Inputs (take)
| Name | Description |
|---|---|
ch_cram
|
- |
ch_fasta
|
- |
ch_fasta_fai
|
- |
ch_scenario
|
- |
ch_vcf
|
- |
val_num_chunks
|
- |
val_sampletype
|
- |
Outputs (emit)
| Name | Description |
|---|---|
vcf
|
- |
tbi
|
- |
versions
|
- |
subworkflows/local/vcf_varlociraptor_somatic/main.nf:15Inputs (take)
| Name | Description |
|---|---|
ch_cram
|
- |
ch_fasta
|
- |
ch_fasta_fai
|
- |
ch_scenario
|
- |
ch_somatic_vcf
|
- |
ch_germline_vcf
|
- |
val_num_chunks
|
- |
Outputs (emit)
| Name | Description |
|---|---|
vcf
|
- |
tbi
|
- |
versions
|
- |
Processes
This page documents all processes in the pipeline.
modules/local/add_info_to_vcf/main.nf:1Inputs
| Name | Type | Description |
|---|---|---|
val(meta)
|
tuple
|
- |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
vcf
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/ascat/main.nf:1copy number profiles of tumour cells.
Tools
ASCAT is a method to derive copy number profiles of tumour cells, accounting for normal cell admixture and tumour aneuploidy. ASCAT infers tumour purity (the fraction of tumour cells) and ploidy (the amount of DNA per tumour cell), expressed as multiples of haploid genomes from SNP array or massively parallel sequencing data, and calculates whole-genome allele-specific copy number profiles (the number of copies of both parental alleles for all SNP loci across the genome).
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input_normal
|
file
|
BAM/CRAM file, must adhere to chr1, chr2, ...chrX notation For modifying chromosome notation in bam files please follow https://josephcckuo.wordpress.com/2016/11/17/modify-chromosome-notation-in-bam-file/. |
index_normal
|
file
|
index for normal_bam/cram |
input_tumor
|
file
|
BAM/CRAM file, must adhere to chr1, chr2, ...chrX notation |
index_tumor
|
file
|
index for tumor_bam/cram |
allele_files
|
file
|
allele files for ASCAT WGS. Can be downloaded here https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WGS |
loci_files
|
file
|
loci files for ASCAT WGS. Loci files without chromosome notation can be downloaded here https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WGS Make sure the chromosome notation matches the bam/cram input files. To add the chromosome notation to loci files (hg19/hg38) if necessary, you can run this command |
bed_file
|
file
|
Bed file for ASCAT WES (optional, but recommended for WES) |
fasta
|
file
|
Reference fasta file (optional) |
gc_file
|
file
|
GC correction file (optional) - Used to do logR correction of the tumour sample(s) with genomic GC content |
rt_file
|
file
|
replication timing correction file (optional, provide only in combination with gc_file) |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
allelefreqs
|
- |
val(meta)
|
tuple
|
bafs
|
- |
val(meta)
|
tuple
|
cnvs
|
- |
val(meta)
|
tuple
|
logrs
|
- |
val(meta)
|
tuple
|
metrics
|
- |
val(meta)
|
tuple
|
png
|
- |
val(meta)
|
tuple
|
purityploidy
|
- |
val(meta)
|
tuple
|
segments
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/bbmap/bbsplit/main.nf:1Split sequencing reads by mapping them to multiple references simultaneously
Tools
BBMap is a short read aligner, as well as various other bioinformatic tools.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
reads
|
file
|
List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively. |
other_ref_names
|
list
|
List of other reference ids apart from the primary |
other_ref_paths
|
list
|
Path to other references paths corresponding to "other_ref_names" |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
index
|
-
|
- | - |
primary_fastq
|
file
|
*primary*fastq.gz
|
Output reads that map to the primary reference |
all_fastq
|
file
|
*fastq.gz
|
All reads mapping to any of the references |
stats
|
file
|
*.txt
|
Tab-delimited text file containing mapping statistics |
log
|
file
|
*.log
|
Log file |
modules/nf-core/bcftools/annotate/main.nf:1Add or remove annotations.
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
Query VCF or BCF file, can be either uncompressed or compressed |
index
|
file
|
Index of the query VCF or BCF file |
annotations
|
file
|
Bgzip-compressed file with annotations |
annotations_index
|
file
|
Index of the annotations file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*{
|
Compressed annotated VCF file |
tbi
|
file
|
*.tbi
|
Alternative VCF file index |
csi
|
file
|
*.csi
|
Default VCF file index |
modules/nf-core/bcftools/concat/main.nf:1Concatenate VCF files
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
vcfs
|
list
|
List containing 2 or more vcf files e.g. [ 'file1.vcf', 'file2.vcf' ] |
tbi
|
list
|
List containing 2 or more index files (optional) e.g. [ 'file1.tbi', 'file2.tbi' ] |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
- | - |
modules/nf-core/bcftools/isec/main.nf:1Apply set operations to VCF files
Tools
Computes intersections, unions and complements of VCF files.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
vcfs
|
list
|
List containing 2 or more vcf/bcf files. These must be compressed and have an associated index. e.g. [ 'file1.vcf.gz', 'file2.vcf' ] |
tbis
|
list
|
List containing the tbi index files corresponding to the vcf/bcf input files e.g. [ 'file1.vcf.tbi', 'file2.vcf.tbi' ] |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
results
|
directory
|
${
|
Folder containing the set operations results perform on the vcf files |
modules/nf-core/bcftools/merge/main.nf:1Merge VCF files
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
vcfs
|
file
|
List containing 2 or more vcf files e.g. [ 'file1.vcf', 'file2.vcf' ] |
tbis
|
file
|
List containing the tbi index files corresponding to the vcfs input files e.g. [ 'file1.vcf.tbi', 'file2.vcf.tbi' ] |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
(Optional) The fasta reference file (only necessary for the |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fai
|
file
|
(Optional) The fasta reference file index (only necessary for the |
meta4
|
map
|
Groovy Map containing bed information e.g. [ id:'genome' ] |
bed
|
file
|
(Optional) The bed regions to merge on |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.{
|
merged output file |
index
|
file
|
*.{
|
index of merged output |
modules/nf-core/bcftools/mpileup/main.nf:1Compresses VCF files
Tools
Generates genotype likelihoods at each genomic position with coverage.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
bam
|
file
|
Input BAM file |
intervals
|
file
|
Input intervals file. A file (commonly '.bed') containing regions to subset |
meta2
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
fasta
|
file
|
FASTA reference file |
save_mpileup
|
boolean
|
Save mpileup file generated by bcftools mpileup |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
vcf
|
- |
val(meta)
|
tuple
|
tbi
|
- |
val(meta)
|
tuple
|
stats
|
- |
val(meta)
|
tuple
|
mpileup
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/bcftools/norm/main.nf:1Normalize VCF file
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
vcf
|
file
|
The vcf file to be normalized e.g. 'file1.vcf' |
tbi
|
file
|
An optional index of the VCF file (for when the VCF is compressed) |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
FASTA reference file |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
- | - |
modules/nf-core/bcftools/sort/main.nf:1Sorts VCF files
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
vcf
|
file
|
The VCF/BCF file to be sorted |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
- | - |
modules/nf-core/bcftools/stats/main.nf:1Generates stats from VCF files
Tools
Parses VCF or BCF and produces text file stats which is suitable for machine processing and can be plotted using plot-vcfstats.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
vcf
|
file
|
VCF input file |
tbi
|
file
|
The tab index for the VCF file to be inspected. Optional: only required when parameter regions is chosen. |
meta2
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
regions
|
file
|
Optionally, restrict the operation to regions listed in this file. (VCF, BED or tab-delimited) |
meta3
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
targets
|
file
|
Optionally, restrict the operation to regions listed in this file (doesn't rely upon tbi index files) |
meta4
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
samples
|
file
|
Optional, file of sample names to be included or excluded. e.g. 'file.tsv' |
meta5
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
exons
|
file
|
Tab-delimited file with exons for indel frameshifts (chr,beg,end; 1-based, inclusive, optionally bgzip compressed). e.g. 'exons.tsv.gz' |
meta6
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
fasta
|
file
|
Faidx indexed reference sequence file to determine INDEL context. e.g. 'reference.fa' |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
stats
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/bcftools/view/main.nf:1View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Tools
View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
vcf
|
file
|
The vcf file to be inspected. e.g. 'file.vcf' |
index
|
file
|
The tab index for the VCF file to be inspected. e.g. 'file.tbi' |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.{
|
VCF normalized output file |
tbi
|
file
|
*.tbi
|
Alternative VCF file index |
csi
|
file
|
*.csi
|
Default VCF file index |
modules/nf-core/bwa/index/main.nf:1Create BWA index for reference genome
Tools
BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing reference information. e.g. [ id:'test', single_end:false ] |
fasta
|
file
|
Input genome fasta file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
index
|
map
|
*.{
|
Groovy Map containing reference information. e.g. [ id:'test', single_end:false ] |
modules/nf-core/bwa/mem/main.nf:1Performs fastq alignment to a fasta reference using BWA
Tools
BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
reads
|
file
|
List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively. |
meta2
|
map
|
Groovy Map containing reference information. e.g. [ id:'test', single_end:false ] |
index
|
file
|
BWA genome index files |
meta3
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
fasta
|
file
|
Reference genome in FASTA format |
sort_bam
|
boolean
|
use samtools sort (true) or samtools view (false) |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
bam
|
- |
val(meta)
|
tuple
|
cram
|
- |
val(meta)
|
tuple
|
csi
|
- |
val(meta)
|
tuple
|
crai
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/bwamem2/index/main.nf:1Create BWA-mem2 index for reference genome
Tools
BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
fasta
|
file
|
Input genome fasta file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
index
|
file
|
*.{
|
BWA genome index files |
modules/nf-core/bwamem2/mem/main.nf:1Performs fastq alignment to a fasta reference using BWA
Tools
BWA-mem2 is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
reads
|
file
|
List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively. |
meta2
|
map
|
Groovy Map containing reference/index information e.g. [ id:'test' ] |
index
|
file
|
BWA genome index files |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
Reference genome in FASTA format |
sort_bam
|
boolean
|
use samtools sort (true) or samtools view (false) |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
sam
|
- |
val(meta)
|
tuple
|
bam
|
- |
val(meta)
|
tuple
|
cram
|
- |
val(meta)
|
tuple
|
crai
|
- |
val(meta)
|
tuple
|
csi
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/cat/cat/main.nf:1A module for concatenation of gzipped or uncompressed files
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
files_in
|
file
|
List of compressed / uncompressed files |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
- | - |
modules/nf-core/cat/fastq/main.nf:1Concatenates fastq files
Tools
The cat utility reads files sequentially, writing them to the standard output.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
reads
|
file
|
List of input FastQ files to be concatenated. |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
reads
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/cnvkit/antitarget/main.nf:1Derive off-target (“antitarget”) bins from target regions.
Tools
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
targets
|
file
|
File containing genomic regions |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
bed
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/cnvkit/batch/main.nf:1Copy number variant detection from high-throughput sequencing data
Tools
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
tumor
|
file
|
Input tumour sample bam file (or cram) |
normal
|
file
|
Input normal sample bam file (or cram) |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'test' ] |
fasta
|
file
|
Input reference genome fasta file (only needed for cram_input and/or when normal_samples are provided) |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'test' ] |
fasta_fai
|
file
|
Input reference genome fasta index (optional, but recommended for cram_input) |
meta4
|
map
|
Groovy Map containing information about target file e.g. [ id:'test' ] |
targets
|
file
|
Input target bed file |
meta5
|
map
|
Groovy Map containing information about reference file e.g. [ id:'test' ] |
reference
|
file
|
Input reference cnn-file (only for germline and tumor-only running) |
panel_of_normals
|
file
|
Input panel of normals file |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
bed
|
- |
val(meta)
|
tuple
|
cnn
|
- |
val(meta)
|
tuple
|
cnr
|
- |
val(meta)
|
tuple
|
cns
|
- |
val(meta)
|
tuple
|
pdf
|
- |
val(meta)
|
tuple
|
png
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/cnvkit/call/main.nf:1Given segmented log2 ratio estimates (.cns), derive each segment’s absolute integer copy number
Tools
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
cns
|
file
|
CNVKit CNS file. |
vcf
|
file
|
Germline VCF file for BAF. |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
cns
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/cnvkit/export/main.nf:1Convert copy number ratio tables (.cnr files) or segments (.cns) to another format.
Tools
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
cns
|
file
|
CNVKit CNS file. |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
- | - |
modules/nf-core/cnvkit/genemetrics/main.nf:1Copy number variant detection from high-throughput sequencing data
Tools
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
cnr
|
file
|
CNR file |
cns
|
file
|
CNS file [Optional] |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
tsv
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/cnvkit/reference/main.nf:1Compile a coverage reference from the given files (normal samples).
Tools
CNVkit is a Python library and command-line software toolkit to infer and visualize copy number from high-throughput DNA sequencing data. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina and Ion Torrent.
Inputs
| Name | Type | Description |
|---|---|---|
fasta
|
file
|
File containing reference genome |
targets
|
file
|
File containing genomic regions |
antitargets
|
file
|
File containing off-target genomic regions |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
*.cnn
|
path
|
cnn
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/controlfreec/assesssignificance/main.nf:1Add both Wilcoxon test and Kolmogorov-Smirnov test p-values to each CNV output of FREEC
Tools
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
cnvs
|
file
|
_CNVs file generated by FREEC |
ratio
|
file
|
ratio file generated by FREEC |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
p_value_txt
|
file
|
*.p.value.txt
|
CNV file containing p_values for each call |
modules/nf-core/controlfreec/freec/main.nf:1Copy number and genotype annotation from whole genome and whole exome sequencing data
Tools
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
mpileup_normal
|
file
|
miniPileup file |
mpileup_tumor
|
file
|
miniPileup file |
cpn_normal
|
file
|
Raw copy number profiles (optional) |
cpn_tumor
|
file
|
Raw copy number profiles (optional) |
minipileup_normal
|
file
|
miniPileup file from previous run (optional) |
minipileup_tumor
|
file
|
miniPileup file from previous run (optional) |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bedgraph
|
file
|
.bedgraph
|
Bedgraph format for the UCSC genome browser |
control_cpn
|
file
|
*_control.cpn
|
files with raw copy number profiles |
sample_cpn
|
file
|
*_sample.cpn
|
files with raw copy number profiles |
gcprofile_cpn
|
file
|
GC_profile.*.cpn
|
file with GC-content profile. |
BAF
|
file
|
*_BAF.txt
|
file B-allele frequencies for each possibly heterozygous SNP position |
CNV
|
file
|
*_CNVs
|
file with coordinates of predicted copy number alterations. |
info
|
file
|
*_info.txt
|
parsable file with information about FREEC run |
ratio
|
file
|
*_ratio.txt
|
file with ratios and predicted copy number alterations for each window |
config
|
file
|
config.txt
|
Config file used to run Control-FREEC |
modules/nf-core/controlfreec/freec2bed/main.nf:1Plot Freec output
Tools
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
ratio
|
file
|
ratio file generated by FREEC |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bed
|
file
|
*.bed
|
Bed file |
modules/nf-core/controlfreec/freec2circos/main.nf:1Format Freec output to circos input format
Tools
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
ratio
|
file
|
ratio file generated by FREEC |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
circos
|
file
|
*.circos.txt
|
Txt file |
modules/nf-core/controlfreec/makegraph2/main.nf:1Plot Freec output
Tools
Copy number and genotype annotation from whole genome and whole exome sequencing data.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
ratio
|
file
|
ratio file generated by FREEC |
baf
|
file
|
.BAF file generated by FREEC |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
png_baf
|
file
|
*_BAF.png
|
Image of BAF plot |
png_ratio_log2
|
file
|
*_ratio.log2.png
|
Image of ratio log2 plot |
png_ratio
|
file
|
*_ratio.png
|
Image of ratio plot |
modules/local/create_intervals_bed/main.nf:1Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
*.bed
|
path
|
bed
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/deepvariant/rundeepvariant/main.nf:1DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
Tools
DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM file |
index
|
file
|
Index of BAM/CRAM file |
intervals
|
file
|
file containing intervals |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
The reference fasta file |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fai
|
file
|
Index of reference fasta file |
meta4
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
gzi
|
file
|
GZI index of reference fasta file |
meta5
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
par_bed
|
file
|
BED file containing PAR regions |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.vcf.gz
|
Compressed VCF file |
vcf_index
|
file
|
*.vcf.gz.{
|
Tabix index file of compressed VCF |
gvcf
|
file
|
*.g.vcf.gz
|
Compressed GVCF file |
gvcf_index
|
file
|
*.g.vcf.gz.{
|
Tabix index file of compressed GVCF |
modules/nf-core/dragmap/align/main.nf:1Performs fastq alignment to a reference using DRAGMAP
Tools
Dragmap is the Dragen mapper/aligner Open Source Software.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
reads
|
file
|
List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively. |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'test', single_end:false ] |
hashmap
|
file
|
DRAGMAP hash table |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'genome'] |
fasta
|
file
|
Genome fasta reference files |
sort_bam
|
boolean
|
Sort the BAM file |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
sam
|
- |
val(meta)
|
tuple
|
bam
|
- |
val(meta)
|
tuple
|
cram
|
- |
val(meta)
|
tuple
|
crai
|
- |
val(meta)
|
tuple
|
csi
|
- |
val(meta)
|
tuple
|
log
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/dragmap/hashtable/main.nf:1Create DRAGEN hashtable for reference genome
Tools
Dragmap is the Dragen mapper/aligner Open Source Software.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing reference information e.g. [ id:'test', single_end:false ] |
fasta
|
file
|
Input genome fasta file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
hashmap
|
file
|
*.{
|
DRAGMAP hash table |
modules/nf-core/ensemblvep/download/main.nf:1Ensembl Variant Effect Predictor (VEP). The cache downloading options are controlled through task.ext.args.
Tools
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
assembly
|
string
|
Genome assembly |
species
|
string
|
Specie |
cache_version
|
string
|
cache version |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
cache
|
file
|
*
|
cache |
modules/nf-core/ensemblvep/vep/main.nf:1Ensembl Variant Effect Predictor (VEP). The output-file-format is controlled through task.ext.args.
Tools
VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
vcf
|
file
|
vcf to annotate |
custom_extra_files
|
file
|
extra sample-specific files to be used with the |
meta2
|
map
|
Groovy Map containing fasta reference information e.g. [ id:'test' ] |
fasta
|
file
|
reference FASTA file (optional) |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.vcf.gz
|
annotated vcf (optional) |
tbi
|
file
|
*.vcf.gz.tbi
|
annotated vcf index (optional) |
tab
|
file
|
*.ann.tab.gz
|
tab file with annotated variants (optional) |
json
|
file
|
*.ann.json.gz
|
json file with annotated variants (optional) |
report
|
-
|
- | - |
modules/nf-core/fastp/main.nf:1Perform adapter/quality trimming on sequencing reads
Tools
A tool designed to provide fast all-in-one preprocessing for FastQ files. This tool is developed in C++ with multithreading supported to afford high performance.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information. Use 'single_end: true' to specify single ended or interleaved FASTQs. Use 'single_end: false' for paired-end reads. e.g. [ id:'test', single_end:false ] |
reads
|
file
|
List of input FastQ files of size 1 and 2 for single-end and paired-end data,
respectively. If you wish to run interleaved paired-end data, supply as single-end data
but with |
adapter_fasta
|
file
|
File in FASTA format containing possible adapters to remove. |
discard_trimmed_pass
|
boolean
|
Specify true to not write any reads that pass trimming thresholds. This can be used to use fastp for the output report only. |
save_trimmed_fail
|
boolean
|
Specify true to save files that failed to pass trimming thresholds ending in |
save_merged
|
boolean
|
Specify true to save all merged reads to a file ending in |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
reads
|
- |
val(meta)
|
tuple
|
json
|
- |
val(meta)
|
tuple
|
html
|
- |
val(meta)
|
tuple
|
log
|
- |
val(meta)
|
tuple
|
reads_fail
|
- |
val(meta)
|
tuple
|
reads_merged
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/fastqc/main.nf:1Run FastQC on sequenced reads
Tools
FastQC gives general quality metrics about your reads. It provides information about the quality score distribution across your reads, the per base sequence content (%A/C/G/T).
You get information about adapter contamination and other overrepresented sequences.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
reads
|
file
|
List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively. |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
html
|
file
|
*_{
|
FastQC report |
zip
|
file
|
*_{
|
FastQC report archive |
modules/nf-core/fgbio/callmolecularconsensusreads/main.nf:1Calls consensus sequences from reads with the same unique molecular tag.
Tools
Tools for working with genomic and high throughput sequencing data.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false, collapse:false ] |
grouped_bam
|
file
|
The input SAM or BAM file, grouped by UMIs |
min_reads
|
integer
|
Minimum number of original reads to build each consensus read. |
min_baseq
|
integer
|
Ignore bases in raw reads that have Q below this value. |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
bam
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/fgbio/copyumifromreadname/main.nf:1Copies the UMI at the end of a bam files read name to the RX tag.
Tools
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information
e.g. |
bam
|
file
|
Sorted BAM/CRAM/SAM file |
bai
|
file
|
Index for bam file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bam
|
file
|
*.{
|
Sorted BAM file |
bai
|
file
|
*.{
|
Index for bam file |
modules/nf-core/fgbio/fastqtobam/main.nf:1Using the fgbio tools, converts FASTQ files sequenced into unaligned BAM or CRAM files possibly moving the UMI barcode into the RX field of the reads
Tools
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
reads
|
file
|
pair of reads to be converted into BAM file |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
bam
|
- |
val(meta)
|
tuple
|
cram
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/fgbio/groupreadsbyumi/main.nf:1Groups reads together that appear to have come from the same original molecule. Reads are grouped by template, and then templates are sorted by the 5’ mapping positions of the reads from the template, used from earliest mapping position to latest. Reads that have the same end positions are then sub-grouped by UMI sequence. (!) Note: the MQ tag is required on reads with mapped mates (!) This can be added using samblaster with the optional argument --addMateTags.
Tools
A set of tools for working with genomic and high throughput sequencing data, including UMIs
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
bam
|
file
|
BAM file. Note: the MQ tag is required on reads with mapped mates (!) |
strategy
|
string
|
Required argument: defines the UMI assignment strategy. Must be chosen among: Identity, Edit, Adjacency, Paired. |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
bam
|
- |
val(meta)
|
tuple
|
histogram
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/freebayes/main.nf:1A haplotype-based variant detector
Tools
Bayesian haplotype-based polymorphism discovery and genotyping
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input_1
|
file
|
BAM/CRAM/SAM file |
input_1_index
|
file
|
BAM/CRAM/SAM index file |
input_2
|
file
|
BAM/CRAM/SAM file |
input_2_index
|
file
|
BAM/CRAM/SAM index file |
target_bed
|
file
|
Optional - Limit analysis to targets listed in this BED-format FILE. |
meta2
|
map
|
Groovy Map containing reference information. e.g. [ id:'test_reference' ] |
fasta
|
file
|
reference fasta file |
meta3
|
map
|
Groovy Map containing reference information. e.g. [ id:'test_reference' ] |
fasta_fai
|
file
|
reference fasta file index |
meta4
|
map
|
Groovy Map containing meta information for the samples file. e.g. [ id:'test_samples' ] |
samples
|
file
|
Optional - Limit analysis to samples listed (one per line) in the FILE. |
meta5
|
map
|
Groovy Map containing meta information for the populations file. e.g. [ id:'test_populations' ] |
populations
|
file
|
Optional - Each line of FILE should list a sample and a population which it is part of. |
meta6
|
map
|
Groovy Map containing meta information for the cnv file. e.g. [ id:'test_cnv' ] |
cnv
|
file
|
A copy number map BED file, which has either a sample-level ploidy: sample_name copy_number or a region-specific format: seq_name start end sample_name copy_number |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.vcf.gz
|
Compressed VCF file |
modules/nf-core/gatk4/applybqsr/main.nf:1Apply base quality score recalibration (BQSR) to a bam file
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM file from alignment |
input_index
|
file
|
BAI/CRAI file from alignment |
bqsr_table
|
file
|
Recalibration table from gatk4_baserecalibrator |
intervals
|
file
|
Bed file with the genomic regions included in the library (optional) |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bam
|
file
|
${
|
Recalibrated BAM file |
bai
|
file
|
${
|
Recalibrated BAM index file |
cram
|
file
|
${
|
Recalibrated CRAM file |
modules/nf-core/gatk4/applyvqsr/main.nf:1Apply a score cutoff to filter variants based on a recalibration table. AplyVQSR performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the first step by VariantRecalibrator and a target sensitivity value.
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test'] |
vcf
|
file
|
VCF file to be recalibrated, this should be the same file as used for the first stage VariantRecalibrator. |
vcf_tbi
|
file
|
tabix index for the input vcf file. |
recal
|
file
|
Recalibration file produced when the input vcf was run through VariantRecalibrator in stage 1. |
recal_index
|
file
|
Index file for the recalibration file. |
tranches
|
file
|
Tranches file produced when the input vcf was run through VariantRecalibrator in stage 1. |
fasta
|
file
|
The reference fasta file |
fai
|
file
|
Index of reference fasta file |
dict
|
file
|
GATK sequence dictionary |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
vcf
|
- |
val(meta)
|
tuple
|
tbi
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4/baserecalibrator/main.nf:1Generate recalibration table for Base Quality Score Recalibration (BQSR)
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM file from alignment |
input_index
|
file
|
BAI/CRAI file from alignment |
intervals
|
file
|
Bed file with the genomic regions included in the library (optional) |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome'] |
fasta
|
file
|
The reference fasta file |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'genome'] |
fai
|
file
|
Index of reference fasta file |
meta4
|
map
|
Groovy Map containing reference information e.g. [ id:'genome'] |
dict
|
file
|
GATK sequence dictionary |
meta5
|
map
|
Groovy Map containing reference information e.g. [ id:'genome'] |
known_sites
|
file
|
VCF files with known sites for indels / snps (optional) |
meta6
|
map
|
Groovy Map containing reference information e.g. [ id:'genome'] |
known_sites_tbi
|
file
|
Tabix index of the known_sites (optional) |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
table
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4/calculatecontamination/main.nf:1Calculates the fraction of reads from cross-sample contamination based on summary tables from getpileupsummaries. Output to be used with filtermutectcalls.
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test' ] |
pileup
|
file
|
File containing the pileups summary table of a tumor sample to be used to calculate contamination. |
matched
|
file
|
File containing the pileups summary table of a normal sample that matches with the tumor sample specified in pileup argument. This is an optional input. |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
contamination
|
- |
val(meta)
|
tuple
|
segmentation
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4/cnnscorevariants/main.nf:1Apply a Convolutional Neural Net to filter annotated variants
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
vcf
|
file
|
VCF file |
tbi
|
file
|
VCF index file |
aligned_input
|
file
|
BAM/CRAM file from alignment (optional) |
intervals
|
file
|
Bed file with the genomic regions included in the library (optional) |
fasta
|
file
|
The reference fasta file |
fai
|
file
|
Index of reference fasta file |
dict
|
file
|
GATK sequence dictionary |
architecture
|
file
|
Neural Net architecture configuration json file (optional) |
weights
|
file
|
Keras model HD5 file with neural net weights. (optional) |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
vcf
|
- |
val(meta)
|
tuple
|
tbi
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4/createsequencedictionary/main.nf:1Creates a sequence dictionary for a reference sequence
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
Input fasta file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
dict
|
file
|
*.{
|
gatk dictionary file |
modules/nf-core/gatk4/estimatelibrarycomplexity/main.nf:1Estimates the numbers of unique molecules in a sequencing library.
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM/SAM file |
fasta
|
file
|
The reference fasta file |
fai
|
file
|
Index of reference fasta file |
dict
|
file
|
GATK sequence dictionary |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
metrics
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4/filtermutectcalls/main.nf:1Filters the raw output of mutect2, can optionally use outputs of calculatecontamination and learnreadorientationmodel to improve filtering.
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test' ] |
vcf
|
file
|
compressed vcf file of mutect2calls |
vcf_tbi
|
file
|
Tabix index of vcf file |
stats
|
file
|
Stats file that pairs with output vcf file |
orientationbias
|
file
|
files containing artifact priors for input vcf. Optional input. |
segmentation
|
file
|
tables containing segmentation information for input vcf. Optional input. |
table
|
file
|
table(s) containing contamination data for input vcf. Optional input, takes priority over estimate. |
estimate
|
float
|
estimation of contamination value as a double. Optional input, will only be used if table is not specified. |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
The reference fasta file |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fai
|
file
|
Index of reference fasta file |
meta4
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
dict
|
file
|
GATK sequence dictionary |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
vcf
|
- |
val(meta)
|
tuple
|
tbi
|
- |
val(meta)
|
tuple
|
stats
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4/filtervarianttranches/main.nf:1Apply tranche filtering
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
vcf
|
file
|
a VCF file containing variants, must have info key:CNN_2D |
tbi
|
file
|
tbi file matching with -vcf |
intervals
|
file
|
Intervals |
resources
|
list
|
resource A VCF containing known SNP and or INDEL sites. Can be supplied as many times as necessary |
resources_index
|
list
|
Index of resource VCF containing known SNP and or INDEL sites. Can be supplied as many times as necessary |
fasta
|
file
|
The reference fasta file |
fai
|
file
|
Index of reference fasta file |
dict
|
file
|
GATK sequence dictionary |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
vcf
|
- |
val(meta)
|
tuple
|
tbi
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4/gatherbqsrreports/main.nf:1Gathers scattered BQSR recalibration reports into a single file
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
table
|
file
|
File(s) containing BQSR table(s) |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
table
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4/gatherpileupsummaries/main.nf:1write your description here
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
pileup
|
file
|
Pileup files from gatk4/getpileupsummaries |
dict
|
file
|
dictionary |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
table
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4/genomicsdbimport/main.nf:1merge GVCFs from multiple samples. For use in joint genotyping or somatic panel of normal creation.
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test'] |
vcf
|
list
|
either a list of vcf files to be used to create or update a genomicsdb, or a file that contains a map to vcf files to be used. |
tbi
|
list
|
list of tbi files that match with the input vcf files |
interval_file
|
file
|
file containing the intervals to be used when creating the genomicsdb |
interval_value
|
string
|
if an intervals file has not been specified, the value entered here will be used as an interval via the "-L" argument |
wspace
|
file
|
path to an existing genomicsdb to be used in update db mode or get intervals mode. This WILL NOT specify name of a new genomicsdb in create db mode. |
run_intlist
|
boolean
|
Specify whether to run get interval list mode, this option cannot be specified at the same time as run_updatewspace. |
run_updatewspace
|
boolean
|
Specify whether to run update genomicsdb mode, this option takes priority over run_intlist. |
input_map
|
boolean
|
Specify whether the vcf input is providing a list of vcf file(s) or a single file containing a map of paths to vcf files to be used to create or update a genomicsdb. |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
- | - |
modules/nf-core/gatk4/genotypegvcfs/main.nf:1Perform joint genotyping on one or more samples pre-called with HaplotypeCaller.
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
gVCF(.gz) file or a GenomicsDB |
gvcf_index
|
file
|
index of gvcf file, or empty when providing GenomicsDB |
intervals
|
file
|
Interval file with the genomic regions included in the library (optional) |
intervals_index
|
file
|
Interval index file (optional) |
meta2
|
map
|
Groovy Map containing fasta information e.g. [ id:'test' ] |
fasta
|
file
|
Reference fasta file |
meta3
|
map
|
Groovy Map containing fai information e.g. [ id:'test' ] |
fai
|
file
|
Reference fasta index file |
meta4
|
map
|
Groovy Map containing dict information e.g. [ id:'test' ] |
dict
|
file
|
Reference fasta sequence dict file |
meta5
|
map
|
Groovy Map containing dbsnp information e.g. [ id:'test' ] |
dbsnp
|
file
|
dbSNP VCF file |
meta6
|
map
|
Groovy Map containing dbsnp tbi information e.g. [ id:'test' ] |
dbsnp_tbi
|
file
|
dbSNP VCF index file |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
vcf
|
- |
val(meta)
|
tuple
|
tbi
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4/getpileupsummaries/main.nf:1Summarizes counts of reads that support reference, alternate and other alleles for given sites. Results can be used with CalculateContamination. Requires a common germline variant sites file, such as from gnomAD.
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test' ] |
input
|
file
|
BAM/CRAM file to be summarised. |
index
|
file
|
Index file for the input BAM/CRAM file. |
intervals
|
file
|
File containing specified sites to be used for the summary. If this option is not specified, variants file is used instead automatically. |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
The reference fasta file |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fai
|
file
|
Index of reference fasta file |
meta4
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
dict
|
file
|
GATK sequence dictionary |
variants
|
file
|
Population vcf of germline sequencing, containing allele fractions. Is also used as sites file if no separate sites file is specified. |
variants_tbi
|
file
|
Index file for the germline resource. |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
table
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4/haplotypecaller/main.nf:1Call germline SNPs and indels via local re-assembly of haplotypes
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM file from alignment |
input_index
|
file
|
BAI/CRAI file from alignment |
intervals
|
file
|
Bed file with the genomic regions included in the library (optional) |
dragstr_model
|
file
|
Text file containing the DragSTR model of the used BAM/CRAM file (optional) |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'test_reference' ] |
fasta
|
file
|
The reference fasta file |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'test_reference' ] |
fai
|
file
|
Index of reference fasta file |
meta4
|
map
|
Groovy Map containing reference information e.g. [ id:'test_reference' ] |
dict
|
file
|
GATK sequence dictionary |
meta5
|
map
|
Groovy Map containing dbsnp information e.g. [ id:'test_dbsnp' ] |
dbsnp
|
file
|
VCF file containing known sites (optional) |
meta6
|
map
|
Groovy Map containing dbsnp information e.g. [ id:'test_dbsnp' ] |
dbsnp_tbi
|
file
|
VCF index of dbsnp (optional) |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
vcf
|
- |
val(meta)
|
tuple
|
tbi
|
- |
val(meta)
|
tuple
|
bam
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4/intervallisttobed/main.nf:1Converts an Picard IntervalList file to a BED file.
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
intervals
|
file
|
IntervalList file |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
- | - |
modules/nf-core/gatk4/learnreadorientationmodel/main.nf:1Uses f1r2 counts collected during mutect2 to Learn the prior probability of read orientation artifacts
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test' ] |
f1r2
|
list
|
list of f1r2 files to be used as input. |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
artifactprior
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4/markduplicates/main.nf:1This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
bam
|
file
|
Sorted BAM file |
fasta
|
file
|
Fasta file |
fasta_fai
|
file
|
Fasta index file |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
cram
|
- |
val(meta)
|
tuple
|
bam
|
- |
val(meta)
|
tuple
|
crai
|
- |
val(meta)
|
tuple
|
bai
|
- |
val(meta)
|
tuple
|
metrics
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4/mergemutectstats/main.nf:1Merges mutect2 stats generated on different intervals/regions
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
stats
|
file
|
Stats file |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
stats
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4/mergevcfs/main.nf:1Merges several vcf files
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test'] |
vcf
|
list
|
Two or more VCF files |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome'] |
dict
|
file
|
Optional Sequence Dictionary as input |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
vcf
|
- |
val(meta)
|
tuple
|
tbi
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4/mutect2/main.nf:1Call somatic SNVs and indels via local assembly of haplotypes.
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test'] |
input
|
list
|
list of BAM files, also able to take CRAM as an input |
input_index
|
list
|
list of BAM file indexes, also able to take CRAM indexes as an input |
intervals
|
file
|
Specify region the tools is run on. |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
The reference fasta file |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fai
|
file
|
Index of reference fasta file |
meta4
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
dict
|
file
|
GATK sequence dictionary |
germline_resource
|
file
|
Population vcf of germline sequencing, containing allele fractions. |
germline_resource_tbi
|
file
|
Index file for the germline resource. |
panel_of_normals
|
file
|
vcf file to be used as a panel of normals. |
panel_of_normals_tbi
|
file
|
Index for the panel of normals. |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
vcf
|
- |
val(meta)
|
tuple
|
tbi
|
- |
val(meta)
|
tuple
|
stats
|
- |
val(meta)
|
tuple
|
f1r2
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4/variantrecalibrator/main.nf:1Build a recalibration model to score variant quality for filtering purposes. It is highly recommended to follow GATK best practices when using this module, the gaussian mixture model requires a large number of samples to be used for the tool to produce optimal results. For example, 30 samples for exome data. For more details see https://gatk.broadinstitute.org/hc/en-us/articles/4402736812443-Which-training-sets-arguments-should-I-use-for-running-VQSR-
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test' ] |
vcf
|
file
|
input vcf file containing the variants to be recalibrated |
tbi
|
file
|
tbi file matching with -vcf |
resource_vcf
|
file
|
all resource vcf files that are used with the corresponding '--resource' label |
resource_tbi
|
file
|
all resource tbi files that are used with the corresponding '--resource' label |
labels
|
string
|
necessary arguments for GATK VariantRecalibrator. Specified to directly match the resources provided. More information can be found at https://gatk.broadinstitute.org/hc/en-us/articles/5358906115227-VariantRecalibrator |
fasta
|
file
|
The reference fasta file |
fai
|
file
|
Index of reference fasta file |
dict
|
file
|
GATK sequence dictionary |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
recal
|
- |
val(meta)
|
tuple
|
idx
|
- |
val(meta)
|
tuple
|
tranches
|
- |
val(meta)
|
tuple
|
plots
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4spark/applybqsr/main.nf:1Apply base quality score recalibration (BQSR) to a bam file
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM file from alignment |
input_index
|
file
|
BAI/CRAI file from alignment |
bqsr_table
|
file
|
Recalibration table from gatk4_baserecalibrator |
intervals
|
file
|
Bed file with the genomic regions included in the library (optional) |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bam
|
file
|
${
|
Recalibrated BAM file |
bai
|
file
|
${
|
Recalibrated BAM index file |
cram
|
file
|
${
|
Recalibrated CRAM file |
modules/nf-core/gatk4spark/baserecalibrator/main.nf:1Generate recalibration table for Base Quality Score Recalibration (BQSR)
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM file from alignment |
input_index
|
file
|
BAI/CRAI file from alignment |
intervals
|
file
|
Bed file with the genomic regions included in the library (optional) |
fasta
|
file
|
The reference fasta file |
fai
|
file
|
Index of reference fasta file |
dict
|
file
|
GATK sequence dictionary |
known_sites
|
file
|
VCF files with known sites for indels / snps (optional) |
known_sites_tbi
|
file
|
Tabix index of the known_sites (optional) |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
table
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/gatk4spark/markduplicates/main.nf:1This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
Tools
Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
bam
|
file
|
Sorted BAM file |
fasta
|
file
|
The reference fasta file |
fasta_fai
|
file
|
Index of reference fasta file |
dict
|
file
|
GATK sequence dictionary |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
- | - |
modules/nf-core/gawk/main.nf:1If you are like many computer users, you would frequently like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest. The job is easy with awk, especially the GNU implementation gawk.
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
The input file - Specify the logic that needs to be executed on this file on the |
program_file
|
file
|
Optional file containing logic for awk to execute. If you don't wish to use a file, you can use |
disable_redirect_output
|
boolean
|
Disable the redirection of awk output to a given file. This is useful if you want to use awk's built-in redirect to write files instead of the shell's redirect. |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
- | - |
modules/nf-core/goleft/indexcov/main.nf:1Quickly estimate coverage from a whole-genome bam or cram index. A bam index has 16KB resolution so that's what this gives, but it provides what appears to be a high-quality coverage estimate in seconds per genome.
Tools
goleft is a collection of bioinformatics tools distributed under MIT license in a single static binary
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false] |
bams
|
file
|
Sorted BAM/CRAM/SAM files |
indexes
|
file
|
BAI/CRAI files |
meta2
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false] |
fai
|
file
|
FASTA index |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
- | - |
modules/nf-core/gunzip/main.nf:1Compresses and decompresses files.
Tools
gzip is a file format and a software application used for file compression and decompression.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Optional groovy Map containing meta information e.g. [ id:'test', single_end:false ] |
archive
|
file
|
File to be compressed/uncompressed |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
- | - |
modules/nf-core/lofreq/callparallel/main.nf:1It predicts variants using multiple processors
Tools
Lofreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It's call-parallel programme predicts variants using multiple processors
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test' ] |
bam
|
file
|
Tumor sample sorted BAM file |
bai
|
file
|
BAM index file |
intervals
|
file
|
BED file containing target regions for variant calling |
meta2
|
map
|
Groovy Map containing sample information about the reference fasta e.g. [ id:'reference' ] |
fasta
|
file
|
Reference genome FASTA file |
meta3
|
map
|
Groovy Map containing sample information about the reference fasta fai e.g. [ id:'reference' ] |
fai
|
file
|
Reference genome FASTA index file |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
vcf
|
- |
val(meta)
|
tuple
|
tbi
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/manta/germline/main.nf:1Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
Tools
Structural variant and indel caller for mapped sequencing data
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM/SAM file. For joint calling use a list of files. |
index
|
file
|
BAM/CRAM/SAM index file. For joint calling use a list of files. |
target_bed
|
file
|
BED file containing target regions for variant calling |
target_bed_tbi
|
file
|
Index for BED file containing target regions for variant calling |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
Genome reference FASTA file |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fai
|
file
|
Genome reference FASTA index file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
candidate_small_indels_vcf
|
file
|
*.{
|
Gzipped VCF file containing variants |
candidate_small_indels_vcf_tbi
|
file
|
*.{
|
Index for gzipped VCF file containing variants |
candidate_sv_vcf
|
file
|
*.{
|
Gzipped VCF file containing variants |
candidate_sv_vcf_tbi
|
file
|
*.{
|
Index for gzipped VCF file containing variants |
diploid_sv_vcf
|
file
|
*.{
|
Gzipped VCF file containing variants |
diploid_sv_vcf_tbi
|
file
|
*.{
|
Index for gzipped VCF file containing variants |
modules/nf-core/manta/somatic/main.nf:1Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
Tools
Structural variant and indel caller for mapped sequencing data
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input_normal
|
file
|
BAM/CRAM/SAM file |
input_index_normal
|
file
|
BAM/CRAM/SAM index file |
input_tumor
|
file
|
BAM/CRAM/SAM file |
input_index_tumor
|
file
|
BAM/CRAM/SAM index file |
target_bed
|
file
|
BED file containing target regions for variant calling |
target_bed_tbi
|
file
|
Index for BED file containing target regions for variant calling |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
Genome reference FASTA file |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fai
|
file
|
Genome reference FASTA index file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
candidate_small_indels_vcf
|
file
|
*.{
|
Gzipped VCF file containing variants |
candidate_small_indels_vcf_tbi
|
file
|
*.{
|
Index for gzipped VCF file containing variants |
candidate_sv_vcf
|
file
|
*.{
|
Gzipped VCF file containing variants |
candidate_sv_vcf_tbi
|
file
|
*.{
|
Index for gzipped VCF file containing variants |
diploid_sv_vcf
|
file
|
*.{
|
Gzipped VCF file containing variants |
diploid_sv_vcf_tbi
|
file
|
*.{
|
Index for gzipped VCF file containing variants |
somatic_sv_vcf
|
file
|
*.{
|
Gzipped VCF file containing variants |
somatic_sv_vcf_tbi
|
file
|
*.{
|
Index for gzipped VCF file containing variants |
modules/nf-core/manta/tumoronly/main.nf:1Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs.
Tools
Structural variant and indel caller for mapped sequencing data
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM/SAM file |
input_index
|
file
|
BAM/CRAM/SAM index file |
target_bed
|
file
|
BED file containing target regions for variant calling |
target_bed_tbi
|
file
|
Index for BED file containing target regions for variant calling |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
Genome reference FASTA file |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fai
|
file
|
Genome reference FASTA index file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
candidate_small_indels_vcf
|
file
|
*.{
|
Gzipped VCF file containing variants |
candidate_small_indels_vcf_tbi
|
file
|
*.{
|
Index for gzipped VCF file containing variants |
candidate_sv_vcf
|
file
|
*.{
|
Gzipped VCF file containing variants |
candidate_sv_vcf_tbi
|
file
|
*.{
|
Index for gzipped VCF file containing variants |
tumor_sv_vcf
|
file
|
*.{
|
Gzipped VCF file containing variants |
tumor_sv_vcf_tbi
|
file
|
*.{
|
Index for gzipped VCF file containing variants |
modules/nf-core/mosdepth/main.nf:1Calculates genome-wide sequencing coverage.
Tools
Fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
bam
|
file
|
Input BAM/CRAM file |
bai
|
file
|
Index for BAM/CRAM file |
bed
|
file
|
BED file with intersected intervals |
meta2
|
map
|
Groovy Map containing bed information e.g. [ id:'test' ] |
fasta
|
file
|
Reference genome FASTA file |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
global_txt
|
- |
val(meta)
|
tuple
|
summary_txt
|
- |
val(meta)
|
tuple
|
regions_txt
|
- |
val(meta)
|
tuple
|
per_base_d4
|
- |
val(meta)
|
tuple
|
per_base_bed
|
- |
val(meta)
|
tuple
|
per_base_csi
|
- |
val(meta)
|
tuple
|
regions_bed
|
- |
val(meta)
|
tuple
|
regions_csi
|
- |
val(meta)
|
tuple
|
quantized_bed
|
- |
val(meta)
|
tuple
|
quantized_csi
|
- |
val(meta)
|
tuple
|
thresholds_bed
|
- |
val(meta)
|
tuple
|
thresholds_csi
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/msisensor2/msi/main.nf:1msisensor2 detection of MSI regions.
Tools
MSIsensor2 is a novel algorithm based machine learning, featuring a large upgrade in the microsatellite instability (MSI) detection for tumor only sequencing data, including Cell-Free DNA (cfDNA), Formalin-Fixed Paraffin-Embedded(FFPE) and other sample types. The original MSIsensor is specially designed for tumor/normal paired sequencing data.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
tumor_bam
|
file
|
BAM/CRAM/SAM file |
tumor_bam_index
|
file
|
BAM/CRAM/SAM index file |
meta2
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
models
|
file
|
Folder of MSISensor2 models (available from Github or as a product of msisensor2/scan) |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
msi
|
file
|
- |
MSI classifications as a text file |
distribution
|
file
|
- |
Read count distributions of MSI regions |
somatic
|
file
|
- |
Somatic MSI regions detected. |
modules/nf-core/msisensorpro/msisomatic/main.nf:1MSIsensor-pro evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input
Tools
Microsatellite Instability (MSI) detection using high-throughput sequencing data.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
normal
|
file
|
BAM/CRAM/SAM file |
normal_index
|
file
|
BAM/CRAM/SAM index file |
tumor
|
file
|
BAM/CRAM/SAM file |
tumor_index
|
file
|
BAM/CRAM/SAM index file |
intervals
|
file
|
bed file containing interval information, optional |
meta2
|
map
|
Groovy Map containing genome information e.g. [ id:'genome' ] |
fasta
|
file
|
Reference genome |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
output_report
|
file
|
- |
File containing final report with all detected microsatellites, unstable somatic microsatellites, msi score |
output_dis
|
file
|
- |
File containing distribution results |
output_germline
|
file
|
- |
File containing germline results |
output_somatic
|
file
|
- |
File containing somatic results |
modules/nf-core/msisensorpro/scan/main.nf:1MSIsensor-pro evaluates Microsatellite Instability (MSI) for cancer patients with next generation sequencing data. It accepts the whole genome sequencing, whole exome sequencing and target region (panel) sequencing data as input
Tools
Microsatellite Instability (MSI) detection using high-throughput sequencing data.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
fasta
|
file
|
Reference genome |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
list
|
file
|
*.{
|
File containing microsatellite list |
modules/nf-core/multiqc/main.nf:1Aggregate results from bioinformatics analyses across many samples into a single report
Tools
MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
report
|
-
|
- | - |
data
|
-
|
- | - |
plots
|
-
|
- | - |
modules/nf-core/muse/call/main.nf:1pre-filtering and calculating position-specific summary statistics using the Markov substitution model
Tools
Somatic point mutation caller based on Markov substitution model for molecular evolution
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information
e.g. |
tumor_bam
|
file
|
Sorted tumor BAM file |
tumor_bai
|
file
|
Index file for the tumor BAM file |
normal_bam
|
file
|
Sorted matched normal BAM file |
normal_bai
|
file
|
Index file for the normal BAM file |
meta2
|
map
|
Groovy Map containing reference information.
e.g. |
reference
|
file
|
reference genome file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
txt
|
file
|
*.MuSE.txt
|
position-specific summary statistics |
modules/nf-core/muse/sump/main.nf:1Computes tier-based cutoffs from a sample-specific error model which is generated by muse/call and reports the finalized variants
Tools
Somatic point mutation caller based on Markov substitution model for molecular evolution
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information
e.g. |
muse_call_txt
|
file
|
single input file generated by 'MuSE call' |
meta2
|
map
|
Groovy Map containing reference information.
e.g. |
ref_vcf
|
file
|
dbSNP vcf file that should be bgzip compressed, tabix indexed and based on the same reference genome used in 'MuSE call' |
ref_vcf_tbi
|
file
|
Tabix index for the dbSNP vcf file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
map
|
*.vcf.gz
|
bgzipped vcf file with called variants |
tbi
|
map
|
*.vcf.gz.tbi
|
tabix index of bgzipped vcf file with called variants |
modules/nf-core/ngscheckmate/ncm/main.nf:1Determining whether sequencing data comes from the same individual by using SNP matching. Designed for humans on vcf or bam files.
Tools
NGSCheckMate is a software package for identifying next generation sequencing (NGS) data files from the same individual, including matching between DNA and RNA.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test'] |
files
|
file
|
VCF or BAM files for each sample, in a merged channel (possibly gzipped). BAM files require an index too. |
meta2
|
map
|
Groovy Map containing SNP information e.g. [ id:'test' ] |
snp_bed
|
file
|
BED file containing the SNPs to analyse |
meta3
|
map
|
Groovy Map containing reference fasta index information e.g. [ id:'test' ] |
fasta
|
file
|
fasta file for the genome, only used in the bam mode |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
corr_matrix
|
- |
val(meta)
|
tuple
|
matched
|
- |
val(meta)
|
tuple
|
all
|
- |
val(meta)
|
tuple
|
pdf
|
- |
val(meta)
|
tuple
|
vcf
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/parabricks/fq2bam/main.nf:1NVIDIA Clara Parabricks GPU-accelerated alignment, sorting, BQSR calculation, and duplicate marking. Note this nf-core module requires files to be copied into the working directory and not symlinked.
Tools
NVIDIA Clara Parabricks GPU-accelerated genomics tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
reads
|
file
|
fastq.gz files |
meta2
|
map
|
Groovy Map containing fasta information |
fasta
|
file
|
reference fasta file - must be unzipped |
meta3
|
map
|
Groovy Map containing index information |
index
|
file
|
reference BWA index |
meta4
|
map
|
Groovy Map containing index information |
interval_file
|
file
|
(optional) file(s) containing genomic intervals for use in base quality score recalibration (BQSR) |
meta5
|
map
|
Groovy Map containing known sites information |
known_sites
|
file
|
(optional) known sites file(s) for calculating BQSR. markdups must be true to perform BQSR. |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bam
|
file
|
*.bam
|
Sorted BAM file |
bai
|
file
|
*.bai
|
index corresponding to sorted BAM file |
cram
|
file
|
*.cram
|
Sorted CRAM file |
crai
|
file
|
*.crai
|
index corresponding to sorted CRAM file |
bqsr_table
|
file
|
*.table
|
(optional) table from base quality score recalibration calculation, to be used with parabricks/applybqsr |
qc_metrics
|
directory
|
*_qc_metrics
|
(optional) optional directory of qc metrics |
duplicate_metrics
|
file
|
*.duplicate-
|
(optional) metrics calculated from marking duplicates in the bam file |
compatible_versions
|
-
|
- | - |
modules/nf-core/rbt/vcfsplit/main.nf:1A tool for splitting VCF/BCF files into N equal chunks, including BND support
Tools
A growing collection of fast and secure command line utilities for dealing with NGS data implemented on top of Rust-Bio.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information
e.g. |
vcf
|
file
|
VCF file with variants to be split |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bcfchunks
|
file
|
*.bcf
|
Chunks of the input VCF file, split into |
modules/nf-core/samtools/bam2fq/main.nf:1The module uses bam2fq method from samtools to convert a SAM, BAM or CRAM file to FASTQ format
Tools
Tools for dealing with SAM, BAM and CRAM files
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
inputbam
|
file
|
BAM/CRAM/SAM file |
split
|
boolean
|
TRUE/FALSE value to indicate if reads should be separated into /1, /2 and if present other, or singleton. Note: choosing TRUE will generate 4 different files. Choosing FALSE will produce a single file, which will be interleaved in case the input contains paired reads. |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
reads
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/samtools/collatefastq/main.nf:1The module uses collate and then fastq methods from samtools to convert a SAM, BAM or CRAM file to FASTQ format
Tools
Tools for dealing with SAM, BAM and CRAM files
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM/SAM file |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'test' ] |
fasta
|
file
|
Reference genome fasta file |
interleave
|
boolean
|
If true, the output is a single interleaved paired-end FASTQ If false, the output split paired-end FASTQ |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
- | - |
modules/nf-core/samtools/convert/main.nf:1convert and then index CRAM -> BAM or BAM -> CRAM file
Tools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM file |
index
|
file
|
BAM/CRAM index file |
meta2
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
fasta
|
file
|
Reference file to create the CRAM file |
meta3
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
fai
|
file
|
Reference index file to create the CRAM file |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
bam
|
- |
val(meta)
|
tuple
|
cram
|
- |
val(meta)
|
tuple
|
bai
|
- |
val(meta)
|
tuple
|
crai
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/samtools/faidx/main.nf:1Index FASTA file, and optionally generate a file of chromosome sizes
Tools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing reference information e.g. [ id:'test' ] |
fasta
|
file
|
FASTA file |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'test' ] |
fai
|
file
|
FASTA index file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
fa
|
file
|
*.{
|
FASTA file |
sizes
|
file
|
*.{
|
File containing chromosome lengths |
fai
|
file
|
*.{
|
FASTA index file |
gzi
|
file
|
*.gzi
|
Optional gzip index file for compressed inputs |
modules/nf-core/samtools/index/main.nf:1Index SAM/BAM/CRAM file
Tools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
input file |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
bai
|
- |
val(meta)
|
tuple
|
csi
|
- |
val(meta)
|
tuple
|
crai
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/samtools/merge/main.nf:1Merge BAM or CRAM file
Tools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input_files
|
file
|
BAM/CRAM file |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
Reference file the CRAM was created with (optional) |
meta3
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fai
|
file
|
Index of the reference file the CRAM was created with (optional) |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
- | - |
modules/nf-core/samtools/mpileup/main.nf:1BAM
Tools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test' ] |
input
|
file
|
BAM/CRAM/SAM file |
intervals
|
file
|
Interval FILE |
meta2
|
map
|
Groovy Map containing sample information e.g. [ id:'test' ] |
fasta
|
file
|
FASTA reference file |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
mpileup
|
- |
versions.yml
|
path
|
versions
|
- |
modules/local/samtools/reindex_bam/main.nf:5The aim of this process is to re-index the bam file without the duplicate, supplementary, unmapped etc, for goleft/indexcov It creates a BAM containing only a header (so indexcov can get the sample name) and a BAM index were low quality reads, supplementary etc, have been removed
Inputs
| Name | Type | Description |
|---|---|---|
val(meta)
|
tuple
|
- |
val(meta2)
|
tuple
|
- |
val(meta3)
|
tuple
|
- |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
- | - |
modules/nf-core/samtools/stats/main.nf:1Produces comprehensive statistics from SAM/BAM/CRAM file
Tools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM file from alignment |
input_index
|
file
|
BAI/CRAI file from alignment |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'genome' ] |
fasta
|
file
|
Reference file the CRAM was created with (optional) |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
stats
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/samtools/view/main.nf:1filter/convert SAM/BAM/CRAM file
Tools
SAMtools is a set of utilities for interacting with and post-processing short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. These files are generated as output by short read aligners like BWA.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM/SAM file |
index
|
file
|
BAM.BAI/BAM.CSI/CRAM.CRAI file (optional) |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'test' ] |
fasta
|
file
|
Reference file the CRAM was created with (optional) |
qname
|
file
|
Optional file with read names to output only select alignments |
index_format
|
string
|
Index format, used together with ext.args = '--write-index' |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
- | - |
modules/nf-core/sentieon/applyvarcal/main.nf:1Apply a score cutoff to filter variants based on a recalibration table. Sentieon's Aplyvarcal performs the second pass in a two-stage process called Variant Quality Score Recalibration (VQSR). Specifically, it applies filtering to the input variants based on the recalibration table produced in the previous step VarCal and a target sensitivity value. https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm
Tools
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test'] |
vcf
|
file
|
VCF file to be recalibrated, this should be the same file as used for the first stage VariantRecalibrator. |
vcf_tbi
|
file
|
tabix index for the input vcf file. |
recal
|
file
|
Recalibration file produced when the input vcf was run through VariantRecalibrator in stage 1. |
recal_index
|
file
|
Index file for the recalibration file. |
tranches
|
file
|
Tranches file produced when the input vcf was run through VariantRecalibrator in stage 1. |
meta2
|
map
|
Groovy Map containing sample information e.g. [ id:'test'] |
fasta
|
file
|
The reference fasta file |
meta3
|
map
|
Groovy Map containing sample information e.g. [ id:'test'] |
fai
|
file
|
Index of reference fasta file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.vcf.gz
|
compressed vcf file containing the recalibrated variants. |
tbi
|
file
|
*vcf.gz.tbi
|
Index of recalibrated vcf file. |
modules/nf-core/sentieon/bwamem/main.nf:1Performs fastq alignment to a fasta reference using Sentieon's BWA MEM
Tools
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing reference information. e.g. [ id:'test', single_end:false ] |
reads
|
file
|
Genome fastq files (single-end or paired-end) |
meta2
|
map
|
Groovy Map containing reference information. e.g. [ id:'test', single_end:false ] |
index
|
file
|
BWA genome index files |
meta3
|
map
|
Groovy Map containing reference information. e.g. [ id:'test', single_end:false ] |
fasta
|
file
|
Genome fasta file |
meta4
|
map
|
Groovy Map containing reference information. e.g. [ id:'test', single_end:false ] |
fasta_fai
|
file
|
The index of the FASTA reference. |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bam_and_bai
|
file
|
*.{
|
BAM file with corresponding index. BAM file with corresponding index. |
modules/nf-core/sentieon/dedup/main.nf:1Runs the sentieon tool LocusCollector followed by Dedup. LocusCollector collects read information that is used by Dedup which in turn marks or removes duplicate reads.
Tools
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing reference information. e.g. [ id:'test', single_end:false ] |
bam
|
file
|
BAM file. |
bai
|
file
|
BAI file |
meta2
|
map
|
Groovy Map containing reference information. e.g. [ id:'test', single_end:false ] |
fasta
|
file
|
Genome fasta file |
meta3
|
map
|
Groovy Map containing reference information. e.g. [ id:'test', single_end:false ] |
fasta_fai
|
file
|
The index of the FASTA reference. |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
cram
|
file
|
*.cram
|
CRAM file |
crai
|
file
|
*.crai
|
CRAM index file |
bam
|
file
|
*.bam
|
BAM file. |
bai
|
file
|
*.bai
|
BAI file |
score
|
file
|
*.score
|
The score file indicates which reads LocusCollector finds are likely duplicates. |
metrics
|
file
|
*.metrics
|
Output file containing Dedup metrics incl. histogram data. |
metrics_multiqc_tsv
|
file
|
*.metrics.multiqc.tsv
|
Output tsv-file containing Dedup metrics excl. histogram data. |
modules/nf-core/sentieon/dnamodelapply/main.nf:1modifies the input VCF file by adding the MLrejected FILTER to the variants
Tools
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information
e.g. |
vcf
|
file
|
INPUT VCF file |
idx
|
file
|
Index of the input VCF file |
meta2
|
map
|
Groovy Map containing reference information
e.g. |
fasta
|
file
|
Genome fasta file |
meta3
|
map
|
Groovy Map containing reference information
e.g. |
fai
|
file
|
Index of the genome fasta file |
meta4
|
map
|
Groovy Map containing reference information
e.g. |
ml_model
|
file
|
machine learning model file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.{
|
INPUT VCF file |
tbi
|
file
|
*.{
|
Index of the input VCF file |
modules/nf-core/sentieon/dnascope/main.nf:1DNAscope algorithm performs an improved version of Haplotype variant calling.
Tools
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information. e.g. [ id:'test', single_end:false ] |
bam
|
file
|
BAM file. |
bai
|
file
|
BAI file |
intervals
|
file
|
bed or interval_list file containing interval in the reference that will be used in the analysis |
meta2
|
map
|
Groovy Map containing meta information for fasta. |
fasta
|
file
|
Genome fasta file |
meta3
|
map
|
Groovy Map containing meta information for fasta index. |
fai
|
file
|
Index of the genome fasta file |
meta4
|
map
|
Groovy Map containing meta information for dbsnp. |
dbsnp
|
file
|
Single Nucleotide Polymorphism database (dbSNP) file |
meta5
|
map
|
Groovy Map containing meta information for dbsnp_tbi. |
dbsnp_tbi
|
file
|
Index of the Single Nucleotide Polymorphism database (dbSNP) file |
meta6
|
map
|
Groovy Map containing meta information for machine learning model for Dnascope. |
ml_model
|
file
|
machine learning model file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.unfiltered.vcf.gz
|
Compressed VCF file |
vcf_tbi
|
file
|
*.unfiltered.vcf.gz.tbi
|
Index of VCF file |
gvcf
|
file
|
*.g.vcf.gz
|
Compressed GVCF file |
gvcf_tbi
|
file
|
*.g.vcf.gz.tbi
|
Index of GVCF file |
modules/nf-core/sentieon/gvcftyper/main.nf:1Perform joint genotyping on one or more samples pre-called with Sentieon's Haplotyper.
Tools
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
gvcfs
|
file
|
gVCF(.gz) file |
tbis
|
file
|
index of gvcf file |
intervals
|
file
|
Interval file with the genomic regions included in the library (optional) |
meta1
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
fasta
|
file
|
Reference fasta file |
meta2
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
fai
|
file
|
Reference fasta index file |
meta3
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
dbsnp
|
file
|
dbSNP VCF file |
meta4
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
dbsnp_tbi
|
file
|
dbSNP VCF index file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf_gz
|
file
|
*.vcf.gz
|
VCF file |
vcf_gz_tbi
|
file
|
*.vcf.gz.tbi
|
VCF index file |
modules/nf-core/sentieon/haplotyper/main.nf:1Runs Sentieon's haplotyper for germline variant calling.
Tools
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing reference information. e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM file from alignment |
input_index
|
file
|
BAI/CRAI file from alignment |
intervals
|
file
|
Bed file with the genomic regions included in the library (optional) |
recal_table
|
file
|
Recalibration table from sentieon/qualcal (optional) |
meta2
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
fasta
|
file
|
Genome fasta file |
meta3
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
fai
|
file
|
The index of the FASTA reference. |
meta4
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
dbsnp
|
file
|
VCF file containing known sites (optional) |
meta5
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
dbsnp_tbi
|
file
|
VCF index of dbsnp (optional) |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.unfiltered.vcf.gz
|
Compressed VCF file |
vcf_tbi
|
file
|
*.unfiltered.vcf.gz.tbi
|
Index of VCF file |
gvcf
|
file
|
*.g.vcf.gz
|
Compressed GVCF file |
gvcf_tbi
|
file
|
*.g.vcf.gz.tbi
|
Index of GVCF file |
modules/nf-core/sentieon/tnscope/main.nf:1TNscope algorithm performs somatic variant calling on the tumor-normal matched pair or the tumor only data, using a Haplotyper algorithm.
Tools
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information. e.g. [ id:'test' ] |
input
|
file
|
One or more BAM or CRAM files. |
input_index
|
file
|
Indices for the input files |
intervals
|
file
|
bed or interval_list file containing interval in the reference that will be used in the analysis. Only recommended for large WGS data, else the overhead may not be worth the additional parallelisation. |
meta2
|
map
|
Groovy Map containing reference information. e.g. [ id:'test' ] |
fasta
|
file
|
Genome fasta file |
meta3
|
map
|
Groovy Map containing reference information. e.g. [ id:'test' ] |
fai
|
file
|
Index of the genome fasta file |
meta4
|
map
|
Groovy Map containing reference information. e.g. [ id:'test' ] |
dbsnp
|
file
|
Single Nucleotide Polymorphism database (dbSNP) file |
meta5
|
map
|
Groovy Map containing reference information. e.g. [ id:'test' ] |
dbsnp_tbi
|
file
|
Index of the Single Nucleotide Polymorphism database (dbSNP) file |
meta6
|
map
|
Groovy Map containing reference information. e.g. [ id:'test' ] |
pon
|
file
|
Single Nucleotide Polymorphism database (dbSNP) file |
meta7
|
map
|
Groovy Map containing reference information. e.g. [ id:'test' ] |
pon_tbi
|
file
|
Index of the Single Nucleotide Polymorphism database (dbSNP) file |
meta8
|
map
|
Groovy Map containing reference information. e.g. [ id:'test' ] |
cosmic
|
file
|
Catalogue of Somatic Mutations in Cancer (COSMIC) VCF file. |
meta9
|
map
|
Groovy Map containing reference information. e.g. [ id:'test' ] |
cosmic_tbi
|
file
|
Index of the Catalogue of Somatic Mutations in Cancer (COSMIC) VCF file. |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.{
|
VCF file |
index
|
file
|
*.vcf.gz.tbi
|
Index of the VCF file |
modules/nf-core/sentieon/varcal/main.nf:1Module for Sentieons VarCal. The VarCal algorithm calculates the Variant Quality Score Recalibration (VQSR). VarCal builds a recalibration model for scoring variant quality. https://support.sentieon.com/manual/usages/general/#varcal-algorithm
Tools
Sentieon® provides complete solutions for secondary DNA/RNA analysis for a variety of sequencing platforms, including short and long reads. Our software improves upon BWA, STAR, Minimap2, GATK, HaplotypeCaller, Mutect, and Mutect2 based pipelines and is deployable on any generic-CPU-based computing system.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test' ] |
vcf
|
file
|
input vcf file containing the variants to be recalibrated |
tbi
|
file
|
tbi file matching with -vcf |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
recal
|
file
|
*.recal
|
Output recal file used by ApplyVQSR |
idx
|
file
|
*.idx
|
Index file for the recal output file |
tranches
|
file
|
*.tranches
|
Output tranches file used by ApplyVQSR |
plots
|
file
|
*plots.R
|
Optional output rscript file to aid in visualization of the input data and learned model. |
modules/nf-core/snpeff/download/main.nf:1Genetic variant annotation and functional effect prediction toolbox
Tools
SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
snpeff_db
|
string
|
SnpEff database name |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
cache
|
file
|
- |
snpEff cache |
modules/nf-core/snpeff/snpeff/main.nf:1Genetic variant annotation and functional effect prediction toolbox
Tools
SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants on genes and proteins (such as amino acid changes).
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
vcf
|
file
|
vcf to annotate |
meta2
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
cache
|
file
|
path to snpEff cache (optional) |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.ann.vcf
|
annotated vcf |
report
|
file
|
*.csv
|
snpEff report csv file |
summary_html
|
file
|
*.html
|
snpEff summary statistics in html file |
genes_txt
|
file
|
*.genes.txt
|
txt (tab separated) file having counts of the number of variants affecting each transcript and gene |
modules/nf-core/spring/decompress/main.nf:1Fast, efficient, lossless decompression of FASTQ files.
Tools
SPRING is a compression tool for Fastq files (containing up to 4.29 Billion reads)
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
spring
|
file
|
Spring file to decompress. |
write_one_fastq_gz
|
boolean
|
Controls whether spring should write one fastq.gz file with reads from both directions or two fastq.gz files with reads from distinct directions |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
fastq
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/strelka/germline/main.nf:1Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation
Tools
Strelka calls somatic and germline small variants from mapped sequencing reads
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test'] |
input
|
file
|
BAM/CRAM file |
input_index
|
file
|
BAM/CRAI index file |
target_bed
|
file
|
BED file containing target regions for variant calling |
target_bed_index
|
file
|
Index for BED file containing target regions for variant calling |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.{
|
gzipped germline variant file |
vcf_tbi
|
file
|
*.vcf.gz.tbi
|
index file for the vcf file |
genome_vcf
|
file
|
*_genome.vcf.gz
|
variant records and compressed non-variant blocks |
genome_vcf_tbi
|
file
|
*_genome.vcf.gz.tbi
|
index file for the genome_vcf file |
modules/nf-core/strelka/somatic/main.nf:1Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs
Tools
Strelka calls somatic and germline small variants from mapped sequencing reads
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input_normal
|
file
|
BAM/CRAM/SAM file |
input_index_normal
|
file
|
BAM/CRAM/SAM index file |
input_tumor
|
file
|
BAM/CRAM/SAM file |
input_index_tumor
|
file
|
BAM/CRAM/SAM index file |
manta_candidate_small_indels
|
file
|
VCF.gz file |
manta_candidate_small_indels_tbi
|
file
|
VCF.gz index file |
target_bed
|
file
|
BED file containing target regions for variant calling |
target_bed_index
|
file
|
Index for BED file containing target regions for variant calling |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf_indels
|
file
|
*.{
|
Gzipped VCF file containing variants |
vcf_indels_tbi
|
file
|
*.{
|
Index for gzipped VCF file containing variants |
vcf_snvs
|
file
|
*.{
|
Gzipped VCF file containing variants |
vcf_snvs_tbi
|
file
|
*.{
|
Index for gzipped VCF file containing variants |
modules/nf-core/svdb/merge/main.nf:1The merge module merges structural variants within one or more vcf files.
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test' ] |
vcfs
|
list
|
One or more VCF files. The order and number of files should correspond to
the order and number of tags in the |
input_priority
|
list
|
Prioritize the input VCF files according to this list,
e.g ['tiddit','cnvnator']. The order and number of tags should correspond to
the order and number of VCFs in the |
sort_inputs
|
boolean
|
Should the input files be sorted by name. The priority tag will be sorted together with it's corresponding VCF file. |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
- | - |
modules/nf-core/tabix/bgziptabix/main.nf:1bgzip a sorted tab-delimited genome file and then create tabix index
Tools
Generic indexer for TAB-delimited genome position files.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
Sorted tab-delimited genome file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
gz_tbi
|
file
|
*.gz,
|
bgzipped tab-delimited genome file tabix index file |
gz_csi
|
file
|
*.gz,
|
bgzipped tab-delimited genome file csi index file |
modules/nf-core/tabix/tabix/main.nf:1create tabix index from a sorted bgzip tab-delimited genome file
Tools
Generic indexer for TAB-delimited genome position files.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
tab
|
file
|
TAB-delimited genome position file compressed with bgzip |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
tbi
|
file
|
*.{
|
tabix index file |
csi
|
file
|
*.{
|
coordinate sorted index file |
modules/nf-core/tiddit/sv/main.nf:1Identify chromosomal rearrangements.
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
input
|
file
|
BAM/CRAM file |
input_index
|
file
|
BAM/CRAM index file |
meta2
|
map
|
Groovy Map containing sample information
e.g. |
fasta
|
file
|
Input FASTA file |
meta3
|
map
|
Groovy Map containing sample information from bwa index
e.g. |
bwa_index
|
file
|
BWA genome index files |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
vcf
|
- |
val(meta)
|
tuple
|
ploidy
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/untar/main.nf:1Extract files.
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
archive
|
file
|
File to be untar |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
untar
|
map
|
*/
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
modules/nf-core/unzip/main.nf:1Unzip ZIP archive files
Tools
p7zip is a quick port of 7z.exe and 7za.exe (command line version of 7zip, see www.7-zip.org) for Unix.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
archive
|
file
|
ZIP file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
unzipped_archive
|
directory
|
${
|
Directory contents of the unzipped archive |
modules/nf-core/varlociraptor/callvariants/main.nf:1Call variants for a given scenario specified with the varlociraptor calling grammar, preprocessed by varlociraptor preprocessing
Tools
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
vcfs
|
file
|
Sorted VCF/BCF file containing sample observations, Can also be a list of files |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bcf
|
file
|
*.bcf
|
BCF file containing sample observations |
modules/nf-core/varlociraptor/estimatealignmentproperties/main.nf:1In order to judge about candidate indel and structural variants, Varlociraptor needs to know about certain properties of the underlying sequencing experiment in combination with the used read aligner.
Tools
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
bam
|
file
|
Sorted BAM/CRAM/SAM file |
bai
|
file
|
Index of sorted BAM/CRAM/SAM file |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'test', single_end:false ] |
fasta
|
file
|
Reference fasta file |
meta3
|
map
|
Groovy Map containing reference index information e.g. [ id:'test', single_end:false ] |
fai
|
file
|
Index for reference fasta file (must be with samtools index) |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
alignment_properties_json
|
file
|
*.alignment-
|
File containing alignment properties |
modules/nf-core/varlociraptor/preprocess/main.nf:1Obtains per-sample observations for the actual calling process with varlociraptor calls
Tools
Flexible, uncertainty-aware variant calling with parameter free filtration via FDR control.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
bam
|
file
|
Sorted BAM/CRAM/SAM file |
bai
|
file
|
Index of the BAM/CRAM/SAM file |
candidates
|
file
|
Sorted BCF/VCF file |
alignment_json
|
file
|
File containing alignment properties obtained with varlociraptor/estimatealignmentproperties |
meta2
|
map
|
Groovy Map containing reference information e.g. [ id:'test', single_end:false ] |
fasta
|
file
|
Reference fasta file |
meta3
|
map
|
Groovy Map containing reference index information e.g. [ id:'test', single_end:false ] |
fai
|
file
|
Index for reference fasta file (must be with samtools index) |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
bcf
|
file
|
*.bcf
|
BCF file containing sample observations |
modules/nf-core/vcflib/vcffilter/main.nf:1Command line tools for parsing and manipulating VCF files.
Tools
Command line tools for parsing and manipulating VCF files.
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test_sample_1' ] |
vcf
|
file
|
VCF file |
tbi
|
file
|
Index file |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
vcf
|
file
|
*.{
|
Filtered VCF file |
modules/nf-core/vcftools/main.nf:1A set of tools written in Perl and C++ for working with VCF files
Tools
A set of tools written in Perl and C++ for working with VCF files. This package only contains the C++ libraries whereas the package perl-vcftools-vcf contains the perl libraries
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information e.g. [ id:'test', single_end:false ] |
variant_file
|
file
|
variant input file which can be vcf, vcf.gz, or bcf format. |
bed
|
file
|
bed file which can be used with different arguments in vcftools (optional) |
diff_variant_file
|
file
|
secondary variant file which can be used with the 'diff' suite of tools (optional) |
Outputs
| Name | Type | Emit | Description |
|---|---|---|---|
val(meta)
|
tuple
|
vcf
|
- |
val(meta)
|
tuple
|
bcf
|
- |
val(meta)
|
tuple
|
frq
|
- |
val(meta)
|
tuple
|
frq_count
|
- |
val(meta)
|
tuple
|
idepth
|
- |
val(meta)
|
tuple
|
ldepth
|
- |
val(meta)
|
tuple
|
ldepth_mean
|
- |
val(meta)
|
tuple
|
gdepth
|
- |
val(meta)
|
tuple
|
hap_ld
|
- |
val(meta)
|
tuple
|
geno_ld
|
- |
val(meta)
|
tuple
|
geno_chisq
|
- |
val(meta)
|
tuple
|
list_hap_ld
|
- |
val(meta)
|
tuple
|
list_geno_ld
|
- |
val(meta)
|
tuple
|
interchrom_hap_ld
|
- |
val(meta)
|
tuple
|
interchrom_geno_ld
|
- |
val(meta)
|
tuple
|
tstv
|
- |
val(meta)
|
tuple
|
tstv_summary
|
- |
val(meta)
|
tuple
|
tstv_count
|
- |
val(meta)
|
tuple
|
tstv_qual
|
- |
val(meta)
|
tuple
|
filter_summary
|
- |
val(meta)
|
tuple
|
sites_pi
|
- |
val(meta)
|
tuple
|
windowed_pi
|
- |
val(meta)
|
tuple
|
weir_fst
|
- |
val(meta)
|
tuple
|
heterozygosity
|
- |
val(meta)
|
tuple
|
hwe
|
- |
val(meta)
|
tuple
|
tajima_d
|
- |
val(meta)
|
tuple
|
freq_burden
|
- |
val(meta)
|
tuple
|
lroh
|
- |
val(meta)
|
tuple
|
relatedness
|
- |
val(meta)
|
tuple
|
relatedness2
|
- |
val(meta)
|
tuple
|
lqual
|
- |
val(meta)
|
tuple
|
missing_individual
|
- |
val(meta)
|
tuple
|
missing_site
|
- |
val(meta)
|
tuple
|
snp_density
|
- |
val(meta)
|
tuple
|
kept_sites
|
- |
val(meta)
|
tuple
|
removed_sites
|
- |
val(meta)
|
tuple
|
singeltons
|
- |
val(meta)
|
tuple
|
indel_hist
|
- |
val(meta)
|
tuple
|
hapcount
|
- |
val(meta)
|
tuple
|
mendel
|
- |
val(meta)
|
tuple
|
format
|
- |
val(meta)
|
tuple
|
info
|
- |
val(meta)
|
tuple
|
genotypes_matrix
|
- |
val(meta)
|
tuple
|
genotypes_matrix_individual
|
- |
val(meta)
|
tuple
|
genotypes_matrix_position
|
- |
val(meta)
|
tuple
|
impute_hap
|
- |
val(meta)
|
tuple
|
impute_hap_legend
|
- |
val(meta)
|
tuple
|
impute_hap_indv
|
- |
val(meta)
|
tuple
|
ldhat_sites
|
- |
val(meta)
|
tuple
|
ldhat_locs
|
- |
val(meta)
|
tuple
|
beagle_gl
|
- |
val(meta)
|
tuple
|
beagle_pl
|
- |
val(meta)
|
tuple
|
ped
|
- |
val(meta)
|
tuple
|
map_
|
- |
val(meta)
|
tuple
|
tped
|
- |
val(meta)
|
tuple
|
tfam
|
- |
val(meta)
|
tuple
|
diff_sites_in_files
|
- |
val(meta)
|
tuple
|
diff_indv_in_files
|
- |
val(meta)
|
tuple
|
diff_sites
|
- |
val(meta)
|
tuple
|
diff_indv
|
- |
val(meta)
|
tuple
|
diff_discd_matrix
|
- |
val(meta)
|
tuple
|
diff_switch_error
|
- |
versions.yml
|
path
|
versions
|
- |
modules/nf-core/yte/main.nf:1A YAML template engine with Python expressions
Tools
Inputs
| Name | Type | Description |
|---|---|---|
meta
|
map
|
Groovy Map containing sample information
e.g. |
template
|
file
|
YTE template |
map_file
|
file
|
YAML file containing a map to be used in the template |
map
|
map
|
Groovy Map containing mapping information to be used in the template
e.g. |
Outputs
| Name | Type | Pattern | Description |
|---|---|---|---|
rendered
|
file
|
*.yaml
|
Rendered YAML file |
Functions
This page documents helper functions defined in the pipeline.
workflows/sarek/main.nf:631Parameters
| Name | Description | Default |
|---|---|---|
meta
|
- | - |
files
|
- | - |
subworkflows/nf-core/utils_nextflow_pipeline/main.nf:87subworkflows/nf-core/utils_nfcore_pipeline/main.nf:32subworkflows/nf-core/utils_nfcore_pipeline/main.nf:46Parameters
| Name | Description | Default |
|---|---|---|
nextflow_cli_args
|
- | - |
subworkflows/nf-core/utils_nfcore_pipeline/main.nf:229Parameters
| Name | Description | Default |
|---|---|---|
summary_params
|
- | - |
email
|
- | - |
email_on_fail
|
- | - |
plaintext_email
|
- | - |
outdir
|
- | - |
monochrome_logs
|
- | - |
multiqc_report
|
- | - |
subworkflows/nf-core/utils_nfcore_pipeline/main.nf:342Parameters
| Name | Description | Default |
|---|---|---|
monochrome_logs
|
- | - |
subworkflows/nf-core/utils_nextflow_pipeline/main.nf:73Parameters
| Name | Description | Default |
|---|---|---|
outdir
|
- | - |
workflows/sarek/main.nf:653Parameters
| Name | Description | Default |
|---|---|---|
path
|
- | - |
subworkflows/local/utils_nfcore_sarek_pipeline/main.nf:254modules/nf-core/cat/cat/main.nf:75Parameters
| Name | Description | Default |
|---|---|---|
filename
|
- | - |
main.nf:342Parameters
| Name | Description | Default |
|---|---|---|
attribute
|
- | - |
subworkflows/nf-core/utils_nfcore_pipeline/main.nf:208Parameters
| Name | Description | Default |
|---|---|---|
multiqc_reports
|
- | - |
subworkflows/nf-core/utils_nfcore_pipeline/main.nf:62subworkflows/nf-core/utils_nfcore_pipeline/main.nf:360Parameters
| Name | Description | Default |
|---|---|---|
summary_params
|
- | - |
hook_url
|
- | - |
subworkflows/local/annotation_cache_initialisation/main.nf:70Parameters
| Name | Description | Default |
|---|---|---|
cache_url
|
- | - |
subworkflows/nf-core/utils_nfcore_pipeline/main.nf:141Parameters
| Name | Description | Default |
|---|---|---|
monochrome_logs
|
- | - |
subworkflows/local/utils_nfcore_sarek_pipeline/main.nf:297Parameters
| Name | Description | Default |
|---|---|---|
mqc_methods_yaml
|
- | - |
subworkflows/nf-core/utils_nfcore_pipeline/main.nf:107Parameters
| Name | Description | Default |
|---|---|---|
summary_params
|
- | - |
subworkflows/nf-core/utils_nfcore_pipeline/main.nf:80Parameters
| Name | Description | Default |
|---|---|---|
yaml_file
|
- | - |
workflows/sarek/main.nf:681Parameters
| Name | Description | Default |
|---|---|---|
path
|
- | - |
subworkflows/local/utils_nfcore_sarek_pipeline/main.nf:338Parameters
| Name | Description | Default |
|---|---|---|
need_input
|
- | - |
step
|
- | - |
outdir
|
- | - |
subworkflows/nf-core/utils_nfcore_pipeline/main.nf:100Parameters
| Name | Description | Default |
|---|---|---|
ch_versions
|
- | - |
subworkflows/local/utils_nfcore_sarek_pipeline/main.nf:285subworkflows/local/utils_nfcore_sarek_pipeline/main.nf:271subworkflows/local/utils_nfcore_sarek_pipeline/main.nf:248