Workflow¶
VIP consists of four workflows depending on the type of input data: fastq, bam/cram, gvcf or vcf.
The fastq
workflow is an extension of the cram
workflow. The cram
and gvcf
workflows are extensions of the vcf
workflow.
The vcf
workflow produces the pipeline outputs as described here.
The following sections provide an overview of the steps of each of these workflows.
FASTQ¶
The fastq
workflow consists of the following steps:
- Parallelize sample sheet per sample and for each sample
- Quality reporting and preprocessing using fastp
- Alignment using minimap2 producing a
cram
file per sample - In case of multiple fastq files per sample, concatenate the cram output files
- Continue with step 3. of the
cram
workflow
For details, see here.
CRAM¶
The cram
workflow consists of the following steps:
- Parallelize sample sheet per sample and for each sample
- Create validated, indexed
.bam
file frombam/cram/sam
input - If a bed file was provide via the sample sheet: generate coverage metrics using MosDepth
- Discover short tandem repeats and publish as intermediate result.
- Using ExpansionHunter for Illumina short read data.
- Using this fork of Straglr for PacBio and Nanopore long read data, this is a fork of this fork(https://github.com/philres/straglr) and is chosen over the original Straglr because of the VCF output that enables VIP to combine it with the SV and SNV data in the VCF workflow.
- Discover copy number variants for for PacBio and Nanopore long read data using Spectre data and publish as intermediate result.
- Parallelize cram in chunks consisting of one or more contigs and for each chunk
- Perform short variant calling with DeepVariant producing a
gvcf
file per chunk per sample, the gvcfs of the samples in a project are than merged to one vcf per project (using GLnexus. - Perform structural variant calling with Manta or cuteSV producing a
vcf
file per chunk per project.
- Perform short variant calling with DeepVariant producing a
- Concatenate short variant calling and structural variant calling
vcf
files per chunk per sample - Continue with step 3. of the
vcf
workflow
For details, see here.
gVCF¶
The gvcf
workflow consists of the following steps:
- For each project in the sample sheet
- Create validated, indexed
.g.vcf.gz
file frombcf/bcf.gz/bcf.bgz/gvcf/gvcf.gz/gvcf.bgz/vcf/vcf.gz/vcf.bgz
inputs - Merge
.g.vcf.gz
files using GLnexus resulting in onevcf.gz
per project - Continue with step 3. of the
vcf
workflow
For details, see here.
VCF¶
The vcf
workflow consists of the following steps:
- For each project in the sample sheet
- Create validated, indexed
.vcf.gz
file frombcf|bcf.gz|bcf.bgz|vcf|vcf.gz|vcf.bgz
input - Chunk
vcf.gz
files and for each chunk- Normalize
- Annotate
- Classify
- Filter
- Perform inheritance matching
- Classify in the context of samples
- Filter in the context of samples
- Concatenate chunks resulting in one
vcf.gz
file per project - If
cram
data is available slice thecram
files to only keep relevant reads - Create report
For details, see here.