Variant Calling using Strelka

1. Install and Activate Strelka

conda create -n strelka -c bioconda strelka
conda activate strelka

This creates a new Conda environment named strelka and installs Strelka2 from the bioconda channel.

2. Index the Reference Genome

samtools faidx GCA_021130815.1_PanTigT.MC.v3_genomic.fna

faidx stands for FASTA indexing. Skip this step if the .fai index file already exists.

3. Run Strelka (Two-Step Process)

Step 1: Configuration

Check if Strelka is installed properly:

which configureStrelkaGermlineWorkflow.py

Configure the workflow:

/home/your_username/miniconda3/envs/strelka/bin/configureStrelkaGermlineWorkflow.py \
  --bam name_of_the_file_aligned_reads_sorted_deduplicated.bam \
  --referenceFasta reference_genome_filename.fna \
  --runDir strelka_germline

Explanation:
configureStrelkaGermlineWorkflow.py prepares the pipeline and generates necessary files.
--bam specifies the input BAM file.
--referenceFasta provides the reference genome.
--runDir specifies the output directory for the configured workflow.

Step 2: Workflow Execution

strelka_germline/runWorkflow.py -m local -j 8

This runs the variant calling process using the previously configured script.
-m local runs the workflow on the local machine.
-j 8 uses 8 CPU threads for parallel execution.

4. Joint Variant Calling for Multiple Samples

To call variants using multiple BAM files together:

configureStrelkaGermlineWorkflow.py \
  --bam indv1.bam \
  --bam indv2.bam \
  --bam indv3.bam \
  --referenceFasta GCA_021130815.1_PanTigT.MC.v3_genomic.fna \
  --runDir output_folder

Then run the workflow:

output_folder/runWorkflow.py -m local -j 8

Note: An output directory will be generated containing the variant calling results for each input BAM file.