Mapping Reads with BWA MEM

The datasets from Zenodo (14258052) contain the reference genome file and its index files.

The trimmed reads (after adapter and low-quality base removal) need to be mapped to a reference genome.

Task 1: Map Reads

Map the reads to the GCA_021130815.1_PanTigT.MC.v3_genomic.fna reference genome.

Task 2: Convert SAM to BAM

The mapping creates a .sam file which is large. Convert it into a .bam file.

Task 3: Sort BAM Files

Reads in SAM/BAM are arranged randomly. Sorting by genomic coordinates improves efficiency for downstream analysis like variant calling.

Task 4: Remove or Mark Duplicate Reads

Note: For ddRAD, RADseq, and amplicon sequencing this step should be avoided, since most reads are duplicates by design.

Task 5: Index BAM File

Indexing the sorted, duplicate-marked BAM file allows quick access to specific genomic regions for downstream SNP calling.

Task 6: Estimate Statistics

Estimate sequencing depth and percent mapping of the reads to the reference genome. Look for a section "Chromosome-wise coverage" or similar.