Mapping Reads with BWA

Mapping Reads with BWA MEM

The datasets from Zenodo (14258052) contain the reference genome file and its index files.

The trimmed reads (after adapter and low-quality base removal) need to be mapped to a reference genome.

Map the reads to the GCA_021130815.1_PanTigT.MC.v3_genomic.fna reference genome.

The mapping creates a .sam file which is large. Convert it into a .bam file.

Reads in SAM/BAM are arranged randomly. Sorting by genomic coordinates improves efficiency for downstream analysis like variant calling.

PCR duplicates: arise from over-amplification during PCR, not true biological fragments.
Optical duplicates: arise from sequencing machine artifacts.

Note: For ddRAD, RADseq, and amplicon sequencing this step should be avoided, since most reads are duplicates by design.

Indexing the sorted, duplicate-marked BAM file allows quick access to specific genomic regions for downstream SNP calling.

Estimate sequencing depth and percent mapping of the reads to the reference genome. Look for a section "Chromosome-wise coverage" or similar.