The datasets from Zenodo (14258052) contain the reference genome file and its index files.
The trimmed reads (after adapter and low-quality base removal) need to be mapped to a reference genome.
Map the reads to the GCA_021130815.1_PanTigT.MC.v3_genomic.fna reference genome.
The mapping creates a .sam
file which is large. Convert it into a .bam
file.
Reads in SAM/BAM are arranged randomly. Sorting by genomic coordinates improves efficiency for downstream analysis like variant calling.
Note: For ddRAD, RADseq, and amplicon sequencing this step should be avoided, since most reads are duplicates by design.
Indexing the sorted, duplicate-marked BAM file allows quick access to specific genomic regions for downstream SNP calling.
Estimate sequencing depth and percent mapping of the reads to the reference genome. Look for a section "Chromosome-wise coverage" or similar.