Download the datasets from https://zenodo.org/records/14258052 .
Fastq reads are in .gz
format. Original genome files are in .fna
(FASTA) format.
.fna.amb
: stores information about ambiguous bases (e.g., N
) in the reference genome..fna.ann
: stores information about sequence names, lengths, and other metadata..fna.bwt
: contains the Burrows–Wheeler transformed sequence..fna.pac
: contains packed sequence data..fna.sa
: suffix array index, used to locate sequence positions.FASTQ files are obtained after demultiplexing Illumina sequencing data. Providers usually deliver them as .fastq.gz
for compression. Each record has four lines:
@SRR15369215.126490887 # sequence identifier GGACCTTCTGTCATTTCACTCCTTCTGAAGTAAGGAGTGAAGTAAACACGAAGTAAACACGACAGGTTAGTCCTATTCCTTCAAGCAGGAGTACAGAAAAGAATGCAAATTCTGGGTTCTAGCCCAGCTTTTACTCCTATGGTTCTATTT # sequence + # separator AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJFJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFAFJJJJJJJJJJJJJJJJFJJJJJJJJJFJJJJJJJJJFFFJJJJJJJJJJJ # base call quality scores (ASCII)
Adapter sequence: CTGTCTCTTATACACATCT