The raw FASTQ reads sometimes contain adapter sequences and low-quality bases at the ends. These need to be removed before downstream analysis.
Install FastQC:
conda create -n fastqc -c bioconda fastqc
Activate the environment:
conda activate fastqc
Run FastQC on all FASTQ files:
fastqc *.fq.gz
Organize output files:
mkdir zip_files html_files
mv *.zip zip_files/
mv *.html html_files/
To visualize the HTML files:
exit
scp -r name@IP_address:/path/to/html_files ~/
Deactivate FastQC environment:
conda deactivate fastqc
Install Trim Galore:
conda create -n trim-galore -c bioconda trim-galore
Activate the environment:
conda activate trim-galore
Trim paired-end reads with Illumina adapters:
trim_galore --paired --illumina BEN_CI16_sub_1.fq.gz BEN_CI16_sub_2.fq.gz
View trimming report:
less BEN_CI16_sub_2.fq.gz_trimming_report.txt
Option 1: List file names manually
for file in BEN_CI18_sub_1.fq.gz BEN_NW13_sub_1.fq.gz BEN_SI9_sub_1.fq.gz BEN_NW10_sub_1.fq.gz BEN_SI18_sub_1.fq.gz LGS1_sub_1.fq.gz BEN_NW12_sub_1.fq.gz BEN_SI19_sub_1.fq.gz; do
paired_file=${file/_1.fq.gz/_2.fq.gz}
trim_galore --paired "$file" "$paired_file"
done
Option 2: Use wildcard to automatically detect file pairs
for file in *_1.fq.gz; do
trim_galore --paired "$file" "${file/_1.fq.gz/_2.fq.gz}"
done
Output Files:
*.fq.gz_trimming_report.txt
- Summary report.*_val_1.fq.gz
- Trimmed read 1.*_val_2.fq.gz
- Trimmed read 2.Deactivate the Trim Galore environment:
conda deactivate trim-galore