The raw FASTQ reads sometimes contain adapter sequences and low-quality bases at the ends. These need to be removed before downstream analysis.
Follow the instructions to install Miniconda on a Linux system:
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh
Refresh the terminal:
source ~/miniconda3/bin/activate
Initialize conda in all available shells:
~/miniconda3/bin/conda init --all
Install FastQC:
conda create -n fastqc -c bioconda fastqc
Activate the environment:
conda activate fastqc
Run FastQC on all FASTQ files:
fastqc fastq_data/*.fastq.gz -o .
Organize output files:
mkdir zip_files html_files
mv *.zip zip_files/
mv *.html html_files/
To visualize the HTML files:
exit
scp -r name@IP_address:/path/to/html_files ~/
Deactivate FastQC environment:
conda deactivate fastqc
Install Trim Galore:
conda create -n trim-galore -c bioconda trim-galore
Activate the environment:
conda activate trim-galore
Trim paired-end reads with Illumina adapters:
trim_galore --paired --illumina fastq_data/SRR836370_1_subset.fastq.gz fastq_data/SRR836370_2_subset.fastq.gz
View trimming report:
less SRR836370_1_subset.fastq.gz_trimming_report.txt
Option 1: List file names manually
for file in fastq_data/SRR836370_1_subset.fastq.gz fastq_data/SRR10764408_1_subset.fastq.gz fastq_data/SRR10764412_subset.fastq.gz ; do
paired_file=${file/_1.fastq.gz/_2.fastq.gz}
trim_galore --paired "$file" "$paired_file"
done
Option 2: Use wildcard to automatically detect file pairs
for file in fastq_data/*_1_subset.fastq.gz
do
trim_galore --paired "$file" "${file/_1_subset.fastq.gz/_2_subset.fastq.gz}"
done
Output Files:
*.fq.gz_trimming_report.txt
- Summary report.*_val_1.fastq.gz
- Trimmed read 1.*_val_2.fastq.gz
- Trimmed read 2.Deactivate the Trim Galore environment:
conda deactivate trim-galore