VCF Filtering and PCA
Download VCF File
Download the VCF file to be used from this link:
Zenodo Dataset
Filtering and Analysis Steps
- Apply base quality, genotype quality, and Hardy-Weinberg Equilibrium (HWE) filters
- Remove Indels
- Apply the individual missingness filter
- Remove individuals with missing proportion greater than 60%
- Apply a missingness filter to keep variants common among a proportion of individuals
- Extract the number of variants remaining upon applying missingness filters from 0.1 to 0.9 and save to a
.txt
file
- Plot the data in
variant_counts.txt
using ggplot2
- Apply max missing 0.6 filter (retain variants with at least 60% data)
- Use site mean depth filter to obtain mean depth at each variant site
- Find the percentile depth between 0.05 and 0.975 (middle 95 percentile)
- Apply filters to keep only variants with depth values lying in this range
- Perform PCA (Principal Component Analysis)