2. Check MAF (SNP-level)
Minor allele frequency (MAF). A large degree of homogeneity at a given SNP across study participants generally results in inadequate power to infer a statistically significant relationship between the SNP and the trait under study. This can occur when we have a very small MAF so that the large majority of individuals have two copies of the major allele.
runPLINK("--ped file.ped --map file.map --freq --out file_maf")
system("cat file_maf.frq | tr -s ' ' '\t' > file_maf_frq.tabs")
Note
Threshold for MAF can vary between 1% and 5% (for small sample settings)
Plot MAF histogram
MAF_check <- read.delim("file_maf_frq.tabs") %>%
select(-X)
maf_01 = sum(MAF_check$MAF<0.01) # threshold at 1%
maf_05 = sum(MAF_check$MAF<0.05) # threshold at 5%
ggplot(MAF_check, aes(MAF)) +
geom_histogram() +
theme_bw() +
ylab ("# of occurences") +
xlab ("Minor allele frequency") +
geom_vline(xintercept = 0.01, color = 'grey', linetype="dashed") +
annotate(geom="text", x=-0.02, y=100000, label=paste0("maf = 0.01 ",maf_01, ' SNPs'),
angle = 90, color="red", size = 3) +
geom_vline(xintercept = 0.05, color = 'grey', linetype="dashed") +
annotate(geom="text", x=0.043, y=100000, label=paste0("maf= 0.05 ", maf_05, ' SNPs'),
angle = 90, color="red", size = 3)
Note
Advice For small sample size it is recommended to exclude SNPs with MAF < 0.05.