前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >专栏 >如何计算群体中的单倍型频率

如何计算群体中的单倍型频率

作者头像
邓飞
发布2025-04-04 16:27:03
发布2025-04-04 16:27:03
27800
代码可运行
举报
运行总次数:0
代码可运行

大家好,我是邓飞。

昨天写了一篇(单倍型的显著性分析)的博文,里面介绍了为什么GWAS分析后,要进行单倍型的显著性分析,简而言之,如果显著性位点在block中,以block为代表进行利用,可以进行PRS(多基因评分)或者MAS(分子标记辅助选择。

1,数据

数据是plink格式的map和ped格式的基因型数据。

测试数据下载:https://t.zsxq.com/ML3ox

有54个位点,310个个体。

2,计算单倍型

使用plink1.9,通过参数--blocks计算单倍型

代码语言:javascript
代码运行次数:0
运行
复制
$ plink --file a19 --blocks no-pheno-req
PLINK v1.90b4.6 64-bit (15 Aug 2017)           www.cog-genomics.org/plink/1.9/
(C) 2005-2017 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink.log.
Options in effect:
  --blocks no-pheno-req
  --file a19

15488 MB RAM detected; reserving 7744 MB for main workspace.
.ped scan complete (for binary autoconversion).
Performing single-pass .bed write (54 variants, 310 people).
--file: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam
written.
54 variants loaded from .bim file.
310 people (0 males, 0 females, 310 ambiguous) loaded from .fam.
Ambiguous sex IDs written to plink.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 310 founders and 0 nonfounders present.
Calculating allele frequencies... done.
54 variants and 310 people pass filters and QC.
Note: No phenotypes present.
--blocks: 8 haploblocks written to plink.blocks .
Extra block details written to plink.blocks.det .
Longest span: 102.68kb.

从日志可以看出,共有8个单倍型,打印结果:

代码语言:javascript
代码运行次数:0
运行
复制
]$ cat plink.blocks.det 
 CHR          BP1          BP2           KB  NSNPS SNPS
  19     26609728     26619327          9.6      4 QTLSaanen19.135_O|QTLSaanen19.137_O|QTLSaanen19.26_O|QTLSaanen19.28_O
  19     26647834     26659230       11.397      2 QTLSaanen19.32|GoatD01.005522
  19     26818809     26824298         5.49      2 QTLSaanen19.45_O|QTLSaanen19.46
  19     27170279     27214437       44.159      5 snp24006-scaffold244048787|QTLSaanen19.49|QTLSaanen19.50_O|QTLSaanen19.51|QTLSaanen19.53_O
  19     27220123     27228780        8.658      2 QTLSaanen19.54_O|QTLSaanen19.55
  19     27480793     27482976        2.184      2 snp24012-scaffold244-1259949|19_27482976_AF-PAKI
  19     27502643     27605322       102.68     15 QTLSaanen19.63_O|QTLSaanen19.64|QTLSaanen19.65_O|snp24013-scaffold244-1309587|QTLSaanen19.67|QTLSaanen19.68|QTLSaanen19.69_O|snp24014-scaffold244-1338180|QTLSaanen19.70_O|QTLSaanen19.71|QTLSaanen19.74|QTLSaanen19.76_O|QTLSaanen19.77_O|QTLSaanen19.78_O|snp24015-scaffold244-1384770
  19     27618016     27623534        5.519      3 QTLSaanen19.79|QTLSaanen19.79_O|QTLSaanen19.80

3,计算每个block的频率

使用plink1.07,参数:--hap-freq

代码语言:javascript
代码运行次数:0
运行
复制
$ plink1.07 --file a19 --hap plink.blocks --noweb  --hap-freq

@----------------------------------------------------------@
|        PLINK!       |     v1.07      |   10/Aug/2009     |
|----------------------------------------------------------|
|  (C) 2009 Shaun Purcell, GNU General Public License, v2  |
|----------------------------------------------------------|
|  For documentation, citation & bug-report instructions:  |
|        http://pngu.mgh.harvard.edu/purcell/plink/        |
@----------------------------------------------------------@

Skipping web check... [ --noweb ] 
Writing this text to log file [ plink.log ]
Analysis started: Wed Apr  2 07:56:49 2025

Options in effect:
	--file a19
	--hap plink.blocks
	--noweb
	--hap-freq

54 (of 54) markers to be included from [ a19.map ]
Warning, found 310 individuals with ambiguous sex codes
These individuals will be set to missing ( or use --allow-no-sex )
Writing list of these individuals to [ plink.nosex ]
310 individuals read from [ a19.ped ] 
0 individuals with nonmissing phenotypes
Assuming a disease phenotype (1=unaff, 2=aff, 0=miss)
Missing phenotype value is also -9
0 cases, 0 controls and 310 missing
0 males, 0 females, and 310 of unspecified sex
Before frequency and genotyping pruning, there are 54 SNPs
310 founders and 0 non-founders found
Total genotyping rate in remaining individuals is 1
0 SNPs failed missingness test ( GENO > 1 )
0 SNPs failed frequency test ( MAF < 0 )
After frequency and genotyping pruning, there are 54 SNPs
After filtering, 0 cases, 0 controls and 310 missing
After filtering, 0 males, 0 females, and 310 of unspecified sex

Read 8 haplotypes from [ plink.blocks ]
Estimating haplotype frequencies/phases ( MHF >= 0.01 )
Considering phases P(H|G) >= 0.01
Requiring per individual per haplotype missingness < 0.5 
Writing haplotype frequencies to [ plink.frq.hap ]
8 out of 8 haplotypes phased       
Analysis finished: Wed Apr  2 07:56:49 2025

查看结果文件:

代码语言:javascript
代码运行次数:0
运行
复制
$ cat plink.frq.hap 
     LOCUS    HAPLOTYPE          F
        H1         ACCC    0.05323
        H1         GATC       0.35
        H1         GATT     0.5968
        H2           GC     0.4821
        H2           AT     0.4885
        H2           GT    0.01956
        H3           CA     0.1339
        H3           AG     0.8661
        H4        CGGTT     0.1758
        H4        TAGCC    0.01867
        H4        CAACC     0.0477
        H4        TAACC     0.7442
        H5           GT     0.1774
        H5           AC     0.4613
        H5           GC     0.3613
        H6           AT     0.1806
        H6           GT     0.1435
        H6           GC     0.6758
        H7 TGCCAATCTTACAAT     0.1871
        H7 ATTTGGTTCCCAGGC       0.25
        H7 ATTTGGCTCCCAGGC     0.1581
        H7 ATTTGGTCTCACAAT    0.04355
        H7 ATTTGGTCTTACAAT     0.3323
        H7 ATTTGGTTCCACAAT    0.01774
        H8          AAG     0.1371
        H8          GGT     0.1242
        H8          AGT     0.3323
        H8          AAT     0.4065

可以看到,H1~H8是八个block,每个block里面包括不同的单倍型以及对应的频率,比如第一个block,包括3个类型:

- ACCC

- GATC

- GATT

对应的频率分别是:0.05323, 0.35和0.5968

把对应的基因型提取出来,确定每个样本的单倍型类型,结合表型数据,就可以进行单倍型间的显著性分析了,进而绘制箱线图、小提琴图等图了。

这种图:

这种图:

或者这种图:

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2025-04-02,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 育种数据分析之放飞自我 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档