问题: 如何使用asreml进行固定因子的wald检验和随机因子的LRT检验?
代码如下:
library(learnasreml)
data("repeatmodel.dat")
data("repeatmodel.ped")
dat = repeatmodel.dat
ped = repeatmodel.ped
head(ped)
head(dat)
str(dat)
# 固定因子wald检验: 不考虑随机因子
library(asreml)
m1 = asreml(LAYDATE ~ BYEAR + AGE + YEAR, data=dat)
wald(m1)
# 固定因子wald检验: 加性效应作为随机因子
ainv = asreml.Ainverse(ped)$ginv
m2 = asreml(LAYDATE ~ BYEAR + AGE + YEAR, random = ~ ped(ANIMAL), ginverse = list(ANIMAL = ainv), data=dat)
wald(m2)
# 判断随机因子:LRT检验--加性效应 + BYEAR
m3 = asreml(LAYDATE ~ AGE + YEAR, random = ~ ped(ANIMAL) + BYEAR, ginverse = list(ANIMAL = ainv), data=dat)
wald(m3)
# 判断随机因子:LRT检验--加性效应
m4 = asreml(LAYDATE ~ AGE + YEAR, random = ~ ped(ANIMAL), ginverse = list(ANIMAL = ainv), data=dat)
wald(m4)
# LRT检验: BYEAR显著性
model.comp(m3,m4)
lrt.asreml(m3,m4)
summary(m3)$varcomp
结果如下:
> library(learnasreml)
> data("repeatmodel.dat")
> data("repeatmodel.ped")
> dat = repeatmodel.dat
> ped = repeatmodel.ped
>
> head(ped)
ID FATHER MOTHER
1 1306 0 0
2 1304 0 0
3 1298 0 0
4 1293 0 0
5 1290 0 0
6 1288 0 0
> head(dat)
ANIMAL BYEAR AGE YEAR LAYDATE
1 1 990 2 992 19
2 1 990 3 993 23
3 1 990 4 994 24
4 1 990 5 995 23
5 1 990 6 996 29
6 2 990 2 992 21
>
> str(dat)
'data.frame': 1607 obs. of 5 variables:
$ ANIMAL : Factor w/ 469 levels "1","2","3","8",..: 1 1 1 1 1 2 2 2 3 3 ...
$ BYEAR : Factor w/ 34 levels "968","970","971",..: 22 22 22 22 22 22 22 22 22 22 ...
$ AGE : Factor w/ 5 levels "2","3","4","5",..: 1 2 3 4 5 1 2 3 1 2 ...
$ YEAR : Factor w/ 39 levels "970","971","972",..: 23 24 25 26 27 23 24 25 23 24 ...
$ LAYDATE: int 19 23 24 23 29 21 17 21 20 20 ...
>
> # 固定因子wald检验: 不考虑随机因子
> library(asreml)
> m1 = asreml(LAYDATE ~ BYEAR + AGE + YEAR, data=dat)
ASReml: Tue May 28 17:54:30 2019
LogLik S2 DF wall cpu
-3204.9852 20.4837 1532 17:54:30 0.0
-3204.9852 20.4837 1532 17:54:30 0.0
Finished on: Tue May 28 17:54:30 2019
LogLikelihood Converged
> wald(m1)
Wald tests for fixed effects
Response: LAYDATE
Terms added sequentially; adjusted for those above
Df Sum of Sq Wald statistic Pr(Chisq)
(Intercept) 1 890547 43476 < 2.2e-16 ***
BYEAR 33 4127 201 < 2.2e-16 ***
AGE 4 5967 291 < 2.2e-16 ***
YEAR 37 10478 512 < 2.2e-16 ***
residual (MS) 20
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> # 固定因子wald检验: 加性效应作为随机因子
> ainv = asreml.Ainverse(ped)$ginv
> m2 = asreml(LAYDATE ~ BYEAR + AGE + YEAR, random = ~ ped(ANIMAL), ginverse = list(ANIMAL = ainv), data=dat)
ASReml: Tue May 28 17:54:31 2019
LogLik S2 DF wall cpu
-3098.9187 16.5248 1532 17:54:31 0.0
-3037.0822 14.2991 1532 17:54:31 0.0
-2958.8975 11.5365 1532 17:54:31 0.0
-2898.4341 9.1074 1532 17:54:31 0.0
-2886.2639 8.2103 1532 17:54:31 0.0
-2885.3686 7.9751 1532 17:54:31 0.0
-2885.3529 7.9436 1532 17:54:31 0.0
-2885.3528 7.9413 1532 17:54:31 0.0
-2885.3528 7.9411 1532 17:54:31 0.0
Finished on: Tue May 28 17:54:31 2019
LogLikelihood Converged
> wald(m2)
Wald tests for fixed effects
Response: LAYDATE
Terms added sequentially; adjusted for those above
Df Sum of Sq Wald statistic Pr(Chisq)
(Intercept) 1 40784 5135.8 < 2.2e-16 ***
BYEAR 33 741 93.3 1.156e-07 ***
AGE 4 5710 719.0 < 2.2e-16 ***
YEAR 37 10067 1267.7 < 2.2e-16 ***
residual (MS) 8
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> # 判断随机因子:LRT检验--加性效应 + BYEAR
> m3 = asreml(LAYDATE ~ AGE + YEAR, random = ~ ped(ANIMAL) + BYEAR, ginverse = list(ANIMAL = ainv), data=dat)
ASReml: Tue May 28 17:54:31 2019
LogLik S2 DF wall cpu
-3129.1357 16.3796 1564 17:54:31 0.0
-3065.6565 14.2386 1564 17:54:31 0.0
-2986.7479 11.5460 1564 17:54:31 0.0
-2925.8431 9.1373 1564 17:54:31 0.0
-2913.2394 8.2336 1564 17:54:31 0.0 (1 restrained)
-2912.3615 8.0474 1564 17:54:31 0.0 (1 restrained)
-2912.2567 7.9822 1564 17:54:31 0.0 (1 restrained)
-2912.2462 7.9615 1564 17:54:31 0.0 (1 restrained)
-2912.2452 7.9552 1564 17:54:31 0.0
-2912.2451 7.9526 1564 17:54:31 0.0
-2912.2451 7.9524 1564 17:54:31 0.0
Finished on: Tue May 28 17:54:31 2019
LogLikelihood Converged
> wald(m3)
Wald tests for fixed effects
Response: LAYDATE
Terms added sequentially; adjusted for those above
Df Sum of Sq Wald statistic Pr(Chisq)
(Intercept) 1 40918 5145.4 < 2.2e-16 ***
AGE 4 5760 724.3 < 2.2e-16 ***
YEAR 38 10499 1320.2 < 2.2e-16 ***
residual (MS) 8
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> # 判断随机因子:LRT检验--加性效应
> m4 = asreml(LAYDATE ~ AGE + YEAR, random = ~ ped(ANIMAL), ginverse = list(ANIMAL = ainv), data=dat)
ASReml: Tue May 28 17:54:31 2019
LogLik S2 DF wall cpu
-3128.7714 16.7962 1564 17:54:31 0.0
-3066.5923 14.5295 1564 17:54:31 0.0
-2987.7729 11.7012 1564 17:54:31 0.0
-2925.9143 9.1801 1564 17:54:31 0.0
-2913.2145 8.2381 1564 17:54:31 0.0
-2912.2628 7.9895 1564 17:54:31 0.0
-2912.2452 7.9553 1564 17:54:31 0.0
-2912.2451 7.9526 1564 17:54:31 0.0
-2912.2451 7.9524 1564 17:54:31 0.0
Finished on: Tue May 28 17:54:31 2019
LogLikelihood Converged
> wald(m4)
Wald tests for fixed effects
Response: LAYDATE
Terms added sequentially; adjusted for those above
Df Sum of Sq Wald statistic Pr(Chisq)
(Intercept) 1 40918 5145.4 < 2.2e-16 ***
AGE 4 5760 724.3 < 2.2e-16 ***
YEAR 38 10499 1320.2 < 2.2e-16 ***
residual (MS) 8
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> # LRT检验: BYEAR显著性
> model.comp(m3,m4)
model AIC BIC AIC.stat BIC.stat
1 m3 5830.49 5846.555
2 m4 5828.49 5839.200 better better
> lrt.asreml(m3,m4)
Likelihood ratio test(s) assuming nested random models.
Chisq probability adjusted using Stram & Lee, 1994.
Df LR statistic Pr(Chisq)
m4/m3 0 1.0339e-06 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
划重点:
1, wald和anova对于asreml是一样的, 因为asreml中有anova这个类, 可以调用aonva.asreml 进行wald检验
2, learnasreml是我编写的包, 安装方法:
devtools::install_github("dengfei2013/learnasreml")
3, LRT检验, 可以手动进行, 也可以使用lrt.asreml函数进行检验.
祝你成功.
下面是使用lme4的解决方案:
很多朋友写信问我, 像要知道固定因子的显著性和随机因子的显著性如何计算,他们使用的是lme4
这个R包, 但是这个包使用anova时没有P值,还要手动计算, 随机因子也需要自己计算loglikelihood值, 然后使用LRT的卡方检验进行显著性检验, 其实lme4包有扩展的包可以非常友好的做这件事情.
###载入软件包和数据
library(lme4)
library(lmerTest)
library(sjstats)
library(learnasreml)
data(fm)
lme4
lmerTest
lmerTest::anova.lmerModLmerTest
用于检测固定因子的显著性, 方差分析表采用III平方和的形式.lmerTest::ranova
用于检测随机因子的显著性, 使用的是LRT检验, 给出的是卡方结果.install.packages("lmerTest")
sjstats
install.packages("lmerTest")
模型介绍
建模
固定因子: Spacing+Rep, 随机因子: Fam
fm1 <- lmer(h1 ~Spacing + Rep + (1|Fam), fm)
固定因子检验
anova(fm1) # 固定因子显著性检验
可以看到Spacing 和Rep都达到极显著
随机因子显著性检验
ranova(fm1) # 随机因子显著性检验,LRT
可以看到Fam达到极显著
计算R2
r2(fm1) # 计算R2
R-Squared for Generalized Linear Mixed Model
[34mFamily : gaussian (identity)
Formula: h1 ~ Spacing + Rep + (1 | Fam)
[39m Marginal R2: 0.116
Conditional R2: 0.277
计算固定因子每个水平的P值
p_value(fm1) # 计算每个水平的显著性
term | p.value | std.error |
---|---|---|
(Intercept) | 1.535094e-127 | 0.7915991 |
Spacing3 | 4.957481e-09 | 0.5463546 |
Rep2 | 2.886600e-01 | 0.8082299 |
Rep3 | 7.443430e-08 | 0.8218056 |
Rep4 | 1.720753e-10 | 0.7995633 |
Rep5 | 4.635631e-01 | 0.7663026 |
提取方差组分
re_var(fm1) # 计算方差组分
_sigma_2
50.7633345854535
Fam_tau.00
11.3168413429073
建模
library(asreml)
fm2 = asreml(h1 ~ Spacing + Rep, random = ~ Fam, data=fm,trace=F)
固定因子检验
anova(fm2) # 固定因子显著性检验, 这里anova 是anova.asreml
随机因子显著性检验
这里首先构建一个空模型, 然后使用LRT检验
fm_Null = asreml(h1 ~ Spacing + Rep, data=fm,trace=F)
lrt.asreml(fm2,fm_Null) # 随机因子显著性检验LRT
summary(fm2)$varcomp[,1:2] # 方差组分
5. 关于混合线性模型计算R2
还有一个包叫MuMIn
,也可以计算R2
library(MuMIn)
r.squaredLR(fm1)#计算R2
0.217233511687581
# 混合线性模型, 如何检测固定因子和随机因子
###载入数据
library(lme4)
library(lmerTest)
library(sjstats)
library(learnasreml)
data(fm)
str(fm)
### 固定因子: Spacing+Rep, 随机因子: Fam
fm1 <- lmer(h1 ~Spacing + Rep + (1|Fam), fm)
summary(fm1)
anova(fm1) # 固定因子显著性检验
ranova(fm1) # 随机因子显著性检验,LRT
r2(fm1) # 计算R2
p_value(fm1) # 计算每个水平的显著性
re_var(fm1) # 计算方差组分
### 对比asreml
fm2 = asreml(h1 ~ Spacing + Rep, random = ~ Fam, data=fm)
anova(fm2) # 固定因子显著性检验, 这里anova 是anova.asreml
fm_Null = asreml(h1 ~ Spacing + Rep, data=fm)
lrt.asreml(fm2,fm_Null) # 随机因子显著性检验LRT
summary(fm2)$varcomp[,1:2] # 方差组分
library(MuMIn)
r.squaredLR(fm1)#计算R2