本篇推文来自于公众号读者的投稿,编辑排版由小明完成
library(ggpubr)
library(rstatix)
library(tidyverse)
这里用到的是R语言的内置数据集sample_n_by()
函数很有用,能够分组随机抽样%>%
是管道符 是将前面的结果传输给后面的函数
data("PlantGrowth")
set.seed(1234)
PlantGrowth %>% sample_n_by(group, size = 1)
函数sample_n_by()
加载和检查数据,按组显示随机的一行
levels(PlantGrowth$group)
单因素方差分析可以用来确定在三种条件下植物的平均生长是否显著不同。
PlantGrowth %>%
group_by(group) %>%
get_summary_stats(weight, type = "mean_sd")
model <- lm(weight ~ group, data = PlantGrowth)
ggqqplot(residuals(model))
image.png
shapiro_test(residuals(model))
在QQ图中,由于所有的点都近似地落在参考线上,我们可以假设正态分布。这个结论得到了Shapiro-Wilk test的支持。p值不显著(p=0.13>0.05),因此我们可以假设正态性。
PlantGrowth %>%
group_by(group) %>%
shapiro_test(weight)
p > 0.05 假设成立
ggqqplot(PlantGrowth, "weight", facet.by = "group")
image.png
plot(model, 1)
image.png
在上图中,残差与拟合值(每组的均值)之间没有明显的关系。我们可以假设方差齐性。
PlantGrowth %>% levene_test(weight ~ group)
p>0.05, 没有显著性差异,假设通过。
res.aov <- PlantGrowth %>% anova_test(weight ~ group)
res.aov
p<0.05,表明各组之间有显著性差异
pwc <- PlantGrowth %>% tukey_hsd(weight ~ group)
pwc
pwc <- pwc %>% add_xy_position(x = "group")
ggboxplot(PlantGrowth, x = "group", y = "weight",color = "#488dcb") +
stat_pvalue_manual(pwc, hide.ns = TRUE) +
labs(
subtitle = get_test_label(res.aov, detailed = TRUE),
caption = get_pwc_label(pwc)
)
image.png
library(ggpubr)
library(rstatix)
library(tidyverse)
set.seed(123)
library(datarium)
data("jobsatisfaction", package = "datarium")
jobsatisfaction %>% sample_n_by(gender, education_level, size = 1)
jobsatisfaction %>%
group_by(gender, education_level) %>%
get_summary_stats(score, type = "mean_sd")
bxp <- ggboxplot(
jobsatisfaction, x = "gender", y = "score",
color = "education_level", palette = "jco"
)
bxp
image.png
model1 <- lm(score ~ gender*education_level,
data = jobsatisfaction)
ggqqplot(residuals(model1))
image.png
shapiro_test(residuals(model1))
假设通过
jobsatisfaction %>%
group_by(gender, education_level) %>%
shapiro_test(score)
p>0.05,假设通过
ggqqplot(jobsatisfaction, "score", ggtheme = theme_bw()) +
facet_grid(gender ~ education_level)
image.png
jobsatisfaction %>% levene_test(score ~ gender*education_level)
p > 0.05假设通过
res.aov1 <- jobsatisfaction %>% anova_test(score ~ gender * education_level)
res.aov1
性别与教育程度对工作满意度得分的交互作用有统计学意义
model3 <- lm(score ~ gender * education_level, data = jobsatisfaction)
jobsatisfaction %>%
group_by(gender) %>%
anova_test(score ~ education_level, error = model3)
受教育程度”对工作满意度的简单主效应在男性和女性中均有统计学意义。
install.packages("emmeans")
library(emmeans)
pwc1 <- jobsatisfaction %>%
group_by(gender) %>%
emmeans_test(score ~ education_level, p.adjust.method = "bonferroni")
pwc1
各组男性和女性的工作满意度得分均有显著性差异(p < 0.05)
pwc1 <- pwc1 %>% add_xy_position(x = "gender")
bxp +
stat_pvalue_manual(pwc1) +
labs(
subtitle = get_test_label(res.aov1, detailed = TRUE),
caption = get_pwc_label(pwc1)
)
image.png
原文链接:https://www.datanovia.com/en/checkout/order-received/
欢迎大家关注我的公众号
小明的数据分析笔记本
小明的数据分析笔记本 公众号 主要分享:1、R语言和python做数据分析和数据可视化的简单小例子;2、园艺植物相关转录组学、基因组学、群体遗传学文献阅读笔记;3、生物信息学入门学习资料及自己的学习笔记!
本文的完整代码可以在公众号后台回复 20210817
获得