假设我有一个名为iris
的数据集。我想在这个数据集中创建一个名为sepal_length_group
的指示器变量。此指示器的值为p25、p50、p75和p100。例如,对于一个观察,如果物种是"setosa“,并且如果所有被归类为"setosa”的物种的Sepal.Length
等于或小于第25个百分位数,我希望sepal_length_group等于"p25“。我写了以下代码,但它会生成所有NAs:
library(skimr)
sepal_length_distribution <- iris %>% group_by(Species) %>% skim(Sepal.Length) %>% select(3, 9:12)
iris_2 <- iris %>% mutate(sepal_length_group = ifelse(Sepal.Length <= sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),2], "p25", NA))
iris_2 <- iris %>% mutate(sepal_length_group = ifelse(Sepal.Length > sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),2] &
Sepal.Length <= sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),3], "p50", NA))
iris_2 <- iris %>% mutate(sepal_length_group = ifelse(Sepal.Length > sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),3] &
Sepal.Length <= sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),4], "p75", NA))
iris_2 <- iris %>% mutate(sepal_length_group = ifelse(Sepal.Length > sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),4] &
Sepal.Length < sepal_length_distribution[which(sepal_length_distribution$Species == "setosa"),5], "p100", NA))
任何帮助我们都将不胜感激!
发布于 2021-05-19 23:46:35
这可以通过使用@Camille注释的函数cut
来简单地完成
library(tidyverse)
iris %>%
group_by(Species) %>%
mutate(cat = cut(Sepal.Length,
quantile(Sepal.Length, c(0,.25,.5,.75, 1)),
paste0('p', c(25,50, 75, 100)), include.lowest = TRUE))
https://stackoverflow.com/questions/67612049
复制相似问题