我试图使用R创建一个条形图,将类别的频率与整个数据集的频率进行比较。我创建了一些模拟数据,类似于真实的数据和预期的输出。我的模拟数据包括三个水果(苹果、橘子、香蕉),它们的进食频率相等(1-2次、3-4次、>4次)。模拟数据:
ID Fruit frequency
1 apple 1-2 times
2 apple 3-4 times
3 apple 1-2 times
4 apple 3-4 times
5 apple 1-2 times
6 apple > 4 times
7 orange 3-4 times
8 orange 3-4 times
9 orange 1-2 times
10 orange 1-2 times
11 orange 1-2 times
12 banana 1-2 times
13 banana 3-4 times
14 banana > 4 times
15 banana > 4 times
16 banana 1-2 times
17 banana 3-4 times
18 banana > 4 times
19 banana 1-2 times
预期产量是一个条形图,三组进食频率(1-2次,3-4次,>4次)。对于每个组,将有两个列,一个列表示"apple",另一个列表示“整个数据集”。
我可以为每个类别(如苹果)创建频率条形图,但不知道如何添加整个数据集数据以进行比较。
有什么建议使用哪种代码或采用哪种方法(可能是子集"apple“?)我会很感激的!
发布于 2018-04-02 23:22:25
首先,我计算了两个百分比(即水果内部和总数),然后将数据转换为友好的绘图格式。
library(ggplot2)
library(dplyr)
library(tidyr)
df %>%
group_by(fruit) %>%
mutate(countF = n()) %>%
group_by(freq, add=T) %>%
#frequency percentage within fruit
mutate(freq_perc_within_fruit = round(n()/countF * 100)) %>%
group_by(freq) %>%
#frequency percentage in total
mutate(freq_perc_in_total = round(n()/nrow(.) * 100)) %>%
select(fruit, freq, freq_perc_within_fruit, freq_perc_in_total) %>%
gather(Percentage, value, -fruit, - freq) %>%
#plot
ggplot(aes(x = freq, y=value, fill=Percentage)) +
geom_bar(position = "dodge", stat = "identity") +
facet_grid(fruit ~ .) +
geom_text(aes(label = paste0(value, "%")), position=position_dodge(.9), vjust=0)
输出图是:
样本数据:
df<- structure(list(ID = 1:19, fruit = c("apple", "apple", "apple",
"apple", "apple", "apple", "orange", "orange", "orange", "orange",
"orange", "banana", "banana", "banana", "banana", "banana", "banana",
"banana", "banana"), freq = c("1-2 times", "3-4 times", "1-2 times",
"3-4 times", "1-2 times", "> 4 times", "3-4 times", "3-4 times",
"1-2 times", "1-2 times", "1-2 times", "1-2 times", "3-4 times",
"> 4 times", "> 4 times", "1-2 times", "3-4 times", "> 4 times",
"1-2 times")), .Names = c("ID", "fruit", "freq"), class = "data.frame", row.names = c(NA,
-19L))
发布于 2018-04-02 21:21:11
以下是一个简单的解决方案:
data <- data.frame(
fruit = sample(c("apple",'orange','banana'), size = 20, replace = TRUE),
frequency =factor(sample(c("1-2 times", '3-4 times', '> 4 times'), size = 20, replace = TRUE), levels = c("1-2 times", '3-4 times', '> 4 times'))
)
apple.freq <- with(subset(data, fruit == "apple"), prop.table(table(frequency)))
overall.freq <- with(data, prop.table(table(frequency)))
freq.mat <- rbind(apple.freq, overall.freq)
barplot(freq.mat, beside = TRUE, col = c("red", "blue"))
您将需要添加图例和轴标签等,但这将使您开始。
使用ggplot2
(例如wrap in ggplot2?的一个变体)可以得到更多的爱好者,但是这是一个简单的解决方案。
https://stackoverflow.com/questions/49617700
复制相似问题