如何通过在dataframe R中创建列值列表对列值进行分组。
我的数据
CustNumber Queue CustID ProNo#
1 Start 1 ESC
2 Start 1 Check
1 Start 1,1,1 hjju623,hjju623
1 Start 1,2,1,1 First44,ESC
2 Start 1,etc,ex rere43
3 Start 1, 5597595494 151ss5151, 4949we49
我正在使用下面的代码创建列值列表,方法是搜索CustNumber,Queue。
val<- df %>%
gather(key,Value, -c(Queue,CustNumber)) %>%
group_by(Queue,CustNumber, key,Value) %>%
summarise(Count = n())%>%
nest(key,Value,Count,.key = "listofvalues")
它给了我。
Queue CustNumber Key listofvalues
Start 1 CustID list(Value = c("1", "1,1,1", "1,2,1,1"), Count = c(1, 1, 1))
Start 1 ProNo# list(Value = c("ESC", "First44,ESC", "hjju623,hjju623"), Count = c(1, 1, 1))
Start 2 CustID list(Value = c("1", "1,etc,ex"), Count = c(1, 1))
Start 2 ProNo# list(Value = c("Check", "rere43"), Count = c(1, 1))
Start 3 CustID list(Value = "1, 5597595494", Count = 1)
Start 3 ProNo# list(Value = "151ss5151, 4949we49", Count = 1)
但我期望的数据框架是
Queue CustNumber Key listofvalues
Start 1 CustID list(Value = c("1", "2"), Count = c(7,1))
Start 1 ProNo# list(Value = c("ESC", "First44", "hjju623"), Count = c(2, 1, 2))
Start 2 CustID list(Value = c("1", "etc","ex"), Count = c(2, 1,1))
Start 2 ProNo# list(Value = c("Check", "rere43"), Count = c(1, 1))
Start 3 CustID list(Value = "1", "5597595494", Count = c(1,1))
Start 3 ProNo# list(Value = "151ss5151", "4949we49", Count = c(1,1))
请帮我做这件事。
数据帧的放置。
df<-structure(list(CustNumber = c("1", "2", "1",
"1", "2", "3"), Queue = c("Start", "Start",
"Start", "Start", "Start", "Start"), CustID = c("1", "1", "1,1,1",
"1,2,1,1", "1,etc,ex", "1, 5597595494"), `ProNo#` = c("ESC", "Check", "hjju623,hjju623",
"First44,ESC", "rere43", "151ss5151, 4949we49")), .Names = c("CustNumber",
"Queue", "CustID", "ProNo#"), row.names = c(NA, 6L), class = "data.frame")
发布于 2018-01-03 22:54:50
我们需要拆分字符串值。使用separate_rows
,我们可以将它转换成'long‘格式,然后在summarise
中获取unique
’值‘和使用table
的频率
library(dplyr)
library(tidyr)
res <- df %>%
gather(key,Value, -c(Queue,CustNumber)) %>%
separate_rows(Value, sep=",") %>%
group_by(CustNumber, Queue, key) %>%
summarise(Count = list(list(Value = unique(Value),
Count = table(factor(Value, levels = unique(Value))))))
res$Count[[1]]
#$Value
#[1] "1" "2"
#$Count
#1 2
#7 1
发布于 2018-01-03 23:05:00
这将提供所需的输出:
library(tidyr)
library(dplyr)
df %>%
gather(key, Value, -c(Queue,CustNumber)) %>%
rowwise() %>%
mutate(value = strsplit(Value, split = ",")) %>%
unnest() %>%
group_by(Queue, CustNumber, key, Value) %>%
summarise(Count = n()) %>%
nest(key, Value, Count, .key = "listofvalues")
https://stackoverflow.com/questions/48089970
复制相似问题