问如何在dataframe R中创建列值列表对列值进行分组
EN

Stack Overflow用户

提问于 2018-01-03 22:41:23

回答 2查看 420关注 0票数 1

如何通过在dataframe R中创建列值列表对列值进行分组。

我的数据

CustNumber    Queue        CustID          ProNo#

1             Start         1               ESC

2             Start         1               Check

1             Start         1,1,1           hjju623,hjju623

1             Start         1,2,1,1         First44,ESC

2             Start         1,etc,ex        rere43

3             Start         1, 5597595494   151ss5151, 4949we49

我正在使用下面的代码创建列值列表，方法是搜索CustNumber，Queue。

val<- df %>%
  gather(key,Value, -c(Queue,CustNumber)) %>%
  group_by(Queue,CustNumber, key,Value) %>%
  summarise(Count = n())%>%
  nest(key,Value,Count,.key = "listofvalues")

它给了我。

Queue     CustNumber    Key     listofvalues

Start       1          CustID   list(Value = c("1", "1,1,1", "1,2,1,1"), Count = c(1, 1, 1))

Start       1          ProNo#   list(Value = c("ESC", "First44,ESC", "hjju623,hjju623"), Count = c(1, 1, 1))

Start       2          CustID   list(Value = c("1", "1,etc,ex"), Count = c(1, 1))   

Start       2          ProNo#   list(Value = c("Check", "rere43"), Count = c(1, 1))  

Start       3          CustID   list(Value = "1, 5597595494", Count = 1)

Start       3          ProNo#   list(Value = "151ss5151, 4949we49", Count = 1)

但我期望的数据框架是

Queue     CustNumber    Key     listofvalues

Start       1          CustID   list(Value = c("1", "2"), Count = c(7,1))

Start       1          ProNo#   list(Value = c("ESC", "First44", "hjju623"), Count = c(2, 1, 2))

Start       2          CustID   list(Value = c("1", "etc","ex"), Count = c(2, 1,1))   

Start       2          ProNo#   list(Value = c("Check", "rere43"), Count = c(1, 1))  

Start       3          CustID   list(Value = "1", "5597595494", Count = c(1,1))

Start       3          ProNo#   list(Value = "151ss5151", "4949we49", Count = c(1,1))

请帮我做这件事。

数据帧的放置。

df<-structure(list(CustNumber = c("1", "2", "1", 
"1", "2", "3"), Queue = c("Start", "Start", 
"Start", "Start", "Start", "Start"), CustID = c("1", "1", "1,1,1", 
"1,2,1,1", "1,etc,ex", "1, 5597595494"), `ProNo#` = c("ESC", "Check", "hjju623,hjju623", 
"First44,ESC", "rere43", "151ss5151, 4949we49")), .Names = c("CustNumber", 
"Queue", "CustID", "ProNo#"), row.names = c(NA, 6L), class = "data.frame")

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-01-03 22:54:50

我们需要拆分字符串值。使用separate_rows，我们可以将它转换成'long‘格式，然后在summarise中获取unique’值‘和使用table的频率

library(dplyr)
library(tidyr)
res <- df %>% 
         gather(key,Value, -c(Queue,CustNumber)) %>% 
         separate_rows(Value, sep=",") %>% 
         group_by(CustNumber, Queue, key) %>% 
         summarise(Count = list(list(Value = unique(Value),
                            Count = table(factor(Value, levels = unique(Value))))))

res$Count[[1]]
#$Value
#[1] "1" "2"

#$Count

#1 2 
#7 1

票数 0

Stack Overflow用户

发布于 2018-01-03 23:05:00

这将提供所需的输出：

library(tidyr)
library(dplyr)
df %>%
  gather(key, Value, -c(Queue,CustNumber)) %>% 
  rowwise() %>% 
  mutate(value = strsplit(Value, split = ",")) %>% 
  unnest() %>% 
  group_by(Queue, CustNumber, key, Value) %>%
  summarise(Count = n()) %>% 
  nest(key, Value, Count, .key = "listofvalues")