首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在R中将长列表划分为指定长度的短列表

在R中将长列表划分为指定长度的短列表
EN

Stack Overflow用户
提问于 2018-02-22 21:15:15
回答 3查看 1.2K关注 0票数 2

这与前面的问题here密切相关。然而,我需要一些稍微不同的东西。

我有一个很长的对象列表,我需要将它们划分为较小的列表,每个列表都有一定数量的条目。我需要能够为不同的任务改变列表的长度。问题是每个对象在单个列表中只能出现一次。

代码语言:javascript
复制
# Create some example data... 
# Make a list of objects.
LIST <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 'Mint', 'Red', 'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 'Peanuts', 'Cloud', 'Forever', 'Ocean', 'Train', 'Fork', 'Moon', 'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 'Tin', 'Bag', 'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 'Green', 'Chocolate', 'Car', 'Distance')

# Generate a longer list, with a random sequence and number of repetitions for each entry.
set.seed(123)

LONG.LIST <- data.frame(Name = (sample(LIST, size = 200, replace = TRUE)))

print(LONG.LIST)

Name
1         Cup
2    Distance
3        Roof
4      Pencil
5       Lunch
6       Toast
7       Watch
8      Bottle
9         Car
10       Roof
11      Lunch
12    Forever
13     Cheese
14    Oranges
15      Ocean
16  Chocolate
17      Socks
18     Leaves
19    Oranges
20   Distance
21      Green
22      Paper
23        Red
24      Paper
25      Trees
26  Chocolate
27     Bottle
28        Dog
29       Wind
30     Parrot
etc....

对于参数,假设我想创建一系列包含20个项目的列表。使用上面生成的示例,'Distance'出现在位置'2‘和位置'20','Lunch'出现在'5’和'11‘,'Oranges'出现在'14’和19',因此没有重复的第一个列表需要扩展以包括'Green''Paper''Red'。然后,第二个列表将从位置24的'Paper'开始。然而,我不想被限制在20个长度内,有时我可能希望它是10或25。

在下面的@LAP中添加注释,这有助于描述我的问题;“检查向量,直到找到20个唯一项,将它们放在一起,丢弃重复项,然后继续向量,直到找到下20个唯一项,依此类推,直到向量的末尾,用NA填充最后一部分。

“单独的列表只需要自身是唯一的。两个或多个列表之间可能存在重复项。”

最后一个列表可能是不完整的,所以最好用'NA's填充它。理想情况下,每个列表中的条目应该是按字母顺序排列的。

最有用的输出是数据帧中的每列一个列表。

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2018-02-22 22:41:03

好的,这是一个部分的答案,因为我想我已经得到了你需要的大部分。

请注意,对于大数据,这可能会很慢。

首先,您可以初始化一个列表,其中的空向量数量与您想要的分组数量一样多。在这个例子中,我们希望从一个200个项目的向量中创建10个20个的组。

首先,我们创建可复制的数据:

代码语言:javascript
复制
LIST <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 'Mint', 'Red', 
          'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 'Peanuts', 'Cloud', 'Forever', 
          'Ocean', 'Train', 'Fork', 'Moon', 'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 
          'Tin', 'Bag', 'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 'Green', 
          'Chocolate', 'Car', 'Distance')

set.seed(123)

LONG.LIST <- data.frame(Name = (sample(LIST, size = 200, replace = TRUE)), stringsAsFactors = F)

test <- vector("list", 10)

然后初始化两个计数器:

代码语言:javascript
复制
i <- 1
j <- 1

现在我们使用一个while循环,该循环一直运行到i大于我们的向量中要拆分的项数(因此当i > 200时它停止)。在这个循环中,我们检查列表中的当前子向量j是否小于20。如果是,我们添加一个项目并进行重复数据删除,如果不是,我们向j添加1以跳转到下一个子向量。

代码语言:javascript
复制
while(i <= nrow(LONG.LIST)){
  if(length(test[[j]]) < 20){
      test[[j]] <- c(test[[j]], LONG.LIST$Name[i])
      test[[j]] <- unique(test[[j]])
      i <- i+1
  }else{
      j <- j+1
    }
}

这是我们的结果:

代码语言:javascript
复制
> test
[[1]]
 [1] "Lunch"     "Cheese"    "Truck"     "Roof"      "Hope"      "Mint"      "Lemons"    "Pencil"    "Hippo"     "Moon"     
[11] "Car"       "Chocolate" "Trees"     "Distance"  "Dog"       "Bag"       "Paper"     "Peanuts"   "Ocean"     "Wind"     

[[2]]
 [1] "Hippo"     "Wind"      "Mint"      "Plane"     "Trees"     "Truck"     "Lemons"    "Watch"     "Chocolate" "Train"    
[11] "Dog"       "Lunch"     "Green"     "Horse"     "Toast"     "Distance"  "Cloud"     "Hammock"   "Fork"      "Paper"    

[[3]]
 [1] "Watch"     "Hope"      "Paper"     "Socks"     "Bag"       "Plane"     "Bottle"    "Green"     "Lunch"     "Fork"     
[11] "Mint"      "Hippo"     "Chocolate" "Car"       "Trees"     "Toast"     "Forever"   "Red"       "Wind"      "Ocean"    

[[4]]
 [1] "Car"      "Lunch"    "Toast"    "Lemons"   "Moon"     "Socks"    "Hippo"    "Pencil"   "Blue"     "Fork"     "Paper"   
[12] "Distance" "Cloud"    "Train"    "Wind"     "Watch"    "Bottle"   "Forever"  "Green"    "Bag"     

[[5]]
 [1] "Train"   "Cheese"  "Bottle"  "Fork"    "Paper"   "Green"   "Leaves"  "Blue"    "Toast"   "Parrot"  "Lemons"  "Dog"    
[13] "Hammock" "Ocean"   "Red"     "Peanuts" "Pencil"  "Bag"     "Horse"   "Hope"   

[[6]]
 [1] "Oranges"   "Truck"     "Hippo"     "Trees"     "Parrot"    "Red"       "Hope"      "Cloud"     "Tin"       "Bag"      
[11] "Pencil"    "Cup"       "Dog"       "Leaves"    "Chocolate" "Mint"      "Plane"     "Moon"      "Fork"      "Green"    

[[7]]
 [1] "Tin"       "Mint"      "Book"      "Bag"       "Roof"      "Hope"      "Socks"     "Watch"     "Paper"     "Peanuts"  
[11] "Cup"       "Distance"  "Leaves"    "Bottle"    "Cloud"     "Horse"     "Trees"     "Oranges"   "Chocolate" "Toast"    

[[8]]
[1] "Horse"     "Watch"     "Chocolate" "Tin"       "Red"       "Train"    

[[9]]
NULL

[[10]]
NULL

现在我们只需要用NA填充最后的向量。这可能会以不同的方式完成,但它可以完成工作:

代码语言:javascript
复制
for(i in 1:length(test)){
  if(length(test[[i]]) < 20){
    test[[i]] <- c(test[[i]], rep(NA, 20 - length(test[[i]])))
  }
}
票数 1
EN

Stack Overflow用户

发布于 2018-02-22 22:39:30

这是一个潜在的答案,它不是很漂亮,但我认为这是你想要的:

首先是数据:

代码语言:javascript
复制
LIST <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 
         'Mint', 'Red', 'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 
         'Peanuts', 'Cloud', 'Forever', 'Ocean', 'Train', 'Fork', 'Moon', 
         'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 'Tin', 'Bag', 
         'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 
         'Green', 'Chocolate', 'Car', 'Distance')

set.seed(123)
LONG.LIST <- data.frame(Name = (sample(LIST, size = 200, replace = TRUE)))

创建一个函数,该函数将从数据帧的顶部找到20个唯一元素,并根据该元素将数据帧拆分为两个列表元素:

代码语言:javascript
复制
library(tidyverse)

spliter <- function(df){
  df  %>%
  as.tibble()%>%
    mutate(Name = as.character(Name),
           dup = !duplicated(Name),
           cum = cumsum(dup),
           splt = ifelse(cum <= 20, 0, 1)) %>%
    {split(df, .$splt)} 
}

现在,将此函数应用于结果列表的第二个元素,直到没有可拆分的内容为止,删除每个列表元素中的重复项:

代码语言:javascript
复制
b <- spliter(LONG.LIST)
c1 <- list(b[[1]] %>%
            filter(!duplicated(Name)))

i <- 1
while(length(b) != 1){
  i <- i+1
  b <- spliter(b[[2]])

  c1[[i]] <- b[[1]] %>%
    filter(!duplicated(Name))
}

如果需要,用NA填充最后一个元素:

代码语言:javascript
复制
c1 <- lapply(c1, function(x){
  if(nrow(x) <  20){
    data.frame(Name = c(as.character(x$Name), rep(NA_character_, (20-length(x$Name)))))
  } else( x)
})

合并到数据帧:

代码语言:javascript
复制
do.call(cbind, c1)

        Name    Name     Name      Name     Name    Name      Name
1        Cup   Green     Wind      Mint     Book Hammock    Parrot
2       Blue     Tin    Paper    Bottle   Pencil   Trees   Hammock
3      Cloud    Blue   Cheese    Cheese      Red     Dog    Pencil
4       Wind Oranges      Dog     Lunch    Paper   Socks       Bag
5  Chocolate   Train  Peanuts    Pencil Distance   Train     Watch
6      Toast  Lemons    Watch      Blue     Hope Peanuts     Train
7       Moon     Red    Plane       Dog      Dog   Hippo     Horse
8      Horse  Pencil  Forever     Ocean   Bottle   Horse     Green
9      Ocean   Trees     Blue      Fork      Tin     Red  Distance
10       Car  Bottle   Lemons    Parrot   Leaves Forever    Leaves
11       Tin   Cloud     Book     Train     Wind    Fork Chocolate
12     Hippo   Paper      Bag       Car   Cheese   Paper     Ocean
13     Trees    Hope  Oranges      Wind    Socks    Book     Cloud
14     Lunch   Ocean    Train     Green     Fork    Moon    Cheese
15      Book   Watch      Red    Leaves    Plane   Cloud      Hope
16  Distance    Roof   Leaves     Cloud     Blue   Watch      <NA>
17    Cheese   Toast    Hippo Chocolate  Forever    Mint      <NA>
18       Bag Forever    Trees     Truck    Cloud    Roof      <NA>
19    Parrot   Hippo    Cloud       Bag  Oranges  Cheese      <NA>
20    Bottle   Horse Distance      Moon     Mint  Leaves      <NA>

下面是ngm的答案中的一个函数:

代码语言:javascript
复制
miss <- function(y, split){
  require(tidyverse)
  spliter <- function(df){
    df  %>%
      as.tibble()%>%
      mutate(Name = as.character(Name),
             dup = !duplicated(Name),
             cum = cumsum(dup),
             splt = ifelse(cum <= split, 0, 1)) %>%
             {split(df, .$splt)} 
    }
  b <- spliter(y)
  c1 <- list(b[[1]] %>%
            filter(!duplicated(Name)))
  i <- 1
  while(length(b) != 1){
    i <- i+1
    b <- spliter(b[[2]])

      c1[[i]] <- b[[1]] %>%
        filter(!duplicated(Name))
      }
  c1 <- lapply(c1, function(x){
    if(nrow(x) <  20){
      data.frame(Name = c(as.character(x$Name), rep(NA_character_, (20-length(x$Name)))))
      } else( x)
    })
  return(do.call(cbind, c1))
}

用法:

代码语言:javascript
复制
miss(LONG.LIST, 20 )
票数 1
EN

Stack Overflow用户

发布于 2018-02-22 22:58:41

此函数svu (“唯一拆分向量”)接受一个向量,并根据您的规范生成一个数据帧。

我不明白为什么输入会是列表或数据框。将输入设为向量似乎更自然。

代码语言:javascript
复制
words <- c('Oranges', 'Toast', 'Truck', 'Dog', 'Hippo', 'Bottle', 'Hope', 'Mint', 'Red', 'Trees', 'Watch', 'Cup', 'Pencil', 'Lunch', 'Paper', 'Peanuts', 'Cloud', 'Forever', 'Ocean', 'Train', 'Fork', 'Moon', 'Horse', 'Parrot', 'Leaves', 'Book', 'Cheese', 'Tin', 'Bag', 'Socks', 'Lemons', 'Blue', 'Plane', 'Hammock', 'Roof', 'Wind', 'Green', 'Chocolate', 'Car', 'Distance')
set.seed(123)
more_words <- sample(words, size = 200, replace = TRUE)

# x is the original vector and n is the desired number of 
# words in each column of the resulting data frame.
svu <- function(x, n) {
  # How many eventual columns?
  n_cols <- trunc(length(x)/n)
  # That many eventual columns all filled with NA for now.
  vec_list <- lapply(1:n_cols, function(x) rep(NA, n))

  # For each word...
  for(string in x) {
    for(i in 1:n_cols) {
      if(!(string %in% vec_list[[i]]) && sum(is.na(vec_list[[i]])) > 0) {
        # ...add it to a non-full column not containing that word.
        vec_list[[i]][min(which(is.na(vec_list[[i]])))] <- string
        break
      }
    }
  }
  # Make it a data frame
  data.frame(do.call(cbind, vec_list), stringsAsFactors = FALSE)
}

试试看:

代码语言:javascript
复制
svu(more_words, 20)                                                                                                                                                                                                                                                                                                                                                                
#>           X1      X2      X3       X4        X5        X6       X7
#> 1        Cup    Wind    Wind     Wind      Wind     Plane     Wind
#> 2       Blue   Ocean     Car   Bottle     Plane   Forever   Bottle
#> 3      Cloud   Horse     Tin    Watch   Forever      Wind   Pencil
#> 4       Wind   Toast   Cloud    Plane      Blue    Cheese Distance
#> 5  Chocolate     Car  Bottle  Forever     Hippo      Mint     Hope
#> 6      Toast     Tin   Trees     Blue      Mint      Blue      Dog
#> 7       Moon    Moon   Ocean   Lemons    Bottle     Lunch      Tin
#> 8      Horse     Cup   Watch     Book    Cheese     Train   Leaves
#> 9      Ocean   Green    Roof      Bag     Lunch    Bottle   Cheese
#> 10       Car    Blue   Toast  Oranges    Pencil    Pencil    Socks
#> 11       Tin Oranges Forever    Train       Dog     Truck     Fork
#> 12     Hippo   Train    Blue      Red     Ocean Chocolate    Plane
#> 13     Trees  Lemons   Hippo  Peanuts      Fork       Bag     Blue
#> 14     Lunch     Red   Horse   Leaves    Parrot      Moon  Forever
#> 15      Book  Pencil     Red    Paper     Train       Car    Cloud
#> 16  Distance   Trees  Lemons    Hippo       Car    Parrot  Oranges
#> 17    Cheese  Bottle   Paper    Trees     Green     Cloud     Mint
#> 18       Bag   Cloud  Cheese   Cheese    Leaves      Book  Hammock
#> 19    Parrot   Paper     Dog    Cloud     Cloud       Red    Trees
#> 20    Bottle    Hope Peanuts Distance Chocolate     Paper    Train
#>          X8      X9       X10
#> 1      Wind    Wind     Trees
#> 2    Pencil  Pencil     Paper
#> 3    Bottle   Trees       Red
#> 4    Cheese Peanuts     Socks
#> 5  Distance     Red      Roof
#> 6     Trees   Paper    Pencil
#> 7       Dog   Socks    Parrot
#> 8     Socks    Book     Watch
#> 9   Hammock    Mint     Green
#> 10  Peanuts    Roof  Distance
#> 11    Hippo  Cheese    Leaves
#> 12    Horse  Leaves Chocolate
#> 13      Red    Moon     Ocean
#> 14  Forever  Parrot     Cloud
#> 15     Fork Hammock    Cheese
#> 16    Paper     Bag      Hope
#> 17     Book   Watch     Horse
#> 18     Moon   Train      <NA>
#> 19    Cloud   Horse      <NA>
#> 20    Watch   Green      <NA>
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/48928383

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档