社区首页 >问答首页 >在dplyr中使用columns将不存在的列添加到数据帧

问在dplyr中使用columns将不存在的列添加到数据帧
EN

Stack Overflow用户

提问于 2022-08-15 22:05:09

回答 3查看 78关注 0票数 2

我有一个列名的列表如下，

cols <- c('grade', 'score', 'status')

如果数据帧在cols向量中没有任何列，我希望使用可变和跨的方式将该列(值作为NA)添加到数据帧中。怎么做？

dplyr

dataframe

回答 3

Stack Overflow用户

回答已采纳

发布于 2022-08-15 22:40:37

base解决方案：

df[setdiff(cols, names(df))] <- NA

此命令可用于管道：

df %>%
  `[<-`(, setdiff(cols, names(.)), NA)

#   id score grade status
# 1  1    94    NA     NA
# 2  2    98    NA     NA
# 3  3    93    NA     NA
# 4  4    82    NA     NA
# 5  5    89    NA     NA

数据

set.seed(123)
df <- data.frame(id = 1:5, score = sample(80:100, 5))

票数 4

Stack Overflow用户

发布于 2022-08-15 22:49:25

一种使用dplyr::mutate()的解决方案

假设您的数据框架是diamonds。然后向原始数据框架(即此处的diamond )添加一个具有与列名相同的列数的tibble数据框架(即，在此MWE中的三列)。

自动创建包含NA的tibble

(多亏了the comment by Darren Tsai)

要创建一个具有与列名相同的列数的tibble，您可以首先创建一个matrix，它的列数与matrix(ncol = length(cols))的列名相同，其次，通过as_tibble()将其转换为tibble数据框架，并使用as_tibble()内部的.name_repair = ~ cols设置列名。

创建矩阵时，tibble的每一列的值为logicalNA，。注意，如果您希望将这些新添加的列分别作为NA_integer_列、数值列、复杂列(例如1+ 5i)和字符列，那么您可能更喜欢使用NA_real_、NA_complex_或NA_character_中的一种而不是NA。在这种情况下，您可以使用mutate来修改列的类型。

您可以在mutate中创建这样一个tibble。

cols <- c('grade', 'score', 'status')

diamonds |>
  mutate(
    matrix(
      ncol = length(cols)
    ) |>
      as_tibble(
        .name_repair = ~ cols
      ) |>
      ## if you want to interpret the grade as `factor` type...
      mutate(
        grade = as.factor(grade)
      )
  )

## # A tibble: 53,940 × 13
##    carat cut       color clarity depth table price     x     y     z grade score
##    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <fct> <lgl>
##  1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43 NA    NA
##  2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31 NA    NA   
##  3  0.23 Good      E     VS1      56.9    65   327  4.05  4.07  2.31 NA    NA
##  4  0.29 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63 NA    NA
##  5  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75 NA    NA
##  6  0.24 Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48 NA    NA
##  7  0.24 Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47 NA    NA   
##  8  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53 NA    NA
##  9  0.22 Fair      E     VS2      65.1    61   337  3.87  3.78  2.49 NA    NA
## 10  0.23 Very Good H     VS1      59.4    61   338  4     4.05  2.39 NA    NA
## # … with 53,930 more rows, and 1 more variable: status <lgl>

若要创建没有与原始数据框架匹配的任何列的NA tibble，请执行以下操作

(多亏了the comment by Julian)

要确保仅当原始数据帧中没有cols向量中的任何列时，才会将列添加到原始数据帧中，您必须选择原始数据框架中不存在的NA tibble列。您可以通过使用!select(matches(colnames(diamonds)))来做到这一点。

cols <- c("grade", "price", "status")

matrix(ncol = length(cols)) |>
  as_tibble(
    .name_repair = ~ cols
  ) |>
  mutate(
    grade = as.factor(grade)
  )

diamonds |>
  mutate(
    matrix(
      ncol = length(cols)
    ) |>
      as_tibble(
        .name_repair = ~cols
      ) |>
      ## if you want to interpret the grade as `factor` type...
      mutate(
        grade = as.factor(grade)
      ) |>
      ## select columns that are not present in the original data frame 
      dplyr::select(
        !matches(colnames(diamonds))
      )
  )

## # A tibble: 53,940 × 12
##    carat cut      color clarity depth table price     x     y     z grade status
##    <dbl> <ord>    <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl> <fct> <lgl> 
##  1  0.23 Ideal    E     SI2      61.5    55   326  3.95  3.98  2.43 NA    NA
##  2  0.21 Premium  E     SI1      59.8    61   326  3.89  3.84  2.31 NA    NA
##  3  0.23 Good     E     VS1      56.9    65   327  4.05  4.07  2.31 NA    NA    
##  4  0.29 Premium  I     VS2      62.4    58   334  4.2   4.23  2.63 NA    NA
##  5  0.31 Good     J     SI2      63.3    58   335  4.34  4.35  2.75 NA    NA
##  6  0.24 Very Go… J     VVS2     62.8    57   336  3.94  3.96  2.48 NA    NA
##  7  0.24 Very Go… I     VVS1     62.3    57   336  3.95  3.98  2.47 NA    NA
##  8  0.26 Very Go… H     SI1      61.9    55   337  4.07  4.11  2.53 NA    NA    
##  9  0.22 Fair     E     VS2      65.1    61   337  3.87  3.78  2.49 NA    NA
## 10  0.23 Very Go… H     VS1      59.4    61   338  4     4.05  2.39 NA    NA
## # … with 53,930 more rows

票数 2

Stack Overflow用户

发布于 2022-08-15 22:27:31

df <- data.frame(grade = c("A", "B", "C"),
                 score = c(1, 2, 3))

cols <- c('grade', 'score', 'status')

for (i in cols){
    if (!(i %in% colnames(df))){
        df[i] <- NA
    }
}

> df
  grade score status
1     A     1     NA
2     B     2     NA
3     C     3     NA