dplyr中的行操作

医学和生信笔记

发布于 2022-11-15 03:18:08

1.4K00

代码可运行

运行总次数：0

代码可运行

在tidyverse中，整洁数据一般都是每一行是一个观测，每一列是一个变量，基本上所有操作都是基于整洁的数据进行的，都是对某列做什么操作。但有时候我们也需要对某行做一些操作，dplyr中现在提供了rowwise()函数快速执行对行的操作。

简介

library(dplyr, warn.conflicts = FALSE)

“rowwise()和group_by()很像，本身不做任何操作，但是使用了rowwise之后，再和mutate()等函数连用时，就会变成按照行进行操作！

df <- tibble(x = 1:2, y = 3:4, z = 5:6)
df %>% rowwise()
## # A tibble: 2 × 3
## # Rowwise: 
##       x     y     z
##   <int> <int> <int>
## 1     1     3     5
## 2     2     4     6

假如你想分别计算每行的均值（只是一个例子），不使用rowwise()函数，得到的结果是所有数据的均值，很明显不是想要的：

df %>% mutate(m = mean(c(x, y, z)))
## # A tibble: 2 × 4
##       x     y     z     m
##   <int> <int> <int> <dbl>
## 1     1     3     5   3.5
## 2     2     4     6   3.5

使用rowwise()之后，神奇的事情发生了，变成了按行操作！

df %>% rowwise() %>% mutate(m = mean(c(x, y, z)))
## # A tibble: 2 × 4
## # Rowwise: 
##       x     y     z     m
##   <int> <int> <int> <dbl>
## 1     1     3     5     3
## 2     2     4     6     4

df <- tibble(name = c("Mara", "Hadley"), x = 1:2, y = 3:4, z = 5:6)
df
## # A tibble: 2 × 4
##   name       x     y     z
##   <chr>  <int> <int> <int>
## 1 Mara       1     3     5
## 2 Hadley     2     4     6

按照行计算均值：

df %>% 
  rowwise() %>% 
  summarise(m = mean(c(x, y, z)))
## # A tibble: 2 × 1
##       m
##   <dbl>
## 1     3
## 2     4

根据name这一列按照行计算均值：

df %>% 
  rowwise(name) %>% # 相当于按照name分组
  summarise(m = mean(c(x, y, z)))
## `summarise()` has grouped output by 'name'. You can override using
## the `.groups` argument.
## # A tibble: 2 × 2
## # Groups:   name [2]
##   name       m
##   <chr>  <dbl>
## 1 Mara       3
## 2 Hadley     4

rowwise()可以看做是group_by()的特殊形式，本身也是对数据先进行聚合操作，所以如果要解除聚合，也要使用ungroup()函数。

对行进行汇总统计

df <- tibble(id = 1:6, w = 10:15, x = 20:25, y = 30:35, z = 40:45)
df
## # A tibble: 6 × 5
##      id     w     x     y     z
##   <int> <int> <int> <int> <int>
## 1     1    10    20    30    40
## 2     2    11    21    31    41
## 3     3    12    22    32    42
## 4     4    13    23    33    43
## 5     5    14    24    34    44
## 6     6    15    25    35    45

接下来要进行按行操作了！

rf <- df %>% rowwise(id)

计算加和：

rf %>% mutate(total = sum(c(w, x, y, z)))
## # A tibble: 6 × 6
## # Rowwise:  id
##      id     w     x     y     z total
##   <int> <int> <int> <int> <int> <int>
## 1     1    10    20    30    40   100
## 2     2    11    21    31    41   104
## 3     3    12    22    32    42   108
## 4     4    13    23    33    43   112
## 5     5    14    24    34    44   116
## 6     6    15    25    35    45   120

rf %>% summarise(total = sum(c(w, x, y, z)))
## `summarise()` has grouped output by 'id'. You can override using the
## `.groups` argument.
## # A tibble: 6 × 2
## # Groups:   id [6]
##      id total
##   <int> <int>
## 1     1   100
## 2     2   104
## 3     3   108
## 4     4   112
## 5     5   116
## 6     6   120

across也有行的形式，那就是c_across，帮助你快速选择多列数据：

rf %>% mutate(total = sum(c_across(w:z)))
## # A tibble: 6 × 6
## # Rowwise:  id
##      id     w     x     y     z total
##   <int> <int> <int> <int> <int> <int>
## 1     1    10    20    30    40   100
## 2     2    11    21    31    41   104
## 3     3    12    22    32    42   108
## 4     4    13    23    33    43   112
## 5     5    14    24    34    44   116
## 6     6    15    25    35    45   120

也可以和where连用：

rf %>% mutate(total = sum(c_across(where(is.numeric))))
## # A tibble: 6 × 6
## # Rowwise:  id
##      id     w     x     y     z total
##   <int> <int> <int> <int> <int> <int>
## 1     1    10    20    30    40   100
## 2     2    11    21    31    41   104
## 3     3    12    22    32    42   108
## 4     4    13    23    33    43   112
## 5     5    14    24    34    44   116
## 6     6    15    25    35    45   120

可以和列操作联合使用：

rf %>% 
  mutate(total = sum(c_across(w:z))) %>% 
  ungroup() %>% # 先解除行操作
  mutate(across(w:z, ~ . / total))
## # A tibble: 6 × 6
##      id     w     x     y     z total
##   <int> <dbl> <dbl> <dbl> <dbl> <int>
## 1     1 0.1   0.2   0.3   0.4     100
## 2     2 0.106 0.202 0.298 0.394   104
## 3     3 0.111 0.204 0.296 0.389   108
## 4     4 0.116 0.205 0.295 0.384   112
## 5     5 0.121 0.207 0.293 0.379   116
## 6     6 0.125 0.208 0.292 0.375   120