社区首页 >问答首页 >如何按类别分列R中的数据

问如何按类别分列R中的数据
EN

Stack Overflow用户

提问于 2020-08-20 01:38:04

回答 1查看 140关注 0票数 0

假设我有数据框架(下图)，我想根据区域分成两个新的类别，一个是BC，另一个是NZ，我是如何实现这一点的？( R) 数据

CDN&音视频通信出海专场

提供游戏出海、社交娱乐等方案，助力企业抢占出海市场

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-08-20 02:05:42

下面是mtcars数据的一个示例，其中我们使用传输变量am来用ggplot2包绘制分散图中的不同组。

我们将创建一个散点图，在x轴上使用位移变量，在y轴上创建每加仑英里数。由于发动机排量较大的汽车通常比排量较小的汽车消耗更多的汽油，我们预计图表中会出现一个负向关系(mpg较高，排量值较低)。

首先，我们将am转换成一个因子变量，使图例输出两个类别，而不是0到1之间的连续体。然后，我们使用ggplot()和geom_point()来根据am的值设置点颜色。

library(ggplot2)
mtcars$am <- factor(mtcars$am,labels = c("automatic","manual"))
ggplot(mtcars,aes(disp,mpg,group = am)) +
     geom_point(aes(color = am))

...and输出：

用面组将图表分离

我们可以直接使用ggplot2通过分组变量生成单独的图表。在ggplot2中，这被称为面图。我们使用facet_wrap()将数据拆分为am值，如下所示。

ggplot(mtcars,aes(disp,mpg,group = am)) +
     geom_point() +
     facet_wrap(mtcars$am,ncol = 2)

...and输出：

增加回归线和置信区间

考虑到原始问题中的注释，我们使用geom_smooth()函数向图中添加一条回归线，默认为低平滑。

ggplot(mtcars,aes(disp,mpg,group = am)) +
     geom_point() +
     facet_wrap(mtcars$am,ncol = 2) + 
     geom_smooth(span = 1)

...and输出：

为了使用简单的回归而不是低平滑，我们在geom_smooth()中使用了geom_smooth()参数，并将其设置为lm。

ggplot(mtcars,aes(disp,mpg,group = am)) +
     geom_point() +
     facet_wrap(mtcars$am,ncol = 2) + 
     geom_smooth(method = "lm")

...and输出：

按组生成回归模型

在这里，我们将数据框架按am值进行拆分，并使用lapply()为每一组生成回归模型。

carsList <- split(mtcars,mtcars$am)
lapply(carsList,function(x){
     summary(lm(mpg ~ disp,data = x))

})

...and输出：

$automatic

Call:
lm(formula = mpg ~ disp, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.7341 -1.6546 -0.8855  1.6032  5.0764 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 25.157064   1.592922   15.79 1.36e-11 ***
disp        -0.027584   0.005146   -5.36 5.19e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.405 on 17 degrees of freedom
Multiple R-squared:  0.6283,    Adjusted R-squared:  0.6064 
F-statistic: 28.73 on 1 and 17 DF,  p-value: 5.194e-05


$manual

Call:
lm(formula = mpg ~ disp, data = x)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.6056 -2.4200 -0.0956  3.1484  5.2315 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 32.86614    1.95033  16.852 3.33e-09 ***
disp        -0.05904    0.01174  -5.031 0.000383 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.545 on 11 degrees of freedom
Multiple R-squared:  0.6971,    Adjusted R-squared:  0.6695 
F-statistic: 25.31 on 1 and 11 DF,  p-value: 0.0003834

注意:由于这是一个示例，说明了使用拆分变量生成回归分析所需的代码，因此我们将不详细介绍这里的数据是否符合普通最小二乘回归的建模假设。

一种回归模型中的组建模

正如我在注释中所指出的，如果我们指定了am效应以及am和disp之间的交互效应，那么我们可以在一个回归模型中解释自动传输和手动传输之间的差异。

summary(lm(mpg ~ disp + am + am * disp,data=mtcars))

通过从每个模型生成预测，我们可以演示该模型生成与手动传输的拆分模型相同的预测，如下所示。

data <- data.frame(am = c(1,1,0),
                   disp = c(157,248,300))
data$am <- factor(data$am,labels = c("automatic","manual"))
mod1 <- lm(mpg ~ disp + am + am * disp,data=mtcars)
predict(mod1,data)
mod2 <- lm(mpg ~ disp,data = mtcars[mtcars$am == "manual",])
predict(mod2,data[data$am == "manual",])

...and输出：

> data <- data.frame(am = c(1,1,0),
+                    disp = c(157,248,300))
> data$am <- factor(data$am,labels = c("automatic","manual"))
> mod1 <- lm(mpg ~ disp + am + am * disp,data=mtcars)
> predict(mod1,data)
       1        2        3 
23.59711 18.22461 16.88199 
> mod2 <- lm(mpg ~ disp,data = mtcars[mtcars$am == "manual",])
> predict(mod2,data[data$am == "manual",])
       1        2 
23.59711 18.22461