[767]MongoDB聚合运算

周小董

发布于 2020-02-29 08:02:45

1.8K00

代码可运行

文章被收录于专栏：python前行者python前行者

运行总次数：0

代码可运行

数据

/* 1 */
{
    "_id" : "2020-02-01",
    "website_clf" : [ 
        {
            "source" : "猎云网",
            "sum_num" : 3880,
            "day_num" : 11,
            "time_clf" : {
                "1" : 0,
                "2" : 0,
                "3" : 0,
                "4" : 0,
                "5" : 0,
                "6" : 0,
                "7" : 0,
                "8" : 1,
                "9" : 0,
                "10" : 0,
                "11" : 3,
                "12" : 0,
                "13" : 0,
                "14" : 0,
                "15" : 2,
                "16" : 2,
                "17" : 1,
                "18" : 1,
                "19" : 0,
                "20" : 0,
                "21" : 0,
                "22" : 0,
                "23" : 1,
                "24" : 0
            }
        }, 
        {
            "source" : "钛媒体",
            "sum_num" : 1086,
            "day_num" : 14,
            "time_clf" : {
                "1" : 0,
                "2" : 0,
                "3" : 0,
                "4" : 0,
                "5" : 0,
                "6" : 0,
                "7" : 0,
                "8" : 1,
                "9" : 2,
                "10" : 1,
                "11" : 0,
                "12" : 1,
                "13" : 0,
                "14" : 1,
                "15" : 1,
                "16" : 0,
                "17" : 0,
                "18" : 1,
                "19" : 1,
                "20" : 4,
                "21" : 0,
                "22" : 1,
                "23" : 0,
                "24" : 0
            }
        }
    ]
}

/* 2 */
{
    "_id" : "2020-02-02",
    "website_clf" : [ 
        {
            "source" : "猎云网",
            "sum_num" : 3895,
            "day_num" : 15,
            "time_clf" : {
                "1" : 0,
                "2" : 0,
                "3" : 0,
                "4" : 0,
                "5" : 0,
                "6" : 0,
                "7" : 0,
                "8" : 0,
                "9" : 1,
                "10" : 0,
                "11" : 0,
                "12" : 3,
                "13" : 2,
                "14" : 1,
                "15" : 4,
                "16" : 0,
                "17" : 2,
                "18" : 0,
                "19" : 0,
                "20" : 1,
                "21" : 0,
                "22" : 0,
                "23" : 0,
                "24" : 1
            }
        }, 
        {
            "source" : "钛媒体",
            "sum_num" : 1101,
            "day_num" : 15,
            "time_clf" : {
                "1" : 0,
                "2" : 0,
                "3" : 0,
                "4" : 0,
                "5" : 0,
                "6" : 0,
                "7" : 0,
                "8" : 0,
                "9" : 1,
                "10" : 1,
                "11" : 2,
                "12" : 0,
                "13" : 2,
                "14" : 1,
                "15" : 0,
                "16" : 0,
                "17" : 1,
                "18" : 4,
                "19" : 0,
                "20" : 1,
                "21" : 2,
                "22" : 0,
                "23" : 0,
                "24" : 0
            }
        }
    ]
}

以source为类别统计day_num和数量

db.getCollection('news_clf').aggregate(    
    [{'$match':{"_id":{'$gte':"2020-02-01",'$lte':"2020-02-02"}}},
    {'$unwind':'$website_clf'},
    {'$group':{"_id":"$website_clf.source",'sumFaceAmnt':{'$sum':'$website_clf.day_num'}}}]
    )

结果

/* 1 */
{
    "_id" : "猎云网",
    "sumFaceAmnt" : 26
}
/* 2 */
{
    "_id" : "钛媒体",
    "sumFaceAmnt" : 29
}

MongoDB中聚合(aggregate)主要用于处理数据(诸如统计平均值,求和等)，并返回计算后的数据结果。有点类似sql语句中的 count(*)。

MongoDB中聚合的方法使用aggregate()。

语法

aggregate() 方法的基本语法格式如下所示：

db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION) 实例

Case1

集合中的数据如下：

{
   _id: ObjectId(7df78ad8902c)
   title: 'MongoDB Overview', 
   description: 'MongoDB is no sql database',
   by_user: 'runoob.com',
   url: 'http://www.runoob.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 100
},
{
   _id: ObjectId(7df78ad8902d)
   title: 'NoSQL Overview', 
   description: 'No sql database is very fast',
   by_user: 'runoob.com',
   url: 'http://www.runoob.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 10
},
{
   _id: ObjectId(7df78ad8902e)
   title: 'Neo4j Overview', 
   description: 'Neo4j is no sql database',
   by_user: 'Neo4j',
   url: 'http://www.neo4j.com',
   tags: ['neo4j', 'database', 'NoSQL'],
   likes: 750
},

现在我们通过以上集合计算每个作者所写的文章数，使用aggregate()计算结果如下：

db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}])

查询结果如下：

/* 1 */
{
    "_id" : "Neo4j",
    "num_tutorial" : 1
},

/* 2 */
{
    "_id" : "runoob.com",
    "num_tutorial" : 2
}

以上实例类似sql语句：

select by_user, count(*) from mycol group by by_user

统计每个作者被like的总和，计算表达式：

db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])

查询结果如下:

/* 1 */
{
    "_id" : "Neo4j",
    "num_tutorial" : 750
},

/* 2 */
{
    "_id" : "runoob.com",
    "num_tutorial" : 110
}

Case2

上面例子有些简单，我们再丰富一下，测试集合sales的数据如下：

{ "_id" : 1, "item" : "abc", "price" : 10, "quantity" : 2, "date" : ISODate("2014-01-01T08:00:00Z") }
{ "_id" : 2, "item" : "jkl", "price" : 20, "quantity" : 1, "date" : ISODate("2014-02-03T09:00:00Z") }
{ "_id" : 3, "item" : "xyz", "price" : 5, "quantity" : 5, "date" : ISODate("2014-02-03T09:05:00Z") }
{ "_id" : 4, "item" : "abc", "price" : 10, "quantity" : 10, "date" : ISODate("2014-02-15T08:00:00Z") }
{ "_id" : 5, "item" : "xyz", "price" : 5, "quantity" : 10, "date" : ISODate("2014-02-15T09:05:00Z") }

需要完成的目标是，基于日期分组，统计每天的销售额，聚合公式为：

db.sales.aggregate(
   [
     {
       $group:
         {
           _id: { day: { $dayOfYear: "$date"}, year: { $year: "$date" } },
           totalAmount: { $sum: { $multiply: [ "$price", "$quantity" ] } },
           count: { $sum: 1 }
         }
     }
   ]
)

查询结果是：

{ "_id" : { "day" : 46, "year" : 2014 }, "totalAmount" : 150, "count" : 2 }
{ "_id" : { "day" : 34, "year" : 2014 }, "totalAmount" : 45, "count" : 2 }
{ "_id" : { "day" : 1, "year" : 2014 }, "totalAmount" : 20, "count" : 1 }

上面的，可以看出$group，我们都使用了_id,使用了分组，那么如果，我们的需求不需要分组，应该怎么办呢？

例如。我们现在要统计sales集合中一共卖出了多少件商品。

如果直接去掉group 阶段的_id,如下:

db.sales.aggregate(
   [
     {
       $group:
         {
          
           totalAmount: { $sum: "$quantity" }
         }
     }
   ]
)

则报错：

{
    "message" : "a group specification must include an _id",
    "ok" : 0,
    "code" : 15955,
    "codeName" : "Location15955",
    "name" : "MongoError"
}

我们还是需要添加上_id,但是可以添加个常量，及时根据常量分组,可以为 _id : “0” 可以是 _id : “a”, _id : “b”, 还可以使_id : “x”, _id : “y” 等等。

例如：

 db.sales.aggregate(
   [
     {
       $group:
         {
          _id : "Total"
           totalAmount: { $sum: "$quantity" }
         }
     }
   ]
)

查询结果为：

{
    "_id" : "Total",
    "totalAmount" : 28
}

$project阶段

假设存在一个 students 集合，其数据结构如下：

{ "_id": 1, "quizzes": [ 10, 6, 7 ], "labs": [ 5, 8 ], "final": 80, "midterm": 75 }
{ "_id": 2, "quizzes": [ 9, 10 ], "labs": [ 8, 8 ], "final": 95, "midterm": 80 }
{ "_id": 3, "quizzes": [ 4, 5, 5 ], "labs": [ 6, 5 ], "final": 78, "midterm": 70 }

现在的需求是统计每个学生的平常的测验分数总和、实验分数总和、期末其中分数总和。

db.students.aggregate([
   {
     $project: {
       quizTotal: { $sum: "$quizzes"},
       labTotal: { $sum: "$labs" },
       examTotal: { $sum: [ "$final", "$midterm" ] }
     }
   }
])

其查询输出结果如下：

{ "_id" : 1, "quizTotal" : 23, "labTotal" : 13, "examTotal" : 155 }
{ "_id" : 2, "quizTotal" : 19, "labTotal" : 16, "examTotal" : 175 }
{ "_id" : 3, "quizTotal" : 14, "labTotal" : 11, "examTotal" : 148 }

聚合的表达式:

下表展示了一些聚合的表达式:

表达式	描述	实例
$sum	计算总和。	db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])
$avg	计算平均值	db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])
$min	获取集合中所有文档对应值得最小值。	db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])
$max	获取集合中所有文档对应值得最大值。	db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])
$push	在结果文档中插入值到一个数组中。	db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])
$addToSet	在结果文档中插入值到一个数组中，但不创建副本。	db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])
$first	根据资源文档的排序获取第一个文档数据。	db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])
$last	根据资源文档的排序获取最后一个文档数据	db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])

管道的概念

管道在Unix和Linux中一般用于将当前命令的输出结果作为下一个命令的参数。

MongoDB的聚合管道将MongoDB文档在一个管道处理完毕后将结果传递给下一个管道处理。管道操作是可以重复的。

表达式：处理输入文档并输出。表达式是无状态的，只能用于计算当前聚合管道的文档，不能处理其它的文档。

这里我们介绍一下聚合框架中常用的几个操作：

常用管道	含义
$project	修改输入文档的结构。可以用来重命名、增加或删除域，也可以用于创建计算结果以及嵌套文档。
$match	用于过滤数据，只输出符合条件的文档。$match使用MongoDB的标准查询操作。
$limit	用来限制MongoDB聚合管道返回的文档数。
$skip	在聚合管道中跳过指定数量的文档，并返回余下的文档。
$unwind	将文档中的某一个数组类型字段拆分成多条，每条包含数组中的一个值。
$group	将集合中的文档分组，可用于统计结果。
$sort	将输入文档排序后输出。
$geoNear	输出接近某一地理位置的有序文档。

1、$project实例

db.article.aggregate(
    { $project : {
        title : 1 ,
        author : 1 ,
    }}
 );

这样的话结果中就只还有_id,tilte和author三个字段了，默认情况下_id字段是被包含的，如果要想不包含_id话可以这样:

db.article.aggregate(
    { $project : {
        _id : 0 ,
        title : 1 ,
        author : 1
    }});

2.$match实例

db.articles.aggregate( [
                        { $match : { score : { $gt : 70, $lte : 90 } } },
                        { $group: { _id: null, count: { $sum: 1 } } }
                       ] );

$match用于获取分数大于70小于或等于90记录，然后将符合条件的记录送到下一阶段$group管道操作符进行处理。

3.$skip实例

db.article.aggregate(
    { $skip : 5 });

经过$skip管道操作符处理后，前五个文档被"过滤"掉。

聚合表达式的字符串和算术运算符

运算符	说明
$add	计算数值的总和。例如：valuePlus5:{$add:["$value",5]}
$divide	给定两个数值，用第一个数除以第二个数。例如：valueDividedBy5:{$divide:["$value",5]}
$mod	取模。例如:{$mod:["$value",5]}
$multiply	计算数值数组的乘积。例如:{$multiply:["$value",5]}
$subtract	给定两个数值，用第一个数减去第二个数。例如:{$subtract:["$value",5]}
$concat	连接两个字符串例如：{$concat:["str1","str2"]}
$strcasecmp	比较两个字符串并返回一个整数来反应比较结果。例如 {$strcasecmp:["$value","$value"]}
$substr	返回字符串的一部分。例如:hasTest：{$substr:["$value","test"]}
$toLower	将字符串转化为小写。
$toUpper	将字符串转化为大写

参考：https://www.runoob.com/mongodb/mongodb-aggregate.html https://www.cnblogs.com/xuliuzai/p/11400546.html https://www.jianshu.com/p/72fc4409936c https://blog.csdn.net/qq_39263663/article/details/80459833 http://www.zuidaima.com/question/3635270363581440.htm

本文参与腾讯云自媒体同步曝光计划，分享自作者个人站点/博客。

原始发表：2020/02/26 ，如有侵权请联系 cloudcommunity@tencent.com 删除

bash