MongoDB 聚合管道(Aggregation Pipeline)
-
一、概述
我们传统的SQL会用select count(*) from table group by test 这样来分组和统计,那MongoDB呢?
就是用aggregate()聚合管道来统计和分组的,取最大和最小值,计算平均值或者求和等等,接下来我们一起看看
MongoDB聚合管道几种常用的操作
二、aggregate()
1、基本语法
MongoDB Enterprise > db.test.aggregate(pipeline, options)
2、分组 $group
a. &group和$push
MongoDB Enterprise > db.test1.find() { "_id" : ObjectId("5b0760e7d169ee90d1d75588"), "no" : "F1", "name" : "Jack" } { "_id" : ObjectId("5b076103d169ee90d1d75589"), "no" : "F2", "name" : "Lucy" } { "_id" : ObjectId("5b076145d169ee90d1d7558a"), "no" : "F3", "name" : "Jack" } //按名字分组,名字相同的号码合在一个数组里,注意这个值不要超过16M,不然会出现错误 MongoDB Enterprise > db.test1.aggregate([{$group:{_id:"$name",no:{$push:"$no"}}}]) { "_id" : "Lucy", "no" : [ "F2" ] } { "_id" : "Jack", "no" : [ "F1", "F3" ] }
b.$group和$sum
MongoDB Enterprise > db.test.find() { "_id" : ObjectId("5acb2efb885dcae07c4cc6c2"), "empno" : "F01", "empname" : "jack" } { "_id" : ObjectId("5acc2d6cc38047008818c007"), "empno" : "F02", "empname" : "Lucy" } { "_id" : ObjectId("5acc5db5f7307d3a80959a2a"), "empno" : "F03", "empname" : "Mary" } { "_id" : ObjectId("5ad01112ad12eb55b80c70b5"), "no" : "01", "name" : "Lily" } { "_id" : ObjectId("5ad0161ab27cf616a85fac7d"), "no" : "02", "name" : "Lucy" } //两种办法都可以统计文档的数量 MongoDB Enterprise > db.test.count() 5 MongoDB Enterprise > db.test.aggregate([{$group:{_id:null,count:{$sum:1}}}]) //_id:null表示集合中所有的文档 { "_id" : null, "count" : 5 }
3、计算平均值 $avg
MongoDB Enterprise > db.test4.find() { "_id" : ObjectId("5b1b1c29637751cda1184313"), "no" : "F1", "name" : "Jack", "age" : 30 } { "_id" : ObjectId("5b1b1c3d637751cda1184314"), "no" : "F2", "name" : "Jack", "age" : 20 } { "_id" : ObjectId("5b1b1c59637751cda1184315"), "no" : "F3", "name" : "Lucy", "age" : 22 } { "_id" : ObjectId("5b1b1c73637751cda1184316"), "no" : "F4", "name" : "Lucy", "age" : 24 } { "_id" : ObjectId("5b1b1c9c637751cda1184317"), "no" : "F5", "name" : "lily", "age" : 18 } { "_id" : ObjectId("5b1b1d23637751cda1184318"), "no" : "F6", "name" : "lily", "age" : 20 } //计算相同名字的平均年龄 MongoDB Enterprise > db.test4.aggregate([{$group:{_id:"$name",avgage:{$avg:"$age"}}}]) { "_id" : "lily", "avgage" : 19 } { "_id" : "Lucy", "avgage" : 23 } { "_id" : "Jack", "avgage" : 25 }
我们再来看看当我们集合中年龄不是数字是字符串的时候会是什么情况
MongoDB Enterprise > db.test.find() { "_id" : ObjectId("5b1a3e82637751cda118430d"), "no" : "F1", "name" : "Jack", "age" : "30" } { "_id" : ObjectId("5b1a3e94637751cda118430e"), "no" : "F2", "name" : "Lucy", "age" : "20" } { "_id" : ObjectId("5b1a3eab637751cda118430f"), "no" : "F3", "name" : "Lily", "age" : "22" } { "_id" : ObjectId("5b1a3f74637751cda1184310"), "no" : "F4", "name" : "Lily", "age" : "26" } { "_id" : ObjectId("5b1a3f88637751cda1184311"), "no" : "F5", "name" : "Lucy", "age" : "28" } { "_id" : ObjectId("5b1a3fe0637751cda1184312"), "no" : "F6", "name" : "Jack", "age" : "24" } MongoDB Enterprise > db.test.aggregate([{$group:{_id:"$name",age:{$avg:"$age"}}}]) { "_id" : "Lily", "age" : null } { "_id" : "Lucy", "age" : null } { "_id" : "Jack", "age" : null }
我们看到年龄的平均结果是null,这时因为我我们存入集合中的年龄是字符串,不是数字,也就是说
$avg计算平均值只会计算数字,不会计算字符串的。
4、求最大和最小值 $max和$min
MongoDB Enterprise > db.test4.find() { "_id" : ObjectId("5b1b1c29637751cda1184313"), "no" : "F1", "name" : "Jack", "age" : 30 } { "_id" : ObjectId("5b1b1c3d637751cda1184314"), "no" : "F2", "name" : "Jack", "age" : 20 } { "_id" : ObjectId("5b1b1c59637751cda1184315"), "no" : "F3", "name" : "Lucy", "age" : 22 } { "_id" : ObjectId("5b1b1c73637751cda1184316"), "no" : "F4", "name" : "Lucy", "age" : 24 } //按姓名分组,取相同名字中最年龄最大的 MongoDB Enterprise > db.test4.aggregate([{$group:{_id:"$name",maxage:{$max:"$age"}}}]) { "_id" : "Lucy", "maxage" : 24 } { "_id" : "Jack", "maxage" : 30 } //取相同名字中最年龄最小的 MongoDB Enterprise > db.test4.aggregate([{$group:{_id:"$name",minage:{$min:"$age"}}}]) { "_id" : "Lucy", "minage" : 22 } { "_id" : "Jack", "minage" : 20 }
5、取第一或最后一笔 $first或$last
MongoDB Enterprise > db.test4.find() { "_id" : ObjectId("5b1b1c59637751cda1184315"), "no" : "F3", "name" : "Lucy", "age" : 22 } { "_id" : ObjectId("5b1b1c73637751cda1184316"), "no" : "F4", "name" : "Lucy", "age" : 24 } { "_id" : ObjectId("5b1b1c9c637751cda1184317"), "no" : "F5", "name" : "lily", "age" : 18 } { "_id" : ObjectId("5b1b1d23637751cda1184318"), "no" : "F6", "name" : "lily", "age" : 20 } //名字相同的,取最先存入集合的号码 MongoDB Enterprise > db.test4.aggregate([{$group:{_id:"$name",firstno:{$first:"$no"}}}]) { "_id" : "lily", "firstno" : "F5" } { "_id" : "Lucy", "firstno" : "F3" } //取最后写入集合的号码 MongoDB Enterprise > db.test4.aggregate([{$group:{_id:"$name",lastno:{$last:"$no"}}}]) { "_id" : "lily", "lastno" : "F6" } { "_id" : "Lucy", "lastno" : "F4" }
6、条件判断 $cond 和 if then语句
MongoDB Enterprise > db.test4.find() { "_id" : ObjectId("5b1b1c29637751cda1184313"), "no" : "F1", "name" : "Jack", "age" : 30 } { "_id" : ObjectId("5b1b1c3d637751cda1184314"), "no" : "F2", "name" : "Jack", "age" : 20 } { "_id" : ObjectId("5b1b1c59637751cda1184315"), "no" : "F3", "name" : "Lucy", "age" : 22 } { "_id" : ObjectId("5b1b1c73637751cda1184316"), "no" : "F4", "name" : "Lucy", "age" : 24 } //判断年龄如果大于等于21的,改为10,否则改为12 MongoDB Enterprise > db.test4.aggregate([{$project:{no:1,name:1,changeage:{$cond:{if:{$gte:["$age",21]},then:10,else:12}}}}]) { "_id" : ObjectId("5b1b1c29637751cda1184313"), "no" : "F1", "name" : "Jack", "changeage" : 10 } { "_id" : ObjectId("5b1b1c3d637751cda1184314"), "no" : "F2", "name" : "Jack", "changeage" : 12 } { "_id" : ObjectId("5b1b1c59637751cda1184315"), "no" : "F3", "name" : "Lucy", "changeage" : 10 } { "_id" : ObjectId("5b1b1c73637751cda1184316"), "no" : "F4", "name" : "Lucy", "changeage" : 10 } //注意mongodb条件操作符: $gt:大于>; $lt:小于<; $gte:大于等于>=; $lte:小于等于<=
7、拆解 $unwind
MongoDB Enterprise > db.test1.find() { "_id" : ObjectId("5b1b7faf21ba435d0e6384e9"), "no" : "1", "author" : "Jack", "books" : [ "book1", "book2", "book3" ] } MongoDB Enterprise > db.test1.findOne() { "_id" : ObjectId("5b1b7faf21ba435d0e6384e9"), "no" : "1", "author" : "Jack", "books" : [ "book1", "book2", "book3" ] } //将数组里的值拆解到每个文档中 MongoDB Enterprise > db.test1.aggregate([{$unwind:"$books"}]) { "_id" : ObjectId("5b1b7faf21ba435d0e6384e9"), "no" : "1", "author" : "Jack", "books" : "book1" } { "_id" : ObjectId("5b1b7faf21ba435d0e6384e9"), "no" : "1", "author" : "Jack", "books" : "book2" } { "_id" : ObjectId("5b1b7faf21ba435d0e6384e9"), "no" : "1", "author" : "Jack", "books" : "book3" }
三、总结
MongoDB聚合管道操作是可以重复的,在当前的管道处理后传递给下个管道,每个阶段的管道限制100MB,超过的则会报错,
还有aggregate()可以用在分片集合,但结果不能再分片集合输出,而MongoDB的MapReduce可以用在分片集合结果也能在分片集合输出,
下次文章分享我们再一起来学习MongoDB的MapReduce用法。