Since entities.user_mentions
是一个数组,您希望为其中的每个 screen_name 发出一个值map()
:
var map = function() {
this.entities.user_mentions.forEach(function(mention) {
emit(mention.screen_name, { count: 1 });
})
};
然后通过唯一的 screen_name 来计算值reduce()
:
var reduce = function(key, values) {
// NB: reduce() uses same format as results emitted by map()
var result = { count: 0 };
values.forEach(function(value) {
result.count += value.count;
});
return result;
};
注意:要调试你的map/reduce JavaScript函数,你可以使用print()
and printjson()
命令。输出将出现在您的mongod
log.
编辑:为了比较,这里是一个使用新的示例聚合框架 http://docs.mongodb.org/manual/reference/aggregation/在 MongoDB 2.2 中:
db.twitter_sample.aggregate(
// Project to limit the document fields included
{ $project: {
_id: 0,
"entities.user_mentions" : 1
}},
// Split user_mentions array into a stream of documents
{ $unwind: "$entities.user_mentions" },
// Group and count the unique mentions by screen_name
{ $group : {
_id: "$entities.user_mentions.screen_name",
count: { $sum : 1 }
}},
// Optional: sort by count, descending
{ $sort : {
"count" : -1
}}
)
最初的 Map/Reduce 方法最适合大型数据集,正如 Twitter 数据所暗示的那样。有关 Map/Reduce 与聚合框架限制的比较,请参阅 StackOverflow 问题的相关讨论MongoDB group()、$group 和 MapReduce https://stackoverflow.com/questions/12337319/mongodb-group-group-and-mapreduce/12340283#12340283.