From Data to Insights: Understanding MongoDB's Aggregate Framework

7 min read

When it comes to storing data in MongoDB or any other place, one of the biggest challenges is figuring out how that data will be accessed in the future. What kinds of reports will you need to run? What valuable insights are hidden in your data that could help your business grow? After carefully designing your data schema to fit your application’s needs, the next step is retrieving that data effectively.

In MongoDB, there are two primary ways to access your data: through the find() command and the aggregate() command.

Using find() command

The find() command lets you search for documents based on specific conditions. With it, you can:

Filter results to get exactly what you need.
Make basic changes to the documents you retrieve.
Sort your results for better organization.
Limit the number of documents returned to keep your output manageable.

Entering the World of Aggregation

On the other hand, the aggregate() command opens up a whole new level of data analysis through the aggregation framework. But you might wonder, why bother with aggregation in MongoDB at all?

Why Aggregate with MongoDB?

Aggregation operations in MongoDB allow you to process and analyze your data in powerful ways. They enable you to group values from multiple documents and perform calculations on this grouped data, ultimately returning meaningful results. This feature is incredibly useful when you need to perform analytics across a cluster of servers.

By using the built-in aggregation operators in MongoDB, you can gain valuable insights from your data that help drive business decisions.

db.collection.aggregate(pipeline: [], options: {}) // in this tutorial we only focus on the pipeline stages

There are many aggregation stages available in the mongoDB. We will use most common and frequently used stages. Like: $lookup, $unwind, $group, $sort, $project, $limit and $skip

So, now let’s show an easy example to understand how does aggregate pipeline works.

Example

Suppose, we have 2 collections inside our mongoDB database. First one is authors and second one is todos. Like this:

// authors collection
[
    { _id: 101, name: "Ron", email: "ron@example.com", age: 39 },
    { _id: 102, name: "Taslima", email: "taslima@example.com", age: 35 }
    { _id: 103, name: "Andres", email: "andres@example.com", age: 37 }
]

// todos collection
[
    { _id: 1, title: "Scroll Instagram feed", isComplete: false, authorId: 101 },
    { _id: 2, title: "Use sister's Instagram account", isComplete: true, authorId: 101 },
    { _id: 3, title: "Cook dinner", isComplete: true, authorId: 102 },
    { _id: 4, title: "Wash dishes", isComplete: true, authorId: 102 },
    { _id: 5, title: "Play football", isComplete: true, authorId: 103 }
    { _id: 6, title: "Destroy hater's peace", isComplete: true, authorId: 103 }
    { _id: 7, title: "Sleep", isComplete: false, authorId: 103 }
]

Now, we want to get the author details like: name, email, age totalTodoCompleted and todoList of those todos which are completed by authors

// expected result:
[
	{
		_id: 101, // author id
		name: "Ron", // author name
		email: "ron@gmail.com", // author email
		age: 39, // age is just a number
		totalTodoCompleted: 1, // total todo completed by author
		todos: [
			// all completed todos
			{ _id: 2, title: "Use sister's Instagram account" },
		],
	},
	// other author's items ...
];

How can we achieve that? Of course we can get that using find() command if we uses any mongoDB driver or ODM like mongoose but it will required more than one query and finally we bind all the data together and use it, the performance will drop and in mongoshell it is not possible either. So aggregate() will be our most efficient way to get this transformed data without loosing control over data and performance.

Step 1: $match to filter out completed todos

db.todos.aggregate([
	{
		$match: {
			isComplete: true, // condition
		},
	},
]);

This will filter out those data which are not matched with the condition.

Step 2: $lookup to populate author’s data from authors collection

db.todos.aggregate([
    {
        $match: {
            isComplete: true,
        },
        {
			$lookup: {
				from: "authors", // collection name
				localField: "authorId", // the field from the todos collection
				foreignField: "_id", // the field from the authors collection
				as: "author", // output array field name
			},
		},
    },
])

// expected output
[
	{
		_id: 2,
		title: "Use sister's Instagram account",
		isComplete: true,
		authorId: 101,
		author: [{ _id: 101, name: "Ron", email: "ron@example.com", age: 39 }],
	},
	{
		_id: 3,
		title: "Cook dinner",
		isComplete: true,
		authorId: 102,
		author: [{ _id: 3, title: "Cook dinner", isComplete: true, authorId: 102 }],
	},
	// .......
];

Step 3: $unwind to flat the author array

db.todos.aggregate([
    {
        $match: {
            isComplete: true,
        },
        {
			$lookup: {
				from: "authors",
				localField: "authorId",
				foreignField: "_id",
				as: "author",
			},
		},
        {
			$unwind: "$author",
		},
    },
])

// expected output
[
	{
		_id: 2,
		title: "Use sister's Instagram account",
		isComplete: true,
		authorId: 101,
		author: { _id: 101, name: "Ron", email: "ron@example.com", age: 39 }, // array to object
	},
	{
		_id: 3,
		title: "Cook dinner",
		isComplete: true,
		authorId: 102,
		author: { _id: 3, title: "Cook dinner", isComplete: true, authorId: 102 }, // array to object
	},
	// .......
];

Step 4: $group by author

db.todos.aggregate([
    {
        $match: {
            isComplete: true,
        },
        {
			$lookup: {
				from: "authors",
				localField: "authorId",
				foreignField: "_id",
				as: "author",
			},
		},
        {
			$unwind: "$author",
		},
        {
            $group: {
                _id: "$authorId", // group by author id
                name: { $first: "$author.name" }, // as we creating a group with todos, so we can choose the first name, email and age from author from the sub-group. as the second name email and age will be the same cause it's under the same sub-group
                email: { $first: "$author.email" },
                age: { $first: "$author.age" },
                totalTodoCompleted: { $sum: 1 }, // increment (+1) by 1 for each todo
                todos: { $push: { _id: "$_id", title: "$title" } } // push all the todos with specific fields
            }
        }
    },
])

// expected output
[
	{
		_id: 101, // author id
		name: "Ron", // author name
		email: "ron@gmail.com", // author email
		age: 39, // age is just a number
		totalTodoCompleted: 1, // total todo completed by author
		todos: [
			// all completed todos
			{ _id: 2, title: "Use sister's Instagram account" },
		],
	},
	// other author's items ...
];

Step 5: $project (Optional) (select fields to show)

db.todos.aggregate([
    {
        $match: {
            isComplete: true,
        },
        {
			$lookup: {
				from: "authors",
				localField: "authorId",
				foreignField: "_id",
				as: "author",
			},
		},
        {
			$unwind: "$author",
		},
        {
            $group: {
                _id: "$authorId",
                name: { $first: "$author.name" },
                email: { $first: "$author.email" },
                age: { $first: "$author.age" },
                totalTodoCompleted: { $sum: 1 },
                todos: { $push: { _id: "$_id", title: "$title" } }
            }
        },
        {
            $project: {
                // show only those fields in the final result
				_id: 1,
				name: 1,
				email: 1,
				totalTodoCreated: 1,
				todos: 1,
			},
        }
    },
])

// expected output without age field
[
	{
		_id: 101, // author id
		name: "Ron", // author name
		email: "ron@gmail.com", // author email
		totalTodoCompleted: 1, // total todo completed by author
		todos: [
			// all completed todos
			{ _id: 2, title: "Use sister's Instagram account" },
		],
	},
	// other author's items ...
];

Step 6: $limit and $skip (Final code)

db.todos.aggregate([
    {
        $match: {
            isComplete: true,
        },
        {
			$lookup: {
				from: "authors",
				localField: "authorId",
				foreignField: "_id",
				as: "author",
			},
		},
        {
			$unwind: "$author",
		},
        {
            $group: {
                _id: "$authorId",
                name: { $first: "$author.name" },
                email: { $first: "$author.email" },
                age: { $first: "$author.age" },
                totalTodoCompleted: { $sum: 1 },
                todos: { $push: { _id: "$_id", title: "$title" } }
            }
        },
        {
            $project: {
				_id: 1,
				name: 1,
				email: 1,
				totalTodoCreated: 1,
				todos: 1,
			},
        },
        {
            // Step to skip documents for pagination
            $skip: (1 - 1) * 5 // Skip the number of documents based on the page number
            // (page - 1) * limit
        },
        {
            // Step to limit the number of documents returned
            $limit: 5 // Limit the results to the specified page size
        }
    },
])

$skip and $limit will truncate the data based on their data

Short summary of every steps

$lookup: Joins todos with authors based on authorId.
$unwind: Flattens the author array to have one author per todo.
$group: Groups todos by authorId, counts the total todos, and collects todos into an array.
$project: (Optional) Restructures the output to show only the necessary fields.
$skip and $limit: Limit and skip the results to the specified values

This complete pipeline allows you to get a summarized view of todos grouped by authors along with their details.