Aggregation In MongoDB

In this article, we will discuss the aggregation framework commands in MongoDB. In the previous article, we have discussed related to the index in MongoDB. Now, in this article, we will discuss the aggregation framework in MongoDB and also how we can use this in our data searching query. So, Data Aggregation is one kind of process where all type of information is collected and then represent that information in a summary format. The main purpose of the data aggregation is the analysis work like statistical analysis, comparison analysis, computations etc.

Welcome back to the MongoDB article series - Part 7. In the previous article, we discussed the advanced index concept in MongoDB. We have also discussed capped collections, TTL index, Full-Text Index etc. Now, in this article, we will discuss how to implement the aggregation framework concepts in a MongoDB Database. If you want to read the previous articles of this series, then follow the links.

In the previous article, we have discussed the index in MongoDB. Now, in this article, we will discuss the aggregation framework in MongoDB and also, we will see how we can use this in our data searching query. So, Data Aggregation is one kind of process where all type of information is collected and then represented in a summary format. The main purpose of the data aggregation is doing the analysis work like statistical analysis, comparison analysis, computations etc.

Aggregation Frameworks

In MongoDB, aggregation framework is mainly used for displaying results for the statistical and analytical analysis of data. This framework always provides us with the transformation of a document within a collection. Basically, the aggregation framework builds a pipeline which processes the data through several blocks of aggregation framework like – filtering, projecting, grouping, sorting, limiting and skipping. This operation always processes the data from the documents and returns the projected results. This framework groups all the data from the different multiple documents together and then performs a variety of operations on this grouped data to return a single result.

The aggregate framework always returns an array as a result of the documents. This makes aggregation framework as an alternative knowledge flow pipeline. It actually acts as a process flow. This framework first performs the transformation of documents and then combines the documents within a collection and returns the output. Every operator in an aggregation framework receives a stream of documents, then performs some kind of transformation on these documents, and then passes on the results of the transformation to the users. If it's the last pipeline operator, then the result returns to the user directly. Otherwise, the result is passed to the next transformation process as an input. Operators may be combined in any order and recur again and again as necessary. For instance, you'll perform "$match", "$group", and then again "$match" once more with totally different criteria.

The MongoDB aggregation pipeline always consists of stages. Every stage transforms the documents as they submit it to the pipeline. Pipeline stages don't get to turn out one output document for each input document, e.g. some stages might generate new documents or filtered documents as a result. Pipeline stages will execute multiple times within the pipeline. MongoDB provides the db.collection.aggregate() methodology within the data collections and therefore the combination command for aggregation pipeline.

$Project

Projection is one of the additional powerful syntax within the aggregation pipeline. It is mainly used in the aggregation pipeline rather than using it as a “normal” command syntax. "$project" permits you to extract fields from subdocuments, rename fields, and perform attention-grabbing operations on them. The simplest operation "$project" will perform is solely choosing fields from your incoming documents to incorporate or exclude a field, use identical syntax you have used in the second argument of a command. In the below example, the syntax will return only those documents as a result which contains the field "author" within the collection.

By default, "_id" is often came if it exists within the incoming document (some pipeline operators take away the "_id" or it are often removed in an exceedingly former projection). You can exclude it as shown in the below example. Inclusion and exclusion rules generally work identically as same as “normal” queries. You can also additionally rename the projected field. For instance, if you needed to come back the "_id" of each user as you can do,
  1. db.articles.aggregate({"$project" : {"author" : 1, "_id" : 0}})  

In the above syntax, “author” is used to represent the fieldname value within the aggregation framework. For example, "$age" would get replaced with the contents of the age field (and most likely be variety, not a string) and "$tags.3" would get replaced with the fourth component of the tags array. Thus, "$_id" is replaced by the "_id" field of every document coming back through the pipeline. Note that we just need to specifically exclude "_id" field so that the it will not return as part of the result to the user. We'll be able to use this syntax to make multiple copies of a field for later use in an exceedingly "$group", say.

  1. db.users.aggregate({"$project" : {"userId" : "$_id""_id" : 0}})  
The simplest "$project" expressions are unit inclusion, exclusion, and rename field names ("$fieldname"). However, there are also several options available under $project syntax including the different types of expressions. There are several expressions available with aggregation which you can combine and nest to any depth to create more complex expressions.
  1. Mathematical Expression 
    Arithmetic expressions allow you to manipulate numeric values. You typically use these expressions by specifying an associate array of numbers to control on. For example, the subsequent expression will calculate the total value of the "salary" and "bonus" fields. These are the syntax of each expression.

    • $add (expr1[, expr2, ...]) - Accepts one or more expressions and adds them together
    • $subtract (expr1, expr2) - Accepts two expressions and subtracts the second from the first.
    • $multiply (expr1[, expr2, ...]) - Takes one or more expressions and multiplies them together.
    • $divide (expr1, expr2) - Takes two expressions and divides the first by the second.
    • $mod (expr1, expr2) - Takes two expressions and returns the remainder of dividing the first by the second.

  2. Date Expression
    Sometimes, we need to performs aggregations command on the time-based unit value like - What was happening last week? Or Last month? Or Over the last year? Therefore, aggregation features a set of expressions that may be used to extract date info in some additional helpful ways like "$year", "$month", "$week", "$dayOfMonth", "$dayOfWeek", "$dayOfYear", "$hour", "$minute", and "$second". So we can solely use date operations on fields which basically contains date type value. 

  3. String Expression
    There are a couple of basic string operations which we need to perform some times. Their signatures are as follows,
    • $substr (expr, startOffset, numToReturn)
      This returns a substring of the primary argument, beginning at the startOffset-th computer memory unit and together with consequent numToReturn bytes (note that this is often measured in bytes, not characters, therefore multibytes encodings can ought to watch out of this). expr must be value to a string.

    • $concat (expr1[, expr2, ..., exprN])
      Concatenates each string expression (or string) given.

    • $toLower (expr)
      Returns the string in lower case. expr must evaluate to a string.

    • $toUpper (expr)
      Returns the string in upper case. expr must evaluate to a string

Sample Example of $add
  1. db.employees.aggregate({  
  2.     "$project": {  
  3.         "totalPay": {  
  4.             "$add": ["$salary""$allowances"]  
  5.         }  
  6.     }  
  7. })  
Sample Example of $subtract
  1. db.employees.aggregate({  
  2.     "$project": {  
  3.         "totalPay": {  
  4.             "$subtract": [{  
  5.                 "$add": ["$salary""$bonus"]  
  6.             }, "$401k"]  
  7.         }  
  8.     }  
  9. })  
Sample Example of Date expression
  1. db.employees.aggregate({  
  2.     "$project": {  
  3.         "hiredIn": {  
  4.             "$month""$joinDate"  
  5.         }  
  6.     }  
  7. })  
Sample Example of $substr
  1. db.employees.aggregate({  
  2.     "$project": {  
  3.         "email": {  
  4.             "$concat": [{  
  5.                 "$substr": ["$firstName", 0, 1]  
  6.             }, ".""$lastName""@example.com"]  
  7.         }  
  8.     }  
  9. }) 

$Match

$match syntax basically filters the documents so that we'll be able to run associate aggregation commands on a set of documents. For example, if you simply need to search out the documents concerning users of a specific State, you may add a "$match" expression like,

  1. {$match : {"state" : "ORL"}}  
"$match" will use all of the usual question operators ("$gt", "$lt", "$in", etc.). One notable exception is that we cannot use geospatial operators during a "$match". Generally, smart observe is to place "$match" expressions as early as potential within the pipeline. This has two benefits,
  1. It permits you to separate out needless documents quickly to provide the faster performance in the pipeline.
  2. It will use indexes if it's run before any projections or groupings.

$group

Grouping is the process mechanism which basically give us the opportunity to group or bundle documents on the basis of some fields and also can combine the values of those documents. Some samples of groupings in the database level are -

  • If we have a tendency to have per-minute measurements and we wish to search out the common humidity per day, we might group by the "day" field.
  • If we have a tendency to have a set of scholars and we wish to prepare students into teams based on grade, we have a tendency to group by their "grade" field.
  • If we have a tendency to have a set of users and we wish to check what percentage of users we have by town, we may cluster by each of the "state" and "city" fields, making one cluster per city/ state pair. 

When we opt for a field or fields to group by, then we need to pass it to the "$group" syntax which perform as the group’s "_id" field. Thus, for the examples higher than, we have,

  1. {"$group" : {"_id" : "$day"}}  
  2. {"$group" : {"_id" : {"state" : "$state""city" : "$city"}}}  

In MongoDB, $group aggregation function contains some group operator which will actually perform the basic aggregation operations. There are two types of operators in the $group – one is Arithmetic Operator and another is Extreme Operator. The Arithmetic operator contains $sum and $avg operators. Whereas the Extreme operators contains $max, $min, $first, $last operators. 

  1. $sum
    This operator adds value for each document and returns the result. Also, although the example above used a literal (1), this can also take more complex values and returns the result.

  2. $avg
    Returns the average value of all input documents during the group.

  3. $max
    Returns the maximum values within the documents.

  4. $min
    Returns the minimum values within the documents

  5. $first
    Returns the first value within the group. 

  6. $last
    Returns the last value within the group.
Sample Example of $sum
  1. db.sales.aggregate({  
  2.     "$group": {  
  3.         "_id""$country",  
  4.         "totalRevenue": {  
  5.             "$sum""$revenue"  
  6.         }  
  7.     }  
  8. })  
Sample Example of $avg
  1. db.sales.aggregate({  
  2.     "$group": {  
  3.         "_id""$country",  
  4.         "totalRevenue": {  
  5.             "$average""$revenue"  
  6.         },  
  7.         "numSales": {  
  8.             "$sum": 1  
  9.         }  
  10.     }  
  11. })  
Sample Example of $last
  1. db.scores.aggregate({  
  2.     "$group": {  
  3.         "_id""$grade",  
  4.         "lowestScore": {  
  5.             "$min""$score"  
  6.         },  
  7.         "highestScore": {  
  8.             "$max""$score"  
  9.         }  
  10.     }  
  11. }) 

$sort

We can perform any type of sort operation either on a particular field or on multiple fields of a document by using the $sort syntax. If we need to sort a large volume of documents, then it is always highly recommended that first we will sort the data in required order at the beginning of the pipeline and apply an index on it. Otherwise, it will take much more time and in that way, the process response will be slow or delayed. $sort is another example of roadblock operations just like a $group. Since, in this process, it needs to arrange all the records accordingly to sort the records in a specified process, then it needs to sort all the records and then only will it send the individual shards to the mongo results for further processing.

  1. db.employees.aggregate({  
  2.     "$project": {  
  3.         "compensation": {  
  4.             "$add": ["$salary""$bonus"]  
  5.         },  
  6.         "name": 1  
  7.     }  
  8. }, {  
  9.     "$sort": {  
  10.         "compensation": -1,  
  11.         "name": 1  
  12.     }  
  13. })  
Conclusion

In this article, we discussed some aggregation framework syntax like a $match, $group, $sort, $project etc. These syntaxes are basically used for performing many types of mathemtical or analysis operations. In the next article, we will discuss some more syntaxes of the aggregation framework in MongoDB. I hope this article will help you. Any feedback or queries related to this article are most welcome.