Understanding Projections In LINQ With Select & SelectMany Enumerable Methods

In this article I will try to explain how to use projections in LINQ with a collection of objects.

In Language Integrated Query (LINQ) we use Select and SelectMany methods for projecting data present in a collection. As the name suggests we use projection methods to project the data in the same way as we use a Select clause in a SQL database to fetch specific columns of a table. Based on the type of collection we either use a Select or to flatten the list we use SelectMany.
I will try to explain both these methods with examples.
Part 1: Select  

OverLoad 1: Enumerable.Select<TSource, TResult> Method (IEnumerable<TSource>, Func<TSource, TResult>)
This is the first overload of the Select method. Don't be confused or scared by looking at the method definition since Lambda expressions makes it much easier then how it actually is and how it looks :) Here is the definition of this method:
  1. public static IEnumerable<TResult> Select<TSource, TResult>(
  2. this IEnumerable<TSource> source,
  3. Func<TSource, TResult> selector
  4. )
If we see the method closely, the first parameter is just denoting that this method can be implemented on any object that implements an IEnumerable interface (If this point is not clear please check my article on Extension methods). The second parameter is a Func delegate that expects an input of type "T" (TSource) or the type on which we are invoking this method and similarly TResult denotes the result it will be returning of type "T" again. For more information about a Func delegate, do read Sachin's article here.
I know many us have the following reaction when initially reading all this stuff: 
But, trust me; once we get a good command on lambda expressions, the syntax of a Func delegate won't scare us. :-) 
So, I will start with examples since that will be help us understand these extension methods easily. Please note I will not display the syntax of each overloaded version, because that is easily accessible on the MSDN.
Example 1: Suppose I have an array of integers like this:
  1. int[] numbers = { 2, 55, 67, 82, 99, 13 };  
 What I want is to divide each number present in this array by 2 and return the resulting integer array, here is the code for this:
  1. int[] result = numbers.Select(x => x / 2).ToArray();  
 Yes, that's it! We are using the LINQ projection method Select here. First check what Visual Studio intellisense is showing:
Now, it will be more clear that since we are applying this method on an array of type int, the Func input parameter will be automatically inferred as int. Here x will be each integer element present in the integer array. So we consider this as we are looping through the int array using a Lambda expression and applying a division by 2 on each number. We will get the following output:
  1. 1,27,33,41,49,6  
 Example 2: Suppose we have a list of strings as follows:
  1. List<string> fruits = new List<string> { "Apple""Banana","Mango""Grapes""Lemon" };  
Now, I want to display each fruit (string) in upper case. We will use the LINQ Select method again and this time the "x" will be of string type, because we are applying this method on a List of strings. Since x is a string (each fruit present in the List), we can use all the built-in functions that can be applied to a variable of type string. Here is the query:
  1. List<string> fruitsStartingWithL = fruits.Select(x => x.ToUpper()).ToList();  
 We will get the following output:

 OverLoad 2: Enumerable.Select<TSource, TResult> Method (IEnumerable<TSource>, Func<TSource, Int32, TResult>)
Again, don't be confused looking at the Func, see that we are just passing a second parameter of type int that is nothing but the index of the element present in the source. 
Example 3:  Suppose we have the same integer array that we have used in Example 1, now what we want to do is to create an integer array that will be the product of the number at its index in the array, in other words 2 * 0, 27 * 1, 33 * 2 and so on. Here is the LINQ query to do the same:
  1. int[] result = numbers.Select((x, i) => x * i).ToArray();  
 Here, x denotes the number and i denotes its respective index. We will get the following output:
  1. 0, 55, 134, 246, 396, 65  
Example 4: We can manipulate the index the way we want, if example 3 doesn't seem to be a practical example then just check my article: Sum-Up Values at same Index that is a classic example where index manipulation is very important.
Part 2: SelectMany

OverLoad 1: Enumerable.SelectMany<TSource, TResult> Method (IEnumerable<TSource>, Func<TSource, IEnumerable<TResult>>)
First of all SelectMany is used to project each element of a sequence to an IEnumerable<T> and flatten the resulting sequence into one sequence. If this statement is not clear then don't worry, it will become more clear with examples. The main purpose of SelectMany is to flatten any given collection and generate a single collection. 
Example 5: Suppose we have a List of strings that holds some products like this:
  1. List<List<string>> products = new List<List<string>>  
  2. {  
  3.     new List<string> { "Apple""Banana""Grapes" },  
  4.     new List<string> { "Coke""Milk""Fanta" },  
  5.     new List<string> { "Mobile""TV""Tablet" }  
  6.  };  
Now, what we want to do is to extract a single List<String> that will hold all the products. We will use SelectMany to flatten the list of lists into one list. Notice the intellisense here:
Please note that since we are applying the SelectMany method on products that are of type List<List<String>>, Func is expecting a List<String> and that makes sense because here our "x" will be "List<String>". Now SelectMany projects each element to an IEnumerable<string> and then flattens it, in other words combines that list to a single list. Here is the LINQ query for it:
  1. List<string> allProducts = products.SelectMany(x => x).ToList();  
  2. //Or if we want we can project the inner list too, both are same  
  3. List<string> allProductsByProjection = products.SelectMany(x => x.Select(z => z)).ToList();  
 We will get the following output:
Example 6:  Again its tough to guess a scenario for where we actually use it, but in reality we may encounter many problems where this method will be really handy. You can check the use of SelectMany in the same article that I have pointed to in example 4.

OverLoad 2: Enumerable.SelectMany<TSource, TResult> Method (IEnumerable<TSource>, Func<TSource, Int32, IEnumerable<TResult>>)
This overload is similar to that of the second overload of Select, since here we have one more parameter in Func that is an integer and that denotes the index of the source element. Here the index will of the collection that will be generated before the flattening of the collection.
Example 7:  Suppose we have a List of Lists of integer:
  1. List<List<int>> numbers = new List<List<int>>  
  2. {  
  3.       new List<int> { 23, 45, 66 },  //0 Index
  4.       new List<int> { 12, 88, 32 }  //1 Index
  5. };  
Now, whenever we use the second overload since "x" will be List<int> here, each "List<int>" will be zero index based as denoted in the comment above.

Now, what we want is to generate a List of integers with all the numbers added to its respective index. Here is the LINQ query for this:
  1. List<int> result = numbers.SelectMany((x, i) => x.Select(z => z + i)).ToList();  
This will generate the following output:

OverLoad 3: Enumerable.SelectMany<TSource, TCollection, TResult> Method (IEnumerable<TSource>, Func<TSource, IEnumerable<TCollection>>, Func<TSource, TCollection, TResult>)
In my opinion this is the best overload of all because it provides a projection with an original collection if we are using SelectMany to flatten any specific property in the collection. I know this statement must not be clear so let's go ahead and see the example.
Example 8:  Suppose I have the following type:
  1. public class MonthlySales
  2. {   
  3.      public string Month { getset; }  
  4.      public Dictionary<string,double>  Sales { getset; }  
  5. }  
Here, Sales will be a Dictionary collection with the key ProductName (String) and Value SalesAmount(Double).
I have some sample data as follows:
Month   Sales
June   Apple, 20 ; Grapes, 40
June  Banana, 10 ; Apple, 30
December  Mango, 10; Banana, 20
December  Banana 40
Now, What I want is I want to calculate the total sales amount for each product for every month, so my final  output should look like:
 Month  Sales
 June  Apple, 50
 June  Grapes, 40
 June  Banana, 10
December  Mango, 10
December  Banana, 60
So, basically we need to group the data by Month and ProductName and finally we can calculate the Sum of SalesAmount. But, remember Sales is a collection of Dictionaries so to flatten it we will use SelectMany but we somehow need this flattened list to remain intact with its original object (in which it belonged). In this case this third overload is useful. Check the following query and its corresponding output:
  1. var result = monthlySales
  2.             .SelectMany(x => x.Sales, (MonthlySalesObj, SalesObj) => new 
  3.                             { 
  4.                                MonthlySalesObj.Month, 
  5.                                SalesObj 
  6.                             }
  7.              );  
Here, first we are flattening the Sales collection that is a collection of Dictionaries. Now, what we want is to attach this flattened list with the original object so we have used the second param where we are projecting the original object MonthlySalesObj and SalesObj (that we just retrieved in the first param). We will get the following output:
Look at the beauty of this method, it has flattened the inner list while the original list is still intact. Now, all that is left is to apply a group by Month and ProductName (that is nothing but a Key in SalesObj) and sum the sales amount. Here is the complete query:
  1. List<MonthlySales> result = monthlySales
  2.            .SelectMany(x => x.Sales, (MonthlySalesObj, SalesObj) => new 
  3.                        { 
  4.                             MonthlySalesObj.Month,
  5.                             SalesObj 
  6.                        }
  7.              ).GroupBy(x => new { x.Month, ProductName = x.SalesObj.Key })  
  8.              .Select(x => new MonthlySales  
  9.              {  
  10.                   Month = x.Key.Month,  
  11.                   Sales = new Dictionary<stringdouble> { { x.Key.ProductName, x.Sum(z => z.SalesObj.Value) } }  
  12.              }).ToList();  

OverLoad 4: Enumerable.SelectMany<TSource, TCollection, TResult> Method (IEnumerable<TSource>, Func<TSource, Int32, IEnumerable<TCollection>>, Func<TSource, TCollection, TResult>)
This overload is similar to Overload 3, except for the fact  that it projects an index also of source element, along with the result.
Example 9:  Suppose we have the same Object that we used in example 8, what we want to do is to add the Index of the source to the sales amount and return the result, here is the object with its index that will be projected when we use this method:
  1. var monthlySales = new List<MonthlySales>  
  2. {  
  3.     new MonthlySales { Month= "June", Sales = new Dictionary<string,double
  4.                                           { 
  5.                                               { "Apple", 20 }, 
  6.                                               { "Grapes", 40 } 
  7.                                           }
  8.                      }, //Index 0 
  9.     new MonthlySales { Month= "June", Sales =new Dictionary<string,double
  10.                                          { 
  11.                                               { "Banana", 10 }, 
  12.                                               {"Apple", 30} }
  13.                      },  //Index 1
  14.     new MonthlySales { Month= "December", Sales =new Dictionary<string,double
  15.                                         { 
  16.                                               { "Mango", 10 }, 
  17.                                               {"Banana", 20} }
  18.                      },  //Index 2
  19.     new MonthlySales { Month= "December", Sales =new Dictionary<string,double
  20.                                         { 
  21.                                               {"Banana", 40} }
  22.                      },  //Index 3
  23. };  
 Here, is the LINQ query to do this:
  1. var result = monthlySales.SelectMany((x, i) => x.Sales.Select(z => z.Value + i));  
We will get the following output:
Summary:  In this article we saw all the overloads of projection operators, in other words Select and SelectMany. I have tried to explain these methods with whatever example came to my mind :) Please let me know in case of any issues. I have attached all the source code for reference.
Happy Coding :)