In this section, let us look at closer look at LINQ. To really understand LINQ, you also need to understand the new language features of C# 3.0. Specifically, it would be beneficial to discuss LINQ in the context of the below features of C# 3.0:
- Type Inference
- Lamda Expressions
- Extension Methods
- Anonymous Types
The next few sections will discuss the above features in the context of leveraging them using LINQ. To get started with LINQ, all you need to do is to import the System.Query namespace. This namespace contains a number of classes that enable you to accomplish a lot with LINQ.
Type Inference
To understand type inference, let us look at couple of lines of code.
var count = 1;
var output = "This is a string";
var employees = new EmployeesCollection();
In the above lines of code, the compiler sees the var keyword, looks at the assignment to count, and determines that it should be an Int32, then assigns 1 to it. When it sees that you assign a string to the output variable, it determines that output should be of type System.String. Same goes for employees collection object. As you would have guessed by now, var is a new keyword introduced in C# 3.0 that has a special meaning. var is used to signal the compiler that you are using the new Local Variable Type Inference feature in C# 3.0.
As an example, let us modify our string query example to use the var keyword.
private void btnLoopThroughStrings_Click(object sender, EventArgs e)
{
string[] names = {"John", "Peter", "Joe", "Patrick", "Donald", "Eric"};
var namesWithFiveCharacters =
from name in names
where name.Length < 5
select name'
lstResults.Items.Clear();
foreach(var name in namesWithFiveCharacters)
lstResults.Items.Add(name);
}
As the above code shows, the variable namesWithFiveCharacters now uses the type "var" instead of IEnumerable<string>. Using "var" is much more extensible since it tells the compiler to infer the type from the assignment. In this case, based on the results of the query, which is IEnumerable<string>, the compiler will automatically assume that it is a variable of type IEnumerable<string>.
If you run the code, it still produces the same output.
Lambda Expressions
C# 2.0 introduced a new feature, anonymous methods, that allows you to declare your method code inline instead of with a delegate function. Lambda expressions, a new feature in C# 3.0, have a more concise syntax to achieve the same goal. Take a closer look at anonymous methods before discussing lambda expressions. Suppose you want to create a button that displays a message box when you click it. In C# 2.0, you would do it as follows:
public SimpleForm()
{
addButton = new Button(...);
addButton.Click += delegate
{
MessageBox.Show ("Button clicked");
};
}
As the above code shows, you can use anonymous methods to declare the function logic inline. However C# 3.0 introduces an even simpler syntax, lambda expressions, which you write as a parameter list followed by the "=>" token, followed by an expression or a statement block. Lambda Expressions are the natural evolution of C# 2.0's Anonymous Methods. Essentially, a Lambda Expression is a convenient syntax that is used to assign a chunk of code (the anonymous method) to a variable (the delegate). As an example,
employee => employee.StartsWith("D");
In this case, the delegates used in the above query are defined in the System.Query namespace as such:
public delegate T Func<T>();
public delegate T Func<A0, T>(A0 arg0);
So this code snippet could be written as:
Func<string, bool> person = delegate (string s) {
return s.StartsWith("D"); };
As you can see, the lambda expression is a lot more compact than the above one. Lambda Expressions are basically just a compact version of anonymous Methods, and you can use either of them or even regular named methods when creating filters for these query operators. Lambda Expressions, though, have the benefit of being compiled either to IL or to an Expression Tree, depending on how they are used.
Note that you can also pass parameters to Lambda expressions, which can be explicitly or implicitly typed. In an explicitly typed parameter list, the type of each expression is explicitly specified. In an implicitly typed parameter list, the types are inferred from the context in which the lambda expression occurs:
(int count) => count + 1 //explicitly typed parameter
(y,z) => return y * z; //implicitly typed parameter
Extension Methods
Previously you understood how the query operators such as StartsWith can be used with the dot notation. You might wonder where these methods come from. These methods, which reside in the System.Query.Sequence class, are part of a new feature in C# 3.0 called Extension Methods.
Extension Method is a new way of extending existing types. Basically, this works by adding a "this" modifier on the first argument. For example, the Sequence class has the Where operator defined as follows:
public static IEnumerable<T> Where<T>(
this IEnumerable<T> source, Func<T, bool> predicate) {
foreach (T element in source) {
if (predicate(element)) yield return element;
}
}
Note the use of "this" modifier on the first argument. The compiler sees this and treats it as a new method on the specified type. So now IEnumerable<T> gets the Where() method. Here it is important to remember that the explicitly defined methods in the object get the first priority. For example, if you call Where() on an object, then the compiler goes to find Where() on the object itself first. If Where() is not present, then it goes off to find an Extension Method that matches the method signature. Clearly, while this feature is cool and really powerful, extension methods should be used extremely sparingly.
Note that C# 3.0 also makes it possible to add methods to existing classes that are defined in other assemblies. All the extension methods must be declared static and they are very similar to static methods. Note that you can declare them only in static classes. To declare an extension method, you specify the keyword "this" as the first parameter of the method, for example:
public static class StringExtension
{
public static void Echo(this string s)
{
Console.WriteLine("Supplied string : " + s);
}
}
The above code shows the extension method named Echo declared in the StringExtension class. Now you can invoke the Echo method like an instance method with a string. The string is passed with the first parameter of the method.
string s = "Hello world";
s.Echo();
Based on the above code, here are the key characteristics of extension methods.
- Extension methods have the keyword this before the first argument
- When extension methods are consumed, the argument that was declared with the keyword this is not passed. In the above code, note the invocation of Echo() method without any arguments
- Extension methods can be defined only in a static class
- Extension methods can be called only on instances. Trying to call them on a class will result in compilation errors. The class instances on which they are called are determined by the first argument in the declaration, the one having the keyword this.
Using Collections in LINQ
Now that you have seen the basics of LINQ and C# 3.0, let us look at a slightly more interesting example. First, let us define a new class named Person:
public class Person
{
private string _firstName;
private string _lastName;
private string _address;
public Person(){}
public string FirstName
{
get { return _firstName; }
set{_firstName = value; }
}
public string LastName
{
get { return _lastName; }
set { _lastName = value; }
}
public string Address
{
get{return _address; }
set{ _address = value; }
}
}
Now let us create a Person collection and query it using LINQ. You add a button to the form, name it btnLoopThroughObjects and modify its Click event to look like the following:
private void btnLoopThroughObjects_Click(object sender, EventArgs e)
{
List<Person> persons = new List<Person>
{new Person{FirstName = "Joe", LastName = "Adams", Address = "Chandler"},
new Person{FirstName = "Don", LastName ="Alexander", Address = "Washington"},
new Person{FirstName = "Dave", LastName = "Ashton", Address = "Seattle"},
new Person{FirstName = "Bill", LastName = "Pierce", Address = "Sacromento"},
new Person{FirstName = "Bill", LastName ="Giard", Address = "Camphill"}};
var personsNotInSeattle = from person in persons
where person.Address != "Seattle"
orderby person.FirstName
select person;
lstResults.Items.Clear();
foreach (var person in personsNotInSeattle)
{
lstResults.Items.Add(person.FirstName + " " + person.LastName +
" - " + person.Address);
}
}
The above code shows off a few cool features. The first is the new C# 3.0 support for creating class instances, and then using a terser syntax for setting properties on them:
new Person{FirstName = "Dave", LastName = "Ashton", Address = "Seattle"}
This is very useful when instantiating and adding classes within a collection like above (or within an anonymous type). Note that this example uses a Generics based List collection of type "Person". LINQ supports executing queries against any IEnumerable<T> collection, so can be used against any Generics or non-Generics based object collections you already have.
After creating the collection, you loop through the collection and filter out all the persons that are not in Seattle and order the results by first name of the persons using the below query.
var personsNotInSeattle = from person in persons
where person.Address != "Seattle"
orderby person.FirstName
select person;
The concept in this example is rather simple. It examines all persons using a compound from clause. If the address of a person is not equal to Seattle, the method adds that person to the resulting collection. The output produced by the above code is as follows:
Anonymous Types
In the previous section, the output from the query is an array of the persons. In the query, you specify that you only want those persons that are not in Seattle. In that case, you returned an array of Person objects with each Person object containing FirstName, LastName, and Address properties.
Let us say for example, you just want all the persons that meet the criteria but only with FirstName and Address properties. This means that you need to be able to create an unknown class with these two properties programmatically on the fly. This is exactly what the Anonymous Types in C# 3.0 allows you to accomplish this. Although these types are called anonymous types, CLR does assign a name to these types. But they are just unknown to us.
For example, the below snippet of code represents the Click event of a command button returns a sequence of a new type when queried using LINQ.
private void btnLoopThroughAnonymous_Click(object sender, EventArgs e)
{
List<Person> persons = new List<Person>
{new Person{FirstName = "Joe", LastName = "Adams", Address = "Chandler"},
new Person{FirstName = "Don", LastName ="Alexander", Address = "Washington"},
new Person{FirstName = "Dave", LastName = "Ashton", Address = "Seattle"},
new Person{FirstName = "Bill", LastName = "Pierce", Address = "Sacromento"},
new Person{FirstName = "Bill", LastName ="Giard", Address = "Camphill"}};
var personsNotInSeattle = from person in persons
where person.Address != "Seattle"
orderby person.FirstName
select new {person.FirstName,
person.Address};
lstResults.Items.Clear();
foreach (var person in personsNotInSeattle)
{
lstResults.Items.Add(person.FirstName + " --- " + person.Address );
}
}
Before discussing the code in detail, here is the output produced by the above code.
The above snippet of code is very similar to previous example in that here also you examine all persons using a compound from clause and return only those persons that are not in Seattle. However one key difference is that it does not return the entire person. It returns a new type that contains two public properties: FirstName, Address. This new type was created by the compiler. Here is the definition the compiler creates:
public class ?????
{
private string firstName;
private string address;
public string FirstName
{
get { return firstName; }
set { firstName= value; }
}
public string Address
{
get { return address; }
set { address = value; }
}
}
As you can see the above class is very similar to the Person class except that it is nameless meaning that it is an anonymous type. So the return value from the query is IEnumerable<?????> and what goes as a replacement for the question mark is something that is determined by the compiler. To be able to capture this collection of anonymous return type, you need a variable that can hold any object types including the compiler created types. This is exactly why you need a var keyword. As mentioned before, the var keyword is used to declare local variables when you do not know the name of the anonymous type that the compiler created for you. Variables declared with var must be initialized at the point they are declared, because it is the only way the compiler knows what type they might be.