Improving Code Readability with Linq (and MoreLinq)

This post is part of the third annual C# Advent. Check out the home page for up to 50 C# blog posts in December 2019! Thanks, Matthew D. Groves for organizing it.

My friends and coworkers have accused me of falling in love with Linq. This may or may not be true... The truth is, ever since Linq came out with .NET 3.5, along with LINQ2SQL, I have invested heavily in using it, to the point where I now find myself writing 50+ line Linq queries. However, when reading code from others, I find that many people still do not appreciate the value of Linq, and what it means for clean, readable code.

Now, 50 lines is most definitely long (probably too long, according to most), but I have found that when working with datasets, whether from the database via ORMs, or in memory via IEnumerable<>, Linq has helped me to write code that is more explicit about why I am writing the code rather than the details of how I am writing it. For example, which of the following is easier to grok?

var list = new List<Class>();  
for (int i = 0; i < oldList.Count; i++)  
   if (oldList[i].FieldA == 1 && 
       oldList[i].FieldB == "Filter")
      list.Add(oldList[i]);

or

var list = oldList  
  .Where(o => o.FieldA == 1)
  .Where(o => o.FieldB == "Filter")
  .ToList();

These two code snippets do essentially the same thing, but in the first, you have to parse out the for loop, verify that the start and end counts are correct, and pick out the condition from the if statement. The second statement reads much closer to how it would be described in a business rule specification: "the new list should be all items where FieldA has a value of 1 and FieldB has a value of 'Filter'".

Side Note

Many of the examples I will show you here come from my puzzle answers to the annual Advent of Code programming event. It is a blast to do, and I encourage anyone who wants to improve their programming skills and their problem solving skills to work on these puzzles. My examples come from these solutions because they are readily available in my GH repository.

Tools

Before we go too much further, let's start talking about two of the simplest tools used in Linq queries.

.Select()

Examples: 1 2 3

The first and most common tool is the .Select() function. Here, we are simply converting data from one object type to another. In one of my examples above, I take a list of strings and convert all of them to ints, with a single line of code (var numbers = input.GetLines().Select(s => Convert.ToInt32(s)).ToList()). Otherwise, I would have had to do a for loop like so:

var numbers = new List<int>();  
foreach (var s in input.GetLines())  
  numbers.Add(Convert.ToInt32(s));

.Where()

Examples: 1 2

The second tool is just as common, the .Where() function. This one can be filed under "just what it says on the box"; it takes a list of objects and returns one that only has objects that match the provided condition.

Truth be told, if I were to guess, I believe that my professional code calls .Select() and .Where() more than any other functions in the standard library. Most of the time, I am working with lists of data (small and large), and these two functions allow me to build complex transformations with relative ease.

Where am I going with this?

Let's pick a relatively straight-forward piece of code as an example. Reviewing the problem statement for day 4 of this years Advent of Code, we find that the goal of the problem is to enumerate all of the numbers between a min and max provided, and identify which ones match a certain criteria.

There are a variety of ways people have solved this problem in C# (1, 2, 3, etc.); most of them use for loops to iterate through the passwords and multiple functions to separate code into simple chunks. Both of these are good things.

However, the code takes a lot of space on screen, and can be difficult to take in all at once, especially when trying to read it for the first time. When looking at a Linq version of the code, you may notice that it is only 11 lines long:

var range = input.GetString().Split('-');  
var min = Convert.ToInt32(range[0]);  
var max = Convert.ToInt32(range[1]);

PartA = Enumerable.Range(min, max - min + 1)  
    .Where(i => i.ToString().Window(2).All(x => x[0] <= x[1]))
    .Where(i => i.ToString().GroupAdjacent(c => c).Any(g => g.Count() >= 2))
    .Count();

PartB = Enumerable.Range(min, max - min + 1)  
    .Where(i => i.ToString().Window(2).All(x => x[0] <= x[1]))
    .Where(i => i.ToString().GroupAdjacent(c => c).Any(g => g.Count() == 2))
    .Count();

Let's walk through it real quick and see if we can appreciate why it can be more readable. We'll skip the first three lines, as they are standard and should be obvious.

Starting on line three (PartA =), we see that we're starting with an auto-generated enumeration of numbers, from min to max (Enumerable.Range() expects the number of items, not the maximum number to return, so we calculate the number: max - min + 1). Then we apply two filters (.Where()), and then count the number of items in the list (.Count()). We do not have to keep track of a counting variable and remember to increment it, both criteria are immediately and clearly applied; it should be relatively evident what we are doing at the top level here.

Even the criteria use Linq to express how to evaluate them. For the first criteria, we can see the following steps:
1. We take a number and convert it to a string. (.ToString())
2. We collect each part of neighboring characters (.Window(2)) and
3. We process each pair by evaluating if the first character is less than or equal to the second character (x => x[0] <= x[1])
4. We determine if all such pairs pass this condition (.All())

The net result of this criteria is that we will return true if and only if the digits of the number are strictly non-decreasing (each digit is equal or increasing over the previous digit).

Side note: .Window() and .GroupAdjacent() come from the MoreLinq library (Nuget, Homepage). .Window(), .Segment(), and .Batch() are my most commonly used functions from this library, all of which I have used in my puzzle solving for AoC this year.

The second criteria is similarly straight-forward:
1. We take a number and convert it to a string. (.ToString())
2. We adjacent digits and if they are equal, group them together (.GroupAdjacent()).
3. We count the number of items in each group (g.Count())
4. We determine if any group has at least 2 items in the group (.Any(g => g.Count() >= 2)).

The implementation of Part B and the distinction between parts A and B should be obvious from comparing the code for each part.

The net result of using Linq for this code is that all of it fits on one screen, it is expressive to describe what we are trying to accomplish, and it removes the requirement of exploring secondary functions to determine their behavior.

Conclusion

In general, Linq functions are well-named and have obvious intent, they provide common framework of behavior with easy specification of how the behavior should be applied, and they reduce the overall amount of code that a developer needs to read or write. Collectively, this improves the overall readability of code written with Linq.