score:8

Accepted answer

This should remove the 10k as a multiple by, and just do it once?

What it means is that instead of iterating personList 100k times, performing the where and select operations for each of those iterations that you'll be iterating the resulting List 100k times, and that the where and select operations will only have been performed on the underlying data source once.

The question is that this seems too easy, and I'm sure the LINQ is doing something clever somewhere that should mean that this doesn't happen.

Nope, your first query is simply something that you shouldn't be doing using LINQ, you should be taking the results of the query and placing them into a data structure if you plan to iterate over them many times (which is what you changed).

You can improve this query even more by using the appropriate data structure. Searching on a List is rather inefficient, as it needs to do a linear search. It would be preferable to use a HashSet to store the results of the query. A HashSet has O(1) search speed in the average case, as opposed to O(n) search time of a List.

var dates = new HashSet<DateTime>(from Person p in personList
                                  where p.OrganisationID = 123
                                  select p.Birthday);

foreach (DateTime d in dateList.Where(date => dates.Contains(date)))
{
    Console.WriteLine(string.Format("Date: {0} has a Birthday", d.ToShortDateString()));
}

score:0

I'm assuming that you are referring to LINQ-to-Objects, as each LINQ provider has its own implementation (LINQ-to-SQL, LINQ-to-Entities, LINQ-to-XML, LINQ-to-anything).

Taking your example of personBirthdays, it is not a foregone conclusion that the expression was created for the purpose of iterating through the full result set, so LINQ cannot automatically materialize the results to an array or list.

These operations are very different:

personBirthdays.Distinct()
personBirthdays.FirstOrDefault(b => b.Month == 7)
personBirthdays.Select(b => b.Year).Distinct()

What LINQ as a technology does that is "clever" is to allow the construction of an expression tree and to defer execution. This is what prevents--in the 3rd example above--100k iteration to get birthdays, then another 100k to choose the year, then a final, costly pass to assemble the distinct values.

The LINQ consumer (you) has to own the destiny of the expression. If you know that the result set will be iterated over multiple times, the onus is on you to materialize them to an array or list.

score:3

This is typical select n+1 problem, and after you applied .ToList() you have partially solved it. Next step could be following: you're constantly iterating over personBirthdays list, replace it with HashSet and you could perform Contains(d) much much faster and remove duplicates:

var personBirthdays = new HashSet<DateTime>((from Person p in personList
    where p.OrganisationID = 123
    select p.Birthday).ToArray());

Related Query

More Query from same tag