score:64

Accepted answer

The key to answering this is deferred execution. When you do this

items = items.Select(item => items.First(i => i == item));

you do not iterate the items array passed into the method. Instead, you assign it a new IEnumerable<int>, which references itself back, and starts iterating only when the caller starts enumerating the results.

That is why all your other fixes have dealt with the problem: all you needed to do is to stop feeding IEnumerable<int> back to itself:

  • Using var foo breaks self-reference by using a different variable,
  • Using return items.Select... breaks self-reference by not using intermediate variables at all,
  • Using ToList() breaks self-reference by avoiding deferred execution: by the time items is re-assigned, old items has been iterated over, so you end up with a plain in-memory List<int>.

But if it's feeding on itself, how does it get anything at all?

That's right, it does not get anything! The moment you try iterating items and ask it for the first item, the deferred sequence asks the sequence fed to it for the first item to process, which means that the sequence is asking itself for the first item to process. At this point, it's turtles all the way down, because in order to return the first item to process the sequence must first get the first item to process from itself.

score:5

After studying the two answers given and poking around a bit, I came up with a little program that better illustrates the problem.

    private int GetFirst(IEnumerable<int> items, int foo)
    {
        Console.WriteLine("GetFirst {0}", foo);
        var rslt = items.First(i => i == foo);
        Console.WriteLine("GetFirst returns {0}", rslt);
        return rslt;
    }

    private IEnumerable<int> GoNuts(IEnumerable<int> items)
    {
        items = items.Select(item =>
        {
            Console.WriteLine("Select item = {0}", item);
            return GetFirst(items, item);
        });
        return items;
    }

If you call that with:

var newList = GoNuts(new[]{1, 2, 3, 4, 5, 6});

You'll get this output repeatedly until you finally get StackOverflowException.

Select item = 1
GetFirst 1
Select item = 1
GetFirst 1
Select item = 1
GetFirst 1
...

What this shows is exactly what dasblinkenlight made clear in his updated answer: the query goes into an infinite loop trying to get the first item.

Let's write GoNuts a slightly different way:

    private IEnumerable<int> GoNuts(IEnumerable<int> items)
    {
        var originalItems = items;
        items = items.Select(item =>
        {
            Console.WriteLine("Select item = {0}", item);
            return GetFirst(originalItems, item);
        });
        return items;
    }

If you run that, it succeeds. Why? Because in this case it's clear that the call to GetFirst is passing a reference to the original items that were passed to the method. In the first case, GetFirst is passing a reference to the new items collection, which hasn't yet been realized. In turn, GetFirst says, "Hey, I need to enumerate this collection." And thus begins the first recursive call that eventually leads to StackOverflowException.

Interestingly, I was right and wrong when I said that it was consuming its own output. The Select is consuming the original input, as I would expect. The First is trying to consume the output.

Lots of lessons to be learned here. To me, the most important is "don't modify the value of input parameters."

Thanks to dasblinkenlight, D Stanley, and Lucas Trzesniewski for their help.

score:20

It looks like the Select is iterating over its own output

You are correct. You are returning a query that iterates over itself.

The key is that you reference items within the lambda. The items reference is not resolved ("closed over") until the query iterates, at which point items now references the query instead of the source collection. That's where the self-reference occurs.

Picture a deck of cards with a sign in front of it labelled items. Now picture a man standing beside the deck of cards whose assignment is to iterate the collection called items. But then you move the sign from the deck to the man. When you ask the man for the first "item" - he looks for the collection marked "items" - which is now him! So he asks himself for the first item, which is where the circular reference occurs.

When you assign the result to a new variable, you then have a query that iterates over a different collection, and so does not result in an infinite loop.

When you call ToList, you hydrate the query to a new collection and also do not get an infinite loop.

Other things that would break the circular reference:

  • Hydrating items within the lambda by calling ToList
  • Assigning items to another variable and referencing that within the lambda.

Related Query

More Query from same tag