score:3

Accepted answer

darren kopp's answer:

var excluded = items1.except(items2);

is the best solution from a performance perspective.

(nb: this true for at least regular linq, perhaps linq to sql changes things as per marco russo's blog post. however, i'd imagine that in the "worst case" darren kopp's method will return at least the speed of russo's method even in a linq to sql environment).

as a quick example try this in linqpad:

void main()
{
   random rand = new random();
   int n = 100000;
   var randomseq = enumerable.repeat(0, n).select(i => rand.next());
   var randomfilter = enumerable.repeat(0, n).select(i => rand.next());

   /* method 1: bramha ghosh's/marco russo's method */
   (from el1 in randomseq where !(from el2 in randomfilter select el2).contains(el1) select el1).dump("result");

   /* method 2: darren kopp's method */
   randomseq.except(randomfilter).dump("result");
}

try commenting one of the two methods out at a time and try out the performance for different values of n.

my experience (on my core 2 duo laptop) seems to suggest:

n = 100. method 1 takes about 0.05 seconds, method 2 takes about 0.05 seconds
n = 1,000. method 1 takes about 0.6 seconds, method 2 takes about 0.4 seconds
n = 10,000. method 1 takes about 2.5 seconds, method 2 takes about 0.425 seconds
n = 100,000. method 1 takes about 20 seconds, method 2 takes about 0.45 seconds
n = 1,000,000. method 1 takes about 3 minutes 25 seconds, method 2 takes about 1.3 seconds

method 2 (darren kopp's answer) is clearly faster.

the speed decrease for method 2 for larger n is most likely due to the creation of the random data (feel free to put in a datetime diff to confirm this) whereas method 1 clearly has algorithmic complexity issues (and just by looking you can see it is at least o(n^2) as for each number in the first collection it is comparing against the entire second collection).

conclusion: use darren kopp's answer of linq's 'except' method

score:0

here's a more simple version of the same thing, you don't need to nest the query:

list<string> items1 = new list<string>();
items1.add("cake");
items1.add("cookie");
items1.add("pizza");

list<string> items2 = new list<string>();
items2.add("pasta");
items2.add("pizza");

var results = from item in items1
          where items2.contains(item)
          select item;

foreach (var item in results)
    console.writeline(item); //prints 'pizza'

score:1

another totally different way of looking at it would be to pass a lambda expression (condition for populating the second collection) as a predicate to the first collection.

i know this is not the exact answer to the question. i think other users already gave the correct answer.

score:4

from marco russo

northwinddatacontext dc = new northwinddatacontext();
dc.log = console.out;
var query =
    from c in dc.customers
    where !(from o in dc.orders
            select o.customerid)
           .contains(c.customerid)
    select c;
foreach (var c in query) console.writeline( c );

score:4

use the except extension method.

var items1 = new list<string> { "apple","orange","banana" };
var items2 = new list<string> { "grapes","apple","kiwi" };

var excluded = items1.except(items2);

Related Query

More Query from same tag