score:1
perhaps this is not the best theoretical approach to work with millions of records. however, this is working and can be used as a starting point for further improvements.
class program
{
static void main(string[] args)
{
var startingrecords = new list<record>()
{
new record(1001930, "a", "in"),
new record(1004901, "b", "in"),
new record(1005192, "a", "out"),
new record(1012933, "a", "in"),
new record(1014495, "b", "out"),
new record(1017891, "a", "out"),
};
var records = startingrecords.orderby(x => x.badgeid).thenby(x => x.time).tolist();
var pairs = records.skip(1).zip(records, (second, first) => tuple.create(first, second)).
where(x => x.item1.badgeid == x.item2.badgeid &&
x.item1.direction == "in" && x.item2.direction == "out").
select(x => new pair(x.item1.badgeid, x.item1.time, x.item2.time)).tolist();
foreach (var pair in pairs)
console.writeline(pair.badgeid + "\t" + pair.timein + "\t" + pair.timeout);
console.read();
}
}
class record
{
public long time { get; set; }
public string badgeid { get; set; }
public string direction { get; set; }
public record(long time, string badgeid, string direction)
{
time = time;
badgeid = badgeid;
direction = direction;
}
}
class pair
{
public string badgeid { get; set; }
public long timein { get; set; }
public long timeout { get; set; }
public pair(string badgeid, long timein, long timeout)
{
badgeid = badgeid;
timein = timein;
timeout = timeout;
}
}
output:
a 1001930 1005192
a 1012933 1017891
b 1004901 1014495
score:1
i'm not sure how efficient or performant this would be, but i think it can be translated by linq into sql so if you are using a database, it may push more of the calculation to the server.
first, group the records by the badges:
var p1 = from p in punches
group p by p.badge into pg
select new {
badge = pg.key,
punches = pg.orderby(p => p.time)
};
then, for each badge's group of records, go through all the "in" records and match it with the "out" record if it exists:
var p2 = p1.selectmany(pg => pg.punches.where(p => p.dir == "in")
.select(p => new {
pg.badge,
timein = p.time,
timeout = pg.punches.where(po => po.dir == "out" && po.time > p.time)
.firstordefault().time
}));
finally, order the result:
var ans = p2.orderby(bio => bio.badge).thenby(bio => bio.timein);
since linq to sql propagates nulls automatically, i think this will handle a missing "out" punch for an "in", but not orphan "out" punches.
another possibility is to use the select
with two parameters to group the punch records in pairs, but that only works with linq to objects so unless you are filtering the data before processing, the millions of records would all be pulled into memory.
for completeness, here is an attempt at it:
var p2 = p1.asenumerable()
.selectmany(pg => pg.punches.select((p, i) => (p, i))
.groupby(pi => pi.i / 2, pi => pi.p)
.select(pp => new {
pg.badge,
timein = pp.where(p => p.dir == "in").firstordefault()?.time,
timeout = pp.where(p => p.dir == "out").firstordefault()?.time
}));
none of this will work very well if your punches aren't well ordered, e.g. you are missing an initial "in".
Source: stackoverflow.com
Related Query
- Efficiently pairing temporally-related records using LINQ
- Get records from table and related related table using Linq to entity
- Using Linq to find all records in table that have a specific value in a related table using method syntax
- How to return list of records along with latest of related record using LINQ
- Include records if related entities have a value using LINQ
- How to more efficiently materialize related items using EF and LINQ
- Convert string[] to int[] in one line of code using LINQ
- Update records using LINQ
- Get top N records using LINQ to Entities
- How to retrieve last 5 records using LINQ method or query expression in C#
- Selecting first 100 records using Linq
- Select most recent records using LINQ to Entities
- Deleting multiple records with Entity Framework using a single LINQ query
- Linq - Using array in Lambda expression to fetch multiple records
- Left outer join using LINQ -- understanding the code
- How to reuse a linq expression for 'Where' when using multiple source tables
- Avoiding code repetition when using LINQ
- Using LINQ to delete an element from a ObservableCollection Source
- LINQ Source Code Available
- Counting records in C# using LINQ
- Overlapping records between two dates using Linq
- Delete all records from a database using LINQ to SQL
- Efficiently check if record exists in database using Entity framework LINQ
- How can I write the following code more elegantly using LINQ query syntax?
- How can I code an outer join using LINQ and EF6?
- C# Using LINQ Query compare the records with result of process array
- C# .Net 3.5 Code to replace a file extension using LINQ
- Any chance to get unique records using Linq (C#)?
- Check if List is not null when using "contains" in LINQ query else select all records
- Trying to understand LINQ code using c#
More Query from same tag
- How can I do a multi level parent-child sort using Linq?
- LINQ Contains() with list of string
- LINQ groupby to List<List<object>>
- Group objects of same kind C#
- What is the difference between following two statements?
- To skip the elements using skipwhile
- Add values to a list where data is missing
- Filter Data table to list using Linq
- Getting distinct objects from a list
- linq EnumerableQuery<Char?>?
- LINQ: Inner join to the First row in a sub query?
- Can LINQ construct and return a multidimensional array
- Gettting Object reference not set to an instance of an object. using linq c# datatable angularjs?
- How to iterate through a DateTime List and send to a function as parameter
- using LINQ to search in 2 different elements in XML file
- C# Linq- How to manage paging of a parent child query?
- Complex Conditional sort group by item containing others
- How to Select All Fields in Linq to Entity Using Lambda Expression?
- LINQ not reading parenthesis
- SQL Time Duration Between Records
- Error converting Linq query to list
- List of running minimum values
- Get all referencing columns to a referenced column in linq
- How to join lists according to an index and combine member if index already exist
- Joining of two queries and returning in list format
- Linq just starting out
- Automatically Compile Linq Queries
- VB Lambda that checks for DBNull
- simple way to sort a list based on max value of a property
- LINQ. Return Matching Records from second table.