Problem
I have a data set with played football games and I'm trying to look more closely into the games played by each team at home. Therefore, I am trying to create a column for every team with a counter that tracks the consecutive wins. The following logic should apply:
- If win: increase counter += 1
- If tie: keep counter
- If loss: decrease counter -= 1
df[['Date', 'Home', 'Home result']].head() | | Date | Home | Home result | | ---- | -------- | -------------- | ----------- | | 0 | 1/1/2018 | Stoke City | Defeat | | 1 | 1/1/2018 | Brighton | Tie | | 2 | 1/1/2018 | Burnley | Defeat | | 3 | 1/1/2018 | Leicester City | Victory |
I created a function that outputs a dictionary that can be mapped to the DataFrame (sorted by Date) to create the new counter column.
def result_counter(df): d = {} for index, row in df.iterrows(): counter = 0 for second_index, second_row in df.iloc[:index+1].iterrows(): if second_row['Home result'] == 'Victory': counter += 1 elif second_row['Home result'] == 'Tie': counter += 0 else: counter -= 1 if second_row['Home'] not in d: d[second_row['Home']] = [counter] else: d[second_row['Home']].append(counter) return d
Expected Result
A new column should be created that indicates the consecutive score of the home team. The expected result for a subset should look like this:
| | Date | Home | Home result | Counter | | ---- | ---------- | --------- | ----------- | ------- | | 0 | 2017-08-19 | Liverpool | Victory | 1 | | 1 | 2017-08-27 | Liverpool | Victory | 2 | | 2 | 2017-09-16 | Liverpool | Tie | 2 | | 3 | 2017-10-14 | Liverpool | Defeat | 1 |
What I tried
I tried to map the dict to the new DataFrame. When doing this I either get a map object displayed or the whole list that corresponds to the key in the dictionary.
# with subset and map sorted_liverpool['Counter'] = map(sorted_liverpool, d) sorted_liverpool[['Date', 'Home', 'Home result', 'Counter']].head() | | Date | Home | Home result | Counter | | ---- | ---------- | --------- | ----------- | ------------------------------ | | 0 | 2017-08-19 | Liverpool | Victory | <map object at 0x7fc0650fe550> | | 1 | 2017-08-27 | Liverpool | Victory | <map object at 0x7fc0650fe550> | | 2 | 2017-09-16 | Liverpool | Tie | <map object at 0x7fc0650fe550> | | 3 | 2017-10-14 | Liverpool | Defeat | <map object at 0x7fc0650fe550> |
# with subset and map directly to "home" column sorted_liverpool['Counter'] = sorted_liverpool['Home'].map(d) sorted_liverpool[['Date', 'Home', 'Home result', 'Counter']].head() | | Date | Home | Home result | Counter | | ---- | ---------- | --------- | ----------- | ------------------------------------------------- | | 0 | 2017-08-19 | Liverpool | Victory | [1, 2, 2, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 9, ... | | 1 | 2017-08-27 | Liverpool | Victory | [1, 2, 2, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 9, ... | | 2 | 2017-09-16 | Liverpool | Tie | [1, 2, 2, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 9, ... | | 3 | 2017-10-14 | Liverpool | Defeat | [1, 2, 2, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7, 8, 9, ... |
How can I map the created dictionary to the DataFrame so it displays only the corresponding value in the list and not the whole list?
score:1
Avoid loop, especially iterrows
when you can.
You need to map the Home result
to respective value, then cumsum:
df['Counter'] = df['Home result'].map({
'Victory': 1,
'Tie': 0,
'Defeat': -1
}).cumsum()
Also, you might need to groupby
on Home
:
df['Counter'] = df['Home result'].map({
'Victory': 1,
'Tie': 0,
'Defeat': -1
}).groupby(df['Home']).cumsum()
Output:
Date Home Home result Counter
0 2017-08-19 Liverpool Victory 1
1 2017-08-27 Liverpool Victory 2
2 2017-09-16 Liverpool Tie 2
3 2017-10-14 Liverpool Defeat 1
score:0
try group by home and accumulative sum cumsum the results of an apply logic
txt="""Date,Home,Home Result
1/1/2018,Stoke City,Defeat
1/1/2018,Brighton,Tie
1/1/2018,Burnley,Defeat
1/1/2018,Leicester City,Victory
2017-08-19,Liverpool,Victory
2017-08-27,Liverpool,Victory
2017-09-16,Liverpool,Tie
2017-10-14,Liverpool,Defeat"""
df = pd.read_table(StringIO(txt), sep=',')
df['Date']=pd.to_datetime(df['Date'])
df['Outcome']=df['Home Result'].apply(lambda x: -1 if x=='Defeat' else 0 if x=='Tie' else 1 if x=='Victory' else 0)
df=df.sort_values(by=['Home','Date'],ascending=True)
df['Date_Total'] = df.groupby(['Home'])['Outcome'].cumsum()
print(df)
output
Date Home Home Result Outcome Date_Total
1 2018-01-01 Brighton Tie 0 0
2 2018-01-01 Burnley Defeat -1 -1
3 2018-01-01 Leicester City Victory 1 1
4 2017-08-19 Liverpool Victory 1 1
5 2017-08-27 Liverpool Victory 1 2
6 2017-09-16 Liverpool Tie 0 2
7 2017-10-14 Liverpool Defeat -1 1
0 2018-01-01 Stoke City Defeat -1 -1
score:1
print(df)
Date Home Home result
0 2017-08-19 Liverpool Victory
1 2017-08-27 Liverpool Victory
2 2017-09-16 Liverpool Tie
3 2017-10-14 Liverpool Defeat
4 2017-08-20 Stoke City Defeat
5 2017-08-28 Stoke City Tie
6 2017-09-17 Stoke City Defeat
7 2017-10-15 Stoke City Victory
df['new column'] = df['Home result'].map({'Victory': 1, 'Tie': 0, 'Defeat': -1})
df['new column'] = df.groupby('Home')['new column'].transform(pd.Series.cumsum)
print(df)
Date Home Home result new column
0 2017-08-19 Liverpool Victory 1
1 2017-08-27 Liverpool Victory 2
2 2017-09-16 Liverpool Tie 2
3 2017-10-14 Liverpool Defeat 1
4 2017-08-20 Stoke City Defeat -1
5 2017-08-28 Stoke City Tie -1
6 2017-09-17 Stoke City Defeat -2
7 2017-10-15 Stoke City Victory -1
Credit To: stackoverflow.com
Related Query
- Map Counter Object to DataFrame to create new column
- How to create new dataframe from intervals of the dataframe and map column values to it?
- Mapping columns from one dataframe to another to create a new column
- How do I create a new column in a dataframe from an existing column using conditions?
- Create a new column in a Pandas DataFrame from existing column names
- Create new dataframe in pandas with dynamic names also add new column
- How to create new values in a pandas dataframe column based on values from another column
- python pandas dataframe create new column from other columns' cells
- str.contains to create new column in pandas dataframe
- How to create a new column based on values from other columns in a Pandas DataFrame
- Create new column in dataframe with match values from other dataframe
- how do you pass multiple variables to pandas dataframe to use them with .map to create a new column
- Map values from one dataframe to new columns in other based on column values - Pandas
- How to create a new column for each unique component in a given column of a dataframe in Pandas?
- Separate string from numeric in single Pandas Dataframe column and create two new columns
- create new column in dataframe using fuzzywuzzy
- Pandas dataframe create a new column based on columns of other dataframes
- (pandas) Create new column based on first element in groupby object
- Create a new dataframe column by applying Excel formula using Python
- Create a new column in Pandas Dataframe based on the 'NaN' values in other columns
- How to create a new dataframe column with a set of nested IF rules (apply is very slow)
- Python: create a new dataframe column and write the index correspondig to datetime intervals
- conditional requirement to create new column for pandas dataframe
- how to create a new column in a dataframe based on conditions in another dataframe?
- Nested if conditions to create a new column in pandas dataframe
- Create new dataframe column with numbered time windows
- Create Dataframe column that uses a dictionary to map the corresponding key,value in a dataframe
- Create new dataframe columns from one column with different values and types
- Create new dataframe from two dataframes. One df contains column indices the other df the values
- Pandas - Create new column where values are taken from other rows in the same dataframe
More Query from same tag
- How to convert all Series values of a Dataframe from timedelta to minutes
- Python Pandas XlsxWriter Center Across Selection
- How to query a pandas DataFrame using an array of tuples ?
- How to handle pct_change with negative values
- Pandas read_excel
- Python dataframe to SQL numbers are getting rounded
- How to replace fields in data frame containing sub-strings given in list-dictionary and avoid ValueError?
- Create dictionary and append certain fields and columns to existing dataframe (based on condition in Python)
- Pandas: speeding up a query to a large SQLite database
- Fast reading of specified columns in df using pandas.to_hdf
- Reading a text file in pandas with separator as linefeed (\n) and line terminator as two linefeeds (\n\n)
- Calculating grouped by % based on if there are contained values in numerator and unique column value in denominator
- Extend pandas dataframe entries to same length
- How to split DF by dates based on another DF
- Pandas: How to sum the largest 3 values with respect to a sub-level in multilevel DataFrame