I am webscraping some data from a few websites, and using pandas to modify it.
On the first few chunks of data it worked well, but later I get this error message:
Traceback(most recent call last): File "data.py", line 394 in <module> df2[['STATUS_ID_1','STATUS_ID_2']] = df2['STATUS'].str.split(n=1, expand=True) File "/home/web/.local/lib/python2.7/site-packages/pandas/core/frame.py, line 2326, in __setitem__ self._setitem_array(key,value) File "/home/web/.local/lib/python2.7/site-packages/pandas/core/frame.py, line 2350, in _setitem_array raise ValueError("Columns must be same length as key') ValueError: Columns must be same length as key
My code is here:
df2 = pd.DataFrame(datatable,columns = cols) df2['FLIGHT_ID_1'] = df2['FLIGHT'].str[:3] df2['FLIGHT_ID_2'] = df2['FLIGHT'].str[3:].str.zfill(4) df2[['STATUS_ID_1','STATUS_ID_2']] = df2['STATUS'].str.split(n=1, expand=True)
EDIT-jezrael : i used your code, and maked a print from this: I hope with this we can find where is the problem..because it seems it is randomly when the scripts has got a problem with this split..
0 1 2 Landed 8:33 AM 3 Landed 9:37 AM 4 Landed 9:10 AM 5 Landed 9:57 AM 6 Landed 9:36 AM 8 Landed 8:51 AM 9 Landed 9:18 AM 11 Landed 8:53 AM 12 Landed 7:59 AM 13 Landed 7:52 AM 14 Landed 8:56 AM 15 Landed 8:09 AM 18 Landed 8:42 AM 19 Landed 9:39 AM 20 Landed 9:45 AM 21 Landed 7:44 AM 23 Landed 8:36 AM 27 Landed 9:53 AM 29 Landed 9:26 AM 30 Landed 8:23 AM 35 Landed 9:59 AM 36 Landed 8:38 AM 37 Landed 9:38 AM 38 Landed 9:37 AM 40 Landed 9:27 AM 43 Landed 9:14 AM 44 Landed 9:22 AM 45 Landed 8:18 AM 46 Landed 10:01 AM 47 Landed 10:21 AM .. ... ... 316 Delayed 5:00 PM 317 Delayed 4:34 PM 319 Estimated 2:58 PM 320 Estimated 3:02 PM 321 Delayed 4:47 PM 323 Estimated 3:08 PM 325 Delayed 3:52 PM 326 Estimated 3:09 PM 327 Estimated 2:37 PM 328 Estimated 3:17 PM 329 Estimated 3:20 PM 330 Estimated 2:39 PM 331 Delayed 4:04 PM 332 Delayed 4:36 PM 337 Estimated 3:47 PM 339 Estimated 3:37 PM 341 Delayed 4:32 PM 345 Estimated 3:34 PM 349 Estimated 3:24 PM 356 Delayed 4:56 PM 358 Estimated 3:45 PM 367 Estimated 4:09 PM 370 Estimated 4:04 PM 371 Estimated 4:11 PM 373 Delayed 5:21 PM 382 Estimated 3:56 PM 384 Delayed 4:28 PM 389 Delayed 4:41 PM 393 Estimated 4:02 PM 397 Delayed 5:23 PM [240 rows x 2 columns]
score:15
Accepted answer
You need a bit modify solution, because sometimes it return 2 and sometimes only one column:
df2 = pd.DataFrame({'STATUS':['Estimated 3:17 PM','Delayed 3:00 PM']})
df3 = df2['STATUS'].str.split(n=1, expand=True)
df3.columns = ['STATUS_ID{}'.format(x+1) for x in df3.columns]
print (df3)
STATUS_ID1 STATUS_ID2
0 Estimated 3:17 PM
1 Delayed 3:00 PM
df2 = df2.join(df3)
print (df2)
STATUS STATUS_ID1 STATUS_ID2
0 Estimated 3:17 PM Estimated 3:17 PM
1 Delayed 3:00 PM Delayed 3:00 PM
Another possible data - all data have no whitespaces and solution working too:
df2 = pd.DataFrame({'STATUS':['Canceled','Canceled']})
and solution return:
print (df2)
STATUS STATUS_ID1
0 Canceled Canceled
1 Canceled Canceled
All together:
df3 = df2['STATUS'].str.split(n=1, expand=True)
df3.columns = ['STATUS_ID{}'.format(x+1) for x in df3.columns]
df2 = df2.join(df3)
Credit To: stackoverflow.com
Related Query
- Pandas error in Python: columns must be same length as key
- ValueError: Columns must be same length as key in pandas
- Pandas ValueError: Columns must be same length as key for now apparent reason
- Try to replace the nan values by pandas , but Error: Columns must be same length as key
- Python Pandas ValueError Arrays Must be All Same Length
- ValueError: arrays must all be same length in python using pandas DataFrame
- ValueError: Columns must be same length as key
- pandas: columns must be same length as key
- How to split comma separated strings in a column into different columns if they're not of same length using python or pandas in jupyter notebook
- Issues with converting date time to proper format- Columns must be same length as key
- TypeError: invalid key error while trying to get min of two columns row wise in pandas in python
- ValueError: Columns must be same length as key with multiple outputs
- Key error when selecting columns in pandas dataframe after read_csv
- Compare 2 columns of 2 different pandas dataframes, if the same insert 1 into the other in Python
- ValueError: key must be provided when HDF5 file contains multiple datasets while reading h5 file in pandas i am getting this error
- Python Pandas Key Error When Trying to Access Index
- Pandas and JSON ValueError: arrays must all be same length
- Counting a number of same words between two columns in python pandas
- Python pandas groupby key error in pandas.hashtable.PyObjectHashTable.get_item
- How to group by two dependent columns and generate new unique key using python pandas or networkx lib?
- If one row in two columns contain the same string python pandas
- Python Pandas differing value_counts() in two columns of same len()
- Grouper and axis must be same length in Python
- Python Pandas Multi-index: keeping same length of level=1 with all level=0 indexes
- mean over all columns with the same prefix in python pandas
- How to use Python pandas Df to merge csvs with more than 1 same column and add only different columns
- I am extracting two columns from a dataframe using pandas but one column becomes as index and then i get key error while trying to access that column
- Appending variable length columns in Pandas dataframe Python
- Groupby pandas throwing ValueError: Grouper and axis must be same length
- Python pandas perform same aggregation on multiple columns
More Query from same tag
- Pandas - TypeError: Cannot perform 'rand_' with a dtyped [bool] array and scalar of type [bool]
- Concatenating string columns in pandas
- Panda Python - dividing a column by 100 (then rounding by 2.dp)
- Filter Pandas DataFrame to show only rows that contain all strings from a list of strings
- Pandas DataFrame with date and time from JSON format
- Rounding datetime to the nearest second instead returns date rounded to day
- Trying to do a Kruskall Wallis post hoc test in python but stats are different?
- How to sum within a group of values and then take the difference from another group?
- Dynamically change which geodataframe column is shown in a geoplot
- Openpyxl: 'ValueError: Max value is 14' when using load_workbook
- Conditional if statement function best method?
- Python (pandas) - reset index with count
- Append to Pandas dataframe while looping over lines?
- Replace part of df column with values defined in Series/dictionary
- Format X-axis in pandas and matplotlib