Sort by index then column in pandas

This is the place for queries that don't fit in any of the other categories.

Sort by index then column in pandas

Postby tnknepp » Mon Jun 03, 2013 5:04 pm

I have a pandas.DataFrame that has datetime indexes, e.g.

Code: Select all
>>>a
                      cf    empt  dist   g2_water   lat    lon
2007-01-10 14:52:00  0.000  NaN  14.4    4.2801  37.019 -76.507
2007-01-09 15:13:00  0.000  NaN  21.0    9.5359  37.168 -76.159
2007-01-09 15:13:00  0.041  NaN  23.4    7.3543  37.018 -76.622
2007-01-06 14:35:00  0.000  NaN  25.0   25.1860  37.156 -76.653
2007-01-09 15:13:00  0.056  NaN  30.6   -1.0000  37.362 -76.483
2007-01-09 15:13:00  0.000  NaN  31.8    7.9639  36.822 -76.294
2007-01-07 14:14:00  0.144  NaN  42.1   -1.0000  37.180 -75.916
2007-01-07 14:14:00  0.176  NaN  45.5   -1.0000  37.138 -75.869
2007-01-06 14:35:00  0.000  NaN  45.6   23.0670  36.804 -76.735
2007-01-06 14:35:00  0.000  NaN  48.1   26.6850  37.507 -76.564



As you can see, there exist rows with the same index (must be confusing to pandas!), and my data are currently sorted by index. It is common in satellite data to have multiple data points with the same time stamp since your instrument has multiple pixels. I know that in MatLab this issue can be handled quite easily by saying: sort by column one, and then by column two if necessary. Is there any way of doing this in pandas? I suspect there is, but I am not finding it. Everything I have found so far sorts the data by index, then by "dist", so I might as well just sort by "dist".

After this sorting I want to unique-ify my dataframe so I have only unique datetime values (i.e. grab the pixel nearest our ground site (where "dist" is min) for the specified datetime). So my list above will look like:

Code: Select all
>>>a
                      cf    empt  dist   g2_water   lat    lon
2007-01-06 14:35:00  0.000  NaN  45.6   23.0670  36.804 -76.735
2007-01-07 14:14:00  0.144  NaN  42.1   -1.0000  37.180 -75.916
2007-01-09 15:13:00  0.000  NaN  21.0    9.5359  37.168 -76.159
2007-01-10 14:52:00  0.000  NaN  14.4    4.2801  37.019 -76.507


Any ideas on this?
Python: 2.7 via Anaconda
Numpy: 1.7
Pandas: 0.11
OS: Windows 7
IDE: Spyder/IPython
User avatar
tnknepp
 
Posts: 134
Joined: Mon Mar 11, 2013 7:41 pm

Re: Sort by index then column in pandas

Postby setrofim » Mon Jun 03, 2013 6:15 pm

You can specify multiple columns when sorting. To sort by index as well, create an extra column with the index. Something like
Code: Select all
a['idx'] = a.index
sorted_a = a.sort(columns=['idx', 'dist'])
setrofim
 
Posts: 288
Joined: Mon Mar 04, 2013 7:52 pm


Return to General Coding Help

Who is online

Users browsing this forum: Google [Bot], Mikhailov and 4 guests