panda dataframe row comparing performance issue with my code

This is the place for queries that don't fit in any of the other categories.

panda dataframe row comparing performance issue with my code

Postby narendra_mohan » Wed Sep 28, 2016 10:23 am

I have a dataframe like following around 2000 records:
Code: Select all
[
col1 col2 col3 col4 col5 col6 col7 col8  col9 colid
a     b    c    d1   e     f    g   h     1    aaa
a1    b1   c1   d2   e1    f1   g1  h2    2    bbb
a2    b2   c2   d3   e2    f2   g2  h3    3    ccc
a3    b1   c1   d2   e1    f1   g1  h2    4    ddd
a1    b3   c2   d1   e     f1   g1  h2    5    eee
.....
]

Let's say I have col2, col3, col4, col5 as key attributes. Now I am getting 3 key attributes unique combs from these using itertools.combination(keyattr, 4-1) and then grouping the above data frame for each combination and getting diff, diff text, and diff percentage.
Code: Select all
for combs in itertools.combination(keyattr, 4-1):
  grpdf=df.groupby(combs)
  for name, group in grpdf:
   group.sort(['col9'])
   group['col9next']=group['col9'].shift(-1)
   group['colidnext']=group['colid'].shift(-1)
   group['diff']=group['col9next']-group['col9']
   group['difftext']=np.where(group['diff']==0, '=', '<')
   group['prcntgdiff']=group['diff']/group['col9']

and some more data. Then I add it into the list like this:
Code: Select all
list.extend(group.T.to_dict().values())

Then returning
Code: Select all
pd.DataFrame(list, col=colums)

This code works fine but it takes more than 10 seconds to complete. Please help me to fix this performance issue.
Last edited by Yoriz on Wed Sep 28, 2016 12:03 pm, edited 1 time in total.
Reason: First post lock. Added code tags.
narendra_mohan
 
Posts: 1
Joined: Wed Sep 28, 2016 10:20 am

Return to General Coding Help

Who is online

Users browsing this forum: Google [Bot] and 10 guests