panda dataframe row comparing performance issue with my code

Wed Sep 28, 2016 10:23 am

I have a dataframe like following around 2000 records:
Code: Select all
col1 col2 col3 col4 col5 col6 col7 col8  col9 colid
a     b    c    d1   e     f    g   h     1    aaa
a1    b1   c1   d2   e1    f1   g1  h2    2    bbb
a2    b2   c2   d3   e2    f2   g2  h3    3    ccc
a3    b1   c1   d2   e1    f1   g1  h2    4    ddd
a1    b3   c2   d1   e     f1   g1  h2    5    eee

Let's say I have col2, col3, col4, col5 as key attributes. Now I am getting 3 key attributes unique combs from these using itertools.combination(keyattr, 4-1) and then grouping the above data frame for each combination and getting diff, diff text, and diff percentage.
Code: Select all
for combs in itertools.combination(keyattr, 4-1):
  for name, group in grpdf:
   group['difftext']=np.where(group['diff']==0, '=', '<')

and some more data. Then I add it into the list like this:
Code: Select all

Then returning
Code: Select all
pd.DataFrame(list, col=colums)

This code works fine but it takes more than 10 seconds to complete. Please help me to fix this performance issue.
