hstack strings & nums reduces precision

This is the place for queries that don't fit in any of the other categories.

hstack strings & nums reduces precision

Postby tnknepp » Tue Jun 25, 2013 2:45 pm

I'm analyzing data, and would like to do so in pandas. However, the file format is not structured to be easily read into a pandas.DataFrame, so I do the initial processing in numpy. I get to the point where I have five arrays/matrices ( each of size=(1,1) ):

routine = array([['MO']], dtype='|S2')

dates = array([[datetime.datetime(2013, 2, 27, 13, 1, 42]])

counts = matrix([[ 25528.]])

sza = array([[ 126.77926586]])

error = matrix([[ 76.]])

point_sza = array([[ 0.]])

From here I put the data into a DataFrame:

Code: Select all
import pandas as pd
from numpy import *

tmp = pd.DataFrame(hstack( (routine,counts,error,point_sza,sza) ),
                                       columns = ['routine','counts','count_error','point_sza','sza'],
                                       index = dates)


The problem is caused by the string (routine) that has dtype='S2' when I "hstack" the data:
Code: Select all
>>> tmp
                         routine  counts count_error point_sza sza
2013-02-27 13:01:42      MO      25          76        0.  12


Counts should be 25528, but it got cut to 25...and converted to string because of "routine".

Is there a way to create the DataFrame while maintaining the original type/length of my numerical values?
Python: 2.7 via Anaconda
Numpy: 1.7
Pandas: 0.11
OS: Windows 7
IDE: Spyder/IPython
User avatar
tnknepp
 
Posts: 114
Joined: Mon Mar 11, 2013 7:41 pm

Re: hstack strings & nums reduces precision

Postby tnknepp » Thu Jun 27, 2013 12:34 pm

Well, I figured something out that works...just stuff the data into a dictionary.
My data continue to be of different types (e.g. string, double; all of which have been pulled out of the arrays, so this is different from my initial post), but the dictionary doesn't care about that.

i.e.
Code: Select all
t = {'routine':routine,
       'counts':counts,
       'count_error':error,
       'point_sza':point_sza,
       'sza':sza,
       'filt1':filt1,
       'filt2':filt2}  # Added in filt1/2 since initial writing

tmp  = pd.DataFrame(t, index = dates)
data = data.append(tmp)  # I eventually use this to append data into a DataFrame that is returned by my function


Pretty simple solution really...
Python: 2.7 via Anaconda
Numpy: 1.7
Pandas: 0.11
OS: Windows 7
IDE: Spyder/IPython
User avatar
tnknepp
 
Posts: 114
Joined: Mon Mar 11, 2013 7:41 pm


Return to General Coding Help

Who is online

Users browsing this forum: No registered users and 2 guests