encoding problems

This is the place for queries that don't fit in any of the other categories.

encoding problems

Postby lmr1405 » Sat Nov 09, 2013 5:34 pm

What is the best way to even begin to go about solving this problem --- when you are dealing with very large input files >5GB

Code: Select all
File "script.py", line 19, in <module>
    for line in oIndexFile:
  File "/usr/lib64/python-2.7/lib/
python2.7/codecs.py", line 684, in next
    return self.reader.next()
  File "/usr/lib64/python-2.7/lib/python2.7/codecs.py", line 615, in next
    line = self.readline()
  File "/usr/lib64/python-2.7/lib/python2.7/codecs.py", line 530, in readline
    data = self.read(readsize, firstline=True)
  File "/usr/lib64/python-2.7/lib/python2.7/codecs.py", line 477, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 0: invalid continuation byte


It is difficult to make tests because when I was testing the script -- it does not have this problems on smaller files that were manually created --- but on the larger file it does -- but it is nearly impossible for me to find exactly where the problemc omes from.

For instance, I got this error after the program was being executed for more than 3 hours...
lmr1405
 
Posts: 22
Joined: Fri Oct 25, 2013 9:49 am

Re: encoding problems

Postby ochichinyezaboombwa » Wed Nov 13, 2013 11:09 pm

From your trace, it's clear that the problem is not in the file's size but in its content: it has some character(s) that your codec cannot decode. As you don't give any details it is hard to tell more.

As for reading a 5Gb file on any modern system, that shouldn't be an issue. What sort of a problem do you try to overcome?
ochichinyezaboombwa
 
Posts: 200
Joined: Tue Jun 04, 2013 7:53 pm

Re: encoding problems

Postby Mekire » Wed Nov 13, 2013 11:53 pm

Give this a read through:
http://nedbatchelder.com/text/unipain.html

Might help.

-Mek
User avatar
Mekire
 
Posts: 820
Joined: Thu Feb 07, 2013 11:33 pm
Location: Amakusa, Japan


Return to General Coding Help

Who is online

Users browsing this forum: No registered users and 6 guests