Python CSV Bult Insert with MySQL

Python CSV Bult Insert with MySQL

Postby shankar » Mon Aug 22, 2016 6:04 am

I Start new Project in Python with MySQL.

I just try to insert millions of record from CSV to MySQL through MySQLdb package.

My Code:
Code: Select all
 import pandas as pd
 import MySQLdb

 #Connect with MySQL
 db = MySQLdb.connect('localhost','root','****','MY_DB')
 cur = db.cursor()

 #Reading CSV
 df = pd.read_csv('/home/shankar/LAB/Python/Rough/******.csv')

 for i in df.COMPANY_NAME:
    i = i.replace("'","")
    i = i.replace("\\","")
   #i = i.encode('latin-1', 'ignore')
    cur.execute("INSERT INTO polls_company (name) VALUES ('" + i + "')")

This code working fine in some sort of CSV files, but having issues in few CSV files.

Error :

Code: Select all
 UnicodeEncodeError                        Traceback (most recent call last)
 <ipython-input-7-aac849862588> in <module>()
 13     i = i.replace("\\","")
 14     #i = i.encode('latin-1', 'ignore')
 ---> 15     cur.execute("INSERT INTO polls_company (name) VALUES ('" + i + "')")
 16 db.commit()

 /home/shankar/.local/lib/python3.5/site-packages/MySQLdb/ in execute(self, query, args)
 212         if isinstance(query, unicode):
 --> 213             query = query.encode(db.unicode_literal.charset, 'surrogateescape')
 215         res = None

 UnicodeEncodeError: 'latin-1' codec can't encode character '\ufffd' in position 49: ordinal not in range(256)

Here, this "Character Encoding" issue is occurred in some CSV files only, but i want automatic Insertion with common encoding techniques.

Because CSV Files encoded works with "utf-8", "latin-1" and more...

If i use utf-8 : then i got error in latin-1 and vise versa

So, Is there any ways to operate all kind of CSV file with common encoding


any other ways to solve this ?

[Thanks in Advance...]
Last edited by micseydel on Mon Aug 22, 2016 7:38 am, edited 1 time in total.
Reason: Initial post lock.
Posts: 1
Joined: Mon Aug 22, 2016 5:56 am

Re: Python CSV Bult Insert with MySQL

Postby Ofnuts » Mon Aug 22, 2016 10:37 am

1) The FFFD Unicode character is already a sign of trouble. It seems to be used when a character has no unicode representation... So maybe you should sanitize your input. Maybe you should also ascertain the scope of your input.

2) Normally, UTF-8 should handle any character... what kind of problems have you got with UTF-8 conversion?

Btw, if you really insert millions of lines, inserting them row by row is going to take ages, and if you do one single commit at the end, the DB will have to hold off all data for a potential rollback... Better do queries covering several thousands rows at a time, each followed by a commit.
This forum has been moved to See you there.
User avatar
Posts: 2659
Joined: Thu May 14, 2015 9:46 am
Location: Paris, France, EU, Earth, Solar system, Milky Way, Local Cluster, Universe #32987440940987

Return to Web Development

Who is online

Users browsing this forum: No registered users and 1 guest