我已经使用 pyodbc 和 sqlalchemy 创建了与 MySQL db 的连接。但是当我使用pd.to_sql()它时,它给了我一个错误。错误似乎是 pandas 试图进行某种编码。要转换的参数是字符串数据类型,DATABASE 编码是latin-1。但是当我使用connection.execute(insert query,params)相同的连接时,它工作正常。此外,当我使用pd.to_sql()建立的连接时sqlalchemy,mysqlconnector它工作高效。
pd.to_sql()
latin-1
connection.execute(insert query,params)
sqlalchemy
mysqlconnector
params = urllib3.parse.quote_plus("DRIVER={MySQL ODBC 8.0 ANSI Driver};" f"SERVER={host}:{port};" f"DATABASE={db};" f"UID={username};" f"PWD={password};" f"charset=utf8") db_engine = create_engine(f"mysql+pyodbc:///?odbc_connect={params}") connection = db_engine.connect() # main_df is a pd.DataFrame(). It contains a long text field which is most of the time getting affected. # the error mostly come from this column. maindf = pd.DataFrame() maindf['transcript'] = ['This is a sample 1', 'This is sample2'] maindf.to_sql("mytable", connection, if_exists="append", index=False, chunksize=1000)
错误如下所示:
File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1685, in _execute_context self.dialect.do_executemany( File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\default.py", line 713, in do_executemany cursor.executemany(statement, parameters) pyodbc.ProgrammingError: ('42000', "[42000] [MySQL][ODBC 8.0(a) Driver][mysqld-5.7.31-log]You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '4' at line 1 (1064) (SQLParamData)") The above exception was the direct cause of the following exception: Traceback (most recent call last): File "E:/myusername/cx-index-score/nice_rpa/pipeline.py", line 17, in <module> uploader.split_upload(os.path.abspath(Path('./datasets'))) File "E:\myusername\cx-index-score\nice_rpa\processandupload.py", line 163, in split_upload self.writetosandbox() File "E:\myusername\cx-index-score\nice_rpa\processandupload.py", line 216, in writetosandbox self.maindf.to_sql("nice_daily_update", self.connection, if_exists="append", File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\core\generic.py", line 2779, in to_sql sql.to_sql( File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\io\sql.py", line 590, in to_sql pandas_sql.to_sql( File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\io\sql.py", line 1405, in to_sql raise err File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\io\sql.py", line 1397, in to_sql table.insert(chunksize, method=method) File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\io\sql.py", line 831, in insert exec_insert(conn, keys, chunk_iter) File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\pandas\io\sql.py", line 748, in _execute_insert conn.execute(self.table.insert(), data) File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1200, in execute return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS) File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\sql\elements.py", line 313, in _execute_on_connection return connection._execute_clauseelement( File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1389, in _execute_clauseelement ret = self._execute_context( File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1748, in _execute_context self._handle_dbapi_exception( File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1929, in _handle_dbapi_exception util.raise_( File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\util\compat.py", line 211, in raise_ raise exception File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\base.py", line 1685, in _execute_context self.dialect.do_executemany( File "C:\ProgramData\Anaconda3\envs\nice_rpa\lib\site-packages\sqlalchemy\engine\default.py", line 713, in do_executemany cursor.executemany(statement, parameters) sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [MySQL][ODBC 8.0(a) Driver][mysqld-5.7.31-log]You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '33\x03@333333\x03@333333\x03@' at line 1 (1064) (SQLParamData)")
如果您看到语法错误部分,它实际上应该是一个简单的字符串,即utf-8。数据库编码为“latin1”。有趣的是,尽管每次数据和错误都相同,但“错误语法”部分总是会变化。一次是“4”,然后是“33\x03@333333\x03@333333\x03@”,每次运行时都会发生变化,但输入数据始终相同。
utf-8
您知道如何在将参数发送到数据库之前阻止 pandas 预处理参数吗?如果这不可能,您能否建议一种替代方法,以便有效地写入多列(范围为 1000)?
我使用了错误的驱动程序。服务器中的驱动程序是MySQL ODBC 5.1 Driver,而我在 中使用MySQL ODBC 8.0 ANSI Driver。这解释了奇怪的编码。
MySQL ODBC 5.1 Driver
MySQL ODBC 8.0 ANSI Driver
params = urllib3.parse.quote_plus("DRIVER={MySQL ODBC 5.1 Driver};" f"SERVER={host}:{port};" f"DATABASE={db};" f"UID={username};" f"PWD={password};" f"charset=utf8") db_engine = create_engine(f"mysql+pyodbc:///?odbc_connect={params}") connection = db_engine.connect()