Maison > Article > base de données > Customized MySQL LOAD DATA LOCAL INFILE handlers with libmys_MySQL
One of the lesser known (and used) MySQL Client API calls seems to bemysql_set_local_infile_handler()
which allows to override the defaultLOAD DATA LOCAL INFILE
behavior on the client side, allowing to import data in text form, e.g. in CSV format, from other sources than actual local files. I searched for some more detailed info than the reference page in the manual for this, or some example code using this mechanism, but couldn't really find any, so this post is going to try to close this gap.
This allows for things similar to PostgreSQL'sCOPY FROM STDIN
mechanism in a way, there is a substantial difference between the two DMBS's approaches here: withCOPY FROM STDIN
an application enters a sort of push mode where text data can be sent using specific library functions the mysql client library API implements a pull model instead where the client library takes control and requests application data via specific callbacks that need to be registered up front withmysql_set_local_infile_handler()
before executing aLOAD DATA INFILE
query.
mysql_set_local_infile_handler()
expects four callback function pointers, one for initialization, one for fetching a single line of text, one for cleaning up, and an error handler. It also expects a mysql connection handle as first parameter, and a pointer to user defined per-connection data that will be passed to the init function later.
This data pointer may point to data common to all invocations of the custom handler. In the example code I'm just using it to point to a descriptive name.
The call flow for the various infile callbacks triggered while processing aLOAD DATA INFILE
query looks like this, with the upper part being the regular case:init()
being called once, thenread()
repeatedly until it can't provide any more data, and finally theend()
callback. The lower paths show the error handling case that is triggered byinit()
returning a non-zero value, or byread()
returning a negative lenght to indicate an error. In this case theerror()
callback is called, followed by a call toend()
.
+------ 0 |+--------+|+--------+ |+-------+| init() |-+- == 0 -+->| read() |-+- == 0 ---------->| end() |+--------+ | +--------+ | ^+-------+ |!= 0 ---------+-->| error() |-+ +---------+
init()
int local_infile_init(void **instance_data, const char *filename, void *handler_data)
The init function is called first whenever aLOAD DATA LOCAL
query is issued. It receives a pointer-pointer where it can store the pointer to local state data to be used for thisLOAD
operation, the filename used in theLOAD
statement, and the per-connection user data pointer that was passed intomysql_set_local_infile_handler()
earlier.
In the example code below I'm keeping track of line numbers via the user data pointer.
read()
int local_infile_read(void *instance_data, char *buf, unsigned int buf_len)
Theread()
function gets called repeatedly until no more data is available. It receives the instance data pointer you stored ininit()
, a pointer to a buffer to store
You don't have to pass a complete single line of data at a time, you can pass multiple input lines, or just part of a line, you just have to make sure not to exceed the buffer size. Data passed from theread()
handler will simply be transfered over to the server, and all parsing, including splitting it into lines, happens on that side.
Theread()
handler shall return the number of bytes that have been put into the buffer. A value of zero indicates that all data has been read and thatread()
should not be called again. A negative value indicates an error and will also terminate reading. There is no way to generate warnings here, just errors.
error()
int local_infile_error(void *instance_data, char *error_msg, unsigned int error_msg_len)
Theerror()
handler is called after returning a non-zero value frominit()
or a negative number fromread()
. It receives the data pointer you've set up ininit()
and a buffer pointer plus length to write an error message to. A numeric error code can be passed as the return value.
There's no direct way to set error number and error code when hitting an error ininit()
orread()
right away. If you need to pass on something descriptive from where the error happened to theerror()
handler you have to take care of doing so using the data pointer (or via global variables).
void local_infile_end(void *instance_data)
Theend()
handler is called afterread()
has returned a zero length to indicate "end of data", or right after the error handler. Its sole purpose is to free any resources that you may have allocated ininit()
.
While local infile handlers allow for some interesting alternatives to simple client side local file imports the current implementation feels a bit complicated and not like a good fit for applications that want to avoid the SQL parsing overhead on bulk imports, but want / need to drive the process instead of passing control to the client library.
To summarize this in an itemized list:
LOAD DATA
statement or the related table / column meta data apart of the file name parameternet_read_timeout
Protocol wise the PostgreSQL and MySQL implementations are not that different though, so maybe an alternative call interface similar to thePQputCopyData()
/PQputCopyEnd()
approach inlibpq
tolibmysqlclient
could be a nice addition to support both the push and pull approaches. ( ... to be continued ...)
The following example code can be compiled using
gcc `mysql_config --cflags` infile_handler.c -o infile_handler `mysql_config --libs`