See Configuring Logging.
SQLAlchemy performs application-level connection pooling automatically
in most cases. With the exception of SQLite, a Engine
object
refers to a QueuePool
as a source of connectivity.
For more detail, see Engine Configuration and Connection Pooling.
The create_engine()
call accepts additional arguments either
directly via the connect_args
keyword argument:
e = create_engine("mysql://scott:tiger@localhost/test",
connect_args={"encoding": "utf8"})
Or for basic string and integer arguments, they can usually be specified in the query string of the URL:
e = create_engine("mysql://scott:tiger@localhost/test?encoding=utf8")
See also
The primary cause of this error is that the MySQL connection has timed out
and has been closed by the server. The MySQL server closes connections
which have been idle a period of time which defaults to eight hours.
To accommodate this, the immediate setting is to enable the
create_engine.pool_recycle
setting, which will ensure that a
connection which is older than a set amount of seconds will be discarded
and replaced with a new connection when it is next checked out.
For the more general case of accommodating database restarts and other temporary loss of connectivity due to network issues, connections that are in the pool may be recycled in response to more generalized disconnect detection techniques. The section Dealing with Disconnects provides background on both “pessimistic” (e.g. pre-ping) and “optimistic” (e.g. graceful recovery) techniques. Modern SQLAlchemy tends to favor the “pessimistic” approach.
See also
The MySQL drivers have a fairly wide class of failure modes whereby the state of the connection to the server is in an invalid state. Typically, when the connection is used again, one of these two error messages will occur. The reason is because the state of the server has been changed to one in which the client library does not expect, such that when the client library emits a new statement on the connection, the server does not respond as expected.
In SQLAlchemy, because database connections are pooled, the issue of the messaging
being out of sync on a connection becomes more important, since when an operation
fails, if the connection itself is in an unusable state, if it goes back into the
connection pool, it will malfunction when checked out again. The mitigation
for this issue is that the connection is invalidated when such a failure
mode occurs so that the underlying database connection to MySQL is discarded.
This invalidation occurs automatically for many known failure modes and can
also be called explicitly via the Connection.invalidate()
method.
There is also a second class of failure modes within this category where a context manager
such as with session.begin_nested():
wants to “roll back” the transaction
when an error occurs; however within some failure modes of the connection, the
rollback itself (which can also be a RELEASE SAVEPOINT operation) also
fails, causing misleading stack traces.
Originally, the cause of this error used to be fairly simple, it meant that a multithreaded program was invoking commands on a single connection from more than one thread. This applied to the original “MySQLdb” native-C driver that was pretty much the only driver in use. However, with the introduction of pure Python drivers like PyMySQL and MySQL-connector-Python, as well as increased use of tools such as gevent/eventlet, multiprocessing (often with Celery), and others, there is a whole series of factors that has been known to cause this problem, some of which have been improved across SQLAlchemy versions but others which are unavoidable:
Sharing a connection among threads - This is the original reason these kinds of errors occurred. A program used the same connection in two or more threads at the same time, meaning multiple sets of messages got mixed up on the connection, putting the server-side session into a state that the client no longer knows how to interpret. However, other causes are usually more likely today.
Sharing the filehandle for the connection among processes - This usually occurs
when a program uses os.fork()
to spawn a new process, and a TCP connection
that is present in th parent process gets shared into one or more child processes.
As multiple processes are now emitting messages to essentially the same filehandle,
the server receives interleaved messages and breaks the state of the connection.
This scenario can occur very easily if a program uses Python’s “multiprocessing”
module and makes use of an Engine
that was created in the parent
process. It’s common that “multiprocessing” is in use when using tools like
Celery. The correct approach should be either that a new Engine
is produced when a child process first starts, discarding any Engine
that came down from the parent process; or, the Engine
that’s inherited
from the parent process can have it’s internal pool of connections disposed by
calling Engine.dispose()
.
Greenlet Monkeypatching w/ Exits - When using a library like gevent or eventlet
that monkeypatches the Python networking API, libraries like PyMySQL are now
working in an asynchronous mode of operation, even though they are not developed
explicitly against this model. A common issue is that a greenthread is interrupted,
often due to timeout logic in the application. This results in the GreenletExit
exception being raised, and the pure-Python MySQL driver is interrupted from
its work, which may have been that it was receiving a response from the server
or preparing to otherwise reset the state of the connection. When the exception
cuts all that work short, the conversation between client and server is now
out of sync and subsequent usage of the connection may fail. SQLAlchemy
as of version 1.1.0 knows how to guard against this, as if a database operation
is interrupted by a so-called “exit exception”, which includes GreenletExit
and any other subclass of Python BaseException
that is not also a subclass
of Exception
, the connection is invalidated.
Rollbacks / SAVEPOINT releases failing - Some classes of error cause the connection to be unusable within the context of a transaction, as well as when operating in a “SAVEPOINT” block. In these cases, the failure on the connection has rendered any SAVEPOINT as no longer existing, yet when SQLAlchemy, or the application, attempts to “roll back” this savepoint, the “RELEASE SAVEPOINT” operation fails, typically with a message like “savepoint does not exist”. In this case, under Python 3 there will be a chain of exceptions output, where the ultimate “cause” of the error will be displayed as well. Under Python 2, there are no “chained” exceptions, however recent versions of SQLAlchemy will attempt to emit a warning illustrating the original failure cause, while still throwing the immediate error which is the failure of the ROLLBACK.
SQLAlchemy currently assumes DBAPI connections are in “non-autocommit” mode -
this is the default behavior of the Python database API, meaning it
must be assumed that a transaction is always in progress. The
connection pool issues connection.rollback()
when a connection is returned.
This is so that any transactional resources remaining on the connection are
released. On a database like PostgreSQL or MSSQL where table resources are
aggressively locked, this is critical so that rows and tables don’t remain
locked within connections that are no longer in use. An application can
otherwise hang. It’s not just for locks, however, and is equally critical on
any database that has any kind of transaction isolation, including MySQL with
InnoDB. Any connection that is still inside an old transaction will return
stale data, if that data was already queried on that connection within
isolation. For background on why you might see stale data even on MySQL, see
http://dev.mysql.com/doc/refman/5.1/en/innodb-transaction-model.html
The behavior of the connection pool’s connection return behavior can be
configured using reset_on_return
:
from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool
engine = create_engine('mysql://scott:tiger@localhost/myisam_database', pool=QueuePool(reset_on_return=False))
reset_on_return
accepts the values commit
, rollback
in addition
to True
, False
, and None
. Setting to commit
will cause
a COMMIT as any connection is returned to the pool:
engine = create_engine('mssql://scott:tiger@mydsn', pool=QueuePool(reset_on_return='commit'))
If using a SQLite :memory:
database, or a version of SQLAlchemy prior
to version 0.7, the default connection pool is the SingletonThreadPool
,
which maintains exactly one SQLite connection per thread. So two
connections in use in the same thread will actually be the same SQLite
connection. Make sure you’re not using a :memory: database and
use NullPool
, which is the default for non-memory databases in
current SQLAlchemy versions.
See also
Threading/Pooling Behavior - info on PySQLite’s behavior.
With a regular SA engine-level Connection, you can get at a pool-proxied
version of the DBAPI connection via the Connection.connection
attribute on
Connection
, and for the really-real DBAPI connection you can call the
ConnectionFairy.connection
attribute on that - but there should never be any need to access
the non-pool-proxied DBAPI connection, as all methods are proxied through:
engine = create_engine(...)
conn = engine.connect()
conn.connection.<do DBAPI things>
cursor = conn.connection.cursor(<DBAPI specific arguments..>)
You must ensure that you revert any isolation level settings or other operation-specific settings on the connection back to normal before returning it to the pool.
As an alternative to reverting settings, you can call the Connection.detach()
method on
either Connection
or the proxied connection, which will de-associate
the connection from the pool such that it will be closed and discarded
when Connection.close()
is called:
conn = engine.connect()
conn.detach() # detaches the DBAPI connection from the connection pool
conn.connection.<go nuts>
conn.close() # connection is closed for real, the pool replaces it with a new connection
The key goal with multiple python processes is to prevent any database connections from being shared across processes. Depending on specifics of the driver and OS, the issues that arise here range from non-working connections to socket connections that are used by multiple processes concurrently, leading to broken messaging (the latter case is typically the most common).
The SQLAlchemy Engine
object refers to a connection pool of existing
database connections. So when this object is replicated to a child process,
the goal is to ensure that no database connections are carried over. There
are three general approaches to this:
Disable pooling using NullPool
. This is the most simplistic,
one shot system that prevents the Engine
from using any connection
more than once.
Call Engine.dispose()
on any given Engine
as soon one is
within the new process. In Python multiprocessing, constructs such as
multiprocessing.Pool
include “initializer” hooks which are a place
that this can be performed; otherwise at the top of where os.fork()
or where the Process
object begins the child fork, a single call
to Engine.dispose()
will ensure any remaining connections are flushed.
An event handler can be applied to the connection pool that tests for connections being shared across process boundaries, and invalidates them. This looks like the following:
import os
import warnings
from sqlalchemy import event
from sqlalchemy import exc
def add_engine_pidguard(engine):
"""Add multiprocessing guards.
Forces a connection to be reconnected if it is detected
as having been shared to a sub-process.
"""
@event.listens_for(engine, "connect")
def connect(dbapi_connection, connection_record):
connection_record.info['pid'] = os.getpid()
@event.listens_for(engine, "checkout")
def checkout(dbapi_connection, connection_record, connection_proxy):
pid = os.getpid()
if connection_record.info['pid'] != pid:
# substitute log.debug() or similar here as desired
warnings.warn(
"Parent process %(orig)s forked (%(newproc)s) with an open "
"database connection, "
"which is being discarded and recreated." %
{"newproc": pid, "orig": connection_record.info['pid']})
connection_record.connection = connection_proxy.connection = None
raise exc.DisconnectionError(
"Connection record belongs to pid %s, "
"attempting to check out in pid %s" %
(connection_record.info['pid'], pid)
)
These events are applied to an Engine
as soon as its created:
engine = create_engine("...")
add_engine_pidguard(engine)
The above strategies will accommodate the case of an Engine
being shared among processes. However, for the case of a transaction-active
Session
or Connection
being shared, there’s no automatic
fix for this; an application needs to ensure a new child process only
initiate new Connection
objects and transactions, as well as ORM
Session
objects. For a Session
object, technically
this is only needed if the session is currently transaction-bound, however
the scope of a single Session
is in any case intended to be
kept within a single call stack in any case (e.g. not a global object, not
shared between processes or threads).