Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

Connection pools
PrevUpHomeNext

Connection pooling is a technique where several long-lived connections are re-used for independent logical operations. When compared to establishing individual connections, it has the following benefits:

  • It provides better performance. Please consult our benchmarks for more info.
  • It simplifies connection management. The connection pool will establish sessions, perform retries and apply timeouts out of the box.

This is how you can create a pool of connections:

// pool_params contains configuration for the pool.
// You must specify enough information to establish a connection,
// including the server address and credentials.
// You can configure a lot of other things, like pool limits
mysql::pool_params params;
params.server_address.emplace_host_and_port(server_hostname);
params.username = mysql_username;
params.password = mysql_password;
params.database = "boost_mysql_examples";

// The I/O context, required by all I/O operations
asio::io_context ctx;

// Construct a pool of connections. The context will be used internally
// to create the connections and other I/O objects
mysql::connection_pool pool(ctx, std::move(params));

// You need to call async_run on the pool before doing anything useful with it.
// async_run creates connections and keeps them healthy. It must be called
// only once per pool.
// The detached completion token means that we don't want to be notified when
// the operation ends. It's similar to a no-op callback.
pool.async_run(asio::detached);

connection_pool is an I/O object that manages connections. It can be constructed from an executor or execution context (like all I/O objects) and a pool_params object.

connection_pool::async_run must be called exactly once per pool. This function takes care of actually keeping connections healthy.

We're now ready to obtain connections using connection_pool::async_get_connection. We will use C++20 coroutines to make async code simpler:

// Use connection pools for functions that will be called
// repeatedly during the application lifetime.
// An HTTP server handler function is a good candidate.
asio::awaitable<std::int64_t> get_num_employees(mysql::connection_pool& pool)
{
    // Get a fresh connection from the pool.
    // pooled_connection is a proxy to an any_connection object.
    mysql::pooled_connection conn = co_await pool.async_get_connection();

    // Use pooled_connection::operator-> to access the underlying any_connection.
    // Let's use the connection
    mysql::results result;
    co_await conn->async_execute("SELECT COUNT(*) FROM employee", result);
    co_return result.rows().at(0).at(0).as_int64();

    // When conn is destroyed, the connection is returned to the pool
}

By default, connection_pool::async_run will run forever. When your application exits, you will want to stop it using connection_pool::cancel. This is typical in signal handlers, to guarantee a clean shutdown.

Note that pooling works only with any_connection.

[Note] Note

connection_pool exposes async functions only. This has to do with efficiency and oddities in Boost.Asio executor model. If you need to use it from sync code, please visit this section.

Pool size

Pools start with a fixed initial size, and will be dynamically resized up to an upper limit if required. You can configure these sizes using pool_params::initial_size and pool_params::max_size.

The resizing algorithm works like this:

  • When the pool is created, pool_params::initial_size number of connections are created and connected (by default, initial_size is 1).
  • If a connection is requested, but all available connections are in use, a new one is created, until max_size is reached.
  • If a connection is requested, and there are max_size connections in use, connection_pool::async_get_connection waits for a connection to become available.
  • Once created, connections never get deallocated.

By default, pool_params::max_size is 151, which is MySQL's default value for the max_connections system variable, controlling the maximum number of concurrent connections allowed by the server.

[Note] Note

Before increasing pool_params::max_size, make sure to also increase the value of max_connections in the server. Otherwise, your connections will be rejected by the connection limit.

This is how you configure pool sizes:

mysql::pool_params params;

// Set the usual params
params.server_address.emplace_host_and_port(server_hostname);
params.username = mysql_username;
params.password = mysql_password;
params.database = "boost_mysql_examples";

// Create 10 connections at startup, and allow up to 1000 connections
params.initial_size = 10;
params.max_size = 1000;

mysql::connection_pool pool(ctx, std::move(params));

Applying a timeout to async_get_connection

By default, connection_pool::async_get_connection waits until a connection is available. This means that, if the server is unavailable, async_get_connection may wait forever.

For this reason, you may consider setting a timeout to async_get_connection. You can do this using asio::cancel_after, which uses Asio's per-operation cancellation mechanism:

// Get a connection from the pool, but don't wait more than 5 seconds
auto conn = co_await pool.async_get_connection(asio::cancel_after(std::chrono::seconds(5)));

You might consider setting the timeout at a higher level, instead. For instance, if you're handling an HTTP request, you can use cancel_after to set a timeout to the entire request. The connection pool example takes this approach.

Session state

MySQL connections hold state. You change session state when you prepare statements, create temporary tables, start transactions, or set session variables. When using pooled connections, session state can be problematic: if not reset properly, state from a previous operation may affect subsequent ones.

After you return a connection to the pool, the equivalent of any_connection::async_reset_connection and async_set_character_set are used to wipe session state before the connection can be obtained again. This will deallocate prepared statements, rollback uncommitted transactions, clear variables and restore the connection's character set to utf8mb4. In particular, you don't need to call any_connection::async_close_statement to deallocate statements.

Resetting a connection is cheap but entails a cost (a roundtrip to the server). If you've used a connection and you know that you didn't mutate session state, you can use pooled_connection::return_without_reset to skip resetting. For instance:

// Get a connection from the pool
mysql::pooled_connection conn = co_await pool.async_get_connection();

// Use the connection in a way that doesn't mutate session state.
// We're not setting variables, preparing statements or starting transactions,
// so it's safe to skip reset
mysql::results result;
co_await conn->async_execute("SELECT COUNT(*) FROM employee", result);

// Explicitly return the connection to the pool, skipping reset
conn.return_without_reset();

Connection reset happens in the background, after the connection has been returned, so it does not affect latency. If you're not sure if an operation affects state or not, assume it does.

Character set

Pooled connections always use utf8mb4 as its character set. When connections are reset, the equivalent of any_connection::async_set_character_set is used to restore the character set to utf8mb4 (recall that raw async_reset_connection will wipe character set data).

Pooled connections always know the character set they're using. This means that any_connection::format_opts and current_character_set always succeed.

We recommend to always stick to utf8mb4. If you really need to use any other character set, use async_set_character_set on your connection after it's been retrieved from the pool.

Connection lifecycle

The behavior already explained can be summarized using a state model like the following:

In short:

Thread-safety

By default, connection_pool is not thread-safe, but it can be easily made thread-safe by setting pool_params::thread_safe:

// The I/O context, required by all I/O operations.
// This is like an io_context, but with 5 threads running it.
asio::thread_pool ctx(5);

// The usual pool configuration params
mysql::pool_params params;
params.server_address.emplace_host_and_port(server_hostname);
params.username = mysql_username;
params.password = mysql_password;
params.database = "boost_mysql_examples";
params.thread_safe = true;  // enable thread safety

// Construct a thread-safe pool
mysql::connection_pool pool(ctx, std::move(params));
pool.async_run(asio::detached);

// We can now pass a reference to pool to other threads,
// and call async_get_connection concurrently without problem.
// Individual connections are still not thread-safe.

To correctly understand what is protected by pool_params::thread_safe and what is not, we need a grasp of how pools are implemented. Both connection_pool and individual pooled_connection's hold pointers to a shared state object containing all data required by the pool:

Thread-safe connection pools internally create an asio::strand that protects the connection pool's state. Operations like async_get_connection, async_run and pooled_connection's destructor will run through the strand, and are safe to be run from any thread. Operations that mutate state handles (the internal std::shared_ptr), like assignment operators, are not thread-safe.

Data outside the pool's state is not protected. In particular, asio::cancel_after creates an internal timer that can cause inadvertent race conditions. For example:

// A function that handles a user session in a server
asio::awaitable<void> handle_session(mysql::connection_pool& pool)
{
    // CAUTION: asio::cancel_after creates a timer that is *not* part of the pool's state.
    // The timer is not protected by the pool's strand.
    // This coroutine must be run within a strand for this to be safe
    using namespace std::chrono_literals;
    co_await pool.async_get_connection(asio::cancel_after(30s));

    // Use the connection
}

This coroutine must be run within a strand:

// OK: the entire coroutine runs within a strand.
// In a typical server setup, each request usually gets its own strand,
// so it can run in parallel with other requests.
asio::co_spawn(
    asio::make_strand(ctx),  // If we removed this make_strand, we would have a race condition
    [&pool] { return handle_session(pool); },
    asio::detached
);

If we don't use asio::make_shared, we have the following race condition:

  • The thread calling async_get_connection sets up the timer required by asio::cancel_after.
  • In parallel, the thread running the execution context sees that there is a healthy connection and completes the async_get_connection operation. As a result, the timer is cancelled. Thus, the timer is accessed concurrently from both threads without protection.

If you're using callbacks, code gets slightly more convoluted. The above coroutine can be rewritten as:

// Holds per-session state
class session_handler : public std::enable_shared_from_this<session_handler>
{
    // The connection pool
    mysql::connection_pool& pool_;

    // A strand object, unique to this session
    asio::strand<asio::any_io_executor> strand_;

public:
    // pool.get_executor() points to the execution context that was used
    // to create the pool, and never to the pool's internal strand
    session_handler(mysql::connection_pool& pool)
        : pool_(pool), strand_(asio::make_strand(pool.get_executor()))
    {
    }

    void start()
    {
        // Enters the strand. The passed function will be executed through the strand.
        // If the initiation is run outside the strand, a race condition will occur.
        asio::dispatch(asio::bind_executor(strand_, [self = shared_from_this()] { self->get_connection(); }));
    }

    void get_connection()
    {
        // This function will run within the strand. Binding the passed callback to
        // the strand will make async_get_connection run it within the strand, too.
        pool_.async_get_connection(asio::cancel_after(
            std::chrono::seconds(30),
            asio::bind_executor(
                strand_,
                [self = shared_from_this()](boost::system::error_code, mysql::pooled_connection) {
                    // Use the connection as required
                }
            )
        ));
    }
};

void handle_session_v2(mysql::connection_pool& pool)
{
    // Start the callback chain
    std::make_shared<session_handler>(pool)->start();
}

Thread-safety is disabled by default because strands impose a performance penalty that is avoidable in single-threaded programs.

Transport types and TLS

You can use the same set of transports as when working with any_connection: plaintext TCP, TLS over TCP or UNIX sockets. You can configure them using pool_params::server_address and pool_params::ssl. By default, TLS over TCP will be used if the server supports it, falling back to plaintext TCP if it does not.

You can use pool_params::ssl_ctx to configure TLS options for connections created by the pool. If no context is provided, one will be created for you internally.

Benchmarks

A throughput benchmark has been conducted to assess the performance gain provided by connection_pool. Benchmark code is under bench/connection_pool.cpp. The test goes as follows:

  • The test consists of N = 10000 logically independent sessions. In an application like a webserver, this would map to handling N HTTP requests.
  • Every logical session prepares a SELECT statement and executes it. The statement matches a single row by primary key and retrieves a single, short string field (a lightweight query).
  • num_parallel = 100 async agents are run in parallel. This means that, at any given point in time, no more than 100 parallel connections to MySQL are made.
  • The test measures the time elapsed between launching the first async agent and receiving the response for the last query (ellapsed_time).
  • The test is repeated 10 times for each different configuration, and results are averaged. This time is used to measure the throughput, in "connections/s" (as given by N/ellapsed_time).
  • Connection pool scenarios use pooled_connection::~pooled_connection, which causes a connection reset to be issued. Raw connection scenarios use any_connection::async_connect and any_connection::async_close for every session. All tests are single-threaded.
  • The server runs MySQL v8.0.33 in a Docker container, in the same machine as the benchmarks.
  • Benchmarks have been compiled using clang-18 using CMake's Release build type and C++20. They've been run in a Ubuntu 22.04 machine with an 8 core i7-10510U and 32GB of RAM.

We can see that pooling significantly increases throughput. This is specially true when communication with the server is expensive (as is the case when using TLS over TCP). The performance gain is likely to increase over high-latency networks, and to decrease for heavyweight queries, since the connection establishment has less overall weight.

[Tip] Tip

When using TLS or running small and frequent queries, pooling can help you.


PrevUpHomeNext