TL;DR: ActiveRecord is more than just an ORM. It also provides a convenient common interface for writing direct SQL queries, for those times when you need to access your database’s advanced features. This article provides an introduction to the low-level ActiveRecord Connection API, which you can use to bypass ActiveRecord “models” and work directly with your database.
My story
When I first started working with Rails back in 2006, I followed the common practice of abstracting all my data and business logic into ActiveRecord “models.” This worked out for a short time, but soon I realized that it didn’t scale all that well to complex applications. There were a number of reasons, some of which are now finally being discussed and addressed in the community.
In any case, what followed was the Big Refactor of my Rails development process and architecture. On the one hand, I rediscovered Ruby’s object-oriented roots, separating behavior from persistence, and building higher-level business objects focused on actually modeling my system. And on the other hand, I rediscovered the database, writing more and more custom SQL to leverage its capabilities.
I still use ActiveRecord where it suits the task at hand. In many cases, it makes sense to treat rows of a table as data objects principally identified by a numeric ID, and in those cases ActiveRecord still shines. However, in other cases I’m finding the ORM more of a hindrance than a help. Many of my model implementations have begun bypassing ActiveRecord-based persistence altogether, instead creating plain old tables and querying with custom SQL.
The good news is that ActiveRecord provides some amount of support for low-level queries. You can share ActiveRecord’s database connection, and you don’t need to know the API of the underlying database driver, making it convenient to add custom SQL incrementally to your Rails application. In this article, I’ll provide an overview of ActiveRecord’s connection API, a convenient low-level interface for writing raw SQL and interacting with your database on its own terms.
Obtaining a Low-Level Connection
Hidden underneath your ActiveRecord class is a useful low-level object called the ActiveRecord connection adapter. It wraps and abstracts away the underlying database-specific driver, and provides a common interface for database tasks such as creating and destroying databases, creating and modifying tables, inserting, updating, and deleting data, running queries, and managing transactions. Normally, the connection adapter is used internally by ActiveRecord, but you can access it yourself if you want to talk to the database directly without using ActiveRecord “models.”
To obtain a connection adapter object, simply call the connection method on your ActiveRecord class or any ActiveRecord object:
connection = User.connection obj = User.find(1) connection = obj.connection
Which ActiveRecord class should you use? In most Rails applications, you talk to just one database, as defined in your database.yml file. For such an application, every ActiveRecord class, including ActiveRecord::Base, will give you the same connection. So in those cases, it doesn’t matter which class you use. However, if your Rails application connects to a secondary database for some ActiveRecord classes (using the establish_connection method) then those classes will yield a connection object pointing at the secondary database. In those cases, you will need to pay attention to which database you need to talk to, and ask for a connection from the right class.
Managing connections and the connection pool
Each connection adapter object represents a single connection to a database. Rails generally opens several connections at once and manages them in a connection pool. When a task needs a database connection, it checks one out of the pool; when it finishes, it checks the connection back in so that the next task can use it. Connections can run only one SQL statement at a time, so generally one connection is opened per thread.
In most Rails applications, this takes place transparently. When you’re processing an HTTP request, ActiveRecord automatically checks out a connection and memoizes it for the duration of the request. Whether you use the standard ActiveRecord APIs, or obtain a raw connection by calling the connection method described above, you get the request’s memoized connection. At the end of the request, ActiveRecord’s Rack middleware automatically checks the connection back in.
However, if you access ActiveRecord outside the context of handling a request via your controller, then you need to do connection management yourself. ActiveRecord will still automatically check out and memoize a connection when you ask for one. However, if you are not using a normal Rails controller, or are otherwise not running inside ActiveRecord’s ConnectionManager middleware, you should check in the connection manually when you are done. You can do so using the clear_active_connections! method on any of your ActiveRecord classes:
User.clear_active_connections!
Now that we know how to get a connection, let’s see what we can do with one.
Running Low-Level Queries
Most calls in the standard ActiveRecord API return ActiveRecord “model” objects. However, there might be cases when you want to bypass the overhead of creating these full ActiveRecord objects, or maybe you want to query data that doesn’t have a corresponding ActiveRecord class. The connection adapter’s low-level query methods let you write your own SQL, and return “plain old data” as raw result tables.
In this first example, we get the “name” value from a single row in our “users” table. If we only need the name, and don’t need the full ActiveRecord object with the rest of its data and capabilities, we can grab the connection object and use the select_value method:
connection = User.connection
name = connection.select_value("SELECT name FROM users WHERE id=1")
# => "Dave"
Use select_value when you need a single value from a single row. The result will be either a string, or nil if no rows match. Note that the returned value is a string even if the column has a different type such as a number or timestamp. You will need to perform your own conversion:
time_str = connection.select_value(
"SELECT created_at FROM users WHERE id=1")
# => "2011-12-24 11:35:02"
time = Time.parse("#{time_str} UTC")
(This often gets me when I ask for a numeric value from the database. The select_value method returns it as a string, so be sure to call to_i on it if you want to do any math.)
If you want a single column from multiple rows, use select_values, which returns an array of strings.
names = connection.select_values( "SELECT name FROM users WHERE created_at>'2012-01-01 00:00:00'") # => ['Bob', 'Kathy']
To obtain multiple columns, you can use select_rows. This method returns an array of arrays, each representing a row in the order of the selected columns. For example:
rows = connection.select_rows( "SELECT id,name FROM users WHERE created_at>'2012-01-01 00:00:00'") # => [['2', 'Bob'], ['4', 'Kathy']]
This returns an array of two-element arrays. Remember that all values are returned as strings. In the example above, the “id” column is numeric, so you may need to call to_i if you want to treat the IDs as integers.
There is no select_row method. To select a single row, just use select_rows and take the first element.
Alternately, you can return rows as hashes (of column name to value) using select_one (for a single row) or select_all (for multiple rows).
records = connection.select_all(
"SELECT id,name FROM users WHERE created_at>'2012-01-01 00:00:00'")
# => [{'id'=>'2', 'name'=>'Bob'}, {'id'=>'4', 'name'=>'Kathy'}]
Low-Level Data Updates
Updating data is accomplished using similar calls. Insert rows into the database using the insert method:
connection.insert( "INSERT INTO users SET name='Sally', created_at='now'")
Update rows in the database using the update method. This method returns the number of rows affected by the update.
row_count = connection.update( "UPDATE users SET name='Robert' WHERE id=2") # => 1
You can delete rows using delete, which again returns the number of rows deleted.
row_count = connection.delete( "DELETE FROM users WHERE created_at>'2012-01-01 00:00:00'") # => 3
Normally, however, you won’t be hard-coding values when you insert and update. You’ll be injecting data obtained from the user or some other source. Therefore, to avoid SQL injection attacks and other issues, you need to quote these values when you construct your SQL statement. Use the quote method for this purpose. It takes a Ruby object and represents it, properly quoted, in the syntax expected by the database. Here are some examples from the PostgreSQL connection adapter. (Connection objects for other databases will yield slightly different results depending on the database.)
connection.quote("Hello") # => "'Hello'"
connection.quote("Joe's") # => "'Joe''s'"
connection.quote(2) # => "2"
connection.quote(true) # => "'t'"
connection.quote(Time.now) # => "'2012-01-23 07:46:41.540033'"
Use quote when constructing SQL statements that update the database:
new_name = "Robert"
row_id = 2
row_count = connection.update(
"UPDATE users SET name=#{connection.quote(new_name)}"+
" WHERE id=#{connection.quote(row_id)}")
Using Prepared Statements
Prepared statements are a common database optimization technique. The idea is that often many of the queries run by your application will have a common form. For example, you might find yourself running hundreds of queries of the form “SELECT name FROM users WHERE id=[something]“. Instead of reparsing the entire statement and rerunning the query planner on every query, you can often instruct the database to parse and plan only once, and re-use that information on subsequent queries. The way to do that is with a prepared statement.
As of Rails 3.1, ActiveRecord includes support for prepared statements. The standard ActiveRecord API will utilize prepared statements transparently. If you are using the low-level connection API, you can set up prepared statements manually.
For running queries, the only one of the methods we’ve covered that supports prepared statements is select_all. Instead of the full statement, you must send a statement template and a list of values to substitute in. The list of values should be an array of two-element arrays; the second element of each array is the value. The first element is generally set to the ActiveRecord column type governing the type of the value, but you can set it to nil if you’re confident of the type you’re sending in. The values must be sent in as the third argument. (The second is a “name” for the query, used for annotating the logs. You can set it to nil.) Here’s the earlier select_all example, rewritten using a prepared statement.
records = connection.select_all(
"SELECT id,name FROM users WHERE created_at>$1",
nil, [[nil, ::Time.utc(2012,1,1)]])
# => [{'id'=>'2', 'name'=>'Bob'}, {'id'=>'4', 'name'=>'Kathy'}]
Now, if you have additional requests of the same form (using the same template), they should run faster because the prepared statement (with its prepared SQL and predetermined query plan) is being reused.
records2 = connection.select_all( "SELECT id,name FROM users WHERE created_at>$1", nil, [[nil, ::Time.utc(2012,1,2)]]) records3 = connection.select_all( "SELECT id,name FROM users WHERE created_at>$1", nil, [[nil, ::Time.utc(2012,1,3)]])
The insert, update, and delete methods provide similar access to prepared statements. For the update and delete methods, use the same arguments as select_all: the template first, followed by a query name (which can be nil) and then an array of values to inject into the statement:
row_count = connection.update( "UPDATE users SET name=$1 WHERE id=$2", nil, [[nil, "Robert"], [nil, 2]])
The insert method is a little more complex because it takes a series of three more arguments before the values list. (Those extra arguments are for supporting databases that make you manage primary key sequences manually. In most cases, you can set those to nil.) So the values array should be passed as the sixth argument to insert. It’s a little messy, but this is the way the API is set up as of Rails 3.1.
connection.insert( "INSERT INTO users SET name=$1, created_at=$2", nil, nil, nil, nil, [[nil, "Sally"], [nil, "now"]])
Prepared statement support requires ActiveRecord 3.1 or later. On older versions, you will need to construct the entire SQL statement using quote to inject values.
Low-Level Migrations and Other Features
Migrations are a very common case for dropping to low-level SQL statements, since there may be many cases when you’ll want more control over your schema than ActiveRecord gives you.
When you write an ActiveRecord migration, you actually are already using the connection adapter API. The create_table and similar methods are connection adapter methods; the migration simply delegates them to the adapter. This means you can use all the methods we’ve been discussing, such as insert and update, directly in your migration if you wanted to inject data during that process. (With the caveat, of course, that many consider it bad practice to alter data during a migration.)
To perform schema changes using raw SQL, you should generally use the execute method. This method returns the underlying database driver’s result object; it doesn’t try to do any postprocessing on the result, and so it generally may be faster than the calls we’ve been looking at so far. It’s useful for cases when you want the very low-level driver-specific result, or (as in most migrations) for cases when you don’t care about the result. Here’s an example using PostgreSQL’s flavor of SQL:
class MyMigration < ActiveRecord::Migration
def up
execute <<-SQL
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name CHARACTER VARYING NOT NULL,
created_at TIMESTAMP NOT NULL)
SQL
end
def down
execute 'DROP TABLE users'
end
end
Nowadays when I write migrations, I use execute for almost everything because I generally want to fine-tune the database schema. It’s not strictly necessary in the above very simple case, but in more complex applications, you may want to set up constraints, triggers, and other data management features in your database, and you’ll need it in those cases.
Of course, if you use execute, you will have to write out both the forward and reverse migrations. You can’t use Rails 3.1′s “change” feature which attempts to auto-create the backward migration from the forward migration—Rails isn’t quite smart enough to figure out how to reverse a change made by arbitrary SQL. But you may find it an acceptable trade-off, as I have.
Another common use for the connection object is to delimit transactions. If you have a set of statements that should be wrapped in a transaction, use the transaction method:
connection.transaction do
connection.update("UPDATE users SET name='Robert' WHERE id=2")
connection.update("UPDATE users SET name='Catherine' WHERE id=4")
end
Because the connection adapter is shared with ActiveRecord, you can also include high-level ActiveRecord calls in your transaction block, interspersed with your low-level calls. For example, you could write the above code like this:
connection.transaction do
connection.update("UPDATE users SET name='Robert' WHERE id=2")
obj = User.find(4)
obj.name = "Catherine"
obj.save
end
I prefer the first version, however. For starters, it’s more performant—it doesn’t require an extra select to get the data, and it doesn’t require building the ActiveRecord object simply to update a single field. But in my opinion, it’s also cleaner and more succinct. I know some don’t like seeing SQL in your web app, but in many cases it’s a very expressive and readable language for tasks like this. I believe is simply a case of the right tool for the right job.
You can also obtain various information about the capabilities of the database and the database driver. Here are just a few examples:
connection.supports_savepoints? # Database supports savepoints? connection.supports_statement_cache? # Supports prepared statements? connection.table_name_length # Maximum table name length connection.columns_per_table # Maximum columns per table
Finally, the connection adapter provides access to the underlying driver-specific connection for when you need to access really advanced or database-specific features. Here’s an example (assuming you’re using the postgresql database adapter).
raw_connection = connection.raw_connection # Returns a PGconn object pg_server_version = raw_connection.server_version.to_s
For More Information
This has been an overview of the tools that ActiveRecord provides for bypassing the ActiveRecord “models” and accessing the advanced features of your database directly. Most of us won’t write entire Rails applications using only this low-level API. However, it is a good tool to have in the toolbox. For those cases when the ActiveRecord ORM doesn’t quite do what you need, or does so clumsily or inefficiently, ActiveRecord lets you drop down to SQL quite easily.
The good news is that all these calls are well-documented on http://api.rubyonrails.org/:
- The
ActiveRecord::ConnectionAdapters::DatabaseStatementsmodule describes most of the low-level methods for querying and updating data that we covered in this article. - The
ActiveRecord::ConnectionAdapters::Quotingmodule describes various methods for quoting data for injection into SQL. - The
ActiveRecord::ConnectionAdapters::SchemaStatementsmodule describes the schema-manipulation methods you are probably familiar with using for migrations. - The
ActiveRecord::ConnectionAdapters::TableDefinitionclass includes the methods you can call insidecreate_table. - The
ActiveRecord::ConnectionAdapters::DatabaseLimitsmodule describes a miscellaneous set of informational methods you can call.
About the Author
Daniel Azuma is a Ruby developer specializing in geospatial technologies, computational geometry, graphics, and related fields. He is the author of rgeo and related gems for geospatial analysis in Ruby and Rails applications. He currently works as Chief Software Architect at Pirq.
Edits (Fri 27 Jan)— I made some corrections to the prepared statement examples so they actually work. Note to self: test examples before publishing.
Using just #quote for numerical fields is not enough, and you’ll be vulnerable to SQL injections unless you also force the ruby type to numbers beforehand. You’re just back to the world of old PHP scripts where people always forget to properly escape their data.
Use prepared statements indiscriminately and you can rest well; you lose a tiny bit of performance on one-off statements in MySQL, but it’s negligible when compared to the otherwise potential security risk.
Kim,
You’re right in that I’ve kind of glossed over the security issues in this article. I’m assuming that, generally, you may not trust the *data* that you’re injecting, but at least you’re trusting the *Ruby objects* (i.e. their types, to_s behavior, etc.), since chances are the objects themselves are being created by you, or at least by Rails. In that case, correct me if I’m wrong, but I believe quoting should be sufficient. If the objects are being handed to you by an untrusted source, then yes, you’ll need to sanitize them before indiscriminately creating custom SQL.
Using prepared statements really isn’t a panacea for this, as far as I know. I believe you can still create a malicious object that will bypass quoting and allow SQL injection. Once you can modify code, all bets are off.
Daniel