Many developers nowadays are working on a daily basis with some ORM (which stands for Object Relation Mapping) library, that is handling the interaction with the database for them, along with making their code more readable, maintainable and closer to the programming language of their choosing. Plus most of ORMs are database agnostic, meaning they support writing queries for multiple database adapters.
ActiveRecord is one of the most popular ones, a Ruby gem that inspired a lot of other ORMs throughout various languages. For the rest of this article I will assume that you are familiar with it, and with Ruby of course. If not, bare with me, Ruby is widely known among programming languages for its readability, so I doubt that you will have troubles understanding it.
Demo database
The subject of this article, like any subject associated with software development, includes abstractions, theory, but without using practical examples I feel like most readers will lose interest or have a tougher time understanding it. So we are going to work with a demo database, that for the sake of simplicity we will have a really simple design and couple of records, so please don’t complain about subjects that aren’t focus (for instance normalization, schema, constraints). The database includes a table called people
that looks like this
id | first_name | middle_name | last_name |
---|---|---|---|
1 | Michael | Jeffrey | Jordan |
2 | Drazen | NULL | Petrovic |
3 | Larry | Joe | Bird |
4 | Earvin | NULL | Johnson |
The people
table has four columns, id as a primary key, first_name, middle_name and last_name. I think the content of the table is simple enough and self explanatory, so I won’t dig into the details of the SCHEMA, I would just mention that every record in the table, like almost every person, has a first and last name and an optional middle name. Because of that, some of the records in the table under the column middle_name are marked with the special reserved word in SQL called NULL
which indicates non-existance of a value.
Its ActiveRecord counterpart, following its grammar convention, will be a Ruby Class (popularly known as a Model) named in a singular form of the table name. So it will look something like
class Person < ActiveRecord::Base
# some ruby code
end
Having said that, if some of you asks me about the identity of these people, meaning you don’t know who they are, then you should stop reading immediately and Google them before you continue. If you think that I am joking, you are wrong. I am being serious. Very serious.
The translator
We know that it is one of ActiveRecord’s jobs to translate its queries from Ruby to SQL, in order to execute them on the database level. So for instance the query to retrieve all the records from the ‘people’ table
Person.all
will be translated to
SELECT people.* FROM people
Another simple example will be to query for the records where the first name equals to ‘Michael’
Person.where(first_name: 'Michael')
The translated SQL
SELECT people.* FROM people WHERE people.first_name = 'Michael'
A visual representation of the database result will be
id | first_name | middle_name | last_name |
---|---|---|---|
1 | Michael | Jeffrey | Jordan |
Maybe we want all the people where first name is not ‘Michael’
Person.where.not(first_name: 'Michael')
the SQL, something similar to
SELECT people.* FROM people WHERE people.first_name != 'Michael'
The visual representation
id | first_name | middle_name | last_name |
---|---|---|---|
2 | Drazen | NULL | Petrovic |
3 | Larry | Joe | Bird |
4 | Earvin | NULL | Johnson |
Simple right? What is the complexity in this? Performing simple comparisons in the queries? In fact, the truth is, it is really simple, but sometimes living (well coding actually) in the Ruby world, we tend to forget that ActiveRecord even though it looks like Ruby, under the hood, is still performing SQL logic and that logic is different from the one in Ruby. That is where the trouble begins.
NULL == nil right?
SQL’s NULL
, in ActiveRecord is represented with Ruby’s nil
. That is why when using ‘nil’ along with ActiveRecord we need to remember that it may look like the singleton object from the class NilClass
, but under the hood this will have the SQL’s ‘NULL’ characteristics.
Nothing will be more clear without an example. Lets say we want to get all the records that don’t have a middle name ‘Jeffrey’. Again with this simple nonesence right? We could do something like
Person.where.not(middle_name: 'Jeffrey')
ActiveRecord will translate this to
SELECT people.* FROM people WHERE people.middle_name != 'Jeffrey'
And voila
id | first_name | middle_name | last_name |
---|---|---|---|
3 | Larry | Joe | Bird |
Wait…but we wanted all the records that don’t have a middle name ‘Jeffrey’. Where are the other two? Hmmm…lets try this with plain Ruby without ActiveRecord
Person.all.select { |person| person.middle_name != 'Jeffrey' }
The visual representation
id | first_name | middle_name | last_name |
---|---|---|---|
2 | Drazen | NULL | Petrovic |
3 | Larry | Joe | Bird |
4 | Earvin | NULL | Johnson |
There it is! Well this will be ok here. The table only has 4 records. What happens if the table has 1000 or 10000 or even more? We know that for most of the times a general rule is that whenever we can query with SQL (ActiveRecord) we should not use Ruby, cause SQL is more optimal. But why does our query work with plain Ruby and not when using ActiveRecord?
Well the answer is again simple. We should write Ruby syntax with SQL logic. To further explain the case when we are performing the query without ActiveRecord, with the select
enumerable method, lets disect what actually happens.
First we are instantiating objects from the Person class for all records in the people table. Because they are ActiveRecord objects, that are representations of the database records, the data types for the columns from the table will be casted accordingly with their counterparts in ActiveRecord as values for the objects’ accessors. So as we said previously NULL is represented with nil. The objects will look something like
[
<Person:0x0000005184458 id: 1, first_name: "Michael", middle_name: "Jeffrey", last_name: "Jordan">,
<Person:0x0000005184458 id: 2, first_name: "Drazen", middle_name: nil, last_name: "Petrovic">,
<Person:0x0000005184458 id: 3, first_name: "Larry", middle_name: "Joe", last_name: "Bird">,
<Person:0x0000005184458 id: 4, first_name: "Earvin", middle_name: nil, last_name: "Johnson">,
]
Next we are iterrating and comparing the values for the middle_name accessor accross all of the objects with the string ‘Jeffrey’. Becase we are comparing Ruby values, we are leaning on the boolean system that is used by Ruby. The results from the comparisons will look like
"Jeffrey" != "Jeffrey"
=> false
nil != "Jeffrey"
=> true
"Joe" != "Jeffrey"
=> true
nil != "Jeffrey"
=> true
Lastly the select method will retrieve the objects for which the comparison in the code block returns a truthy value, stored in an Array
. I’ve highlighted the truthy word because it is one of the things where Ruby’s nil differs from SQL’s NULL.
Conclussion NULL != nil
I owe you an explanation for the ActiveRecord case of the query above. We will explain it in this chapter more thoroughly.
Repetition is the mother of learning. So I’ve mentioned it before and I’m going to mention it again NULL is not a value in SQL. It is not a member of any data domain, but just a marker, a reserved word indicating the absence of value. Therefore the comparisons with NULL (except in two cases) can never be either True
or False
, but always a third logical result, known as Unknown
. So SQL implements a three-valued logical system. A detailed visual representation of the three-valued logic looks like
p | q | p OR q | p AND q | p = q | p != q |
---|---|---|---|---|---|
True | True | True | True | True | False |
True | False | True | False | False | True |
True | NULL | True | Unknown | Unknown | Unknown |
False | True | True | False | False | True |
False | False | False | False | True | False |
False | NULL | Unknown | False | Unknown | Unknown |
NULL | True | True | Unknown | Unknown | Unknown |
NULL | False | Unknown | False | Unknown | Unknown |
NULL | NULL | Unknown | Unknown | Unknown | Unknown |
p | NOT p |
---|---|
True | False |
False | True |
NULL | Unknown |
You may be confused with NULL and UNKNOWN, but to make it more clear when using NULL in BOOLEAN logic it also may be referred to UNKNOWN. So wherever we populated the table with Unknown you can use NULL and vice-versa.
Back to our example from the previous chapter, where we are performing the ActiveRecord statement
Person.where.not(middle_name: 'Jeffrey')
Which as we said translated to SQL
SELECT people.* FROM people WHERE people.middle_name != 'Jeffrey'
The WHERE
clause just like the select
method in Ruby will only return the records for which the comparison returns a truthy value. So the comparisons will look something like
SELECT 'Jeffrey' != 'Jeffrey';
=> False
SELECT NULL != 'Jeffrey';
=> Unknown
SELECT 'Joe' != 'Jeffrey';
=> True
SELECT NULL != 'Jeffrey';
=> Unknown
And that is why we only get
id | first_name | middle_name | last_name |
---|---|---|---|
3 | Larry | Joe | Bird |
Because of this SQL provides two special comparison predicates called IS NULL
and IS NOT NULL
to test whether there is an absence of a value in the field or not.
So depending on our version of ActiveRecord, our statement should be
For version below 5 we can’t use the hash conditions since there is no or
method, but we can use placeholder condition which is a close call
Person.where('middle_name != :name OR middle_name IS NULL', name: 'Jeffrey')
For version at least 5
Person.where.not(middle_name: 'Jeffrey').or(Person.where(middle_name: nil))
It will translate to SQL
SELECT people.* FROM people WHERE people.middle_name != 'Jeffrey' OR people.middle_name IS NULL
and now the real voila!
id | first_name | middle_name | last_name |
---|---|---|---|
2 | Drazen | NULL | Petrovic |
3 | Larry | Joe | Bird |
4 | Earvin | NULL | Johnson |
See ActiveRecord was smart enough to translate the query when using the hash condition
Person.where(middle_name: nil)
to the wanted SQL predicate
SELECT people.* FROM people WHERE people.middle_name IS NULL
Again be careful!
Since almost everything in Ruby is an object, ActiveRecord does not have much choice, so it chooses nil (which is an object) as the closest representation of NULL. Because of that we have to keep in mind that whenever we use nil in ActiveRecord it actually uses the SQL’s NULL characteristics. We saw earlier in SQL NULL = NULL
it equals to NULL
or more correctly Unknown
, while in Ruby nil == nil
eqauls to true
, or SQL’s NULL != 'Jeffrey'
eqauls to NULL
, while Ruby’s nil != 'Jeffrey'
eqauls to true
. To find more more about NULL check out this while for ActiveRecord quering check out this.
Also useful note, that is associated with this problem, will be to avoid the Three-state Boolean Problem. But I feel like that is another post for another day.