SQL's NULL != ActiveRecord's nil

Many developers nowadays are working on a daily basis with some ORM (which stands for Object Relation Mapping) library, that is handling the interaction with the database for them, along with making their code more readable, maintainable and closer to the programming language of their choosing. Plus most of ORMs are database agnostic, meaning they support writing queries for multiple database adapters.

ActiveRecord is one of the most popular ones, a Ruby gem that inspired a lot of other ORMs throughout various languages. For the rest of this article I will assume that you are familiar with it, and with Ruby of course. If not, bare with me, Ruby is widely known among programming languages for its readability, so I doubt that you will have troubles understanding it.

Demo database

The subject of this article, like any subject associated with software development, includes abstractions, theory, but without using practical examples I feel like most readers will lose interest or have a tougher time understanding it. So we are going to work with a demo database, that for the sake of simplicity we will have a really simple design and couple of records, so please don’t complain about subjects that aren’t focus (for instance normalization, schema, constraints). The database includes a table called people that looks like this

id first_name middle_name last_name
1 Michael Jeffrey Jordan
2 Drazen NULL Petrovic
3 Larry Joe Bird
4 Earvin NULL Johnson

 

The people table has four columns, id as a primary key, first_name, middle_name and last_name. I think the content of the table is simple enough and self explanatory, so I won’t dig into the details of the SCHEMA, I would just mention that every record in the table, like almost every person, has a first and last name and an optional middle name. Because of that, some of the records in the table under the column middle_name are marked with the special reserved word in SQL called NULL which indicates non-existance of a value.

Its ActiveRecord counterpart, following its grammar convention, will be a Ruby Class (popularly known as a Model) named in a singular form of the table name. So it will look something like

  class Person < ActiveRecord::Base
    # some ruby code
  end

Having said that, if some of you asks me about the identity of these people, meaning you don’t know who they are, then you should stop reading immediately and Google them before you continue. If you think that I am joking, you are wrong. I am being serious. Very serious.

Basketball Mount Rushmore

The translator

We know that it is one of ActiveRecord’s jobs to translate its queries from Ruby to SQL, in order to execute them on the database level. So for instance the query to retrieve all the records from the ‘people’ table

  Person.all

will be translated to

  SELECT people.* FROM people

Another simple example will be to query for the records where the first name equals to ‘Michael’

  Person.where(first_name: 'Michael')

The translated SQL

  SELECT people.* FROM people WHERE people.first_name = 'Michael'

A visual representation of the database result will be

id first_name middle_name last_name
1 Michael Jeffrey Jordan

 

Maybe we want all the people where first name is not ‘Michael’

  Person.where.not(first_name: 'Michael')

the SQL, something similar to

  SELECT people.* FROM people WHERE people.first_name != 'Michael'

The visual representation

id first_name middle_name last_name
2 Drazen NULL Petrovic
3 Larry Joe Bird
4 Earvin NULL Johnson

 

Simple right? What is the complexity in this? Performing simple comparisons in the queries? In fact, the truth is, it is really simple, but sometimes living (well coding actually) in the Ruby world, we tend to forget that ActiveRecord even though it looks like Ruby, under the hood, is still performing SQL logic and that logic is different from the one in Ruby. That is where the trouble begins.

NULL == nil right?

SQL’s NULL, in ActiveRecord is represented with Ruby’s nil. That is why when using ‘nil’ along with ActiveRecord we need to remember that it may look like the singleton object from the class NilClass, but under the hood this will have the SQL’s ‘NULL’ characteristics.

Nothing will be more clear without an example. Lets say we want to get all the records that don’t have a middle name ‘Jeffrey’. Again with this simple nonesence right? We could do something like

  Person.where.not(middle_name: 'Jeffrey')

ActiveRecord will translate this to

  SELECT people.* FROM people WHERE people.middle_name != 'Jeffrey'

And voila

id first_name middle_name last_name
3 Larry Joe Bird

 

Wait…but we wanted all the records that don’t have a middle name ‘Jeffrey’. Where are the other two? Hmmm…lets try this with plain Ruby without ActiveRecord

  Person.all.select { |person| person.middle_name != 'Jeffrey' }

The visual representation

id first_name middle_name last_name
2 Drazen NULL Petrovic
3 Larry Joe Bird
4 Earvin NULL Johnson

 

There it is! Well this will be ok here. The table only has 4 records. What happens if the table has 1000 or 10000 or even more? We know that for most of the times a general rule is that whenever we can query with SQL (ActiveRecord) we should not use Ruby, cause SQL is more optimal. But why does our query work with plain Ruby and not when using ActiveRecord?

Well the answer is again simple. We should write Ruby syntax with SQL logic. To further explain the case when we are performing the query without ActiveRecord, with the select enumerable method, lets disect what actually happens.

First we are instantiating objects from the Person class for all records in the people table. Because they are ActiveRecord objects, that are representations of the database records, the data types for the columns from the table will be casted accordingly with their counterparts in ActiveRecord as values for the objects’ accessors. So as we said previously NULL is represented with nil. The objects will look something like

  [
    <Person:0x0000005184458 id: 1, first_name: "Michael", middle_name: "Jeffrey", last_name: "Jordan">,
    <Person:0x0000005184458 id: 2, first_name: "Drazen", middle_name: nil, last_name: "Petrovic">,
    <Person:0x0000005184458 id: 3, first_name: "Larry", middle_name: "Joe", last_name: "Bird">,
    <Person:0x0000005184458 id: 4, first_name: "Earvin", middle_name: nil, last_name: "Johnson">,
  ]

Next we are iterrating and comparing the values for the middle_name accessor accross all of the objects with the string ‘Jeffrey’. Becase we are comparing Ruby values, we are leaning on the boolean system that is used by Ruby. The results from the comparisons will look like

  "Jeffrey" != "Jeffrey"
  => false
  nil != "Jeffrey"
  => true
  "Joe" != "Jeffrey"
  => true
  nil != "Jeffrey"
  => true

Lastly the select method will retrieve the objects for which the comparison in the code block returns a truthy value, stored in an Array. I’ve highlighted the truthy word because it is one of the things where Ruby’s nil differs from SQL’s NULL.

Conclussion NULL != nil

I owe you an explanation for the ActiveRecord case of the query above. We will explain it in this chapter more thoroughly.

Repetition is the mother of learning. So I’ve mentioned it before and I’m going to mention it again NULL is not a value in SQL. It is not a member of any data domain, but just a marker, a reserved word indicating the absence of value. Therefore the comparisons with NULL (except in two cases) can never be either True or False, but always a third logical result, known as Unknown. So SQL implements a three-valued logical system. A detailed visual representation of the three-valued logic looks like

p q p OR q p AND q p = q p != q
True True True True True False
True False True False False True
True NULL True Unknown Unknown Unknown
False True True False False True
False False False False True False
False NULL Unknown False Unknown Unknown
NULL True True Unknown Unknown Unknown
NULL False Unknown False Unknown Unknown
NULL NULL Unknown Unknown Unknown Unknown

 

p NOT p
True False
False True
NULL Unknown

 

You may be confused with NULL and UNKNOWN, but to make it more clear when using NULL in BOOLEAN logic it also may be referred to UNKNOWN. So wherever we populated the table with Unknown you can use NULL and vice-versa.

Back to our example from the previous chapter, where we are performing the ActiveRecord statement

  Person.where.not(middle_name: 'Jeffrey')

Which as we said translated to SQL

  SELECT people.* FROM people WHERE people.middle_name != 'Jeffrey'

The WHERE clause just like the select method in Ruby will only return the records for which the comparison returns a truthy value. So the comparisons will look something like

  SELECT 'Jeffrey' != 'Jeffrey';
  => False
  SELECT NULL != 'Jeffrey';
  => Unknown
  SELECT 'Joe' != 'Jeffrey';
  => True
  SELECT NULL != 'Jeffrey';
  => Unknown

And that is why we only get

id first_name middle_name last_name
3 Larry Joe Bird

 

Because of this SQL provides two special comparison predicates called IS NULL and IS NOT NULL to test whether there is an absence of a value in the field or not.

So depending on our version of ActiveRecord, our statement should be

For version below 5 we can’t use the hash conditions since there is no or method, but we can use placeholder condition which is a close call

  Person.where('middle_name != :name OR middle_name IS NULL', name: 'Jeffrey')

For version at least 5

  Person.where.not(middle_name: 'Jeffrey').or(Person.where(middle_name: nil))

It will translate to SQL

  SELECT people.* FROM people WHERE people.middle_name != 'Jeffrey' OR people.middle_name IS NULL

and now the real voila!

id first_name middle_name last_name
2 Drazen NULL Petrovic
3 Larry Joe Bird
4 Earvin NULL Johnson

 

See ActiveRecord was smart enough to translate the query when using the hash condition

  Person.where(middle_name: nil)

to the wanted SQL predicate

  SELECT people.* FROM people WHERE people.middle_name IS NULL

Again be careful!

Since almost everything in Ruby is an object, ActiveRecord does not have much choice, so it chooses nil (which is an object) as the closest representation of NULL. Because of that we have to keep in mind that whenever we use nil in ActiveRecord it actually uses the SQL’s NULL characteristics. We saw earlier in SQL NULL = NULL it equals to NULL or more correctly Unknown, while in Ruby nil == nil eqauls to true, or SQL’s NULL != 'Jeffrey' eqauls to NULL, while Ruby’s nil != 'Jeffrey' eqauls to true. To find more more about NULL check out this while for ActiveRecord quering check out this.

Also useful note, that is associated with this problem, will be to avoid the Three-state Boolean Problem. But I feel like that is another post for another day.