to all posts

Finding the right person (in Postgres)

Published

Even though Patrick McKenzie lists a lot of good reasons not to, a project I am contributing to stores people’s names like this:

CREATE TABLE people (
    -- The beefy first column is the PostgreSQL way of specifying
    -- an autoincrementing integer primary key
    id integer PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    first_name text NOT NULL,
    middle_name text,
    last_name text NOT NULL
);

Besides the issues mentioned in McKenzie’s article, this schema also makes it not-very-straightforward to find people by searching for their name. Take, for instance, me:1

INSERT INTO
    people (first_name, middle_name, last_name)
VALUES
    ('Erik', 'André', 'Jakobsen');

Prelude

To learn more about why and how we can handle it, I followed Ecto’s getting started guide to set up an Elixir project called Friends with a PostgreSQL database. After getting everything set up, I create the people table with an Ecto migration. If you’re following along, run this in your command line:

❯ mix ecto.gen.migration create_people
* creating priv/repo/migrations/20250326160917_create_people.exs

This gives us a file where we can create our table.

defmodule Friends.Repo.Migrations.CreatePeople do
  use Ecto.Migration

  def change do
    create table :people do
      add :first_name, :string, null: false
      add :middle_name, :string
      add :last_name, :string, null: false
    end
  end
end

We’ll also need an Ecto schema, i.e. a fancy struct which represents entities in our database. A changeset function which lets us validate the data comes in handy as well:

defmodule Friends.People.Person do
  use Ecto.Schema
  import Ecto.Changeset

  schema "people" do
    field :first_name, :string
    field :middle_name, :string
    field :last_name, :string
  end

  def changeset(%__MODULE__{} = person, params \\ %{}) do
    person
    |> cast(params, [:first_name, :middle_name, :last_name])
    |> validate_required([:first_name, :last_name])
  end
end

Finally, create a module called Friends.People which will use our schema to interact with our database. To start, we’ll have a simple function to insert new friends:

defmodule Friends.People do
  alias Friends.People.Person
  alias Friends.Repo

  def create(params) do
    %Person{}
    |> Person.changeset(params)
    |> Repo.insert()
  end
end

We’re now ready to load our project in a REPL with iex -S mix and call the function to insert a very nice name into the database:

❯ iex -S mix
Erlang/OTP 27 [erts-15.2.2] [source] [64-bit] [smp:10:10] [ds:10:10:10] [async-threads:1] [jit]

Compiling 2 files (.ex)
Generated friends app
Interactive Elixir (1.18.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> Friends.People.create(%{first_name: "Erik", middle_name: "André", last_name: "Jakobsen"})

17:20:57.813 [debug] QUERY OK source="people" db=1.1ms decode=0.9ms queue=0.7ms idle=741.3ms
INSERT INTO "people" ("first_name","middle_name","last_name") VALUES ($1,$2,$3) RETURNING "id" ["Erik", "André", "Jakobsen"]
{:ok,
 %Friends.People.Person{
   __meta__: #Ecto.Schema.Metadata<:loaded, "people">,
   id: 1,
   first_name: "Erik",
   middle_name: "André",
   last_name: "Jakobsen"
 }}

Finally! Now let’s find that son of a bitch

A reasonable first attempt at a search function might look something like this:

defmodule Friends.People do
  alias Friends.People.Person
  alias Friends.Repo
  import Ecto.Query # 👈 import Ecto.Query to access the Query DSL

  # ✂️ create function cut

  def find_by_name(name) when is_binary(name) do
    name_pattern = "%#{name}%"

    from(p in Person,
      where:
        ilike(
          fragment(
            "? || ' ' || ? || ' ' || ?",
            p.first_name,
            p.middle_name,
            p.last_name
          ),
          ^name_pattern
        )
    )
    |> Repo.all()
  end
end

That is, wrap the query in % and use string concatenation and ILIKE to search for the name. Calling find_people_by_name("Erik") will issue a query pretty close to this:

SELECT
	p.first_name, p.middle_name, p.last_name
FROM
	people as p
WHERE
	p.first_name || ' ' || p.middle_name || ' ' || p.last_name ILIKE '%Erik%';

And it works pretty well! I can search for my complete name, and be lazy about capitalization:

iex(2)> recompile # recompile the project without leaving the REPL
Compiling 1 file (.ex)
Generated friends app
:ok

iex(3)> Friends.People.find_by_name("erik andré jakobsen")

# debug log cut
[
  %Friends.People.Person{
    __meta__: #Ecto.Schema.Metadata<:loaded, "people">,
    id: 1,
    first_name: "Erik",
    middle_name: "André",
    last_name: "Jakobsen"
  }
]

Searching for any one of my names works fine as well.

iex(4)> Friends.People.find_by_name("jakobsen")
[
  %App.People.Person{
    __meta__: #Ecto.Schema.Metadata<:loaded, "people">,
    id: 1,
    first_name: "Erik",
    middle_name: "André",
    last_name: "Jakobsen"
  }
]

Great! We’re done here. I’m making some coffee.

Not everyone has a middle name you know

What? Oh, I’m sure that’s still fine.

iex(5)> Friends.People.create(%{first_name: "José", last_name: "Valim"})

{:ok,
 %Friends.People.Person{
   __meta__: #Ecto.Schema.Metadata<:loaded, "people">,
   id: 2,
   first_name: "José",
   middle_name: nil,
   last_name: "Valim"
 }}
iex(6)> Friends.People.find_by_name("José Valim")

[]

Right.

To get a better idea of what’s going on, let’s run a SQL query straight in Postgres:

❯ psql -U postgres -d friends_repo
psql (17.2, server 17.0 (DBngin.app))
Type "help" for help.

friends_repo=# SELECT 'José' || ' ' || NULL || ' ' || 'Valim' AS full_name;
full_name
-----------

(1 row)

Err, so 'something' || NULL is NULL. According to some person on Stack Overflow™ this is part of the SQL standard, and I’m not about to spend hundreds of dollars veryfing that, so we’ll just accept it as a fact of life and move on.

Postgres has a lot of string functions that could help out. Maybe concat, which makes the slightly more sensible choice of simply ignoring nulls, is the answer?

friends_repo=# SELECT concat('José', ' ', NULL, ' ', 'Valim') AS full_name;
  full_name
-------------
 José  Valim
(1 row)

That’s closer, but we’re left with two spaces between the names instead of one, which messes things up

friends_repo=# SELECT * FROM people WHERE concat(first_name, ' ', middle_name, ' ', last_name) ILIKE 'José Valim';
 id | first_name | middle_name | last_name
----+------------+-------------+-----------
(0 rows)

Happily, the Postgres people have thought about this, and they give us concat_ws. The first argument is used as a separator when concatenating the following arguments, and nulls are ignored.

friends_repo=# SELECT concat_ws(' ', 'José', NULL, 'Valim') AS full_name;
 full_name
------------
 José Valim
(1 row)

We update our search function to use it:

def find_by_name(name) when is_binary(name) do
  name_pattern = "%#{name}%"

  from(p in Person,
    where:
      ilike(
        fragment(
          "concat_ws(' ', ?, ?, ?)",
          p.first_name,
          p.middle_name,
          p.last_name
        ),
        ^name_pattern
      )
  )
  |> Repo.all()
end

And would you look at that:

iex(7)> recompile
Compiling 1 file (.ex)
Generated friends app
:ok
iex(8)> Friends.People.find_by_name("José Valim")

[
  %Friends.People.Person{
    __meta__: #Ecto.Schema.Metadata<:loaded, "people">,
    id: 2,
    first_name: "José",
    middle_name: nil,
    last_name: "Valim"
  }
]

Now, where did I put that coff—

I didn’t even know you had a middle name!

Oh don’t worry, just run a search without it.

iex(8)> App.People.find_people_by_name("erik jakobsen")
[]

Alright, so back to the drawing board, and away from raw SQL. Rather than taking the name search and using it as-is, we’ll split it up and iteratively build a query.

In Ecto, queries are simply data structures that can be stored in variables, and the query can be extended as many times as we need:

iex(9)> query = from p in Friends.People.Person
#Ecto.Query<from p0 in Friends.People.Person>
iex(10)> query = from p in query, where: ilike(p.first_name, "josé")
#Ecto.Query<from p0 in Friends.People.Person,
# where: ilike(p0.first_name, "josé")>
iex(11)> Friends.Repo.all(query)

[
  %Friends.People.Person{
    __meta__: #Ecto.Schema.Metadata<:loaded, "people">,
    id: 2,
    first_name: "José",
    middle_name: nil,
    last_name: "Valim"
  }
]

For our purposes, we first make a private function which accepts a name and partial query, and extends the latter with an OR query on all three name columns using the former:

defp add_name_to_query(name, query) do
  name_pattern = "%#{name}%"

  from p in query,
    where:
      ilike(p.first_name, ^name_pattern) or
        ilike(p.middle_name, ^name_pattern) or
        ilike(p.last_name, ^name_pattern)
end

Using a comprehension with the :reduce option, we build a query using all names individually, and then use that to search the database:

def find_people_by_name(name_query) do
  names = String.split(name_query)

  query =
    for name <- names, reduce: from(p in Person) do
      partial_query -> add_name_to_query(name, partial_query)
    end

  Repo.all(query)
end

Let’s break up what that for comprehension is doing a little more. In the do block we are calling this function with query:

partial_query -> add_name_to_query(name, partial_query)

For each iteration, the partial_query will be the query we have built up so far. In the first iteration we haven’t started yet, so this is where Elixir will use the from(p in Person) expression that we passed to the reduce option. That means that once we’re done building our query for "erik jakobsen", we’ll have something like this:

from p in Person,
  where:
    ilike(p.first_name, "%erik%") or
      ilike(p.middle_name, "%erik%") or
      ilike(p.last_name, "%erik%") or
      ilike(p.first_name, "%jakobsen%") or
      ilike(p.middle_name, "%jakobsen%") or
      ilike(p.last_name, "%jakobsen%")

This gives us what we want:

iex(12)> Friends.People.find_by_name("erik jakobsen")

[
  %Friends.People.Person{
    __meta__: #Ecto.Schema.Metadata<:loaded, "people">,
    id: 1,
    first_name: "Erik",
    middle_name: "André",
    last_name: "Jakobsen"
  }
]

Happy now?

What’s with the é-s?

Fair enough.

iex(13)> Friends.People.find_by_name("jose")

[]

Postgres has our backs once again, this time with the unaccent extension. To use it with our database, we’ll need another migration to add it:

❯ mix ecto.gen.migration add_unaccent
* creating priv/repo/migrations/20250326174022_add_unaccent.exs

In it, we use execute/2 to add the extension.

defmodule Friends.Repo.Migrations.AddUnaccent do
  use Ecto.Migration

  def change do
    execute "CREATE EXTENSION IF NOT EXISTS unaccent", "DROP EXTENSION unaccent"
  end
end

The first argument to execute is what to do when running our migration:

❯ mix ecto.migrate

18:42:01.986 [info] == Running 20250326174022 Friends.Repo.Migrations.AddUnaccent.change/0 forward

18:42:01.987 [info] execute "CREATE EXTENSION IF NOT EXISTS unaccent"

18:42:02.242 [info] == Migrated 20250326174022 in 0.2s

and the second tells Ecto what to do in case we change our minds and want to roll back

❯ mix ecto.rollback

18:43:42.534 [info] == Running 20250326174022 Friends.Repo.Migrations.AddUnaccent.change/0 backward

18:43:42.535 [info] execute "DROP EXTENSION unaccent"

18:43:42.537 [info] == Migrated 20250326174022 in 0.0s

But we’re not rolling back, so, uuh, undo the undo please.

❯ mix ecto.migrate

18:44:04.214 [info] == Running 20250326174022 Friends.Repo.Migrations.AddUnaccent.change/0 forward

18:44:04.215 [info] execute "CREATE EXTENSION IF NOT EXISTS unaccent"

18:44:04.220 [info] == Migrated 20250326174022 in 0.0s

Finally, throw around unaccent in our query builder with great abandon:

defp add_name_to_query(name, query) do
  name_pattern = "%#{name}%"

  from p in query,
    where:
      fragment("unaccent(?) ILIKE unaccent(?)", p.first_name, ^name_pattern) or
        fragment("unaccent(?) ILIKE unaccent(?)", p.middle_name, ^name_pattern) or
        fragment("unaccent(?) ILIKE unaccent(?)", p.last_name, ^name_pattern)
end

Look, it’s beautiful:

iex(14)> Friends.People.find_by_name("jose")
[
  %Friends.People.Person{
    __meta__: #Ecto.Schema.Metadata<:loaded, "people">,
    id: 2,
    first_name: "José",
    middle_name: nil,
    last_name: "Valim"
  }
]

iex(15)> Friends.People.find_by_name("erik andre")
[
  %Friends.People.Person{
    __meta__: #Ecto.Schema.Metadata<:loaded, "people">,
    id: 1,
    first_name: "Erik",
    middle_name: "André",
    last_name: "Jakobsen"
  }
]

Isn’t this all very slow?

That depends!

The project I’m working on only stores about 2000 names, so queries finish pretty much instantly.

But, as an experiment, I added the Faker library, used it to generate five million fake names, each of which sounds English, Spanish, French or Italian at random, and inserted each into my database:

Task.async_stream(
  1..5_000_000,
  fn _ ->
    faker =
      Enum.random([
        Faker.Person.En,
        Faker.Person.Es,
        Faker.Person.Fr,
        Faker.Person.It
      ])

    Friends.People.create(%{
      first_name: faker.first_name(),
      middle_name: faker.first_name(),
      last_name: faker.last_name()
    })
  end
)
|> Stream.run()

We all have hobbies.

Searching for my name again2 finishes in about 1.5 seconds (notice the db=1541.1ms below), which, frankly, isn’t great:

iex(16)> Friends.People.find_by_name("Erik André Jakobsen")

18:59:42.242 [debug] QUERY OK source="people" db=1541.1ms decode=3.1ms queue=2.0ms idle=13.6ms
# ✂️ cut

First off, having a database with that many names in it sounds to me like you either have a very good kind of problem, or that the authorities are looking for you.

In any case, if my project took off (it won’t), we would have to make some changes.

For one, we don’t even have an index on any of this. We could, for instance precompute and store an unqouted full name column with a GIN index, and then implement a search using trigrams. Or set up something like tantivy. Or give Ansible some money. Or rent a more powerful server.

These are all left as an exercise for the reader.


Footnotes

  1. I am, after all, a selfish bastard.

  2. See previous footnote.