Finding the right person (in Postgres)
Even though Patrick McKenzie lists a lot of good reasons not to, a project I am contributing to stores people’s names like this:
CREATE TABLE people (
-- The beefy first column is the PostgreSQL way of specifying
-- an autoincrementing integer primary key
id integer PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
first_name text NOT NULL,
middle_name text,
last_name text NOT NULL
);
Besides the issues mentioned in McKenzie’s article, this schema also makes it not-very-straightforward to find people by searching for their name. Take, for instance, me:1
INSERT INTO
people (first_name, middle_name, last_name)
VALUES
('Erik', 'André', 'Jakobsen');
Prelude
To learn more about why and how we can handle it, I followed Ecto’s getting started guide to set up an Elixir project called Friends
with a PostgreSQL database. After getting everything set up, I create the people
table with an Ecto migration. If you’re following along, run this in your command line:
❯ mix ecto.gen.migration create_people
* creating priv/repo/migrations/20250326160917_create_people.exs
This gives us a file where we can create our table.
defmodule Friends.Repo.Migrations.CreatePeople do
use Ecto.Migration
def change do
create table :people do
add :first_name, :string, null: false
add :middle_name, :string
add :last_name, :string, null: false
end
end
end
We’ll also need an Ecto schema, i.e. a fancy struct which represents entities in our database. A changeset
function which lets us validate the data comes in handy as well:
defmodule Friends.People.Person do
use Ecto.Schema
import Ecto.Changeset
schema "people" do
field :first_name, :string
field :middle_name, :string
field :last_name, :string
end
def changeset(%__MODULE__{} = person, params \\ %{}) do
person
|> cast(params, [:first_name, :middle_name, :last_name])
|> validate_required([:first_name, :last_name])
end
end
Finally, create a module called Friends.People
which will use our schema to interact with our database. To start, we’ll have a simple function to insert new friends:
defmodule Friends.People do
alias Friends.People.Person
alias Friends.Repo
def create(params) do
%Person{}
|> Person.changeset(params)
|> Repo.insert()
end
end
We’re now ready to load our project in a REPL with iex -S mix
and call the function to insert a very nice name into the database:
❯ iex -S mix
Erlang/OTP 27 [erts-15.2.2] [source] [64-bit] [smp:10:10] [ds:10:10:10] [async-threads:1] [jit]
Compiling 2 files (.ex)
Generated friends app
Interactive Elixir (1.18.0) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> Friends.People.create(%{first_name: "Erik", middle_name: "André", last_name: "Jakobsen"})
17:20:57.813 [debug] QUERY OK source="people" db=1.1ms decode=0.9ms queue=0.7ms idle=741.3ms
INSERT INTO "people" ("first_name","middle_name","last_name") VALUES ($1,$2,$3) RETURNING "id" ["Erik", "André", "Jakobsen"]
{:ok,
%Friends.People.Person{
__meta__: #Ecto.Schema.Metadata<:loaded, "people">,
id: 1,
first_name: "Erik",
middle_name: "André",
last_name: "Jakobsen"
}}
Finally! Now let’s find that son of a bitch
A reasonable first attempt at a search function might look something like this:
defmodule Friends.People do
alias Friends.People.Person
alias Friends.Repo
import Ecto.Query # 👈 import Ecto.Query to access the Query DSL
# ✂️ create function cut
def find_by_name(name) when is_binary(name) do
name_pattern = "%#{name}%"
from(p in Person,
where:
ilike(
fragment(
"? || ' ' || ? || ' ' || ?",
p.first_name,
p.middle_name,
p.last_name
),
^name_pattern
)
)
|> Repo.all()
end
end
That is, wrap the query in %
and use string concatenation and ILIKE
to search for the name. Calling find_people_by_name("Erik")
will issue a query pretty close to this:
SELECT
p.first_name, p.middle_name, p.last_name
FROM
people as p
WHERE
p.first_name || ' ' || p.middle_name || ' ' || p.last_name ILIKE '%Erik%';
And it works pretty well! I can search for my complete name, and be lazy about capitalization:
iex(2)> recompile # recompile the project without leaving the REPL
Compiling 1 file (.ex)
Generated friends app
:ok
iex(3)> Friends.People.find_by_name("erik andré jakobsen")
# debug log cut
[
%Friends.People.Person{
__meta__: #Ecto.Schema.Metadata<:loaded, "people">,
id: 1,
first_name: "Erik",
middle_name: "André",
last_name: "Jakobsen"
}
]
Searching for any one of my names works fine as well.
iex(4)> Friends.People.find_by_name("jakobsen")
[
%App.People.Person{
__meta__: #Ecto.Schema.Metadata<:loaded, "people">,
id: 1,
first_name: "Erik",
middle_name: "André",
last_name: "Jakobsen"
}
]
Great! We’re done here. I’m making some coffee.
Not everyone has a middle name you know
What? Oh, I’m sure that’s still fine.
iex(5)> Friends.People.create(%{first_name: "José", last_name: "Valim"})
{:ok,
%Friends.People.Person{
__meta__: #Ecto.Schema.Metadata<:loaded, "people">,
id: 2,
first_name: "José",
middle_name: nil,
last_name: "Valim"
}}
iex(6)> Friends.People.find_by_name("José Valim")
[]
Right.
To get a better idea of what’s going on, let’s run a SQL query straight in Postgres:
❯ psql -U postgres -d friends_repo
psql (17.2, server 17.0 (DBngin.app))
Type "help" for help.
friends_repo=# SELECT 'José' || ' ' || NULL || ' ' || 'Valim' AS full_name;
full_name
-----------
(1 row)
Err, so 'something' || NULL
is NULL
. According to some person on Stack Overflow™ this is part of the SQL standard, and I’m not about to spend hundreds of dollars veryfing that, so we’ll just accept it as a fact of life and move on.
Postgres has a lot of string functions that could help out. Maybe concat
, which makes the slightly more sensible choice of simply ignoring nulls, is the answer?
friends_repo=# SELECT concat('José', ' ', NULL, ' ', 'Valim') AS full_name;
full_name
-------------
José Valim
(1 row)
That’s closer, but we’re left with two spaces between the names instead of one, which messes things up
friends_repo=# SELECT * FROM people WHERE concat(first_name, ' ', middle_name, ' ', last_name) ILIKE 'José Valim';
id | first_name | middle_name | last_name
----+------------+-------------+-----------
(0 rows)
Happily, the Postgres people have thought about this, and they give us concat_ws
. The first argument is used as a separator when concatenating the following arguments, and nulls are ignored.
friends_repo=# SELECT concat_ws(' ', 'José', NULL, 'Valim') AS full_name;
full_name
------------
José Valim
(1 row)
We update our search function to use it:
def find_by_name(name) when is_binary(name) do
name_pattern = "%#{name}%"
from(p in Person,
where:
ilike(
fragment(
"concat_ws(' ', ?, ?, ?)",
p.first_name,
p.middle_name,
p.last_name
),
^name_pattern
)
)
|> Repo.all()
end
And would you look at that:
iex(7)> recompile
Compiling 1 file (.ex)
Generated friends app
:ok
iex(8)> Friends.People.find_by_name("José Valim")
[
%Friends.People.Person{
__meta__: #Ecto.Schema.Metadata<:loaded, "people">,
id: 2,
first_name: "José",
middle_name: nil,
last_name: "Valim"
}
]
Now, where did I put that coff—
I didn’t even know you had a middle name!
Oh don’t worry, just run a search without it.
iex(8)> App.People.find_people_by_name("erik jakobsen")
[]
…
Alright, so back to the drawing board, and away from raw SQL. Rather than taking the name search and using it as-is, we’ll split it up and iteratively build a query.
In Ecto, queries are simply data structures that can be stored in variables, and the query can be extended as many times as we need:
iex(9)> query = from p in Friends.People.Person
#Ecto.Query<from p0 in Friends.People.Person>
iex(10)> query = from p in query, where: ilike(p.first_name, "josé")
#Ecto.Query<from p0 in Friends.People.Person,
# where: ilike(p0.first_name, "josé")>
iex(11)> Friends.Repo.all(query)
[
%Friends.People.Person{
__meta__: #Ecto.Schema.Metadata<:loaded, "people">,
id: 2,
first_name: "José",
middle_name: nil,
last_name: "Valim"
}
]
For our purposes, we first make a private function which accepts a name and partial query, and extends the latter with an OR
query on all three name columns using the former:
defp add_name_to_query(name, query) do
name_pattern = "%#{name}%"
from p in query,
where:
ilike(p.first_name, ^name_pattern) or
ilike(p.middle_name, ^name_pattern) or
ilike(p.last_name, ^name_pattern)
end
Using a comprehension with the :reduce
option, we build a query using all names individually, and then use that to search the database:
def find_people_by_name(name_query) do
names = String.split(name_query)
query =
for name <- names, reduce: from(p in Person) do
partial_query -> add_name_to_query(name, partial_query)
end
Repo.all(query)
end
Let’s break up what that for
comprehension is doing a little more. In the do
block we are calling this function with query:
partial_query -> add_name_to_query(name, partial_query)
For each iteration, the partial_query
will be the query we have built up so far. In the first iteration we haven’t started yet, so this is where Elixir will use the from(p in Person)
expression that we passed to the reduce
option. That means that once we’re done building our query for "erik jakobsen"
, we’ll have something like this:
from p in Person,
where:
ilike(p.first_name, "%erik%") or
ilike(p.middle_name, "%erik%") or
ilike(p.last_name, "%erik%") or
ilike(p.first_name, "%jakobsen%") or
ilike(p.middle_name, "%jakobsen%") or
ilike(p.last_name, "%jakobsen%")
This gives us what we want:
iex(12)> Friends.People.find_by_name("erik jakobsen")
[
%Friends.People.Person{
__meta__: #Ecto.Schema.Metadata<:loaded, "people">,
id: 1,
first_name: "Erik",
middle_name: "André",
last_name: "Jakobsen"
}
]
Happy now?
What’s with the é-s?
Fair enough.
iex(13)> Friends.People.find_by_name("jose")
[]
Postgres has our backs once again, this time with the unaccent
extension. To use it with our database, we’ll need another migration to add it:
❯ mix ecto.gen.migration add_unaccent
* creating priv/repo/migrations/20250326174022_add_unaccent.exs
In it, we use execute/2
to add the extension.
defmodule Friends.Repo.Migrations.AddUnaccent do
use Ecto.Migration
def change do
execute "CREATE EXTENSION IF NOT EXISTS unaccent", "DROP EXTENSION unaccent"
end
end
The first argument to execute
is what to do when running our migration:
❯ mix ecto.migrate
18:42:01.986 [info] == Running 20250326174022 Friends.Repo.Migrations.AddUnaccent.change/0 forward
18:42:01.987 [info] execute "CREATE EXTENSION IF NOT EXISTS unaccent"
18:42:02.242 [info] == Migrated 20250326174022 in 0.2s
and the second tells Ecto what to do in case we change our minds and want to roll back
❯ mix ecto.rollback
18:43:42.534 [info] == Running 20250326174022 Friends.Repo.Migrations.AddUnaccent.change/0 backward
18:43:42.535 [info] execute "DROP EXTENSION unaccent"
18:43:42.537 [info] == Migrated 20250326174022 in 0.0s
But we’re not rolling back, so, uuh, undo the undo please.
❯ mix ecto.migrate
18:44:04.214 [info] == Running 20250326174022 Friends.Repo.Migrations.AddUnaccent.change/0 forward
18:44:04.215 [info] execute "CREATE EXTENSION IF NOT EXISTS unaccent"
18:44:04.220 [info] == Migrated 20250326174022 in 0.0s
Finally, throw around unaccent
in our query builder with great abandon:
defp add_name_to_query(name, query) do
name_pattern = "%#{name}%"
from p in query,
where:
fragment("unaccent(?) ILIKE unaccent(?)", p.first_name, ^name_pattern) or
fragment("unaccent(?) ILIKE unaccent(?)", p.middle_name, ^name_pattern) or
fragment("unaccent(?) ILIKE unaccent(?)", p.last_name, ^name_pattern)
end
Look, it’s beautiful:
iex(14)> Friends.People.find_by_name("jose")
[
%Friends.People.Person{
__meta__: #Ecto.Schema.Metadata<:loaded, "people">,
id: 2,
first_name: "José",
middle_name: nil,
last_name: "Valim"
}
]
iex(15)> Friends.People.find_by_name("erik andre")
[
%Friends.People.Person{
__meta__: #Ecto.Schema.Metadata<:loaded, "people">,
id: 1,
first_name: "Erik",
middle_name: "André",
last_name: "Jakobsen"
}
]
Isn’t this all very slow?
That depends!
The project I’m working on only stores about 2000 names, so queries finish pretty much instantly.
But, as an experiment, I added the Faker library, used it to generate five million fake names, each of which sounds English, Spanish, French or Italian at random, and inserted each into my database:
Task.async_stream(
1..5_000_000,
fn _ ->
faker =
Enum.random([
Faker.Person.En,
Faker.Person.Es,
Faker.Person.Fr,
Faker.Person.It
])
Friends.People.create(%{
first_name: faker.first_name(),
middle_name: faker.first_name(),
last_name: faker.last_name()
})
end
)
|> Stream.run()
We all have hobbies.
Searching for my name again2 finishes in about 1.5 seconds (notice the db=1541.1ms
below), which, frankly, isn’t great:
iex(16)> Friends.People.find_by_name("Erik André Jakobsen")
18:59:42.242 [debug] QUERY OK source="people" db=1541.1ms decode=3.1ms queue=2.0ms idle=13.6ms
# ✂️ cut
First off, having a database with that many names in it sounds to me like you either have a very good kind of problem, or that the authorities are looking for you.
In any case, if my project took off (it won’t), we would have to make some changes.
For one, we don’t even have an index on any of this. We could, for instance precompute and store an unqouted full name column with a GIN index, and then implement a search using trigrams. Or set up something like tantivy. Or give Ansible some money. Or rent a more powerful server.
These are all left as an exercise for the reader.