Lateral Joins Instead of Window Functions
Every time I need to use a window function, I cringe a little. They’re not that hard to use. But I find them unintuitive, and I use them so infrequently that I always have difficulty figuring out what I’m partitioning over which rows.
And most times, when I reach for a window function, I’m not trying to do
anything complex. I’m usually calculating something for each row of my
results. If only Postgres had a
for_each function… 🤔
Let’s dive into a concrete problem. Suppose we have the following
Someone asks you to get the latest
log_in event for a given user (e.g.
No problem. You’ve done an
ORDER BY _ DESC LIMIT 1 query many a time.
SELECT user_id, created_at FROM events WHERE events.name = "log_in" AND user_id = 35 ORDER BY created_at DESC LIMIT 1
You probably didn’t even break a sweat.
But now, what query would you write if someone asks you to get a list of all
user emails with their latest
SELECT users.email, events.created_at AS latest_log_in FROM users INNER JOIN events ... WHERE events.name = "log_in"
🤨 Not as easy, right?
It’d be nice if we could use our first query inside our second query. Something like this:
-- pseudocode SELECT users.email, events.created_at AS latest_log_in FROM users INNER JOIN ( SELECT created_at FROM events WHERE events.name = "log_in" AND user_id = 35 ORDER BY created_at DESC LIMIT 1 ) as events
The problem is that we have to do that for each user, not just user
can we do that?
As it turns out, lateral joins are just the thing we need.
Using LATERAL joins
A LATERAL item can appear… within a JOIN tree… [It] can… refer to any items that are on the left-hand side of a JOIN that it is on the right-hand side of.
JOIN LATERAL allows us to reference values from the main query inside
the subquery. So, we can access the
users.id from within our subquery and
remove the hard-coded
SELECT users.email, events.created_at AS latest_log_in FROM users JOIN LATERAL ( SELECT created_at FROM events WHERE events.user_id = users.id AND events.name = "log_in" ORDER BY created_at DESC LIMIT 1 ) events ON true
JOIN LATERAL, Postgres will grab each row in
FROM, and it will evaluate
LATERAL item using that row’s value. Then it’ll join the tables as usual.
So, we select all of our users, and for each row in the
users table, we
evaluate the most recent
log_in event for that given user. Then we join those
Isn’t that amazing?! 🤯
These resources helped me understand
LATERAL joins. I hope they’re helpful to