Lateral Joins Instead of Window Functions

Every time I need to use a window function, I cringe a little. They’re not that hard to use. But I find them unintuitive, and I use them so infrequently that I always have difficulty figuring out what I’m partitioning over which rows.

And most times, when I reach for a window function, I’m not trying to do anything complex. I’m usually calculating something for each row of my results. If only Postgres had a for_each function… 🤔

The problem

Let’s dive into a concrete problem. Suppose we have the following users and events tables:

Users    
id name email
1 John john@example.com
2 Jean jean@example.com
   
100030 Sandra sandra@example.com
Events    
user_id name created_at
1 log_in 2021-02-23
2 log_in 2021-02-23
9 purchase 2021-02-24
   
999 log_in 2021-12-23

Someone asks you to get the latest log_in event for a given user (e.g. id = 35).

No problem. You’ve done an ORDER BY _ DESC LIMIT 1 query many a time.

SELECT user_id, created_at
FROM events
WHERE events.name = "log_in"
  AND user_id = 35
ORDER BY created_at DESC
LIMIT 1

You probably didn’t even break a sweat.

But now, what query would you write if someone asks you to get a list of all user emails with their latest log_in events?

SELECT users.email, events.created_at AS latest_log_in
FROM users
INNER JOIN events
...
WHERE events.name = "log_in"

🤨 Not as easy, right?

It’d be nice if we could use our first query inside our second query. Something like this:

-- pseudocode
SELECT users.email, events.created_at AS latest_log_in
FROM users
INNER JOIN (
  SELECT created_at
  FROM events
  WHERE events.name = "log_in"
    AND user_id = 35
  ORDER BY created_at DESC
  LIMIT 1
) as events

The problem is that we have to do that for each user, not just user 35. How can we do that?

As it turns out, lateral joins are just the thing we need.

Using LATERAL joins

A LATERAL item can appear… within a JOIN tree… [It] can… refer to any items that are on the left-hand side of a JOIN that it is on the right-hand side of.

PostgreSQL’s SELECT docs

Using JOIN LATERAL allows us to reference values from the main query inside the subquery. So, we can access the users.id from within our subquery and remove the hard-coded user_id:

SELECT users.email, events.created_at AS latest_log_in
FROM users
JOIN LATERAL (
  SELECT created_at
  FROM events
  WHERE events.user_id = users.id
    AND events.name = "log_in"
  ORDER BY created_at DESC
  LIMIT 1
) events ON true

With JOIN LATERAL, Postgres will grab each row in FROM, and it will evaluate the LATERAL item using that row’s value. Then it’ll join the tables as usual.

So, we select all of our users, and for each row in the users table, we evaluate the most recent log_in event for that given user. Then we join those results.

Isn’t that amazing?! 🤯

More resources

These resources helped me understand LATERAL joins. I hope they’re helpful to you too:

Want my latest thoughts, posts, and projects in your inbox?

    I will never send you spam. Unsubscribe any time.