The Phoenix testing pyramid

August 11, 2023

I recently jumped on a call with a couple of people to give them an intro into testing Elixir.

One of the topics I touched on was the testing pyramid, and since it seemed to resonate with them, I wanted to share it here.

The testing pyramid

I believe the original testing pyramid came from Mike Cohn, and it had three levels:

Top level: UI tests
Middle level: Service tests
Bottom level: Unit tests

I like this image from an article on Martin Fowler’s website:

A pyramid with three levels. Top level: UI tests. Middle level: Service tests: Bottom level: Unit tests. Arrow on the left going from top to bottom. At the top it has more integration. At the bottom it has more isolation. Arrow on the right going from top to bottom. At the top slower. At the bottom faster.

With single-page applications, those definitions might be confusing to some, since you can now write UI tests that test a UI component in isolation. And I think the same is true about some LiveView tests – you can write a unit test that is only testing UI logic.

So, to avoid confusion, let’s quickly describe the original testing pyramid, and then I’ll share how I apply it to Phoenix.

The testing pyramid explained

The idea behind the testing pyramid is that the higher up you go, the fewer tests you should have. The higher the level of your test, the more it is testing the integration of several components in your app. And the more you do that, the slower the test will be.

So, the tests at the top of the pyramid should be few and coarse. Because they’re slow, we don’t want to test fine-grained details. For example, we don’t want to test every possible conditional path there.

In the middle of the pyramid, we have other integration tests. Those run a bit faster and thus can be more fine-grained. So, we can have more of them. But what are services? For now, ignore the concrete name “Service”. You’ll see how I apply that to Phoenix below.

Finally, we have unit tests at the bottom. Those run fast, and thus, we should have lots of them. This is where we should be testing the logic of isolated components.

If the testing pyramid is still confusing, let me offer one more illustration.

Some people contrast the testing pyramid to an inverse pyramid (also called an ice-cream cone setup). There we have many end-to-end tests that run very slowly trying to cover many flows and edge cases of the application and far fewer unit tests.

The problem with an ice-cream cone setup is that our tests run very slowly, and it’s a sign that we’re testing deeply nested logic through high-level integration tests.

Phoenix testing pyramid

In Phoenix, I apply the testing pyramid guide like this.

A pyramid with three levels. Top level: End-to-end tests. Middle level: Core app tests: Bottom level: Unit tests. Arrow on the left going from top to bottom. At the top it has more integration. At the bottom it has more isolation. Arrow on the right going from top to bottom. At the top slower. At the bottom faster.

Top level: end-to-end tests (typically running through the UI)

If you are using LiveView, this could be a LiveView test. But we need to differentiate between a test that’s only testing UI logic and one that’s testing an end-to-end flow. At the top of the pyramid, we have end-to-end LiveView tests.

If you’re not using LiveView, we can do end-to-end testing with Wallaby. Wallaby manages a web driver, so the tests actually execute pieces of the UI through a web driver such as chromedriver.

Those top-level tests exercise the entire app, starting with the UI, navigating through pages, clicking on buttons and links, and ensuring flows work as expected.

The tests are slow, and unfortunately, sometimes brittle. For example, Wallaby tests can fail because some JavaScript in our page takes too long or triggers events in weird ways.

That makes running and maintaining the tests expensive. Therefore, we want fewer of them. Note that we still want some, just not as many as in the pyramid’s lower-levels.

I practice, I always test from the outside-in. So, I start work with a feature test at the top level. But I avoid testing every conditional path through the feature test. The test ensures that my components are integrated correctly, and that the core user experience is correct. But I leave more fine-grained testing to lower level tests.

Middle level: integration test at the app boundary

In Phoenix apps, we typically separate Phoenix’s web layer from the core of our app. Phoenix 1.3 pushed developers in that direction with the introduction of Phoenix Contexts.

That separation is visible in our controllers and our live views handle_event/3 callbacks. We place our controllers and live views in our myapp_web directory. But they call our app’s core modules and functions that live in the myapp directory.

That directory difference outlines a clear boundary, and that’s the seam where I define the middle level of the testing pyramid.

Thus, the middle-level tests are integration tests that test the behavior of our core app (away from the delivery mechanism). I believe that’s the same function Service Level tests had in the original testing pyramid.

Since these tests don’t need a web driver to click through a browser, they run faster. And since they’re testing the core of our app, we can write them at a more granular level. They’re still coarser than unit tests, but we can test a lot more.

For example, I test each public function that is exposed to the outside world, checking for both the happy and sad paths. That ensures that our core app’s is working as expected, regardless of how we’re delivering those results to our end users.

Bottom level: unit tests

At the bottom level we find our unit tests. They should test individual units of logic, and since they’re fast, we can write as many of them as we want.

What qualifies as a “unit of logic” is up for debate. Some people say it’s one isolated function. Others say the function can call dependent modules so long as it’s a conceptual unit of logic. I think both are fine.

Whatever “unit of logic” means, that’s the place to write fine-grained tests. I typically write several tests for every function: testing happy paths, sad paths, edge cases, and conditional logic. In short, that’s where we write all the tests that ensure the function behaves as expected under normal and abnormal conditions.