Turborepo 101

TL;DR

Pros:

  • Vercel’s remote caching saves lots of ci minutes by caching dependencies and artifacts
  • No need to have a package registry, dependencies can be locally imported thanks to npm/pnpm/yarn workspaces
  • Better control/code visibility and less duplication (shared contracts, configuration, libraries)

Cons:

  • Independent versioning of packages becomes harder
  • Transitioning a mature project to a monorepo is not trivial
  • If you plan to use Docker, Dockerfile steps will increase significantly

Multiple services, same repo

Several years back monoliths were the way to go when building software, so you would likely have a single repository containing your project. Things started to change with the advent of SOA: it was natural to split each service into its own repository, to separate concerns and avoid having giant JSP apps crammed together into a single place. Fast forward to the current day with microservices we happen to have loads of repos: one for each service (let’s not forget the libraries!).

Managing this amount of repos can be challenging, even more for small teams. Big Tech companies, such as Google, Meta and Microsoft, solved the issue by developing complex build systems such as Bazel (way before git was popular). Smaller companies unfortunately lack the resources to do it. How can we deliver value to our clients if a small tweak to the build system takes a team’s full week? Perharps, monorepos weren’t that bad… but how can they be both efficient and practical in 2023?

Turborepo

Let’s focus on NodeJS today: Introducing, Turborepo.

Turborepo is a build tool that leverages the workspace system built into the most common node package managers (npm, yarn, pnpm) to manage dependencies. It and also boosts efficiency by caching the metadata of the operations it runs.

Usually a repo is structured in this way:

🌳 monorepo
 ├─ 📁 packages
 ├─ 📁 apps
 ├─ 📄 package.json
 └─ 📄 turbo.json

Both packages and apps folders are configured to contain npm packages, using the package manager of your choice’s workspaces.

The packages folder contains, you guessed it, packages: common code used between services while the apps folder contains the actual services that power the app.

turbo.json is used to configure turborepo, let’s have a look.

Performance

🌳 monorepo
 ├─ 📁 packages
 │  └── 📁 test-package
 ├─ 📁 services
 │  ├── 📁 service-1
 │  └── 📁 service-2
 ├─ 📄 package.json
 └─ 📄 turbo.json

test-package, service-1 and service-2 contain TS code that needs to be transpiled, when npm run build is executed, the JS code will end up in a dist folder inside the respective service’s folders.

Let’s suppose that both service-1 and service-2 depend on test-package (their package.json will have test-package listed as dependency), how can we optimize the build process?

turbo.json comes to our help, this is what it contains:

{
  "$schema": "https://turborepo.org/schema.json",
  "pipeline": {
    "build": {
      "inputs": ["src/**/*.ts"],
      "outputs": ["dist/**"],
      "dependsOn": ["^build"]
    }
  }
}

Starting with a clean slate, executing npm run build we will see that each package own build script gets called exactly once.

Let’s change the content of service-1 and run build again, only service-1 is built again.

Not only that, but if you look closely service-1 and service-2 are built in parallel, fascinating, isn’t it?

Turborepo stores all the metadata required to improve performance in a .turbo folder: this poses a further question: what about the CI, how can we have this performance improvement if the .turbo folder is not committed to the repo?

Turborepo offers remote caching, you can benefit for free using Vercel’s provided caching. By doing this on a test repo with over 20 fake services (all depending on a single package), I was able to cut build time in half when making changes to a single service.

How to manage shared code

Package versioning is a daunting experience every true Javascript dev has faced in its life.

I’d like to focus on the way package versioning is managed in a monorepo. In short, YOU DON’T!

The proper way

As for the proper explanation, picture this: If the packages and the services that depend on them are all together, what is that needs versioning? The code needs it (and we already have git for that), not the packages, nor the services! A sensible argument may be that your product needs versioning, well we have a root package.json dont’t we? We should use that instead! No more internal versioning, only version your product! Happy management, happy devs.

Perharps, this approach won’t be viable if you are building a library that comes with plugins as separate packages. If you are both a developer and a mentally stable person (I know, it’s a hard combo), you should take a look at changesets. Works well if you use it with its GitHub plugin, on other providers it still feels very manual and less elegant. If you really are crazy though, you might look at auto, that looks very similar to the OG semantic-release. Anyway more on these tools later.

The ‘I let copilot write all of the code for me’ way

Let’s imagine that 5 years ago you created a package called logger. You moved it to the monorepo alongside the rest of your product. You have a very old, ugly and unmaintainable service (created at the sime time) and you want to never touch it again. Ever! You also have a relatively new service which is nicer. Both services depend on the logger package. Management is planning to add features that would require a new service, which also needs logger. While planning to add the new service you think it would be nice to update logger.

🌳 monorepo
 ├─ 📁 packages
 │  └── 📁 logger
 └─ 📁 services
    ├── 📁 legacy-service
    └── 📁 kinda-new-service

You should copy the logger package into a new folder, rename it logger-legacy and make legacy-service point to it. You can then proceed by updating logger, the other service you had and adding your new service.

🌳 monorepo
 ├─ 📁 packages
 │  ├── 📁 logger
 │  └── 📁 logger-legacy
 └─ 📁 services
    ├── 📁 legacy-service
    ├── 📁 kinda-new-service
    └── 📁 cool-bleeding-edge-outdated-in-two-months-service

This way not only you updated the logger package, but you also have a clearer view of the technical debt in the codebase.

Mind you that you can also let these two logger packages have the same name and only specify a different version!

🌳 monorepo
 ├─ 📁 packages
 │  ├── 📁 logger (logger@2.0.0)
 │  └── 📁 logger-legacy (logger@1.0.0)
 └─ 📁 services
    ├── 📁 legacy-service (uses logger@1.0.0)
    ├── 📁 kinda-new-service (uses logger@2.0.0)
    └── 📁 cool-bleeding-edge-outdated-in-two-months-service (uses logger@2.0.0)

Publishing and Versioning

Back to versioning, how can we version public packages if we plan to publish them to a registry?

We have a few options, the sane ones are changesets and auto.

Changesets

Changesets simplifies (for the maintainer) the version bumping process, that’s all it does. No magic. You can take a look at the details yourself here. What’s really interesting is the Changesets bot which blocks a PR if has no changeset (an thus makes the maintainer of the project’s life easier). This useful bot is only available for github. Yikes.

Auto

Auto does what semantic-release back in the day, but better. It supports monorepos and manages:

  • commit-based version bumping
  • changelogs
  • tags
  • package publishing And much more thanks to plugins.

Unfortunately, much like its older brother, it’s harder to setup (and troubleshoot) than changesets. Luckily there’s a boilerplate that can help you to get started.

Docker

Taking a look at the docs we see how to avoid unnecessary image rebuilds and how to enable remote caching. We are also provided with a working example.

What’s cool about this is that the monorepo gets pruned of all the unused services and packages. So if you have a repo like this:

🌳 monorepo
 ├─ 📁 packages
 │  ├── 📁 backend-logger
 │  ├── 📁 backend-service-template
 │  ├── 📁 frontend-components
 │  └── 📁 frontend-tracking
 ├─ 📁 apps
 │  ├── 📁 backend-service-1
 │  ├── 📁 backend-service-2
 │  ├── 📁 admin-web-app
 │  └── 📁 web-app
 ├─ 📄 package.json
 └─ 📄 turbo.json

The built image for backed-service-1 will only contain:

/app
 ├─ 📁 packages
 │  ├── 📁 backend-logger
 │  └── 📁 backend-service-template
 ├─ 📁 apps
 │  └── 📁 backend-service-1
 ├─ 📄 package.json
 └─ 📄 turbo.json

What I’d like to see here is a simple way to only include the production code. If you are using Typescript you will still find it inside the image. Unless you get really crazy with your Dockerfiles of course.

/app
 ├─ 📁 packages
 │  ├── 📁 backend-logger
 │  │   ├── 📁 dist
 │  │   └── 📁 src
 │  └── 📁 backend-service-template
 │      ├── 📁 dist
 │      └── 📁 src
 ├─ 📁 apps
 │  └── 📁 backend-service-1
 │      ├── 📁 dist
 │      └── 📁 src
 ├─ 📄 package.json
 └─ 📄 turbo.json

Transition Effort

As we can see, Turborepo is a wonderful tool that can help us reduce repository hell without impacting our performance or workflows. But how difficult would it be to transition to the monorepo?

Let’s divide the process into its steps:

  • Setup turborepo
  • Move inside services and packages
  • Start updating dependencies
  • Configure turbo scripts
  • Git related changes
  • CI changes

Setup turborepo

This is the easiest step, have a look the examples and find the one that suits your needs the most. Then start hacking away everything that you don’t use.

🌳 monorepo
 ├─ 📁 packages
 ├─ 📁 apps
 ├─ 📄 package.json
 └─ 📄 turbo.json

Move inside services and packages

Gather up all your repos and copy all of the relevant files in you monorepo. A folder per each repo.

🌳 monorepo
 ├─ 📁 packages
 │  └── 📁 lib
 │      ├── 📄 .gitignore 
 │      ├── 📄 .eslintrc.cjs
 │      ├── 📁 src
 │      ├── 📁 test
 │      └── 📄 package.json
 ├─ 📁 apps
 │  ├── 📁 backend
 │  │   ├── 📄 .gitignore
 │  │   ├── 📄 .eslintrc.cjs
 │  │   ├── 📁 migrations
 │  │   ├── 📁 src
 │  │   ├── 📁 test
 │  │   └── 📄 package.json
 │  └── 📁 frontend
 │      ├── 📄 .gitignore 
 │      ├── 📄 .eslintrc.cjs
 │      ├── 📁 src
 │      ├── 📁 test
 │      └── 📄 package.json
 ├─ 📄 package.json
 └─ 📄 turbo.json

Start updating dependencies

Hey, there’s duplicate code! Let’s adapt and move .gitignore to the root folder. We will also create a custom plugin for eslint.

🌳 monorepo
 ├─ 📄 .gitignore 
 ├─ 📁 packages
 │  ├── 📁 eslint-config-custom
 │  │   ├── 📄 index.js
 │  │   └── 📄 package.json
 │  └── 📁 lib
 │      ├── 📄 .eslintrc.cjs
 │      ├── 📁 src
 │      ├── 📁 test
 │      └── 📄 package.json
 ├─ 📁 apps
 │  ├── 📁 backend
 │  │   ├── 📄 .eslintrc.cjs
 │  │   ├── 📁 migrations
 │  │   ├── 📁 src
 │  │   ├── 📁 test
 │  │   └── 📄 package.json
 │  └── 📁 frontend
 │      ├── 📄 .eslintrc.cjs
 │      ├── 📁 src
 │      ├── 📁 test
 │      └── 📄 package.json
 ├─ 📄 package.json
 └─ 📄 turbo.json

After this is done let’s update backend, lib and frontend’s package.json to use our new package.

  //...
  "devDependencies": {
    "eslint-config-custom": "workspace:*" // If you are using npm remove the 'workspace:' prefix
  },
  //...

Let’s not forget to use it in each package’s eslint.config.js

module.exports = {
  root: true,
  // This tells ESLint to load the config from the package `eslint-config-custom`
  extends: ['custom'],
  //...
};

We should also update our lib import in the services.

  //...
  "dependencies": {
    "lib": "workspace:*" // If you are using npm remove the 'workspace:' prefix
  },
  //...

Configure turbo scripts

Now that the dependencies are fixed and there’s no duplication let’s add modify the turbo.json. We have build, test and lint commands in all our original three packages so let’s add them in.

{
  "$schema": "https://turborepo.org/schema.json",
  "pipeline": {
    "build": {
      "outputs": ["dist/**"],
      "dependsOn": ["^build"]
    },
    "test": {
      "outputs": ["coverage/**"],
      "dependsOn": ["^build"]
    },
    "lint": {}
  }
}

Now that all you code is in a single repo you should consider adopting a clear git strategy if you don’t do so already. Look no further than conventional commits and consider giving trunk based development or github flow a chance.

CI changes

Having a root package.json like this…

  //...
  "scripts": {
    "build": "turbo build",
    "test": "turbo test",
    "lint": "turbo lint"
  },
  //...

…allows you to have a very clean CI:

name: CI

on:
  push:
    branches: 
      - "master"
  pull_request:
    branches: 
      - "master"

jobs:
  build-test-lint:
    name: Build Test Lint
    runs-on: ubuntu-latest
    env:
      TURBO_TOKEN: ${{ secrets.TURBO_TOKEN }}
      TURBO_TEAM: ${{ secrets.TURBO_TEAM }}
      TURBO_REMOTE_ONLY: true

    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: 18
          cache: 'npm'
      - run: npm ci
      - run: npm run build
      - run: npm run test

What if you need to deploy your service from the CI? We are not quite there yet. If you use auto, there’s a plugin that manages docker image publishing. You can use that together with turbo to only push the services that changed.

Wrapping up

If you managed to get trough this post it means that you found it somewhat interesting. Or maybe you really like scrolling, I dont’ know!

Anyway I hope it was useful, you now have completed Turborepo 101. Here is you badge 🥇!