Monkey-patching the Express router

I maintain Scribe, a library for automatically generating your HTTP API documentation from your codebase. When you run the generate command, Scribe fetches your routes from your codebase, uses a bunch of strategies to figure out details about them, and transforms those details into a HTML docs page. There are a lot of moving parts and complexity in this process, but this post is about the first part: how Scribe is able to fetch the routes from your app.

In the Laravel version, this process is hella easy: just call Illuminate\Support\Facades\Route::getRoutes(), and you get an array of Route objects, with each object containing the route details, including the path, HTTP methods, and handler (controller/method). It's a similar thing for the AdonisJS version.

The Express version is another story, though. Express doesn't provide a simple method you can call on the app to fetch all defined routes. This means it's time for a hobby of mine: monkey-patching. When our generate command is run, we'll dynamically modify the Express code so that it gives us what we want.

First attempt: decorating

My first attempt had a couple of moving parts:

a "decorator" function that would take the Express app object and modify its methods (app.get(...), app.post(...), etc) so that they recorded where they were called. This part was important so I could fetch any docblocks there later on. For the decorator to be useful, it had to be called before the user started registering routes.
a getRoutes function that would receive the app object and try to fetch the user's routes from the Express "API".

Here's what the end user's code would look like:

// app.js

const express = require('express');
const app = express();

// Add this 👇 (the decorator)
require('@knuckleswtf/scribe-express')(app); 

// Register your routes...
app.get(...);

// Add this 👇 (needed for getRoutes)
module.exports = app;

The user would run the Scribe command like this: npx scribe generate --app app.js, and it basically did this:

const appFile = args.app;
const app = require(appFile); // (1)
const routes = getRoutes(app); // (2)

The line (1) would execute the app file, thereby running the decorator and registering the routes. Then the next line would pass the exported app object to our getRoutes function, which would try to fetch the routes from it.

Assessment

There are three main metrics I use to judge the success of a process like this: how easy it is for a user to opt in, how easy it is to implement, and how reliable it is. How did this perform?

User integration: not great

There were too many steps and things to remember:

The user had to remember to add require('@knuckleswtf/scribe-express')(app).
They had to remember to add that line after creating the app, but before registering any routes.
They had to make sure they exported the app object from the file (many people don't, because it's not needed for anything).
On top of all this, they had to run the generate command.

That's too many points of failure for me; too many opportunities for users to make mistakes. It's not awful, but not ideal. In an ideal monkey-patch, the user should only have to make a very small modification to their code, or none at all.

Ease of implementation: poor

I used quotes earlier when talking about Express' API, because Express doesn't have one. Its internals are a complicated mess. It works, and it does many clever things, but it's unnecessarily complex. You have all sorts of objects all over the place—routers, layers, stacks, routes, handles, and whatnot. Routes can have layers, and layers can have routes, and a layer can be a route, or some other shit. And mixed right in there with routes are middleware like authentication, so you need to filter those out too. Express isn't made for developers to hook into that way.

Figuring out how to extract the actual routes from the tangle of layers and stacks took several days. And by the time I was done, I never wanted to touch the thing again.

Reliability: poor

On the reliability front, this was also pretty poor. Because Express' internals were so complicated to figure out, a lot of what I did was based on assumptions, guesses and workarounds.

For instance, Express doesn't retain the original path you set when you create a route (like app.post('/path', handler)). Instead, it converts the path to a regex, like /^\/?path\/?$/. This means that, when fetching the routes, I had to convert the regexes back to strings, which works, but I don't know if it's guaranteed to always give exactly what the user wrote.

There was also the fact that people can register routes in Express in a bunch of different ways—you can have multiple apps, routers, sub-routers, sub-apps, and so on, and Express doesn't combine them into one simple data structure. Since my decorator implementation was tied to a specific app instance, it made these other options really difficult to get right.

Seriously, this experience was the final thing that convinced me people need to stop using Express. It was a key part of Node.js' development, but I don't buy this middleware approach that leaves you having to implement everything yourself, and doesn't expose a proper API. 😕

A better solution

My first attempt was more of "decorating" than monkey-patching—it only added some specific behaviour to the app object that was passed to it. But it would be better if we had true monkey-patching: attaching this behaviour automatically to any instance of express(), even those created in a different file. There's good news: in Node.js, there are defined APIs to do that, by hooking into the module system.

For instance, if I overwriterequire.cache.express (docs), when the user calls require('express'), they'll get my new version. This means I can give them a modified express, so that when they call express(), the app object they get is already decorated. No need to pass in your app manually to the decorator anymore.

This led to another realization: since users don't need to pass in an app anymore, then they don't even need to call the decorator in their code at all. The generate command could call the decorator itself to monkey-patch Express, before executing the user's app file. So now we can eliminate the first two frustrations I had with the user integration process.

The final big improvement came from me realising that I could skip the Express "API" altogether. When a user calls app.get('/path', handler) on our monkey-patched app instance, we already have all the information we need — the path, the handler, the HTTP method, the file name and line. So we can just record this route somewhere before handing over to Express to handle it, rather than trying to fetch it from the router later.

This was awesome! It eliminated my final concerns: this approach was way more reliable, since I wasn't relying on Express' internal API (layers and stacks), but their public API (app.get(...)), which was documented and stable. I could get the original path, handle sub-apps and sub-routers, and users didn't even need to export the app object anymore.

Implementation

I was going to implement this manually, but then I remembered my friends at Elastic specialise in this. Thanks to their work with require-in-the-middle and shimmer, it was fairly straightforward, although it still took some work to figure out what exactly I needed to patch. For Express, there were three main entry points:

the HTTP methods on express.Route.prototype, because Express internally creates a new instance of Route and calls route.get(...) when you call app.get(...) or router.get(...).
the express.application.use() and express.Router.use() methods (for app.use('path', router)).
the express.Router.route() method, which is called when you create a route using app.route('path').get(handler). Here we just record the router the route belongs to, so that when our patched get is called (number 1 above), we have all the details we need.

You can see the full code on GitHub, but the decorator now looks like this:

function decorator() {
    decorateExpress();
}

// A flag that keeps us from doing the monkey-patch multiple times
decorator.decorated = false;

// The place where we record our routes as they're registered
decorator.subRouters = new Map();

function decorateExpress() {
    const hook = require('require-in-the-middle');
    const shimmer = require('shimmer');

    // Hook into require('express'') and return ours
    hook(['express'], function (exports, name, basedir) {
        if (decorator.decorated) {
            return exports;
        }

        // Handle app.get(...), app.post(...), and so on
        const httpMethods = ['get', 'post', 'put', 'patch', 'head', 'delete'];
        httpMethods.forEach(function shimHttpMethod(httpMethod) {
            shimmer.wrap(exports.Route.prototype, httpMethod, original => {
                return patchHttpVerbMethod(original, httpMethod);
            });
        });

        // Handle sub-routers and sub-apps 
        // ie app.use('/path', otherApp), app.use('/path', otherRouter), router.use('/path', otherRouter), etc
        shimmer.wrap(exports.application, 'use', patchAppUseMethod);
        shimmer.wrap(exports.Router, 'use', patchRouterUseMethod);

        // Handle app.route(path).get()
        shimmer.wrap(exports.Router, 'route', original => {
            return function (...args) {
                const routeObject = original.apply(this, args);
                // Track the router that this route belongs to
                routeObject.___router = this;
                return routeObject;
            }
        });

        decorator.decorated = true;
        return exports;
    });
}

The patchHttpVerbMethod, patchAppUseMethod, and patchRouterUseMethod do the real work of recording the routes in decorator.subRouters

Thedecorator.subRouters map is the key that allows us to support sub-apps and sub-routers. It acts as a sort of tree. Every route in Express is added to a router instance. A user can have multiple apps and routers, and mount them on each other using app.use() or router.use(). But since we won't know when they'll call app.use('/somepath', thisrouter), we store all the routes they add on thisrouter in decorator.subRouters[thisrouter]. When the user finally calls app.use('/somepath', thisrouter), we fetch the routes from decorator.subRouters[thisrouter] and add them to the decorator.subRouters[app].

const app = express();

// decorator.subRouters is empty.

app.get('/a', someHandlerFn);

// decorator.subRouters now looks like this:
//       app =>  GET /a

app.post('/a', someHandlerFn);

// decorator.subRouters now looks like this:
//       app => GET /a
//              POST /a

const subRouter = express.Router();
subRouter.get('/a', someHandlerFn);

// decorator.subRouters now looks like this:
//       app       => GET /a
//                    POST /a
//       subRouter => GET /a

app.use('/b', subRouter);

// decorator.subRouters now looks like this:
//       app => GET /a
//              POST /a
//              GET /b/a

By the time all the routes have been registered, there should be only one router left in decorator.subRouters—the main router. To fetch the routes, the generate command (GitHub) simply has to fetch those routes.

// Monkey-patch the Express router so that when a user adds a route, we record it
require('./decorator');

// Execute app file so routes are registered
require(appFile);

const [routes] =  [...decorator.subRouters.values()];

And that's it. The new solution is much better in terms of user experience, and was much easier to implement, although still tasking. There are still a few parts that are fragile, where I had to rely on some specific internal detail of the Express code (like this), but it's way more reliable than before, plus there are now tests to check that it works. So it's a definite win.

I write about my software engineering learnings and experiments. Stay updated with Tentacle: tntcl.app/blog.shalvah.me.