Index
Published on

Practical module loader design

There are many projects trying to fill the unfortunate loader API vacuum left by TC391. New module loaders & bundlers like Browserify, SystemJS, and webpack are today’s trendiest ways to write modules and load them into browsers. Unfortunately, these loaders tend to make a few basic mistakes in their architecture that were correctly implemented many years ago by the AMD standard, and AMD implementers tend to be dismissed when making implementation suggestions.

In the hopes of preventing the mistakes made by these newer loaders from being propagated into future loaders (and into whatever final standard hopefully emerges), this article describes the best mental model I know for Web module loaders, with detailed descriptions of each part and rationales describing why each of those parts is critical to the design.

The mental model for a Web module loader

After six years of experience working with JavaScript modules, I feel like I can safely say that any good Web module loader architecture will look more or less like this:

Loader Logical module layer Module ID resolver ① Normalisation    ② Redirection/remapping Plugin engine Module cache Physical storage layer Resource ID resolver ① MID-to-RID mapping    ② Normalisation Data retrieval Data processing ① Unbundling    ② Compilation

When using this model to think about how a module is imported, the “flow” of the import starts at the top of the diagram and goes down.

The logical module layer

The logical module layer handles the import and export of module values at runtime.

It exposes the API for making requests to retrieve a module (like require or Loader#import), processes those requests, and returns the correct value from the module cache.

This part of the loader operates exclusively on module IDs and values already loaded into memory. File paths are not ever involved in this part of the loader. It’s possible for a fully prepopulated loader to never access the network or a file system and operate purely within the logical module layer.

This design makes it easy to deploy tiny loaders that do no dynamic loading at all, like the 1kB almond loader. (Browserify is another example of a loader that doesn’t offer dynamic loading at all by default.)

The module ID resolver

The module ID resolver is the first step in any module import request.

When a module is requested, whether through a static ES6 import statement or a dynamic require/Loader#import call, the resolver takes two pieces of information:

  1. The ID of the module being imported (the “import”), plus
  2. The ID of the module that is importing it, if any (the “importer”)

From these two IDs, it generates a third resolved module ID that will be used by the rest of the loader to return the correct value for an import.

To generate the resolved module ID, the resolver goes through two stages.

During the first stage, if an import uses a relative module ID, this is converted to a normalised (absolute) module ID by combining the import with the importer. In other words, if module a/b imports ./c, the absolute module ID of ./c is a/c. (Using a relative module ID from outside of another module is undefined behaviour.)

The main purpose of this normalisation is to allow “package portability”, which is a fancy way of saying that packages2 should be able to be renamed without breaking any of their own internal dependencies. (A nice side-effect of this feature is that relative module IDs often require less typing.)

During the second stage, the partially-resolved module ID & importer ID are passed through a remapper. The purpose of this second stage is to allow imports in one module to be transparently redirected to a different module within the module loader at runtime.

Redirection of modules at runtime offers an elegant solution to many problems that come up in the real world. For example:

  1. Your code uses a third party charting library that makes Ajax requests to get data using a standard library (like jQuery or dojo/request). Your server requires authentication, but the charting library doesn’t let you provide authentication information. You can use module ID remapping to give the third party charting library a modified copy of the Ajax library that automatically adds the necessary authentication to its requests, without modifying any third party code.
  2. YouYour developers are lazy and want to type shorter module IDs that get automatically expanded into their correct form by the loader.
  3. Your application uses a third party virtual DOM library that has a slightly buggy diffing function. You can use module ID remapping to monkey patch only the buggy function in the library with a fixed version, with no need to copy the entire library to a new file, and no concern over race conditions that might cause the function to be used before it is patched.
  4. You have a large application that you want to migrate in parts to a newer incompatible version of a core library or framework by allowing old and new code to coexist side by side during the migration.

It’s possible to solve problems like these without module ID remapping, but the only way to solve them trivially—and without future maintenance nightmares—is by including an ID remapper in the module loader.

It’s also possible to treat module IDs as file paths (at least SystemJS and Node.js do this) and remap purely on file paths, but it introduces all sorts of trouble and complexity and limits how the parts of the loader can be implemented. For example:

  • Does the map source include some kind of base path? If so, what happens if the base path changes—does the map also need to change?
  • What if the module loader does have the ability to do file system walking, like Node.js, then what do you use for the source and destination?
  • What about built-in or in-memory modules, what is their “file path”?
  • In a multi-version scenario, how do you decide which version is the “correct” version for a particular dependency?
  • What about loaders that might load a single module from many different sources, like an edge cached CDN with fallback, or a local store like IndexedDB with a network fallback? Is this even possible? (No :()

These questions, and many others like them, are neatly side stepped by keeping module IDs logically distinct from file paths—just as URLs are logically distinct from file paths even though they’re often direct maps back to files on a server’s hard drive. (If you’re skeptical about this, just imagine how difficult it would be if URLs had to be direct maps back to files; that’s how most module loaders are designed today.)

Because this is a feature that I’ve only seen implemented correctly by the AMD specification, I’m going to take extra time to break down what the necessary mapping tables and implementation details look like to solving these examples, since relatively few people seem to have taken the time to understand how ID mapping actually works.

Example 1: Adding authentication to a third party library

The mapping to solve this problem is just a single entry, remapping requests from any charting module to jquery:

{
    charting: {
        jquery: 'my/authenticated-jquery'
    }
}

It’s not immediately clear just by looking at this configuration, so to be explicit: the keys in this map match ID segments, not full module IDs. (ID segments are just like file path segments: the parts of the ID that are separated by /.)

In other words, the rule above for charting will apply to any module ID that matches the regular expression /^charting(\\/|$)/:

  • charting, charting/foo, and charting/charts/line all match, so they will all get my/authenticated-jquery when they ask to import jquery.
  • charting2 is not matched, because only complete segments match, not parts of segments.
  • my/charting is not matched, because segments only match from the start of the module ID, not anywhere in the module ID.

Example 2: Auto-expanding module aliases

The mapping to solve this problem involves adding one entry for each alias you want to create inside the special star map:

{
    '*': {
        charts: 'charting/lib/advanced/charts',
        cline: 'charting/lib/advanced/charts/line'
    }
}

The star map, defined using * for the importer ID, is a general fallback that applies to all modules. So, with this map, any module that imports cline will get charting/charts/line instead.

This example also shows how only the matched segment in the requested module ID is replaced. In other words, with this remapping table:

  • charts will be be mapped to charting/lib/advanced/charts
  • charts/pie will be mapped to charting/lib/advanced/charts/pie
  • charts/bar will be mapped to charting/lib/advanced/charts/bar

This means you don’t need to add configuration for every single module you want to alias. Instead, you can simply shorten the longest common segment with a single map entry.

Example 3: Fixing bugs in third party code

The mapping for this problem involves setting up one entry for redirection, plus a second entry to cancel the redirection in a more specific case:

{
    my: {
        vdom: 'my/fixed-vdom'
    },
    'my/fixed-vdom': {
        vdom: 'vdom'
    }
}

This third example introduces two final concepts to the ID remapper.

First, mappings are always applied from longest segment to shortest segment. In other words, no matter what order the configuration was provided to the loader, my/fixed-vdom will be matched first, and my second, because my/fixed-vdom is longer.

Second, only one mapping can match per requested ID. In other words, if my/fixed-vdom matches, my will not match. This prevents any possibility of infinite recursion in the remapping system, and ensures that mappings can be undone for specific module IDs, as is necessary in this case.

Note that this example could have just as easily used the star map to redirect everyone to the fixed vdom library. As it is, any module not starting with my will continue to receive the buggy original module, which may or may not be what you want. (If it isn’t… just change the map!)

Example 4: Gradual migration

The mapping for this problem involves redirecting dependencies of all code that expects a particular (older or newer) version of a package.

Either the old code can be redirected to a renamed old package, like this:

{
    my: {
        'old-stuff': {
            ember: 'ember1',
            'ember-grid': 'ember1-grid'
        }
    },
    'ember1-grid': {
        ember: 'ember1'
    }
}

Or, new code can be redirected to a renamed new package, like this:

{
    my: {
        'new-stuff': {
            ember: 'ember2',
            'ember-grid': 'ember2-grid'
        }
    },
    'ember2-grid': {
        ember: 'ember2'
    }
}

In either case, the dependencies for modules that require a specific version of code are redirected transparently to the correct version without changing dependencies in the code and without requiring any sort of path walking system like what Node.js uses to resolve version conflicts. (This mechanism is also more robust than Node.js path walking because it gives you down to single-module granularity, whereas Node.js has a maximum of package-level granularity.)

The plugin engine

The plugin engine is basically the loader plugins system of AMD.

This part of the loader uses a specific module ID pattern that invokes a plugin function to dynamically calculate the resolved value of an import.

AMD uses a syntax of <plugin module ID>!<arbitrary resource data passed to the plugin>. This syntax works well because it makes clear the plugin being used (the plugin module’s ID is always before the !), the order of execution of nested plugins (left-to-right), and requires zero configuration.

With the plugin engine, you can do things like:

  • Perform code compilation at runtime (CoffeeScript, TypeScript, JSX, etc.)
  • Load non-JavaScript resources like stylesheets
  • Conditionally load modules, using static import syntax (e.g. only loading shims when native features are unavailable)

A good plugin engine enables users to extend loader behaviour without requiring any changes to the loader specification or built-in loader API. It also means all dependencies can be expressed identically throughout a code base, and can be bundled together using identical tools.

Plugins written for the plugin engine may leverage the loader’s existing physical storage layer to load modules, or they may use their own data loading mechanism, depending upon what the plugin does and how it needs to load data.

Approaches like the Node.js extensions map and SystemJS plugins aren’t good enough here: they both require configuration and don’t support arbitrary nesting (and in the case of Node.js, do not allow asynchronous resolution).

Webpack’s similar “loaders” system is also a build-time-only transformation—fine for now, but not really suitable for the Web platform, which is supposed to offer a compiler-free edit-and-reload experience. This is actually a common problem for most of the newer loaders: they all require some kind of build-time processing to work, even after the various different platforms gain native support for the ES module syntax.

Modules that are resolved using the plugin engine have their values placed in the module cache the same way as any other non-plugin module, so multiple imports of the same module ID (plugin ID + arbitrary resource data) get the same data and can be prepopulated in cache to avoid execution of the plugin engine at runtime.

The module cache

The module cache is a simple in-memory map that stores the exported values of modules.

When a module is imported by another module, if the imported module is already in the cache, the value from the cache is returned. This is important for memory efficiency, and also allows many modules to communicate or share state easily through a shared dependency.

The module cache can also be prepopulated with values, or with factory functions that will generate values, which is the primary mechanism by which module bundles work. (More about those later in the discussion on unbundling.)

When a module is requested but is not inside the module cache (a cache miss), there are two options.

One option is that the loader can simply give up and return an error. In this case, the module cache must be prepopulated through some external means (like a <script> tag), and the physical storage layer does not exist.

Alternatively, the loader can include a physical storage layer that populates the module cache on-demand by loading modules from some kind of external storage.

The physical storage layer

For a module loader to properly load modules from storage, the physical storage layer needs to perform a specific series of steps:

  1. Convert module ID to a resource identifier
  2. Load raw code from the data source
  3. Convert the raw code to executable code
  4. Execute the code (which may import other modules)
  5. Put the resulting exported value into the module cache

It’s a lot of work, but each step remains clear and distinct, which makes it easy to understand and easy to implement.

(Obviously, for a Web loader, this mechanism must be asynchronous, as there is no reasonable way to perform synchronous I/O in the main browser thread. So the Node.js loader model does not fit here.)

The resource identifier resolver

In the same way that there must be a mechanism in the logical module layer for modules to be remapped, there must also be a mechanism for choosing how to map a module ID to a resource identifier.

A resource identifier can be anything: a URL, a URN, a file path, a handle, a database key, a memory address… any identifier that the data retrieval system will be able to use to find code. (The fact that a resource identifier could be anything reinforces the concept that module IDs should never be automatically presumed to be file paths in the loader model.)

The way resource identifiers are resolved is generally environment-specific.

Node.js today uses a pretty complicated file system walking algorithm to find the proper resource identifier; this is fine for a loader with file system access, but not for the Web, since it is not possible to walk URLs in this manner.

The AMD standard uses a much simpler two-stage approach. (SystemJS does something similar.) Because this approach makes good sense as a default for any file- or URL-based loader, I’ll describe it in greater detail.

In the first stage, module IDs are looked up in a map that’s very similar to the module ID remapping table:

{
    jquery: 'modules/third-party/jquery',
    'my/local': '.',
    my: '//cdn.mysite.com/js'
}

Unlike the module ID remapper, this map doesn’t use information about the importer: at this point in the module loading lifecycle, we’re working on fixing a module cache miss, not resolving imports, so who originally requested the module is irrelevant.

As with module ID remapping, the keys in the paths map match ID segments, from the start of the module ID, from longest to shortest. So, the map above will generate file paths like these:

Module ID Intermediate file path
jquery modules/third-party/jquery
jquery/ajax modules/third-party/jquery/ajax
my/local/a ./a
my/fixed-vdom //cdn.mysite.com/js/my/fixed-vdom
other/foo other/foo

As shown in the last row, module IDs that don’t match an entry in the path mapping table are used as-is.

Explicit path remapping enables deployment of code to CDNs and allows file system or network path structures to be arbitrarily rearranged without requiring changes to any code. For certain types of distributed workloads, this is extremely valuable, since it means the same code can be built once and deployed to many servers and only the loader configuration needs to change.

After explicit path mapping is performed, a final normalisation pass occurs.

First, any non-absolute paths are combined with a master base path or URL, if one is configured in the loader. This is largely for user convenience, since it is extremely common for all code to be stored in a subdirectory.

Second, a .js extension is added to the path, if no extension exists, since this is the registered filename extension for all JavaScript code.3

Once all this is done, we should now have a fully qualified resource identifier that we can use to retrieve code from storage.

Note that there is no reason why the resolver needs to generate a single identifier. Multiple identifiers could be generated and passed to the data retrieval system to provide better fallback functionality. For example, one could try to load from an edge CDN first, then fall back to loading from the local domain if the CDN failed. (This is not possible if module IDs are treated as file paths!)

The data retrieval system

The penultimate step when loading an uncached module is to actually retrieve raw data for the module from storage.

For server-side JavaScript, this normally means making a system call to load a file from the file system, though it could just as easily be a call into a memory-backed store or database.

For client-side JavaScript, this pretty much always means a request to the network, either using the Fetch API, XMLHttpRequest, or <script> injection—though, again, retrieval through a file system API, compressed archive, local storage, or service worker cache are also possibilities.

The data processing system

Once the raw module code is retrieved, it needs to be converted into executable code and executed.

Certain data retrieval mechanisms do this automatically—for example, loading a module with <script> tag injection can leverage the browser’s normal code compilation and execution system, so the loader doesn’t need to do anything else. Other data retrieval mechanisms, like fs.readFile or XMLHttpRequest, need extra calls to compile and execute module code.

If the code that was retrieved from storage is a single, unprocessed JavaScript module, the exported value of this module is simply added to the module cache at the module ID given to the physical storage layer at the start of the request.

However, for the sake of network efficiency, retrieved code may often actually be a module bundle—a resource containing multiple modules that should populate the module cache in order to avoid additional network requests. In this case, the data processing system must either know how to transform the bundle format into a set of modules and populate the module cache that way, or the loader API needs to specify what a bundle should call in order to populate the loader with extra modules.

Today’s bundle formats typically need to be <script>-injected and make calls to known APIs, since any format that requires additional post-processing on the client would violate CSP unsafe-eval restrictions. A native loader wouldn’t have such a restriction, but any backwards-compatible loader API would.

It’s Web scale!

While a runtime module system isn’t as simple as just #includeing a bunch of files together, it’s also not that complicated, and doesn’t need to be terribly confusing to users or implementers.

The model described above is used by all conformant AMD loaders. As such, I can say confidently that it works extremely well. It’s had over seven years of real-world use, been implemented by multiple vendors, and is proven to scale from simple single-module apps up to absolutely enormous million-LOC enterprise projects.

As alluded to in the introduction, the #1 problem I see in basically every new module loader proposal or implementation (looking at you, TypeScript and SystemJS) is that people mix the logical and physical layers together—treating module IDs like file paths. Doing this leads to a broken model that makes the system’s capabilities highly dependent on file system structures that do not always exist. If there is one idea I want people to take away from this article, it’s to never do this.

The nice thing about the model I’ve described here is that it doesn’t restrict developers from doing more cool new things with module loaders. There’s plenty of room for innovation! (For instance, I’d love to see some combination of the ES-compatible two-stage module format of SystemJS with the loader semantics of AMD.) This model simply ensures any loader will end up with a basic design that won’t paint users into a really ugly corner.


  1. “YK: The loader pipeline will be done in a "living spec" (a la HTML5) so that Node and the browser can collaborate on shared needs.” Those who forget history are doomed to repeat it. CommonJS was a collaboration, and AMD was originally defined through CommonJS. However, Node.js folks decided they were better than everyone else and quit. What a logjam. 

  2. Packages: as in, those things that you download from npm, or bower, or jspm, or apt, or pacman. 

  3. This normalisation does not prevent loading other types of resources through the plugin engine, it is simply the default normalisation scheme used by the default resolver. Furthermore, compilers like TypeScript can choose to use a default normalisation that uses .ts/.d.ts/.tsx instead, since those are the extensions used by TypeScript. A scheme that requires file extensions to be a part of the module ID cannot ever be used in this way.