Computing /

Phantom Dependencies In Node.js And How PNPM Prevents Them

J E Justin Etighe · 7 min. read

Three package managers dominate the Node.js ecosystem:

  1. npm (Node Package Manager): npm was originally created by Isaac Z. Schlueter in 2009. It has since become the default package manager for Node.js and has seen contributions from a large number of developers. When you install Node.js, you’ll find that npm is installed too. Similar to how you get pip when you install Python.

  2. Yarn: Yarn was developed by Facebook in collaboration with Exponent, Google, and Tilde. It was announced in 2016 as an alternative to npm, with a focus on performance and security. Key contributors include Sebastian McKenzie and Christoph Pojer.

  3. pnpm: Created by Zoltan Kochan, pnpm serves as an easy replacement for npm. It supports the full suite of standard npm commands, including their shortened aliases, but it brings a fresh twist to the task of package management. pnpm employs a central store to house packages, optimizing on caching and reuse. Each package in a project’s local node_modules is then a hard link (or a symlink in case of directories, which don’t support hard-linking) to the corresponding package in the central store. This innovative approach cuts down installation times and conserves disk space.

Additionally, pnpm elimates “phantom dependencies”. The next section discusses what a phantom dependency is and how pnpm prevents it.

Phantom Dependencies

In Node.js world, the term “phantom dependencies” refer to packages that your code uses but are not explicitly declared in your package.json file. This situation can happen because Node.js’ module resolution algorithm can access any package installed in node_modules even if it wasn’t declared as a dependency in the package.json.

For example, let’s say Package A depends on Package B and Package C, and both B and C are correctly listed in A’s package.json file. However, if Package B also depends on Package D, Node.js allows Package A to require Package D even if it is not listed in A’s package.json. So D becomes a phantom dependency of A because A can access it even when it is not declared as A’s direct dependency.

This can accidentally “fix” missing dependencies issues without error during local development but it may lead to unrepeatable installs and failures in other environments: if A’s package.json file does not include Package D in its dependencies and the dependency tree changes (maybe Package B no longer needs D), removing the implicitly available Package D, your project will break as A still tries to require D. Also, compromised packages might be able to leverage this flaw for nefarious purposes.

You’ll find that pnpm combats this problem by enforcing stricter package access rules - a package can only access its explicitly declared dependencies. This limits the risk associated with phantom dependencies, providing you with a more predictable and secure environment.

Enter pnpm

Sporting the elimination of phantom dependencies as the only reason to try out pnpm would be underselling it. It goes beyond that. The first two advantages are easily noticed if you don’t use a superfast SSD, CPU and internet speed, which is true for a majority of people:

Why Switch?

  1. Exceptional Efficiency and Disk Space Management: Unlike Yarn and npm which install all individual package files directly in node_modules, pnpm employs a different strategy. It maintains a single shared storage cache where all package versions are kept. When you install a package, pnpm simply creates a hard link from this cache to your project’s node_modules directory. The result is a drastic reduction in disk space usage because each package version needs to be saved only once, and it can be reused across multiple projects.

  2. Superior Speed with Concurrent Downloads: When fetching packages from the registry, pnpm performs downloads concurrently which leads to quicker installation times. When executing pnpm install, depending on network conditions, you can fine-tune the level of concurrency used by pnpm with the following flags:

    • —network-concurrency : If not specified, the default value is 16. You’d decrease this number if you’re experiencing network errors due to an excess number of concurrent requests.
    • —child-concurrency : This is most relevant for operations involving lifecycle scripts, linking, and node-gyp compilation during installations. If not specified, the default is 5 on a CI server and auto elsewhere. With auto, pnpm uses a count that is equal to half your core count, respecting a minimum of 4.
  3. Deterministic Installations Ensuring Consistency: pnpm creates a system where a package and its specific version exist only once in the pnpm store. This feature ensures deterministic and consistent reproduction of the node_modules directory across environments. Consistent installations not only help in reducing disk space usage but also ensure that your project behaves the same way across different environments such as different developer machines, CI/CD environments, and production systems.

  4. Resilient and Easily Understandable Package Locking: pnpm generates a pnpm-lock.yaml file which is more resilient to frequent merge conflicts typically seen with npm’s package-lock.json and Yarn’s yarn.lock. The structured data format (YAML) used by pnpm-lock.yaml is also more human-readable.

  5. Seamless Operational Switch with pnpm Importers: Transitioning an existing project from npm or Yarn to pnpm is a breeze thanks to pnpm importers. This helps teams adopt pnpm without the fear of complex migration processes.

  6. Uncompromising on Dependency Access Rules: since pnpm follows a strict policy that packages can only access their explicitly listed dependencies, imposing a level of isolation between packages, there are fewer opportunities for coding errors or potential security issues through unexpected package access. npm and Yarn lack this level of strictness, so hidden errors or security issues can slip through.

  7. Custom Control with pnpmfile.js: pnpm allows the use of pnpmfile.js to customize packages as they’re installed or for applying patches. This provides an enhanced level of control of the package installation process. You might not use this everyday, but you’ll be glad it exists if you ever need it.

  8. For the Love of Simplicity: Say we have two conventionally named scripts in our package.json: start and dev, I’ll compare executing them with pnpm vs npm…

    pnpm vs npm
    npm start      # Works
    pnpm start     # Also works
    
    npm dev        # Unknown command: "dev"
    npm run dev    # Works
    pnpm run dev   # Also works
    pnpm dev       # This is the convenience
    

    Not having to write run before the script names counts. For me, it’s on the same relief level as writing just apt instead of apt-get on modern Linux distros.

Installation

Since pnpm is just another Node.js package, you can install it using npm:

Install pnpm
npm install -g pnpm  # This installs it globally
pnpm --version       # Print the installed pnpm version

Final Thoughts

I had a rollercoaster ride with Yarn between 2018 and 2021. I encountered pesky glitches that needed workarounds if my project used Yarn instead of npm; like this TailwindCSS/PostCSS issue, because of a bug in Yarn v2. Eventually, I stopped using Yarn altogether and reverted to good ol’ npm.

I first tried out pnpm in 2021 when I adopted Rush.js for a monorepo solution, because Rush recommended it as one of its supported package managers. Within my first week of tinkering with Rush and pnpm, I noticed a neat payoff: I was burning less time downloading node_modules, and that was a noticeable boost in my productivity.

Fast forward to today, pnpm is my go-to for package management in TypeScript/JavaScript projects. I’m yet to encounter any issues that would make me consider switching back to npm or Yarn.

Author
Justin Etighe

Spawn of Human<T>, where T varies by the hour. Passively logging knowledge and experiences.