In building my personal computation proof-of-concept, one thing became clear very quickly: I needed a graceful and extensible way to manage service dependencies. One thing I wanted to avoid was a long list of services that needed to be installed, configured, and running prior to starting the application. I also didn’t want the system to be restricted by the services I chose up front.
Choosing a data storage layer is a particularly tricky problem. There are many RDMS and NoSQL options to chose from, and each has its own unique set of strengths and weaknesses. Is the data write or read heavy, mutable or immutable, tabular or schemaless, relational, geospatial, searchable, or streamable? Can you choose one database technology that is ideal for all possible use cases?
Beyond data stores, I wanted a system where any existing standalone software could be packaged, distributed, and reused. What if, for example, Stanford researchers released an open source Named Entity Recognition algorithm and classification data set, packaged in one replicable… Java app?
Enter Containers
Docker, the popular Linux LXC container solution, provides a rich ecosystem and toolchain for building, running, and distributing containers. No more installing or building from source endless dependencies to get a piece of software to work. Inside a container lives a known-good configuration and environment for any process that you can interact with over a network interface. If someone solves that once, using that resource is as trivial as downloading the image, running the container, and connecting via a local IP address and port.
Package Management
While no package manager is without its shortcomings, I’m rather fond of the many things npm does right (and can live with the many other things it does poorly) and how well it works with nodejs.
However, while you can npm-install a node module that makes it trivial to connect to a database, you cannot use npm to download, configure, and run the database service itself. To its credit, docker provides a cli that makes this process as easy as a software package manager.
What neither address is a problem very specific to my use-case. When developing an application, it makes perfect sense to install any services your application will need and then install any packages your application code will require to connect to those services. What if you want to run an application without any services and be able to add new services on-demand at runtime?
Managing Dependencies
By using npm as a distribution mechanism for plugins, the software package problem is handled trivially. All that’s left service dependencies. The solution I devised involved inspecting a plugins declared service dependencies prior to running the plugin’s code, and once the requested containers are available, starting the plugin and providing it with the container’s IP address and exposed port(s).
While this is not an approach I would advocate for common cases, where all services and functionality are baked-in at build time, it works remarkably well for a long-running application that can be extended at runtime.