Service dependencies

In building my personal computation proof-of-concept, one thing became clear very quickly: I needed a graceful and extensible way to manage service dependencies. One thing I wanted to avoid was a long list of services that needed to be installed, configured, and running prior to starting the application. I also didn’t want the system to be restricted by the services I chose up front.

Choosing a data storage layer is a particularly tricky problem. There are many RDMS and NoSQL options to chose from, and each has its own unique set of strengths and weaknesses. Is the data write or read heavy, mutable or immutable, tabular or schemaless, relational, geospatial, searchable, or streamable? Can you choose one database technology that is ideal for all possible use cases?

Beyond data stores, I wanted a system where any existing standalone software could be packaged, distributed, and reused. What if, for example, Stanford researchers released an open source Named Entity Recognition algorithm and classification data set, packaged in one replicable… Java app?

Enter Containers

Docker, the popular Linux LXC container solution, provides a rich ecosystem and toolchain for building, running, and distributing containers. No more installing or building from source endless dependencies to get a piece of software to work. Inside a container lives a known-good configuration and environment for any process that you can interact with over a network interface. If someone solves that once, using that resource is as trivial as downloading the image, running the container, and connecting via a local IP address and port.

Package Management

While no package manager is without its shortcomings, I’m rather fond of the many things npm does right (and can live with the many other things it does poorly) and how well it works with nodejs.

However, while you can npm-install a node module that makes it trivial to connect to a database, you cannot use npm to download, configure, and run the database service itself. To its credit, docker provides a cli that makes this process as easy as a software package manager.

What neither address is a problem very specific to my use-case. When developing an application, it makes perfect sense to install any services your application will need and then install any packages your application code will require to connect to those services. What if you want to run an application without any services and be able to add new services on-demand at runtime?

Managing Dependencies

By using npm as a distribution mechanism for plugins, the software package problem is handled trivially. All that’s left service dependencies. The solution I devised involved inspecting a plugins declared service dependencies prior to running the plugin’s code, and once the requested containers are available, starting the plugin and providing it with the container’s IP address and exposed port(s).

While this is not an approach I would advocate for common cases, where all services and functionality are baked-in at build time, it works remarkably well for a long-running application that can be extended at runtime.

Personal Computation

Discover Supercomputer 2 by nasa_goddard on flickr
Discover Supercomputer 2 by nasa_goddard on flickr

During the 1980s and 1990s, a powerful revolution took place in computing hardware. Programs had historically been executed on mainframe computers, fed commands manually at first and then from terminals. As computing power increased and chip prices fell, terminals went from thin clients that logged in and executed programs remotely to full-fledged computing devices. Apple and Microsoft notably capitalized on this revolution. Bill Gates’ dream was to put “a computer on every desk and in every home.”

By moving computation on-site, users were given unprecedented power. Even if Internet connectivity was not prohibitively slow, it would be difficult for mainframes to compete against the responsiveness and personalization of an on-site computer.

This selective retelling and framing of history is intended to form the foundation of a present-day analogy. The personal computer revolution of the 20th century is similar to a revolution I would like to see take place in personal computation.
Personal Computation

We can define personal computation as a system that uses data and algorithms to provide a highly personalized experience to a single user. Compared to a massive, centralized, mainframe-like system, a personal computation system would take different approaches on hardware, software, and data.

Hardware

Through virtualization and containerization, hardware is becoming less directly tied to computation. While there are other configurations, let’s compare two: infrastructure you don’t own or have access to (e.g. SaaS/cloud services); and infrastructure where nobody has more control or access than you (e.g. a web host or IaaS).

“There is no cloud. It’s just someone else’s computer.”

Running a “personal cloud” on a web host (or Infrastructure as a Service provider) is still technically using someone else’s computer, but gives you different levels of control, access, privacy, and security. Notably, if you rent a computing resource from a utility, they should not have access to your data and should not be able to dictate what software you can or cannot run on that resource.

This is a comparison between two cloud-based infrastructures. In order to optimize battery life on our mobile devices, we will ultimately need to perform many types of computation on a remote service.

Software

Cloud providers perform personalized computation by developing complex algorithms and running them against the vast amounts of data they collect on you, your friends, and people who resemble you.

Algorithm: a set of rules for calculating some sort of output based on input data.

Facebook’s News Feed, Google Now, and Apple’s Siri give users a lot of functionality, but the algorithms only work within each service provider’s walled garden. You can’t use Facebook’s News Feed to surface important content your friends posted outside of Facebook (if they didn’t also post it to Facebook). You can’t use Google Now to automatically remind you of upcoming travel, appointments, and bills if you don’t use GMail. You can’t use Siri’s speech recognition outside of iOS or OS X. You also can’t get new functionality or features to any of those services unless their engineers add it or if they provide a specific mechanism for 3rd party extensions or apps.

Currently, these service providers’ state-of-the-art algorithms are far more advanced and better integrated than anything you can install and run on your own. However, there is no technical reason you can’t get most of what they provide without connecting to a company’s centralized service. In fact, if you had access to and control of these algorithms, you could apply them to data that exists outside of a walled garden or to personal data you don’t want to share with a 3rd party.

That is to say, despite all the amazing machine learning algorithms deployed by centralized service providers, they’re only able to scratch the surface of what would be possible with software outside of a walled garden.

Data

Today, in order to get highly personalized computation, people give up access to a vast array of highly personal information—where you are at all times, who you talk to and when, what you say to your friends and family, and what you like to eat, wear, watch, listen to, and do. This information is valuable enough to advertisers that ad publishers can profitably invest tons of money into servers and software development and giving the computation away for free.

As we’ve discovered during the last decade, people in general are not at all reticent about giving up privacy in return for being able to be able to basic things like talk to friends online or get driving directions.

If you told everyone it’s not necessary to completely surrender their privacy to many 3rd party companies, would they care? It seems like only a minuscule fraction of people would. Ultimately, if there is a limit to how much privacy people are willing to give up, it would probably (hopefully) be different for a 3rd party company than for a system only the user has access to. Given more comprehensive data about the user, a personal cloud type of system has potential to deliver higher personalization.

The Alternative to the Mainframe

Given the striking resemblance of today’s web services to mainframes, could there be a revolution on the horizon similar to that of the personal computer? There is no shortage of personal cloud projects or decentralized services. Sadly, many of them attempt to be open source clones of existing centralized services, which is akin to installing a mainframe in your living room and calling it a PC.

What would the software equivalent of the personal computer look like? What if we could run whatever algorithm we want on whatever data we want? What if we owned and controlled a system that did not have ulterior motives, or get acqui-hired, or prevent us from escaping a walled garden?

Existing services, like those provided by Google, Apple, Facebook, and so on, set a high bar for ease of use, user experience, seamless integration, and powerful yet simple-looking (“magic”) functionality. But at the end of the day, it is just software, and there is no technical reason why it needs to run in a 3rd party’s data center.

Servers and storage are getting cheaper, software is easier than ever to distribute, programming languages and frameworks are enabling ever more complex and extensible applications, and complex systems are becoming more easily reproducible with containerization.

I’ve been thinking about and working on a rough proof-of-concept of this for a few years, and quit my job a couple months ago to work on it full-time. It’s certainly not as easy or simple as it sounds, but I’ve come up with some interesting strategies and solutions. During the next few weeks, I’m planning on writing about some of these concepts, as I’m stretching the limits of what I can accomplish while working in a vacuum.

The THX Sound and Random Numbers

I recently stumbled upon a 7 year old article detailing the creation of the THX sound in the 1980s. It was apparently generated by 250,000 ASP commands that were generated from a program that consisted of 20,000 lines of C code.

While entirely impressed by the work and result, this part stuck out to me.

>> There are many, many random numbers involved in the score for the piece. Every time I ran the C-program, it produced a new “performance” of the piece. The one we chose had that conspicuous descending tone that everybody liked. It just happened to end up real loud in that version.
>> “Some months after the piece was released (along with “Return of the Jedi”) they lost the original recording. I recreated the piece for them, but they kept complaining that it didn’t sound the same. Since my random-number generators were keyed on the time and date, I couldn’t reproduce the score of the performance that they liked. I finally found the original version and everybody was happy.

In visualization, I’ve come across instances in the past where I’ve needed random numbers to get a desired look, but needed repeatable results.

Take these two cases:

1.) I was re-rendering grid-like portions of a scatter-plot where data could be added but not removed. If I had 100 randomly placed points in a grid cell and needed to redraw it with 101 points, I needed the first 100 to be redrawn in the same positions they were in, plus the new one. (This one was in Flash)

2.) I was rendering a video with hundreds of thousands of “people” scattered around a floor in 3D space, but needed to be able to zoom in on particular people. If they were positioned in a truly random formation, the hard-coded positions I zoomed into would no longer show the specific people I was targeting. I needed to be able to re-render the video, position the people randomly, but have them in the same random positions they were in previously. (This one was in Processing)

For both of those cases, I came up with similar solutions. I created an alternate random() function that accepted an input parameter—a seed—and returns a random-ish floating point number between 0 and 1. For example, if you send it a seed of 123456, you might get a value back of 0.382919. If you send 123457, you might get 0.716254. What’s most important, however, is that sending 123456 again would result in the same value, 0.382919.

There are many ways to accomplish this, and I certainly didn’t set out to find the most optimal/scalable method. Within my seeded random function, I simply used the value for pi, performed some arithmetic using the seed, and returned only the digits after the decimal. That resulted in a very random-looking floating point number between 0 and 1, just like basic, non-ranged random() methods tend to provide.

In Case #1 above, I used the grid cell’s static X and Y coordinates plus the UID of the dot being rendered, and then generated a second random number with a slight variation in order to have two values—X and Y coordinates. In Case #2, all I needed was the UID of the person, also varied for a second random number to retrieve a random X and a random Y position.

As for the THX sound, they could have used a similar approach with a single hard-coded base seed value (e.g. the number 1) that is used in all subsequent seeded random function calls. This would result in the same sound being played every time. If they wanted to “compose” a new, randomized piece, the would simply change that base seed value (e.g. 1 becomes 2). To go back to a previous version, change that number back. Upon finding a composition they want to keep, simply leave that base seed in the code. The only way to lose the sound would be to lose the entire codebase, at which point your hopes of recreating the sound would obviously be slim, even if no random (or random-ish) numbers were involved.

Ludum Dare #23 48h Compo

Until Friday night, I had never heard of Ludum Dare, a global game development event celebrating its 10th anniversary. I found it via Reddit or Hacker News or Twitter or something, and discovered the event had kicked off 4 hours prior. Everyone participating in the Compo was given 48 hours to create a game from scratch around a theme announced that night. An alternate Jam competition has more relaxed rules, allowed teams, and ran for 72 hours. The theme was “Tiny Worlds,” which can be tricky when coming up with a compelling game idea. Fortunately, the theme and rules allow for pretty broad interpretation.

I wasn’t sure I would participate, but an idea popped into my head. Inspired in part by a segment of Dragon Ball Z Kai, I pondered what it would be like to jump from tiny planet to tiny planet. Mainly, what would happen to your perspective of “up” and “down” if you jump from the top of one planet to the bottom of another? The concept of “down” essentially just means “in the direction of the pull of gravity.” What drove me to want to build the game was the idea of making a 2D platformer where the directions up, down, left, and right are completely fluid and based on the gravitational pull of planets around the character.

Even though I haven’t used CoffeeScript or Processing.js before, I decided to try them out. Probably not a good idea on such a short timeline, but oh well.

The result was an auto-orienting 2D puzzle game where you have to jump from tiny planet to tiny planet to get to your goal before your oxygen runs out. You can walk around on planets and jump, but after leaving a planet’s surface, there is no longer any control over the character. That means if you miss a planet, you can drift off into space forever!

I finished an hour before the deadline and set up the game at PicoPlanets.com (a play on “pico” meaning one one-millionth, or 0.000001) and made the source code available on GitHub. You can also see screenshots and blog posts about the progress (“making of”) here on my Ludum Dare author page.

Firebug: Once you download the page, you own it

I’m going to geek out for a second.

Firebug is an extension for Firefox that I’ve been using for years. These days, most (maybe all?) modern browsers come with some sort of interactive console. It’s great for web development, because you can tweak HTML & CSS without reloading the page, come up with what you like, then copy and paste the resulting code back into your files.

I use Firebug beyond that, though. You see, when I’m viewing a web page, I have this mindset that it is living on my computer and I can do with it what I want. Think about big overlay ads on sites that block the content you’re viewing for X seconds. You can delete that entire element and go back to reading in seconds.

Figure 1
Figure 1
Today, I was reading up on a GPS photo logging device I recently purchased. It doesn’t seem to work. I was reading trying to read a blog post reviewing it, when I started getting too annoyed with a quirk in the page. See Fig 1.

You see those broken characters after every period?

No thanks!

Sure, I could just try to ignore them, but I’m too easily distracted. Instead, I opened firebug and typed out a line of Javascript, after verifying that the blog I was reading had jQuery installed. Within a few moments (I probably would have been done reading through the post), I had this:

wtf = document.getElementsByTagName(“p”); jQuery.each(wtf, function (index, val) { jQuery(val).html(jQuery(val).html().replace(/Â/g, “”)); });

To break it down, I’m creating an array of all paragraph tags on the page, then looping through them and setting their HTML contents to be their original HTML contents with the strange characters replaced with nothing.

While the blog post didn’t have anything to help me get this device working, I figured it was worth sharing the creative use of Firebug.

Another recent creative use of Firebug was browsing Google Maps with my friend Ed and hearing him say, “Man, it’d be cool if we could see just the dots, without the map.” I fired up Firebug, poked around, and deleted the image tiles below the points. The result? Fig 2.

Figure 2
Figure 2