Posts Tagged ‘Data Viz’
I have been wanting to jump into several programming languages, recently. Unfortunately, I’m very limited on time. As a full-time Flash developer, it is bad enough that I am lagging behind on adopting AS3, Flex, AIR, and Papervision. I would love to pick up Ruby, Python, and Silverlight, too.
However, the data visualizationist in me has been captivated this year by a language called Processing. Flash is a great tool for visualizing sets of data, and has served me fairly well in the past. While Flash is mostly tied to the web (and with AIR, it takes a step out of the browser), Processing is designed to be put in many more places. More on that in a second.
Let’s start with a summary of the platform. Here is an excerpt from Processing.org:
Processing is an open source programming language and environment for people who want to program images, animation, and interactions. It is used by students, artists, designers, researchers, and hobbyists for learning, prototyping, and production. It is created to teach fundamentals of computer programming within a visual context and to serve as a software sketchbook and professional production tool. Processing is developed by artists and designers as an alternative to proprietary software tools in the same domain. Processing is free to download and available for GNU/Linux, Mac OS X, and Windows.
Processing is built on Java, which means you can create applications for the web, desktop, and mobile devices. Even more fascinating, there is a project related to Processing, called Wiring, that allows your Processing application to communicate with homemade hardware (circuit boards, solder, and microprocessors, oh my!). This means that no matter what input or output medium you want to use, Processing should be able to do it.
Before I get motivated enough to try out any new technology, I need to see it in action. The more impressive the examples, the more motivated I will be to try it out.
I found out about Jonathan Harris thanks to his TED Talk in 2006. The project that I found particularly fascinating was “We Feel Fine“. It features many colorful objects floating in space that represent feelings the application discovered while scraping blogs.
As a Flash developer, I could not miss Audi’s “Rhythm of Lines” micro-site built using Papervision3D that features moving lines in 3D space, shaping the outline of an Audi A5. I was surprised to learn weeks later that the accompanying TV spot for the “Rhythm of Lines” campaign was created using Processing. Upon doing more research on the subject, I came across this detailed AIGA article about Processing.
Another feature that Processing boasts over Flash (something I have been wishing to see added to Flash years) is OpenGL-powered hardware acceleration. Papervision3D is filling the gap a little bit, but it still requires a lot of work from the CPU, instead of throwing it onto the GPU where it should be. OpenGL support means that you can add more complexity, some shading, and other fancy effects without taking much of a performance hit.
Even though I am very short on time, if I get inspired to do a particular data visualization, I might invest some of my sleep hours into a micro-project.
Part of the purpose of any social web site is to build a network of friends. Using the recently released Digg API, I created a map of Digg users and how they’re connected to each other.
The Map [Link]
On the map, users are organized by the length of time they have had accounts on Digg. The oldest accounts are at the center and accounts created in the last few months are around the edges. The map only includes users who utilize Digg’s friendship feature.
I’m known for making an argument that data visualization can be very useful. Charts and graphs, while they may be aesthetically pleasing, can point out trends and habits on a broad scale that would probably be missed using typical statistical analysis.
However, I’ll be the first to say that the Digg friendship map I created has very little value as a practical analysis tool. It’s an idea I’ve had in my head for a few months, and finally got around to building it. I thought it would be fun. I thought it would be neat.
Making it fun and neat
The map itself, as a JPG image, may already appeal to people with an interest in data visualization. However, if I’m going to make something fun and neat, I’m going to try to appeal to more people than just data visualization enthusiasts.
When I rendered the image, I stored the coordinates for every user in a database. This allows me to go back afterwards and query the database for a specific user and retrieve that data point. I created a simple Flash interface where people can type in their (or others’) digg user names to find out where they are on the map.
In the last seven weeks, DiggTaggr has delivered about 115,000 sets of links to relevant stories to several thousand unique users. This is a pretty good size dataset to tinker with, so I decided to hack through it and see if I could present the data in an interesting way.
I have to admit, I was partially inspired to do this by Stamen Design’s data visualizations of Digg’s traffic. If you haven’t seen their scatter-plots, you should check them out.
Graph #1: User ID vs. Story ID
This was my first attempt at displaying the dataset in an interesting manner. Two stories related to DiggTaggr hit the front page and are labeled on the graph. DiggTaggr debuted on Friday, February 2nd, and received a complete redesign on the 4th.
You can see the curve of new users accelerating through most of the graph. This illustrates that there are fewer and fewer new users. There are also grid-like patterns emerging. Horizontal lines represent highly active users, while the dark horizontal gaps represent users who tried the tool and stopped using it. Vertical lines represent high activity during peak hours, while dark vertical gaps represent low activity on weekends.
Graph #2: Time vs. User ID
The Story ID axis in the previous graph gave a fairly accurate chronological referrence, but if it’s time you want, it’s time you should use.
This graph illustrates peak hours and peak days of the week in a more explicit way. I labeled the distinct patterns of weekdays and weekends. You can see that the pattern is more clearly defined in a certain area of the graph. These users involuntarily grouped themselves together by seeing the tool first thing Monday morning (the white horizontal line at the top of that section of users), while daily users had already seen the tool for 2 days.
The graph is color-coded to see how quickly users went through 40 Digg stories using DiggTaggr. Some users quickly went to red, while others used the tool less frequently.
Graph #3: Stories Viewed vs Time
Okay, okay. I was having a little bit of fun with this one. This one took over an hour to render on my laptop, partially because each dot had its own database query to determine how many previous instances there were for that user.
Each squiggly line represents a user. When the line is vertical, the user is viewing stories quickly one after another. You can see that only a handful of users have made it to the 1,000 story mark. I should give them a prize!
The density at the bottom left illustrates the high volume of new users. Some Digg at a rapid pace and shoot up, while others are more moderate and gradually climb. The density at the bottom tells us that a high percentage of DiggTaggr users either rarely visit Digg or uninstalled the tool.
Data visualization is still fascinating and fun.
DiggTaggr has sent almost half a million links to relevant stories to its users.
Yesterday, I chose to parse datasets instead of going outside. geek++
I still enjoy hearing from users. Feedback is always welcome and appreciated.