Writing Tools for Digg Users: The Inevitable Digg Effect

Digg.com is an ideal target for creating instant gratification content. You can come out of nowhere and have tens of thousands of new users in 24 hours. An inevitable woe is the Digg Effect — a sign of success and often the key to server failure.

How do you make 10,000 comparisons of Digg users to the rest of the community?

How do you display 600,000 links to relevant Digg stories in thousands of users’ browsers?

How do you tell 40,000 users where they are located on a map of 67,000 users?

How do you display 15,000,000 Diggs on that map?

How do you do it all on one shared hosted server?

Birth of a Digg Tinkerer

In December, I ran an experiment that required collecting a lot of data on thousands of users. Other parts of the experiment included measuring traffic and Google Adsense revenue during the Digg Effect. I created a little gadget, called “DiggStatus“, where Digg users could type in their user names to see how their Digg usage statistics stack up against the rest of the community. The experiment was much more successful than I had anticipated.

Weekend Projects

Every once in a while, it’s nice to get away from clients, meetings, deadlines, revisions, and product launches. Build something you want to build. Make your own deadlines. Change your mind — no deadlines. Finish it when and if you feel like it.

During the last few months, I have created a small collection of projects geared toward Digg users. Each is intended to be in itself a one-hit wonder. Get to the front page, get viewed by thousands, then recede into oblivion.

The Inevitable

Creating content for Digg is all or nothing. Either it hits the front page and you get blasted with thousands of visitors in a short period of time, or it passes through Digg’s upcoming stories section without a hint of interest.

Creating tools and gadgetry for the Digg front page puts a developer with limited resources in a sticky situation. You have to last through the Digg Effect, or nobody will see it.

Looking into a Broken Mirror

Digg users are accustomed to using mirror sites to view content after the target web server crumbles under the load. Unfortunately, mirrors are only useful for static content. Mirrors can’t learn your application’s recommendation algorithm. Mirrors don’t know the coordinates of users on the map you just made.

As a tool creator, it is imperative to withstand the Digg Effect.

Every K Counts

Look at the amount of data your web server will have to send for every visitor.

If your web site has a layout that utilizes images, jump ship. Set up your to-be-Dugg page outside of your layout. Add color with HTML. Structure your page with very light-weight HTML. A logo at the top of the page can be all it takes to push your web server over the edge. With links back to your site, only interested traffic will be directed to your normal site.

If you require images, find somewhere else to put it, so the bandwidth can be diffused between multiple locations.

Keep Quiet

With thousands of Digg users slamming your server in a very short period of time, even the simplest database call can be destructive to your server. If you use a framework for your site, make sure you have caching set up so you don’t ping the database for the same data thousands of times just to display the page.

In the tool itself, you need to optimize, optimize, optimize! Database queries add up faster than you may think.

Put Your Money Where Your Mouth Is

This week, I released two tools that utilize a map I generated last weekend of the Digg community.

The first tool allows visitors to search for a user name to find where that user is located on the map. This one required 3 database queries per request — one to find the location of the user, one to count the user’s friends, and one to count the user’s fans. While it was on Digg’s front page, about 40,000 users were requested. The database was therefore queried about 100,000 times. All of this and I didn’t receive any mention of downtime from users. If I had used one more database call, and taken the total up to 130,000 queries, the server might not have been able to handle the load.

The second tool I released this week displays Diggs (votes) on the map in real-time (with a small delay) and placed at the Digging users’ location on the map. In 24 hours, the tool has displayed 15,000,000 of those votes. Unfortunately, there were a few hours that users had intermittent access to my site. Contributing to the overload, the first tool was still getting quite a few hits from Digg, it was featured on the BBspot.com home page as a Daily Link, and there were quite a few StumbleUpon users.

However, despite a little bit of downtime, this one did quite well, thanks to some heavy optimization. It may appear like there are tons of queries going on, with all those icons popping up on thousands of screens around the world. The search functionality in the other tool required 3 queries per request. This tool only required one query per 100 Diggs (though, the tool may at times display over 1,000 Diggs per minute). The results were returned to the client in a compact comma-delimited list, where the Flash front-end would then break the results apart.

Moral of the Story

If you create content for Digg, take a close look at the your page and/or tool’s weight on the server. You need to save every byte you can. You should only contact the database when absolutely necessary.

The Digg Effect should be a sign of victory not a sign of defeat.

Data Visualization: Mapping the Digg Community

Part of the purpose of any social web site is to build a network of friends. Using the recently released Digg API, I created a map of Digg users and how they’re connected to each other.

The Map [Link]

On the map, users are organized by the length of time they have had accounts on Digg. The oldest accounts are at the center and accounts created in the last few months are around the edges. The map only includes users who utilize Digg’s friendship feature.

Usefulness

I’m known for making an argument that data visualization can be very useful. Charts and graphs, while they may be aesthetically pleasing, can point out trends and habits on a broad scale that would probably be missed using typical statistical analysis.

However, I’ll be the first to say that the Digg friendship map I created has very little value as a practical analysis tool. It’s an idea I’ve had in my head for a few months, and finally got around to building it. I thought it would be fun. I thought it would be neat.

Making it fun and neat

The map itself, as a JPG image, may already appeal to people with an interest in data visualization. However, if I’m going to make something fun and neat, I’m going to try to appeal to more people than just data visualization enthusiasts.

When I rendered the image, I stored the coordinates for every user in a database. This allows me to go back afterwards and query the database for a specific user and retrieve that data point. I created a simple Flash interface where people can type in their (or others’) digg user names to find out where they are on the map.

Check it out!

SXSW Bowling Technique

This is a video of me bowling at SXSW 2007 in Austin, TX. The video shows step-by-step the technique I’ve developed (though, I use that term loosely). First, you extend the ball forward, in front of you. Next, you lift it over your head. Then, you bring it behind you. At the release, you spin your arm around the ball (I recommend ‘palming’ the ball — don’t use the thumb hole). You should try to land the ball near the left edge of the lane, aiming at the right gutter. This will give it a more dramatic effect as it curves at the edge of the gutter and comes back to strike the pins. The most important part, though, is a good victory dance! Even if you don’t get a strike! (just don’t fall down..)

Data Visualization: Digging Into DiggTaggr’s Usage Stats

The Challenge

In the last seven weeks, DiggTaggr has delivered about 115,000 sets of links to relevant stories to several thousand unique users. This is a pretty good size dataset to tinker with, so I decided to hack through it and see if I could present the data in an interesting way.

I have to admit, I was partially inspired to do this by Stamen Design’s data visualizations of Digg’s traffic. If you haven’t seen their scatter-plots, you should check them out.

Graph #1: User ID vs. Story ID

This was my first attempt at displaying the dataset in an interesting manner. Two stories related to DiggTaggr hit the front page and are labeled on the graph. DiggTaggr debuted on Friday, February 2nd, and received a complete redesign on the 4th.

You can see the curve of new users accelerating through most of the graph. This illustrates that there are fewer and fewer new users. There are also grid-like patterns emerging. Horizontal lines represent highly active users, while the dark horizontal gaps represent users who tried the tool and stopped using it. Vertical lines represent high activity during peak hours, while dark vertical gaps represent low activity on weekends.

Graph #2: Time vs. User ID

The Story ID axis in the previous graph gave a fairly accurate chronological referrence, but if it’s time you want, it’s time you should use.

This graph illustrates peak hours and peak days of the week in a more explicit way. I labeled the distinct patterns of weekdays and weekends. You can see that the pattern is more clearly defined in a certain area of the graph. These users involuntarily grouped themselves together by seeing the tool first thing Monday morning (the white horizontal line at the top of that section of users), while daily users had already seen the tool for 2 days.

The graph is color-coded to see how quickly users went through 40 Digg stories using DiggTaggr. Some users quickly went to red, while others used the tool less frequently.

Graph #3: Stories Viewed vs Time

Okay, okay. I was having a little bit of fun with this one. This one took over an hour to render on my laptop, partially because each dot had its own database query to determine how many previous instances there were for that user.

Each squiggly line represents a user. When the line is vertical, the user is viewing stories quickly one after another. You can see that only a handful of users have made it to the 1,000 story mark. I should give them a prize!

The density at the bottom left illustrates the high volume of new users. Some Digg at a rapid pace and shoot up, while others are more moderate and gradually climb. The density at the bottom tells us that a high percentage of DiggTaggr users either rarely visit Digg or uninstalled the tool.

Conclusions

Data visualization is still fascinating and fun.

DiggTaggr has sent almost half a million links to relevant stories to its users.

Yesterday, I chose to parse datasets instead of going outside. geek++

I still enjoy hearing from users. Feedback is always welcome and appreciated.
Email: brian@shaler.name