February 2010 – Brian Shaler

Usually, a “user group” would revolve around a computer language, a development platform, or subsets of computing technologies. This title is phrased in such a way to imply that data is a platform under which statisticians, data analysts, and visualizers coincide

Last night, I had a conversation with Mark Ng and Marc Chung, two people who I have recently found to be highly enthusiastic in analyzing large data sets. The outcome of the conversation may potentially be two organizations, a user group and a work group.

The User Group:
I’m an interface guy who’s been doing data visualization lightly for 4 years and heavily for 1 year. My skill set for dealing with large amounts of data is creeping its way back, back, back from the front-end interface into the deep abyss of things that drive data visualization: statistical analysis, data mining, and distributed computing. In researching these topics, I’ve learned about some fascinating and useful tools that can do mind-boggling things with mind-bogglingly large data sets. This is stuff I would love to share, and even more, I’m interested to see what other people know and have done with these types of tools. My proposition was to start a recurring meet-up that would consist of presentations and/or demos of tools, languages, platforms, and cloud computing technologies.

The Work Group:
One VERY hot topic driving data visualization forward right now is government transparency. More and more local, state, and federal government bodies are releasing gargantuan amounts of data for the public to review. The problem? Gargantuan means BIG! Here, we need to connect a few dots:

First, we need to get the data. That can be through public repositories, or, as an example, a local news outlet that submits public records requests to obtain public data.

Second, we need to get the data in the right hands. Extremely large data sets are unmanageable to people who aren’t statisticians. So let’s get statisticians involved!

Third, we need to make the results public, which could mean looping back with a local news outlet to get coverage. It could also mean building and embedding interactive data visualizations into local news web sites, much like the New York Times.

I think both groups are excellent ideas and they even complement each other well (the user group would be an excellent resource pool for the work group). It is important to get data wranglers, statistics enthusiasts, and visualization gurus to come out of the woodwork and help these ideas come to fruition! Connect with me, Mark Ng, and/or Marc Chung to get in touch and stay in the loop.

Month: February 2010

A User Group… For Data?