Nathan Yau: 5 misconceptions about visualization

AUTHOR Nathan Yau
DATE September 23, 2011
LICENSE Creative Commons BY-NC

1. Visualization is for making data flashy

This is probably the most common one. It’s easy to look at a lot of the best visualization projects and want your data to look and feel the same way. So people ask, “I have such and such data. Is there a visualization technique that I can use to make it look cooler?”

Well, maybe. Not if you only have five data points though. You can spend a lot of time with icons or fancy print, but the graphics are interesting because the data that the visuals represent is interesting.

For example, I mapped the growth of Walmart a while back (It’s amazing how much mileage I get out of this graphic.), and people seem to like it because of the organic growth pattern. It starts in one area and spreads outwards like a virus.

Okay, compared to Toby Segaran’s original, I did add some interactive flourishes, but even without, the growth pattern is what makes the animation interesting.

For example, here’s a map with the same style as my Walmart one, but it shows the spread of Target. It’s not nearly as fun to watch, because Target took a more opportunistic approach of expansion. Locations pop up kind of randomly at times. It’s mostly interesting as a contrast to the Walmart map.

It should always be data first. Certain graphics get eyeballs because they show something that wouldn’t be seen in a table.

2. Software does everything

There are a lot of options for visualization, and the “best” one will change depending on who you ask.

Personally, I use a lot of R and have a lot of fun in Illustrator. More recently, I’ve been working withJavaScript. Flexibility is a huge plus for me, and I like to have full control over how my graphics look and how the interactive ones work. Most of what I do though is to present data to a wider audience. If I were an analyst tasked with digging through a large dataset, I might take a different route before I make something custom.

My main point is that there is no one piece of software that will do everything for you.

Some software is good for analysis, some is good for specific types of analysis, and some is good for storytelling.

3. The more information in a single graphic, the better

A misstep a lot of people take when they’re trying to advance “beyond Excel” is to layer too much information on top of their basic graphic. I’m all for providing context and highlighting interesting spots in your data, but at some point it’s better to split your one chart into two or three charts.

Some people try to be clever by using multiple axes on a single plot or multiple visual cues in a single chart to save space. Again, this works sometimes. A lot of the time it doesn’t. Oftentimes, simple and clear is better than clever and compact.

My favorite test is to show a graphic to someone who doesn’t know the data and isn’t a visualization expert and see what they take away from the visual.

4. Visualization is too biased to be useful

There’s a certain amount of subjectivity that goes into any visualization as you choose what data to show and how to show it. By focusing on one part of the data, you might inadvertently obscure another. However, if you’re careful, get to know the data that you’re dealing with, and stay true to what’s there, then it should be easier to overcome bias.

After all, statistics is somewhat subjective, too. You choose what you analyze, what methods to use, and pick what to point out in reports.

News organizations, for example, have to do this all the time. They get a dataset, decide what story they want to tell (or find what story the data has to tell). Browse through graphics by The New York Times, and you can see how you can add a layer of information that objectively describes what the data is about.

5. It has to be exact

If you’re using visualization to show the exact value of every single data point, along with every standard error, you’re probably using it wrong. Accuracy is important. Yes. But visualization is less about the individual values and more about the distribution of them over time and space. You’re looking for (or showing) patterns. You’re comparing and contrasting.

If all you care about are individual data points, you might as well put it in a table.