Chapter [ ]: Visualization

What is your favorite data visualization book or blog? And why?

How would you design a chart or graph for a color-blind audience?

Explain Edward Tufte's concept of "chart junk."

Chartjunk refers to all visual elements in charts and graphs that are not necessary to comprehend the information represented on the graph, or that distract the viewer from this information.

The term chartjunk was coined by Edward Tufte in his 1983 book The Visual Display of Quantitative Information.


Tufte writes: "an unintentional Necker Illusion, as two back planes optically flip to the front. Some pyramids conceal others; and one variable (stacked depth of the stupid pyramids) has no label or scale."

Here is a more modern example from exceluser where it is very hard to understand the column plot because of workers and cranes that obscure them.

The problem with such decorations is that they forces readers to work much harder than necessary to discover the meaning of data.

Which tools do you use for visualization? What do you think of Tableau? R? SAS? (for graphs). How to efficiently represent 5 dimension in a chart (or in a video)?

There are many good tools for Data Visualization. R, Python, Tableau and Excel are among most commonly used by Data Scientists.
There are many ways to representing more than 2 dimensions in a chart. 3rd dimension can be shown with a 3D scatter plot which can be rotate. You can use color, shading, shape, size. Animation can be used effectively to show time dimension (change over time).

Here is a good example.

5-dimensional scatter plot of Iris data, with size: sepal length; color: sepal width; shape: class; x-column: petal length; y-column: petal width, from here.

For more than 5 dimensions, one approach is Parallel Coordinates, pioneered by Alfred Inselberg.

Iris data in parallel coordinates

See also

Of course, when you have a lot of dimensions, it is best to reduce the number of dimensions or features first.

results matching ""

    No results matching ""