The Presentation of Blank Space

(Greetings to the residents of Tokelau who have taken an interest in this website. We’d love to hear from you.)

By The Metric Maven

Over my career as an engineer, I slowly took more and more interest in the presentation of data and numbers. For small sets of data, tables are often preferable over graphs. Edward Tufte states:

Tables are preferable to graphics for many small data sets. A table is nearly always better than a dumb pie chart; the only worse design than a pie chart is several of them… [1]

When constructing a table, I have often needed to contemplate the presentation of numbers before I design it, and often need to review it afterward. The problem is not the numbers themselves, but with their presentation.

I’ve been exposed to graphic arts and printing for many decades, but when I was introduced to TeX I became much more interested in typesetting. Some typefaces are far more readable than others. The typeface known as comic sans is generally disparaged and has become something of a phenomenon. Helvetica is perhaps the most well-known typeface, and is ubiquitous. Some typefaces are known for their readability over long periods, but one very important aspect of creating a typeface and putting words on a page with it, is the spacing between letters (known as glyphs). The choice of spacing between glyphs in a manner which produces a visually pleasing result is known as kerning.

In my view, this applies to numerical presentation as much as it does to prose presentation using a typeface. It was also of concern to the founders of the metric system:

At the time of the creation of the metric system in France, financiers and businessmen were increasingly separating whole numbers in sets of three with commas between. This made them easier to read. The triad grouping was adopted, but the comma was thought to be inelegant and confusing. Laplace and Lagrange stated: “…, it is hoped that the use of a comma to separate groups of thousands will be abandoned, or that other means be used for this purpose.” Other means were adopted, which is the small space between groups of thousands. [2]

It has been my experience that introducing commas can really obscure information. For instance, in my essay The Expanding Universe, the table presented shows the expected size of the universe over time:

click to enlarge

I used full spaces to separate numerical triads in the table. The columns are easily seen in this case. Now here is the table with commas:

click to enlarge

The comma “separators” act to perceptually unite the string of numerical glyphs rather than separate them as a space does. In the first table one can clearly pick out each column that goes with each metric unit as shown at the bottom.

The modern international standard eschews commas and adopts spaces as desired from the beginning. The numbers are to be separated into triads, or groups of three. Mr. Reid, a physicist and teacher has a nice essay called Stop Putting Commas In Your Numbers. The amount of blank space separation is said to be a “thin space.” This is defined as a fifth of an em (or sometimes a sixth) for the Unicode Character THIN SPACE (U+2009). There is already a little waffling about the size of the space. Mr. Reid presents a helpful table that demonstrates his view:

click to enlarge

The BIPM has this to say:

…for numbers with many digits the digits may be divided into groups of three by a thin space, in order to facilitate reading.  Neither dots nor commas are inserted in the spaces between groups of three. However, when there are only four digits  before or after the decimal marker, it is customary not to use a space to isolate a single digit. The practice of grouping digits in this way is a matter of choice; it is not always followed  in certain specialized applications such as engineering drawings, financial statements, and scripts to be read by a computer.

This gets to the heart of this essay. I’ve always had difficulty deciding:

1) If, when there are four digits, would it would be best to use a thousands space separator, or not.

2) If I use a thousands space separator for a four digit number, how large should this space be to provide the most aesthetic presentation?

There does not seem to be a single definition of thin space, Merriam-Webster claims it is either a fourth em space, or fifth em space. Others say a sixth of an em space. In the end the choice may come down to kerning. In the TeX typesetting language, the \thinspace command is defined as a \kern .16667em or one-sixth of an em space.

It appears that the tables above, which have multiple groups of metric triads, a full space is aesthetic and the data is very accessible to the eye. It is when the data in a table does not go beyond five digits that I’ve been hard pressed to decide how to best display the data. Below I have taken the data for energy use in the US for 2016 and presented it with a full space, thin space and no space thousands separators:

– click to enlarge

The full space thousands separator data seems a bit awkward, with too much blank space seeming to slice the number so much they seem like separate values. The thin space amount of blank separation is probably the best in this situation. The four digit values still seem to be a single entity, but also work with the large numbers to provide separation. Using no space seems a bit disjointed, but in practice it is often difficult to provide a thinspace, so the alternative of using no spaces up to 9999 might be a good option.

The above table is in a random order of values. When it is ascending, the table can look quite different:

– click to enlarge

When presented this way, the thinspace column and the no space column have a similar aesthetic, and when it is not possible to use a thin space, no space for the four digit numbers looks good. The table can look different when the lines are removed between rows:

One might now prefer the full space column to the thinspace column. It would probably even be best to remove most of the rules as is often argued by some typographers.

Tufte would probably recommend a table like this:

In this case, one might like the fullspace column the best.

There is no real right and wrong way to do this, just more appealing and less appealing,  which is a very difficult value to measure. We each must find our balance between the aesthetics of numerical presentation and the clear presentation of information.

[1] Tufte Edward, The Visual Display of Quantitative Information, Graphics Press 1983 pg 178

[2] Bancroft Randy, The Dimensions of The Cosmos Outskirts Press, 2016 pg 9

If you liked this essay and wish to support the work of The Metric Maven, please visit his Patreon Page

The Metric Maven has published a book titled The Dimensions of The Cosmos. It examines the basic quantities of the world from yocto to Yotta with a mixture of scientific anecdotes and may be purchased here.


US Scientists Not Using The Metric System

By The Metric Maven

Bulldog Edition

A Vox article, American energy use, in one diagram, shows that US Scientists using the metric system are Mormons Making Coffee,  without adding any coffee. A diagram is presented for 2016 energy use in the United States:

click to enlarge

The units used are in quadrillion BTUs. BTUs are not even a well defined unit. It is stated that a BTU is about 1055 joules. So, a quadrillion BTUs is about 1055 Petajoules. The chart has this run-down for energy consumed in the US:

Because the energy values are BTUs nested inside of a name called a Quad, this is even worse than using Olde English prefixes. The actual energy unit is hidden in a nickname. Clearly it could be worse, the different energy sources could be a mixture of KWh, “metric tons” of coal, and so on. The Quad is simply an Argot, used by insiders to make what they do less transparent. See my essay, John and the Argot-nauts. The author of the article tries to put a Quad in perspective by offering this list of Quad equivalents.

A “quad” is one quadrillion (a thousand trillion) BTUs. Here, according to Wikipedia, are a few things equivalent to a quad:

8,007,000,000 gallons (US) of gasoline
293,071,000,000 kilowatt-hours (kWh)
36,000,000 metric tons of coal
970,434,000,000 cubic feet of natural gas
25,200,000 metric tons of oil

So a quad is a lot of energy. The US consumed 97.3 quads in 2016, an amount that has stayed roughly steady (within a quad or so) since 2000.

This list of units seems to ask a reader to add apples, oranges, grapes, strawberries, blueberries and then compare the sum to bananas. In the metric system we choose but one fruit for comparison. In this case the choice of Petajoules will produce integer comparison values for the smallest and the largest values.

If we use Naughtin’s Laws to rewrite this list in metric we obtain:

Total Energy  102 652 Petajoules (without rounding)

The data is presented in all integers and the numbers are easily comparable.  Solar and Geothermal do not contribute much of the total, but Natural Gas, Coal, and Petroleum do. Even in the US, a joule is almost certainly a more recognizable energy unit than a Quad, as is the metric prefix modifier Peta- (Petabytes of data storage). The units are suppressed in the original diagram, so we could indicate all values are in Petajoules (PJ) and simplify the table further:

The article notes that most people immediately notice the amount of wasted energy, which is about two thirds according to the article, or about  68 435 Petajoules.
The same diagram from 1970 is presented, also in Quads:

click to enlarge

It shows that in 1970 we generated about 71 213 Petajoules of energy and wasted 32 178 Petajoules. Wow, we now  officially waste about as much energy as we generated in 1970!

In 1950 the total generated energy was 32 810 Petajoules, of which about half was

click to enlarge

The larger point is that scientists at LLNL continue to express energy values the same way they did in 1950. There is also a strange implicit assumption that if the values are presented in pre-metric units, that somehow they will be understood better by the public. This is probably just a rationalization for using internal argot to express these values. One can only speculate why there has never been a change. One thing that is certain, is there has been a significant change in the complexity of our energy generation in the US since the 1950s. The 1950 diagram has four energy inputs, today we have nine. To best understand this information, one should examine how it has been presented in the past and consider a simpler, more intuitive way of expressing this data. The metric system would be a good start, and perhaps reading Edward Tufte might be the next step for government scientists to investigate better ways to express this data, assuming they actually want to, not just for public understanding, but for scientists, engineers and others.

Thanks to Peter Goodyear for bringing this article to my attention.

If you liked this essay and wish to support the work of The Metric Maven, please visit his Patreon Page

Related Articles:

Joule in The Crown

John and The Argot-Nauts

The Metric Maven has published a book titled The Dimensions of The Cosmos. It examines the basic quantities of the world from yocto to Yotta with a mixture of scientific anecdotes and may be purchased here.