Data Normalization

Normalize or not?

A very important aspect of thematic mapping is whether you use the data in pure form (e.g. population of each country) or in normalized form (e.g. population per square kilometer of territory). In the first case, we will see how many people live in the country, in the second case, we will see how densely populated the territory is. The main reason for normalizing data is to be able to compare very different territories. For example, you can directly compare a large country like Canada with a small country like Switzerland. Although Canada has a much larger population than Switzerland, it has a much smaller population density. Without normalizing the data, it would not be easy to see this fact.

Note: If you are going to create a choropleth, use ONLY normalized data.

To summarize: If you want your users to see magnitude (a quantitative order of magnitude), use pure data. If, however, you want to show relative difference (which already takes into account things like area size), then use normalized data.

Is my data already normalized?

They very well may be! If the numerical data has “x per square kilometer/mile/…” or “x per capita” or “percentage” or “x / y ratio” in the description, then you can skip the data normalization step.

How to normalize your data

The main ways to normalize data are as follows: divide the data by (1) the area corresponding to that data, thus creating data of the form “x per square kilometer/mile”; (2) by the number of people within that area, thus creating data of the form “x per capita” or “x as % of total population”.

The basics of data classification

When to use

If you are going to classify your data, you must decide both the number of classes and the method of partitioning into intervals (classes). There are many different ways to systematically classify your data, below we will discuss their advantages and disadvantages.

The point of classification is to reduce a large number of observations by grouping them into a few intervals or classes. Why? Because it is much easier for users to perceive several well-defined classes than “raw” data. If the classification is done well, it helps to convey the message of the map in a much simpler and clearer way. However, classification is not an easy process, very often it is not possible to find the “ideal” method for a particular set of data at the first attempt. It is always important to understand the data you are working with and not just apply your “favorite” classification method. An inappropriate classification method can create false patterns on a map that have little to do with the real geographic phenomenon you’re trying to visualize. Maps that use questionable classification methods are not just ineffective, they are misleading.

Classification is important because grouping data is one of the most fundamental aspects of map generalization-the process of simplifying the real world to a map canvas framework. So even small differences in this process can dramatically change the appearance of a map and its message. Despite all this, users rarely give it much thought and do not question the classes they propose, and yet this is one of the easiest ways to “cheat with maps” intentionally or out of ignorance. Nevertheless, classification is very useful and is a basic skill when creating thematic maps.

Normalize or not?

Is my data already normalized?

How to normalize your data

The basics of data classification

When to use

Leave a Reply Cancel reply