-
Marketplace
-
Channel Resources
Articles from this Site
Insights in Visualizing Populations-Part 1
Why do we care about populations? In any organization, there is often a need to compare groups. For example, these could be groups of our customers, clients, products, orders, suppliers, or employees. Thus, someone might ask: "What is the average order size from our customers in the Northeast versus our customers in the Southeast?" In the case below, we ask the question: "How does the average salary of our male and female employees compare at each Job Level?" As with every business intelligence question, we will see that it is critically important to carefully select both the metric - and the method used to communicate that metric - to the mind of management.
A Case Study in Salary Equity Analysis
Let's assume you were recently promoted to VP of Human Resources of a Fortune-500 company. That's the good news. The bad news is that tomorrow morning you have a meeting with a delegation from the EEOC. That is the Equal Employment Opportunity Commission, the agency of the United States Government that enforces the federal employment discrimination laws. It seems that someone has filed a complaint that your company discriminates by paying lower salaries to women than it does to men.
To prepare for this meeting, you ask your business intelligence staff to prepare a graphic presentation for you. In an effort to assure your information request, it seems that two different analysts have each sent you a chart. The data, presented in Figures 1 and 2, is real data from a Fortune-500 company, though masked to hide the identity of that organization.
Which chart would you rather show at tomorrow morning's meeting, Figure 1 or Figure 2? Before you read on, make a decision.
Figure 1
![]() |
Figure 2
![]() |
After seeing these two charts, most people quickly decide that Figure 1 looks much better. After all, at each Job Level the pairing of the average male and female salaries seems to be rather close. In Figure 2, there is a very visible saw tooth effect. It is quite evident in Figure 2 that there is notable disparity between the average male and female salaries at most Job Levels.
Those who make this quick, and often decisive, decision are invariably quite surprised to learn that the data in Figures 1 and 2 are identical. There is absolutely no difference between the data in the two figures!
Now, many readers are probably scratching their heads and thinking. There are only three dimensions here, and a total of only 14 numbers on the entire chart. What is going on to muddle our perceptions?
To solve the problem, let's look at the dimensions. In both Figures 1 and 2, the three dimensions are:
- Salary, a ratio dimension, on the vertical axis,
- Job Level, an ordinal dimension, on the horizontal axis
- Gender (M/F), a nominal dimension, also on the horizontal axis.
Job Level is what we call the 'primary sort' as its sequence is sorted first across the horizontal axis. It goes from the lowest job level on the left to the highest Job Level on the right. This is an expected ordering, and we would be surprised to see it reversed.
Gender is the 'secondary sort' in both figures. There is no expected ordering here. Figure 1 happens to present the 'gallant' sort, with 'ladies before gentleman.'
Across the primary sort (Job Level) in both figures there is an expected tendency for the average salary to be higher for employees at a higher Job Level. In Figure 1, with the gallant sort for Gender, the average female salary always comes first, and is shown to the left, with the average male salary to the right in each pair at the same Job Level. Since the overpowering trend of the primary sort leads us to expect the average salary to rise as the eye moves across the chart to the right, the fact that, at almost every job level, the average male salary is clearly higher than the average female salary is hardly discernable.
When the sequence of the secondary sort is changed, as shown in Figure 2, the female salary, which is now to the right of the male salary in each Job Level pair, seems conspicuously low. The average female salary in each pair, being lower on the right, is bucking the trend of the primary sort, and the saw tooth effect is glaring to the eye. The only difference between the two figures is the seemingly innocuous sequence of this secondary sort. Those with effective graphicacy skills know to look for this potential problem if the secondary sort is a nominal dimension-a named dimension, with no standard ordering sequence.
Now that you know about this issue:
- Which figure gives a fair representation of the reality in the data?
- Which figure would you show in your presentation to the delegation from the EEOC?
- Which figure would you use internally to help management fix a salary equity problem that you really want to fix?
Advocacy Graphics:
The questions above lead to the realm of what I call 'Advocacy Graphics'- data visualizations selected to consciously advocate for a particular position. In the true case described above, the original creation of the images in Figures 1 and 2 happened to be a naïve mistake done entirely by accident. The way they may be used, though, invokes the issue of 'advocacy.' If you really were this VP of Human Resources, would you be faulted for using Figure 1 in your EEOC meeting and Figure 2 at your next internal management meeting?
Sadly, the worst advocacy graphics are created by those using some graphicacy knowledge for nefarious purposes. The only way to fight back is to be trained in graphicacy, to spot any such effort. Increasingly, presentations are being made using live data. The days of overhead transparencies or 35mm slides are long behind us. What will keep people from using underhanded advocacy graphics techniques is the fear of someone in the audience asking, "Can you please change that secondary sort, now, while we watch?" As more managers become sensitized to graphicacy issues such as this (and other graphicacy issues as well) we can expect a decrease in advocacy graphics and an increase in the number of decisions made dispassionately, based on a full understanding of the reality in the data.
Is this the best metric to address the Salary Equity question?
Beyond advocacy graphics, if you truly want to understand the reality in the data, is 'average' the appropriate metric to use? Averages are commonly used in many organizations because they are generally easy to calculate and seemingly easy for an audience to understand. Reflecting back on Figures 1 and 2 we can consider what it took to get the 14 numbers presented there.
Each employee on the corporate payroll was sorted into a virtual bucket of Job Level, and each bucket was sorted into male and female sub-buckets. Then the salaries for all of the females in each sub-bucket were added up and the sum was divided by the number of women in that bucket. This produced the average salary for all of the women at that Job Level. The same thing was repeated for the men at that Job Level, and again for every other Job Level. With more than 50,000 employees in the data pool used in Figures 1 and 2, it was good to have a computer to add, count, and divide.
The above process was a technical challenge for the mid 20,th century. How can we do better now? We will take this process to a higher level as our next Data Visualization column, "Insights in Visualizing Populations-Part 2," will present a 21st century visualization opportunity.
For more information on related topics, visit the following channels:




