Tips for grouping data

There are several options for how you group -- or don't group -- the data you've defined in your data series. How do you know what to pick?

We encourage you to experiment a bit with the different options to see the various stories you can tell by aggregating your data to different attributes. However, we also want you to understand when and why you might use some of these options. First, let's look at a table of when you can use each option:

Grouping Option User-uploaded dataset IHS Polk dataset Survey-style dataset (e.g., Nielsen Scarborough)
Define Data Groupings - single attribute Supported Supported Supported
Define Data Groupings - flexible groupings     Supported
Show Totals

Supported

Supported Supported

Table 1-1 Supported Grouping Options

When to use Define Data Groupings - single Attribute

  • When your dataset does not support flexible groupings
  • When you want to aggregate to only a single attribute
  • When you want to aggregate to all of the values within an attribute (this is often true with attributes that have very few values, such as sex of respondent, or that have a small number of highly relevant values, such as level of education or race)
  • When you want to explore the data; once you find information you want to dig into, use flexible groupings to refine your results (assuming your dataset supports flexible grouping)

When to use Define Data Groupings - Flexible Groupings

First, let's define what flexible groupings means. Whereas in the previous option you were restricted to aggregating to all of the values in a single attribute, with flexible grouping, you can:

  • Select only certain answers from an attribute (question): Previously, all answers were selected; often, it placed unwanted data on visualizations, especially when there were answers like "none."
  • Select answers from different attributes (questions): Previously, you could aggregate to only a single attribute. Now, you can select as many different attributes -- and as many (or as few) answers for each attribute -- as you want.

So, when would you want to use this option? Let's look at an example. Steve is researching whether certain demographics participate in healthy behaviors in a market; his research will help his healthy-lifestyle-oriented client determine how to position its ads. Steve is using the Nielsen Scarborough USA+ dataset, which scatters information about healthy living across multiple categories and questions. He can find relevant answers in questions like "Activities Past 12 Months," "Eco-Friendly Activity Activities Done on a Regular Basis," " Food Products HouseHold Used Past 7 Days," and "Lifestyle Characteristics," just to name a few. Each of these questions has many answers, and only some are applicable for the healthy lifestyle research he's doing. Steve carefully picks only those answers from the questions, and ends up with a flexible grouping that looks similar to this:

An example of flexible groupings with a survey-style dataset

Figure 35: An example of flexible groupings (click to enlarge)

When he creates his visualization, it might look similar to this (assuming he created data series defining his target demographic, as well as a context series to compare this market's behavior to the nation). He can quickly see how health-oriented this particular target demographic is and use the findings in his client research.

Figure 36: The resulting visualization, using flexible groupings (click to enlarge)

When to use Show Totals

Use Show Totals when:

  • You have carefully crafted multiple data series to include exactly the attributes you want to examine
  • You do not want to further break out your data by applying a grouping
  • You are interested in knowing only the net totals for the data series you have defined and comparing them across the series

For example, let's imagine that Nicole wants to know specifically how many women in certain income and age brackets have purchased a Honda in 2015 in the Atlanta DMA. She carefully creates four different data series that will allow her to compare the total number of registrations for each target demographic she is interested in. Her data series might look something like this:

Four data series defining very specific target demographics

Figure 37: The four data series that Nicole constructed; each is very detailed (click to enlarge)

The default visualization is a table. If Nicole adds a table for each data series, she can quickly see that among Atlanta women aged 25 - 44, those who make between $50,000 and $100,000 a year account for significantly more Honda registrations than any other defined target demographic.

Figure 38: The default table visualization when you choose the Show Totals grouping option

You can use the Show Totals option with table, column, and bar chart visualizations, and you can use one of the data series as an index. If you use a table visualization, there is only a single row, with an optional Net Total row. The Proportion column will always have a value of 100% because you are showing a single record: the total count of everything defined in your data series. Likewise, if you've defined a data series to be used as an index, the Index column will also have a value of 100%.

A table and bar chart showing the results of Show Totals

Figure 39: A table and a bar chart showing the output from the Show Totals option; the table contains a Proportion and Index column (click to enlarge)

Three caveats about using Show Totals:

  • Sample size becomes critical here. It's possible to create such a specific data series that only a few -- or no -- respondents match it (for example, a data series that specifies roller-skating nuns aged 65-99 in the ZIP Code 15213 mostly likely has no data associated with it because it's too specific).
  • If you are creating a template and the Show Totals option is set, it is locked and all future presentations will be created with this option. However, if you create a template with Define Data Groupings selected, you cannot then select Show Totals in any presentation you create from the template.
  • Weighted calculations are not applied when using Show Totals. This means that the value you get might differ from the value you'd get if you grouped your data series instead (for example, if you decided to aggregate them by ZIP Code or other attribute).

Related information

Group (or aggregate) your data

Target series

Examine data in your presentation


© 2016, Rhiza, Inc. All rights reserved. Last updated May 23, 2016 03:49:16 PM.

Legal Information