Categorizing Numeric Data

DataCracker allows you to categorize your numeric data.  Your survey may have asked a question such as "How many years has your business been operating?" or "How old are you?" - and asked the respondent to move a slider or enter a number as their answer.  DataCracker will normally infer that this data is Numeric, however if the data is Text, you will first need to convert it to Numeric using DATA MANIPULATION > Structure > Average.  Categorizing is the act of allocating the numbers into categories as numeric ranges.

This is also known as:

  • The act of bucketing, where buckets are the categories.
  • The act of bracketing, where brackets are the categories.

The benefits of categorizing are:

  • Allows you to avoid creating categories up front in your survey question, for when you do not yet understand your market.
  • Dynamically create and adjust categories/buckets that suite the distribution of your responses (i.e. if you get a lot of young people answering your survey, you can specify lots of young age categories).
  • View percentages instead of an average.
  • Break down other questions in your survey by the categories.

In DataCracker, categorizing is performed with a tool that accompanies Histogram charts.  It is a visual way of categorizing.  Here is a tutorial:

1. Upload the bus phone survey.sav data set into DataCracker, selecting Other as the source of data, and selecting a Short report, single chart page layout.

2. Select the page titled Q5. Years of operation of business (numeric)

You can see that DataCracker has automatically chosen to represent this data as a histogram.  The average number of years of business operation is 21.7 (bold on the X axis).  The program has also chosen an automatic number of bins.

3. Click the HISTOGRAM ribbon tab.

4. Click the Categories button.

1.       This is the tool which will allow you to allocate the numeric data into categories (“buckets” mean the same thing, but the standard terminology is categories).

2.       Observe that the options are:

                                                               i.      Do not generate categories – currently selected

                                                             ii.      Generate categories with equal proportions – this is a starting point, where the data is categorized into 3 categories with equal proportions (e.g. 33%, or as close as it can be, according to the data)

                                                            iii.      Generate categories with equal intervals – this is an alternative starting point, where the 3 categories are equally spaced between the minimum and maximum.

6.       6. Choose Generate categories with equal proportions.

1.       Observe that 2 red lines have been overlayed on top of the histogram.

2.       Behind the scenes, a new data item has been added which represents the percentages of people in each category.  The labels of the data match the labels shown above the histogram (“Less than 51,” “51 – 100” and “102 or more”).

7.       7. You may customise the categories.  For example:

1.       Change the number of categories to 5 by clicking the Categories button again, and changing Number of categories from 3 to 4.  A new red line is added on top of the histogram.

2.       Change the category cutoff points.  For example, click on one the first red line so it appears selected.  Once it appears selected (a new grey rectangle appears around it), click and drag the line to the left or right to change its cutoff point.  Once you let go of your mouse, you can observe that the category labels and percentages update automatically.
Tip: when the red lines are overlapping the blue bins, they can be difficult to select.  Try selecting near the red value at the top of the line, or zooming in first.

3.       Change the category labels.  For example, click on the first category label (Less than XX).  Observe you now have all category labels selected.  Click on the first label again to select it alone.  Now, type in a new label in the ribbon at HISTOGRAM > Histogram > Category label, and press Enter when done.  Observe the label updates on the histogram, and behind the scenes, it also updates the new data item that represents the categories.

8.       8. You may now use the new categorized data in other charts or tables:

1.       Select the page Q4. The businesses number of locations

2.       In HOME > Data Selection > By, select the new data Histogram categories - Q5. Years of operation of business (numeric)
Tip: 
This new data will be at the bottom of the list, not the top!

3.       Observe that the chart now shows the data by your categories.

 

Now that you have made category data, there are two important points to note about DataCracker in general that will make you work better:

  • DataCracker is dynamic: if you further customize your categories in the histogram (e.g. changing the cutoff points or category labels), that will automatically flow through to any chart that shows this categorized data.
  • If you make a mistake when categorizing, you can use the Undo button at any time.

Feedback and Knowledge Base