In our previous post, we introduced a model which attempts to explain the relationship between means scores versus summarizing results using the percent of respondents who gave the highest (Top Box or TB), two highest (Top 2 Box or T2B), or three highest ratings (Top 3 Box or T3B). We plotted our model against actual data from a study with 21 rating items to demonstrate that the fit was reasonable. In Part 2, we take our work further and address some follow-up questions which this modeling raises. First, how can we improve the fit between mean and TB/T2B/T3B? Second, is this model robust enough to apply to any rating scale?
Naturally our data modelers strive for a better fit as long as the improvement justifies the additional complexity. The question can be reworded non-technically as, “How can we get closer to the actual data with the fewest and simplest additional steps possible?” The technical version of the question is, “How do we minimize the sum of squared residuals (SSR) with the fewest additional parameters?”
We can take advantage of the fact that TB/T2B/T3B results are bounded by 0% and 100% by adding an exponent to our original model. (We’ll call it P for ‘Power.’) The new model will still be bounded by 0% and 100% as long as P ≥ 0. When P is close to 1.0, the original model stands as the best possible fit.
We used non-linear optimization to find values for P which minimize SSR for each curve. Below are the results for the data we have been examining. The original model’s results are shown as dashed lines while the new model’s results are shown as solid lines.
|SSR, new model||0.033||0.022||0.015|
The chart below shows modest improvements between the original model (Original Fit) and the new model (Power Fit) using the original data involving 6-point agreement scales.
The 6-point agreement scales are broadly distributed, providing a more ideal situation for statistical modeling. We also tested the original and new models on a variety of other scale types. On both 5-point and 6-point agreement and satisfaction questions, both models perform very well with the new model having only a slightly advantage.
However, as the scales become smaller and the results more skewed, the new model performs significantly better. You can see the advantage of the new model on these 4-point satisfaction questions.