Editing Likert scale (section)

== Scoring and analysis ==

After the questionnaire is completed, each item may be analyzed separately or in some cases item responses may be summed to create a score for a group of items. Hence, Likert scales are often called summative scales.

Whether individual Likert items can be considered as interval-level data, or whether they should be treated as ordered-categorical data is the subject of considerable disagreement in the literature,<ref>{{cite journal |last=Jamieson |first=Susan |date=2004 |title=Likert Scales: How to (Ab)use Them |journal=Medical Education |volume=38 |issue=12 |pages=1217–1218|doi=10.1111/j.1365-2929.2004.02012.x |pmid=15566531 |s2cid=42509064 |url=http://eprints.gla.ac.uk/59552/1/59552.pdf }}</ref><ref name=NormanGeoff>{{cite journal |last=Norman |first=Geoff |date=2010 |title=Likert scales, levels of measurement and the "laws" of statistics |journal=Advances in Health Sciences Education |volume=15 |issue=5 |pages=625–632|doi=10.1007/s10459-010-9222-y |pmid=20146096 |s2cid=6566608 }}</ref> with strong convictions on what are the most applicable methods.  This disagreement can be traced back, in many respects, to the extent to which Likert items are interpreted as being [[level of measurement|ordinal]] data.

There are two primary considerations in this discussion. First, Likert scales are arbitrary. The value assigned to a Likert item has no objective numerical basis, either in terms of [[measure theory]] or scale (from which a [[distance metric]] can be determined).  The value assigned to each Likert item is simply determined by the researcher designing the survey, who makes the decision based on a desired level of detail. However, by convention Likert items tend to be assigned progressive positive integer values. Likert scales typically range from 2 to 10 – with 3, 5, or, 7 being the most common.<ref>{{cite web |title=Likert Scale Explanation - With an Interactive Example |url=https://www.surveyking.com/help/likert-scale-example |website=SurveyKing |access-date=13 August 2017}}</ref>  Further, this progressive structure of the scale is such that each successive Likert item is treated as indicating a 'better' response than the preceding value. (This may differ in cases where reverse ordering of the Likert scale is needed).

The second, and possibly more important point, is whether the "distance" between each successive item category is equivalent, which is inferred traditionally.  For example, in the above five-point Likert item, the inference is that the 'distance' between category 1 and 2 is the same as between category 3 and 4. In terms of good research practice, an equidistant presentation by the researcher is important; otherwise a bias in the analysis may result.  For example, a four-point Likert item with categories "Poor", "Average", "Good", and "Very Good" is unlikely to have all equidistant categories since there is only one category that can receive a below-average rating.  This would arguably bias any result in favor of a positive outcome. On the other hand, even if a researcher presents what he or she believes are equidistant categories, it may not be interpreted as such by the respondent.

A good Likert scale, as above, will present a ''symmetry'' of categories about a midpoint with clearly defined linguistic qualifiers. In such symmetric scaling, equidistant attributes will typically be more clearly observed or, at least, inferred.  It is when a Likert scale is symmetric and equidistant that it will behave more like an interval-level measurement.  So while a Likert scale is indeed [[level of measurement|ordinal]], if well presented it may nevertheless approximate an interval-level measurement. This can be beneficial since, if it was treated just as an ordinal scale, then some valuable information could be lost if the 'distance' between Likert items were not available for consideration.  The important idea here is that the appropriate type of analysis is dependent on how the Likert scale has been presented.

The validity of such measures depends on the underlying interval nature of the scale. If interval nature is assumed for a comparison of two groups, the paired samples {{var|t}}-test is not inappropriate.<ref name="IndivLikert "/> If non-parametric tests are to be performed the Pratt (1959)<ref name="Pratt ">{{cite journal |last1=Pratt |first1=J. |title=Remarks on zeros and ties in the Wilcoxon signed rank procedures |journal=Journal of the American Statistical Association |date=1959 |volume=54 |issue=287 |pages=655–667 |doi=10.1080/01621459.1959.10501526}}</ref>  modification to the Wilcoxon signed-rank test is recommended over the standard [[Wilcoxon signed-rank test]].<ref name="IndivLikert "/>

Responses to several Likert questions may be summed providing that all questions use the same Likert scale and that the scale is a defensible approximation to an interval scale, in which case the [[central limit theorem]] allows treatment of the data as [[Level of measurement|interval]] data measuring a latent variable.{{citation needed|date=April 2013}} If the summed responses fulfill these assumptions, parametric statistical tests such as the [[analysis of variance]] can be applied. Typical cutoffs for thinking that this approximation will be acceptable is a minimum of four and preferably eight items in the sum.<ref name="Carifio"/><ref name=NormanGeoff/>

To model binary Likert responses directly, they may be represented in a [[binomial distribution|binomial]] form by summing agree and disagree responses separately. The [[chi-squared test|chi-squared]], [[Cochran's Q test]], or [[McNemar test]] are common statistical procedures used after this transformation. Non-parametric tests such as [[chi-squared test]], [[Mann–Whitney test]], [[Wilcoxon signed-rank test]], or [[Kruskal–Wallis test]].<ref name="stats">{{cite web |last=Mogey |first= Nora |title=So You Want to Use a Likert Scale? |url=http://www.icbl.hw.ac.uk/ltdi/cookbook/info_likert_scale/index.html |work=Learning Technology Dissemination Initiative |publisher=Heriot-Watt University |date=March 25, 1999 |access-date=April 30, 2009}}</ref> are often used in the analysis of Likert scale data.

Alternatively, Likert scale responses can be analyzed with an [[ordered probit]] model, preserving the ordering of responses without the assumption of an interval scale.  The use of an ordered probit model can prevent errors that arise when treating ordered ratings as interval-level measurements.<ref name="Liddell">{{cite journal |last1=Liddell |first1=T. |last2=Kruschke |first2=J. |author-link2=John K. Kruschke |title=Analyzing ordinal data with metric models: What could possibly go wrong? |journal=Journal of Experimental Social Psychology |date=2018 |volume=79 |pages=328–348 |doi=10.1016/j.jesp.2018.08.009 |hdl=2022/21970 |hdl-access=free}}</ref> [[Consensus-based assessment]] (CBA) can be used to create an objective standard for Likert scales in domains where no generally accepted or objective standard exists.  Consensus-based assessment (CBA) can be used to refine or even validate generally accepted standards.{{citation needed|date=April 2013}}

=== Latent variable models ===

A common practice for analyzing responses to collections of Likert scale items is to summarize them via a [[latent variable model]], for example using [[factor analysis]] or [[item response theory]].

==== Rasch model ====

Likert scale data can, in principle, be used as a basis for obtaining interval level estimates on a continuum by applying the [[polytomous Rasch model]], when data can be obtained that fit this model. In addition, the polytomous Rasch model permits testing of the [[hypothesis]] that the statements reflect increasing levels of an attitude or trait, as intended. For example, application of the model often indicates that the neutral category does not represent a level of attitude or trait between the disagree and agree categories.

Not every set of Likert scaled items can be used for Rasch measurement. The data has to be thoroughly checked to fulfill the strict formal [[axiom]]s of the model. However, the raw scores are the [[sufficient statistics]] for the Rasch measures, a deliberate choice by [[Georg Rasch]], so, if you are prepared to accept the raw scores as valid, then you can also accept the Rasch measures as valid.

=== Visual presentation of Likert-type data ===
An important part of data analysis and presentation is the visualization (or plotting) of data.  The subject of plotting Likert (and other) rating data is discussed at length in two papers by Robbins and Heiberger.<ref>{{cite book |last1=Robbins |first1=N. B. |last2=Heiberger |first2=R. M. |chapter=Plotting Likert and Other Rating Scales |title=JSM Proceedings, Section on Survey Research Methods |date=2011 |pages=1058–1066 |publisher=American Statistical Association |chapter-url=http://www.asasrms.org/Proceedings/y2011/Files/300784_64164.pdf}}</ref> In the first they recommend the use of what they call diverging stacked bar charts and compare them to other plotting styles. The second paper<ref>{{cite book |last1=Heiberger |first1=R. M. |last2=Robbins |first2=N. B. |chapter=Design of Diverging Stacked Bar Charts for Likert Scales and Other Applications |title=Journal of Statistical Software |date=2014 |volume=57 |issue=5 |pages=1–32 |publisher=American Statistical Association |doi=10.18637/jss.v057.i05 |doi-access=free |s2cid=61139330 |chapter-url=http://www.jstatsoft.org/v57/i05}}</ref> describes the use of the Likert function in the HH package for R, and gives many examples of its use. Another paper <ref>{{cite journal |last1=Koo |first1=M. |last2=Yang |first2=S. W. |title = Likert-Type Scale| journal = Encyclopedia| volume = 5| issue = 1| page = 18| year = 2025| doi = 10.3390/encyclopedia5010018|doi-access=free }}</ref> also provided [[Python (programming language)|Python]] code to create a clustered diverging stacked bar chart of 5-point Likert scale responses.