Editing Psychometrics (section)

== Theoretical approaches ==
Psychometricians have developed a number of different measurement theories. These include [[classical test theory]] (CTT) and [[item response theory]] (IRT).<ref>{{Cite book |last1=Embretson |first1=Susan E. |title=Item Response Theory for Psychologists |last2=Reise |first2=Steven Paul |date=2000 |publisher=L. Erlbaum Associates |isbn=978-0-8058-2818-4}}</ref><ref>Hambleton, R.K., & Swaminathan, H. (1985). ''Item Response Theory: Principles and Applications.'' Boston: Kluwer-Nijhoff.</ref> An approach that seems mathematically to be similar to IRT but also quite distinctive, in terms of its origins and features, is represented by the [[Rasch model]] for measurement. The development of the Rasch model, and the broader class of models to which it belongs, was explicitly founded on requirements of measurement in the physical sciences.<ref>Rasch, G. (1960/1980). ''Probabilistic models for some intelligence and attainment tests''. Copenhagen, Danish Institute for Educational Research, expanded edition (1980) with foreword and afterword by B.D. Wright. Chicago: The University of Chicago Press.</ref>

Psychometricians have also developed methods for working with large matrices of correlations and covariances. Techniques in this general tradition include: [[factor analysis]],<ref>Thompson, B.R. (2004). ''Exploratory and Confirmatory Factor Analysis: Understanding Concepts and Applications.'' American Psychological Association.</ref> a method of determining the underlying dimensions of data. One of the main challenges faced by users of factor analysis is a lack of consensus on appropriate procedures for [[Factor analysis#Criteria for determining the number of factors|determining the number of latent factors]].<ref name="Zwick1986">{{cite journal |last1=Zwick |first1=William R. |last2=Velicer |first2=Wayne F. |title=Comparison of five rules for determining the number of components to retain. |journal=Psychological Bulletin |date=1986 |volume=99 |issue=3 |pages=432–442 |doi=10.1037/0033-2909.99.3.432}}</ref> A usual procedure is to stop factoring when [[Eigenvalues and eigenvectors|eigenvalues]] drop below one because the original sphere shrinks. The lack of the cutting points concerns other multivariate methods, also.<ref>{{Cite book |last=Singh |first=Manoj Kumar |url=https://books.google.com/books?id=wodCEAAAQBAJ&dq=A+usual+procedure+is+to+stop+factoring+when+eigenvalues+drop+below+one+because+the+original+sphere+shrinks.&pg=PA107 |title=Introduction to Social Psychology |date=2021-09-11 |publisher=K.K. Publications |language=en}}</ref>

[[Multidimensional scaling]]<ref>[[Mark L. Davison|Davison, M.L.]] (1992). ''Multidimensional Scaling.'' Krieger.</ref> is a method for finding a simple representation for data with a large number of latent dimensions. [[Cluster analysis]] is an approach to finding objects that are like each other. Factor analysis, multidimensional scaling, and cluster analysis are all multivariate descriptive methods used to distill from large amounts of data simpler structures.

More recently, [[structural equation modeling]]<ref>Kaplan, D. (2008). ''Structural Equation Modeling: Foundations and Extensions'', 2nd ed. Sage.</ref> and [[path analysis (statistics)|path analysis]] represent more sophisticated approaches to working with large [[Covariance matrix|covariance matrices]]. These methods allow statistically sophisticated models to be fitted to data and tested to determine if they are adequate fits. Because at a granular level psychometric research is concerned with the extent and nature of multidimensionality in each of the items of interest, a relatively new procedure known as bi-factor analysis<ref>{{Cite journal |last=DeMars |first=Christine E. |date=2013-10-01 |title=A Tutorial on Interpreting Bifactor Model Scores |url=https://www.tandfonline.com/doi/abs/10.1080/15305058.2013.799067 |journal=International Journal of Testing |volume=13 |issue=4 |pages=354–378 |doi=10.1080/15305058.2013.799067 |issn=1530-5058}}</ref><ref>{{Cite journal |last=Reise |first=Steven P. |date=2012-09-01 |title=The Rediscovery of Bifactor Measurement Models |journal=Multivariate Behavioral Research |volume=47 |issue=5 |pages=667–696 |doi=10.1080/00273171.2012.715555 |issn=0027-3171 |pmc=3773879 |pmid=24049214}}</ref><ref>{{Cite journal |last1=Rodriguez |first1=Anthony |last2=Reise |first2=Steven P. |last3=Haviland |first3=Mark G. |date=June 2016 |title=Evaluating bifactor models: Calculating and interpreting statistical indices. |url=https://doi.apa.org/doi/10.1037/met0000045 |journal=Psychological Methods |language=en |volume=21 |issue=2 |pages=137–150 |doi=10.1037/met0000045 |pmid=26523435 |issn=1939-1463}}</ref> can be helpful. Bi-factor analysis can decompose "an item's systematic variance in terms of, ideally, two sources, a general factor and one source of additional systematic variance."<ref>{{Cite journal |last1=Schonfeld |first1=Irvin Sam |last2=Verkuilen |first2=Jay |last3=Bianchi |first3=Renzo |date=August 2019 |title=An exploratory structural equation modeling bi-factor analytic approach to uncovering what burnout, depression, and anxiety scales measure. |url=https://doi.apa.org/doi/10.1037/pas0000721 |journal=Psychological Assessment |language=en |volume=31 |issue=8 |pages=1073–1079 |doi=10.1037/pas0000721 |pmid=30958024 |issn=1939-134X}}</ref>

=== Key concepts ===
Key concepts in classical test theory are [[Reliability (psychometric)|reliability]] and [[Test validity|validity]]. A reliable measure is one that measures a construct consistently across time, individuals, and situations. A valid measure is one that measures what it is intended to measure. Reliability is necessary, but not sufficient, for validity.

Both reliability and validity can be assessed statistically. Consistency over repeated measures of the same test can be assessed with the Pearson correlation coefficient, and is often called ''test-retest reliability.''<ref name="gifted.uconn">{{cite web|url=http://www.gifted.uconn.edu/Siegle/research/Instrument+Reliability+and+Validity/Reliability.htm|title=Home – Educational Research Basics by Del Siegle|website=www.gifted.uconn.edu|date=17 February 2015}}</ref> Similarly, the equivalence of different versions of the same measure can be indexed by a [[Pearson product-moment correlation coefficient|Pearson correlation]], and is called ''equivalent forms reliability'' or a similar term.<ref name="gifted.uconn"/>

Internal consistency, which addresses the homogeneity of a single test form, may be assessed by correlating performance on two halves of a test, which is termed ''split-half reliability''; the value of this [[Pearson product-moment correlation coefficient]] for two half-tests is adjusted with the [[Spearman–Brown prediction formula]] to correspond to the correlation between two full-length tests.<ref name="gifted.uconn"/> Perhaps the most commonly used index of reliability is [[Cronbach's α]], which is equivalent to the [[mean]] of all possible split-half coefficients. Other approaches include the [[intra-class correlation]], which is the ratio of variance of measurements of a given target to the variance of all targets.

There are a number of different forms of validity. [[Criterion validity|Criterion-related validity]] refers to the extent to which a test or scale predicts a sample of behavior, i.e., the criterion, that is "external to the measuring instrument itself."<ref>Nunnally, J.C. (1978). ''Psychometric theory'' (2nd ed.). New York: McGraw-Hill.</ref> That external sample of behavior can be many things including another test; college grade point average as when the high school SAT is used to predict performance in college; and even behavior that occurred in the past, for example, when a test of current psychological symptoms is used to predict the occurrence of past victimization (which would accurately represent postdiction). When the criterion measure is collected at the same time as the measure being validated the goal is to establish ''[[concurrent validity]]''; when the criterion is collected later the goal is to establish ''[[predictive validity]]''. A measure has ''[[construct validity]]'' if it is related to measures of other constructs as required by theory. ''[[Content validity]]'' is a demonstration that the items of a test do an adequate job of covering the domain being measured. In a personnel selection example, test content is based on a defined statement or set of statements of knowledge, skill, ability, or other characteristics obtained from a ''[[job analysis]]''.

[[Item response theory]] models the relationship between [[latent trait]]s and responses to test items. Among other advantages, IRT provides a basis for obtaining an estimate of the location of a test-taker on a given latent trait as well as the standard error of measurement of that location. For example, a university student's knowledge of history can be deduced from his or her score on a university test and then be compared reliably with a high school student's knowledge deduced from a less difficult test. Scores derived by classical test theory do not have this characteristic, and assessment of actual ability (rather than ability relative to other test-takers) must be assessed by comparing scores to those of a "norm group" randomly selected from the population. In fact, all measures derived from classical test theory are dependent on the sample tested, while, in principle, those derived from item response theory are not.