Editing Intelligence quotient (section)

==Reliability and validity==
===Reliability===
{|class="wikitable sortable" style="font-size:small; float:right; text-align:center; margin:0 0 0.5em 1em" summary="Sortable table showing actual I.Q. scores of twelve students on three different I.Q. tests, with students identified by pseudonyms in cited data source."
|+ IQ scores can differ to some degree for the same person on different IQ tests, so a person does not always belong to the same IQ score range each time the person is tested. (IQ score table data and pupil pseudonyms adapted from description of KABC-II norming study cited in {{harvp|Kaufman|2009}}.<ref name="Kaufman2009Fig5.1" /><ref name="KaufmanSB2013Fig3.1" />)
|-
! class="unsortable" |Pupil!!KABC-II!!WISC-III!!WJ-III
|-
|A||90||95||111
|-
|B||125||110||105
|-
|C||100||93||101
|-
|D||116||127||118
|-
|E||93||105||93
|-
|F||106||105||105
|-
|G||95||100||90
|-
|H||112||113||103
|-
|I||104||96||97
|-
|J||101||99||86
|-
|K||81||78||75
|-
|L||116||124||102
|}

Psychometricians generally regard IQ tests as having high [[Reliability (psychometrics)|statistical reliability]].{{sfn|Neisser et al.|1995}}<ref name="Mackintosh2011p169">{{Harvnb |Mackintosh|2011|page=169}} "after the age of 8–10, IQ scores remain relatively stable: the correlation between IQ scores from age 8 to 18 and IQ at age 40 is over 0.70."</ref> Reliability represents the measurement consistency of a test.<ref name="Weiten">{{cite book|vauthors= Weiten W|title=Psychology: Themes and Variations |publisher=[[Cengage Learning]]|year=2016|page=281|isbn=978-1305856127 |url=https://books.google.com/books?id=ALkaCgAAQBAJ&pg=PT331}}</ref> A reliable test produces similar scores upon repetition.<ref name="Weiten"/> On aggregate, IQ tests exhibit high reliability, although test-takers may have varying scores when taking the same test on differing occasions, and may have varying scores when taking different IQ tests at the same age. Like all statistical quantities, any particular estimate of IQ has an associated standard error that measures uncertainty about the estimate. For modern tests, the confidence interval can be approximately 10 points and reported [[standard error of measurement]] can be as low as about three points.<ref>{{cite web |title=WISC-V Interpretive Report Sample |website=Pearson |url=https://images.pearsonclinical.com/images/assets/wisc-v/WISC-VInterpretiveReportSample-1.pdf |access-date=29 September 2020 |pages=18 |archive-date=22 April 2022 |archive-url=https://web.archive.org/web/20220422190842/https://images.pearsonclinical.com/images/assets/wisc-v/WISC-VInterpretiveReportSample-1.pdf |url-status=dead }}</ref> Reported standard error may be an underestimate, as it does not account for all sources of error.<ref>{{cite book |last1=Kaufman |first1=Alan S. |last2=Raiford |first2=Susan Engi |last3=Coalson |first3=Diane L. |year=2016 |title=Intelligent testing with the WISC-V |publisher=Wiley |location=Hoboken, NJ |isbn=978-1-118-58923-6 |pages=683–702 |quote=Reliability estimates in Table 4.1 and standard errors of measurement in Table 4.4 should be considered best-case estimates because they do not consider other major sources of error, such as transient error, administration error, or scoring error (Hanna, Bradley, & Holen, 1981), which influence test scores in clinical assessments. Another factor that must be considered is the extent to which subtest scores reflect portions of true score variance due to a hierarchical general intelligence factor and variance due to specific group factors because these sources of true score variance are conflated.}}</ref>

Outside influences such as low motivation or high anxiety can occasionally lower a person's IQ test score.<ref name="Weiten" /> For individuals with very low scores, the 95% confidence interval may be greater than 40 points, potentially complicating the accuracy of diagnoses of intellectual disability.<ref>{{cite journal |last1=Whitaker |first1=Simon |title=Error in the estimation of intellectual ability in the low range using the WISC-IV and WAIS-III |journal=Personality and Individual Differences |date=April 2010 |volume=48 |issue=5 |pages=517–521 |url=https://www.researchgate.net/publication/222824571 |access-date=22 January 2020 |doi=10.1016/j.paid.2009.11.017}}</ref> By the same token, high IQ scores are also significantly less reliable than those near to the population median.<ref>{{harvnb|Lohman|Foley Nicpon|2012|p={{page needed|date=October 2020}}}}. "The concerns associated with SEMs [standard errors of measurement] are actually substantially worse for scores at the extremes of the distribution, especially when scores approach the maximum possible on a test ... when students answer most of the items correctly. In these cases, errors of measurement for scale scores will increase substantially at the extremes of the distribution. Commonly the SEM is from two to four times larger for very high scores than for scores near the mean (Lord, 1980)."</ref> Reports of IQ scores much higher than 160 are considered dubious.<ref>{{harvnb|Urbina|2011|p=20}} "[Curve-fitting] is just one of the reasons to be suspicious of reported IQ scores much higher than 160"</ref>

===Validity as a measure of intelligence===
Reliability and validity are very different concepts. While reliability reflects reproducibility, validity refers to whether the test measures what it purports to measure.<ref name="Weiten" /> While IQ tests are generally considered to measure some forms of intelligence, they may fail to serve as an accurate measure of broader definitions of [[human intelligence]] inclusive of, for example, [[creativity]] and [[social intelligence]]. For this reason, psychologist Wayne Weiten argues that their [[construct validity]] must be carefully qualified, and not be overstated.<ref name="Weiten" /> According to Weiten, "IQ tests are valid measures of the kind of intelligence necessary to do well in academic work. But if the purpose is to assess intelligence in a broader sense, the validity of IQ tests is questionable."<ref name="Weiten" />

Some scientists have disputed the value of IQ as a measure of intelligence altogether. In ''[[The Mismeasure of Man]]'' (1981, expanded edition 1996), [[Evolutionary biology|evolutionary biologist]] [[Stephen Jay Gould]] compared IQ testing with the now-discredited practice of determining intelligence via [[craniometry]], arguing that both are based on the fallacy of [[Reification (fallacy)|reification]], "our tendency to convert abstract concepts into entities".<ref name="TMoMp24">{{harvnb|Gould|1981|p=24}}. {{harvnb|Gould|1996|p=[https://books.google.com/books?id=WTtTiG4eda0C&pg=PA56 56]}}.</ref> Gould's argument sparked a great deal of debate,<ref name="Kaplan et al">{{cite journal|last1=Kaplan|first1=Jonathan Michael |last2=Pigliucci|first2=Massimo|last3=Banta|first3=Joshua Alexander|year=2015|title=Gould on Morton, Redux: What can the debate reveal about the limits of data? |journal=Studies in History and Philosophy of Biological and Biomedical Sciences |url=http://philpapers.org/archive/KAPGOM.pdf |volume=30|pages=1–10}}</ref><ref>{{Cite journal|last1=Weisberg|first1=Michael|last2=Paul|first2=Diane B.|date=19 April 2016 |title=Morton, Gould, and Bias: A Comment on "The Mismeasure of Science" |journal=PLOS Biology |volume=14 |issue=4 |at=e1002444 |doi=10.1371/journal.pbio.1002444 |issn=1544-9173 |pmc=4836680 |pmid=27092558 |doi-access=free }}</ref> and the book is listed as one of ''[[Discover (magazine)|Discover Magazine]]''{{'}}s "25 Greatest Science Books of All Time".<ref>{{cite magazine |date=7 December 2006 |url=https://www.discovermagazine.com/the-sciences/25-greatest-science-books-of-all-time |title=25 Greatest Science Books of All Time |magazine=Discover}}</ref>

Along these same lines, critics such as [[Keith Stanovich]] do not dispute the capacity of IQ test scores to predict some kinds of achievement, but argue that basing a concept of intelligence on IQ test scores alone neglects other important aspects of mental ability.{{sfn|Neisser et al.|1995}}<ref>[[David Brooks (journalist)|Brooks, David]] (14 September 2007). [https://www.nytimes.com/2007/09/14/opinion/14brooks.html "The Waning of I.Q."]. ''[[The New York Times]]''.</ref> [[Robert Sternberg]], another significant critic of IQ as the main measure of human cognitive abilities, argued that reducing the concept of intelligence to the measure of ''g'' does not fully account for the different skills and knowledge types that produce success in human society.<ref>Sternberg, Robert J., and Richard K. Wagner. "The g-ocentric view of intelligence and job performance is wrong." Current directions in psychological science (1993): 1–5.</ref>

Despite these objections, clinical psychologists generally regard IQ scores as having sufficient [[Validity (statistics)|statistical validity]] for many clinical purposes.{{Specify |reason=Quick summary of which clinical purposes it's used for, and ideally some of the limitations. Is this sentence better suited for the reliability section?|date=October 2020}}<ref name="Kaufman2009"/>{{sfn|Anastasi|Urbina|1997|pp=326–327}}

===Test bias or differential item functioning===
Differential item functioning (DIF), sometimes referred to as measurement bias, is a phenomenon when participants from different groups (e.g. gender, race, disability) with the same [[Latent trait|latent abilities]] give different answers to specific questions on the same IQ test.<ref>Embretson, S. E., Reise, S. P. (2000).''Item Response Theory for Psychologists''. New Jersey: Lawrence Erlbaum.</ref> DIF analysis measures such specific items on a test alongside measuring participants' latent abilities on other similar questions. A consistent different group response to a specific question among similar types of questions can indicate an effect of DIF. It does not count as differential item functioning if both groups have an equally valid chance of giving different responses to the same questions. Such bias can be a result of culture, educational level and other factors that are independent of group traits. DIF is only considered if test-takers from different groups ''with the same underlying [[Latent variable|latent]] ability level'' have a different chance of giving specific responses.<ref name=":1">{{cite journal |last1=Zumbo|first1=B.D. |year=2007|title=Three generations of differential item functioning (DIF) analyses: Considering where it has been, where it is now, and where it is going |journal=Language Assessment Quarterly |volume=4 |issue=2 |pages=223–233|doi=10.1080/15434300701375832|s2cid=17426415}}</ref> Such questions are usually removed in order to make the test equally fair for both groups. Common techniques for analyzing DIF are [[item response theory]] (IRT) based methods, Mantel-Haenszel, and [[logistic regression]].<ref name=":1" />

A 2005 study found that "differential validity in prediction suggests that the [[Wechsler Adult Intelligence Scale|WAIS-R]] test may contain cultural influences that reduce the validity of the WAIS-R as a measure of cognitive ability for Mexican American students,"<ref>{{cite journal|last1=Verney|first1=SP|last2=Granholm |first2=E|last3=Marshall|first3=SP|last4=Malcarne|first4=VL|last5=Saccuzzo|first5=DP|year=2005 |title=Culture-Fair Cognitive Ability Assessment: Information Processing and Psychophysiological Approaches |journal=Assessment|volume=12|issue=3|pages=303–19|doi=10.1177/1073191105276674|pmid=16123251 |s2cid=31024437}}</ref> indicating a weaker positive correlation relative to sampled white students. Other recent studies have questioned the culture-fairness of IQ tests when used in South Africa.<ref>{{cite journal|last1=Shuttleworth-Edwards|first1=Ann|last2=Kemp|first2=Ryan|last3=Rust |first3=Annegret |last4=Muirhead|first4=Joanne|last5=Hartman|first5=Nigel|last6=Radloff|first6=Sarah|year=2004|title=Cross-cultural Effects on IQ Test Performance: AReview and Preliminary Normative Indications on WAIS-III Test Performance |journal=Journal of Clinical and Experimental Neuropsychology|volume=26|issue=7|pages=903–20 |doi=10.1080/13803390490510824 |pmid=15742541 |s2cid=16060622}}</ref><ref>{{cite journal|last1=Cronshaw |first1=Steven F. |last2=Hamilton|first2=Leah K.|last3=Onyura|first3=Betty R. |last4=Winston |first4=Andrew S. |year=2006|title=Case for Non-Biased Intelligence Testing Against Black Africans Has Not Been Made: A Comment on Rushton, Skuy, and Bons (2004)|journal=International Journal of Selection and Assessment |volume=14|issue=3|pages=278–87|doi=10.1111/j.1468-2389.2006.00346.x |s2cid=91179275}}</ref> Standard intelligence tests, such as the Stanford–Binet, are often inappropriate for [[autistic]] children; the alternative of using developmental or adaptive skills measures are relatively poor measures of intelligence in autistic children, and may have resulted in incorrect claims that a majority of autistic children are of low intelligence.<ref>{{cite journal|last1=Edelson|first1=M. G. |year=2006|title=Are the Majority of Children With Autism Mentally Retarded?: A Systematic Evaluation of the Data|journal=Focus on Autism and Other Developmental Disabilities |volume=21|issue=2|pages=66–83 |doi=10.1177/10883576060210020301|s2cid=145809356}}</ref>

===Flynn effect===
{{Main|Flynn effect}}

Since the early 20th century, raw scores on IQ tests have increased in most parts of the world.<ref name="Neisser1998">{{Cite book|title=The Rising Curve: Long-Term Gains in IQ and Related Measures |editor-last=Neisser |editor-first=Ulric |display-authors=8 |author1=Ulric Neisser |author2=James R. Flynn |author3=Carmi Schooler |author4=Patricia M. Greenfield |author5=Wendy M. Williams |author6=Marian Sigman |author7=Shannon E. Whaley |author8=Reynaldo Martorell |author9=Richard Lynn |author10=Robert M. Hauser |author11=David W. Grissmer |author12=Stephanie Williamson |author13=Sheila Nataraj Kirby |author14=Mark Berends |author15=Stephen J. Ceci |author16=Tina B. Rosenblum |author17=Matthew Kumpf |author18=Min-Hsiung Huang |author19=Irwin D. Waldman |author20=Samuel H. Preston |author21=John C. Loehlin |year=1998 |publisher=American Psychological Association |location=Washington, DC |isbn=978-1-55798-503-3 |series=APA Science Volume Series |url=https://archive.org/details/risingcurvelongt00neis}}{{Page needed|date=January 2011}}</ref>{{sfn|Mackintosh|1998|p={{Page needed|date=January 2011}}}}{{sfn|Flynn|2009|p={{Page needed|date=January 2011}}}} When a new version of an IQ test is normed, the standard scoring is set so performance at the population median results in a score of IQ 100. The phenomenon of rising raw score performance means if test-takers are scored by a constant standard scoring rule, IQ test scores have been rising at an average rate of around three IQ points per decade. This phenomenon was named the Flynn effect in the book ''[[The Bell Curve]]'' after [[Jim Flynn (academic)|James R. Flynn]], the author who did the most to bring this phenomenon to the attention of psychologists.<ref name="Flynn1984">{{cite journal |last1=Flynn |first1=James R. |title=The mean IQ of Americans: Massive gains 1932 to 1978. |journal=Psychological Bulletin |volume=95 |issue=1 |pages=29–51 |year=1984 |doi=10.1037/0033-2909.95.1.29 |s2cid=51999517}}</ref><ref name="Flynn1987">{{cite journal |last1=Flynn |pages=171–91 |first1=James R. |issue=2 |volume=101 |year=1987 |title=Massive IQ gains in 14 nations: What IQ tests really measure. |doi=10.1037/0033-2909.101.2.171 |journal=Psychological Bulletin}}</ref>

Researchers have been exploring the issue of whether the Flynn effect is equally strong on performance of all kinds of IQ test items, whether the effect may have ended in some developed nations, whether there are social subgroup differences in the effect, and what possible causes of the effect might be.<ref>{{Cite book |last1=Zhou |first1=Xiaobin |last2=Grégoire |first2=Jacques |last3=Zhu |first3=Jianjin |title=WAIS-IV Clinical Use and Interpretation: Scientist-Practitioner Perspectives |editor1-last=Weiss |editor1-first=Lawrence G. |editor2-last=Saklofske |editor2-first=Donald H. |editor3-last=Coalson |editor3-first=Diane |editor4-last=Raiford |editor4-first=Susan |chapter=The Flynn Effect and the Wechsler Scales |year=2010 |publisher=Academic Press |location=Amsterdam |series=Practical Resources for the Mental Health Professional |isbn=978-0-12-375035-8}}{{Page needed|date=January 2011}}</ref> A 2011 textbook, ''IQ and Human Intelligence'', by [[Nicholas Mackintosh|N. J. Mackintosh]], noted the Flynn effect demolishes the fears that IQ would be decreased. He also asks whether it represents a real increase in intelligence beyond IQ scores.{{sfn|Mackintosh|2011|pp=25–27}} A 2011 psychology textbook, lead authored by Harvard Psychologist Professor [[Daniel Schacter]], noted that humans' inherited intelligence could be [[dysgenics|going down]] while acquired intelligence goes up.<ref>{{cite book |first1=Daniel L. |last1=Schacter |first2=Daniel T. |last2=Gilbert |first3=Daniel M. |last3=Wegner |title=Psychology|date=2011 |publisher=Palgrave Macmillan |location=Basingstoke |isbn=978-0230579835 |page=384}}</ref>

Research has suggested that the Flynn effect has slowed or reversed course in some Western countries beginning in the late 20th century. The phenomenon has been termed the ''negative Flynn effect''.<ref name=":4">{{Cite journal|last1=Bratsberg|first1=Bernt|last2=Rogeberg|first2=Ole|date=26 June 2018|title=Flynn effect and its reversal are both environmentally caused |journal=Proceedings of the National Academy of Sciences|volume=115 |issue=26|pages=6674–6678|doi=10.1073/pnas.1718793115|pmid=29891660|pmc=6042097|bibcode=2018PNAS..115.6674B |doi-access=free}}</ref> A study of Norwegian military conscripts' test records found that IQ scores have been falling for generations born after the year 1975, and that the underlying cause of both initial increasing and subsequent falling trends appears to be environmental rather than genetic.<ref name=":4" />

===Age===
[[Ronald S. Wilson]] is largely credited with the idea that IQ heritability rises with age.<ref>{{Cite journal |date=May 1987 |title=Ronald S. Wilson (1933–1986) |url=https://link.springer.com/10.1007/BF01065501 |journal=Behavior Genetics |language=en |volume=17 |issue=3 |pages=211–217 |doi=10.1007/BF01065501 |pmid=3307742 |issn=0001-8244 |last1=Plomin |first1=R. }}</ref> Researchers building on this phenomenon dubbed it "The Wilson Effect," named after the behavioral geneticist.<ref name=":14">{{Cite journal |last=Bouchard |first=Thomas J. |date=October 2013 |title=The Wilson Effect: The Increase in Heritability of IQ With Age |url=https://www.cambridge.org/core/product/identifier/S1832427413000546/type/journal_article |journal=Twin Research and Human Genetics |language=en |volume=16 |issue=5 |pages=923–930 |doi=10.1017/thg.2013.54 |pmid=23919982 |issn=1832-4274}}</ref> A paper by [[Thomas J. Bouchard Jr.]], examining twin and adoption studies, including twins "reared apart," finds that IQ "reaches an asymptote at about 0.80 at 18–20 years of age and continuing at that level well into adulthood. In the aggregate, the studies also confirm that shared environmental influence decreases across age, approximating about 0.10 at 18–20 years of age and continuing at that level into adulthood."<ref name=":14" /> IQ can change to some degree over the course of childhood.{{sfn|Kaufman|2009|pp=[https://archive.org/details/iqtestingpsych00phdd/page/n234 220]–222}} In one [[longitudinal study]], the mean IQ scores of tests at ages 17 and 18 were correlated at [[Correlation coefficient|{{nowrap|1=''r'' = 0.86}}]] with the mean scores of tests at ages five, six, and seven, and at {{nowrap|1=''r'' = 0.96}}{{Explain|date=October 2020|reason=Please provide context to r correlation values. Are 0.86 and 0.96 good? How do they compare with correlation at older ages?}} with the mean scores of tests at ages 11, 12, and 13.{{sfn|Neisser et al.|1995}}

The current consensus is that [[fluid intelligence]] generally declines with age after early adulthood, while [[crystallized intelligence]] remains intact.{{sfn|Kaufman|2009|loc="Chapter 8"|p={{Page needed|date=January 2011}}}} However, the exact peak age of fluid intelligence or crystallized intelligence remains elusive. Cross-sectional studies usually show that especially fluid intelligence peaks at a relatively young age (often in the early adulthood) while longitudinal data mostly show that intelligence is stable until mid-adulthood or later. Subsequently, intelligence seems to decline slowly.<ref name="DesjardinsWarnke2012">{{cite journal |last1=Desjardins |first1=Richard |last2=Warnke |first2=Arne Jonas |year=2012 |title=Ageing and Skills |url=http://www.oecd-ilibrary.org/education/ageing-and-skills_5k9csvw87ckh-en |journal=OECD Education Working Papers |doi=10.1787/5k9csvw87ckh-en |doi-access=free |hdl-access=free |hdl=10419/57089}}</ref>

For decades, practitioners' handbooks and textbooks on IQ testing have reported IQ declines with age after the beginning of adulthood. However, later researchers pointed out this phenomenon is related to the [[Flynn effect]] and is in part a [[Cohort (statistics)|cohort]] effect rather than a true aging effect. A variety of studies of IQ and aging have been conducted since the norming of the first Wechsler Intelligence Scale drew attention to IQ differences in different age groups of adults. Both cohort effects (the birth year of the test-takers) and practice effects (test-takers taking the same form of IQ test more than once) must be controlled to gain accurate data.{{Inconsistent|date=October 2020|reason=Resolve the distinction between IQ (which, by definition, is age-normalized) and intelligence (which IQ attempts to measure) in this section.}} It is unclear whether any lifestyle intervention can preserve fluid intelligence into older ages.{{sfn|Kaufman|2009|loc="Chapter 8"|p={{Page needed|date=January 2011}}}}