Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Studentized residual
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Motivation== {{see also|Errors and residuals in statistics}} The key reason for studentizing is that, in [[regression analysis]] of a [[multivariate distribution]], the variances of the ''residuals'' at different input variable values may differ, even if the variances of the ''errors'' at these different input variable values are equal. The issue is the difference between [[errors and residuals in statistics]], particularly the behavior of residuals in regressions. Consider the [[simple linear regression]] model :<math> Y = \alpha_0 + \alpha_1 X + \varepsilon. \, </math> Given a random sample (''X''<sub>''i''</sub>, ''Y''<sub>''i''</sub>), ''i'' = 1, ..., ''n'', each pair (''X''<sub>''i''</sub>, ''Y''<sub>''i''</sub>) satisfies :<math> Y_i = \alpha_0 + \alpha_1 X_i + \varepsilon_i,\,</math> where the ''errors'' <math>\varepsilon_i</math>, are [[statistical independence|independent]] and all have the same variance <math>\sigma^2</math>. The '''residuals''' are not the true errors, but ''estimates'', based on the observable data. When the method of least squares is used to estimate <math>\alpha_0</math> and <math>\alpha_1</math>, then the residuals <math>\widehat{\varepsilon\,}</math>, unlike the errors <math>\varepsilon</math>, cannot be independent since they satisfy the two constraints :<math>\sum_{i=1}^n \widehat{\varepsilon\,}_i=0</math> and :<math>\sum_{i=1}^n \widehat{\varepsilon\,}_i x_i=0.</math> (Here ''Ξ΅''<sub>''i''</sub> is the ''i''th error, and <math>\widehat{\varepsilon\,}_i</math> is the ''i''th residual.) The residuals, unlike the errors, ''do not all have the same variance:'' the variance decreases as the corresponding ''x''-value gets farther from the average ''x''-value. This is not a feature of the data itself, but of the regression better fitting values at the ends of the domain. It is also reflected in the [[Influence function (statistics)|influence functions]] of various data points on the [[regression coefficient]]s: endpoints have more influence. This can also be seen because the residuals at endpoints depend greatly on the slope of a fitted line, while the residuals at the middle are relatively insensitive to the slope. The fact that ''the variances of the residuals differ,'' even though ''the variances of the true errors are all equal'' to each other, is the ''principal reason'' for the need for studentization. It is not simply a matter of the population parameters (mean and standard deviation) being unknown β it is that ''regressions'' yield ''different residual distributions'' at ''different data points,'' unlike ''point [[estimators]]'' of [[univariate distribution]]s, which share a ''common distribution'' for residuals.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)