Editing Regression analysis (section)

==Prediction (interpolation and extrapolation) {{anchor|Prediction|Interpolation|Extrapolation|Interpolation and extrapolation}}==
{{further|Predicted response|Prediction interval}}

[[File:CurveWeightHeight.png|thumb|upright=1.5|In the middle, the fitted straight line represents the best balance between the points above and below this line. The dotted straight lines represent the two extreme lines, considering only the variation in the slope. The inner curves represent the estimated range of values considering the variation in both slope and intercept. The outer curves represent a prediction for a new measurement.<ref>{{cite book |last=Rouaud |first=Mathieu |title=Probability, Statistics and Estimation|year=2013 |page=60 |url=http://www.incertitudes.fr/book.pdf }}</ref>]]

Regression models '''''predict''''' a value of the ''Y'' variable given known values of the ''X'' variables. Prediction {{em|within}} the range of values in the dataset used for model-fitting is known informally as ''[[interpolation]]''. Prediction {{em|outside}} this range of the data is known as ''[[extrapolation]]''. Performing extrapolation relies strongly on the regression assumptions. The further the extrapolation goes outside the data, the more room there is for the model to fail due to differences between the assumptions and the sample data or the true values.

A ''[[prediction interval]]'' that represents the uncertainty may accompany the point prediction. Such intervals tend to expand rapidly as the values of the independent variable(s) moved outside the range covered by the observed data.

For such reasons and others, some tend to say that it might be unwise to undertake extrapolation.<ref>Chiang, C.L, (2003) ''Statistical methods of analysis'', World Scientific. {{isbn|981-238-310-7}} - [https://books.google.com/books?id=BuPNIbaN5v4C&dq=regression+extrapolation&pg=PA274 page 274 section 9.7.4 "interpolation vs extrapolation"]</ref>

===Model selection=== 
{{Further|Model selection}}
The assumption of a particular form for the relation between ''Y'' and ''X'' is another source of uncertainty. A properly conducted regression analysis will include an assessment of how well the assumed form is matched by the observed data, but it can only do so within the range of values of the independent variables actually available. This means that any extrapolation is particularly reliant on the assumptions being made about the structural form of the regression relationship. If this knowledge includes the fact that the dependent variable cannot go outside a certain range of values, this can be made use of in selecting the model – even if the observed dataset has no values particularly near such bounds. The implications of this step of choosing an appropriate functional form for the regression can be great when extrapolation is considered. At a minimum, it can ensure that any extrapolation arising from a fitted model is "realistic" (or in accord with what is known).