Editing Cross-validation (statistics) (section)

==Using prior information==
When users apply cross-validation to select a good configuration <math>\lambda</math>, then they might want to balance the cross-validated choice with their own estimate of the configuration. In this way, they can attempt to counter the volatility of cross-validation when the sample size is small and include relevant information from previous research. In a forecasting combination exercise, for instance, cross-validation can be applied to estimate the weights that are assigned to each forecast. Since a simple equal-weighted forecast is difficult to beat, a penalty can be added for deviating from equal weights.<ref name="Hoornweg2018SUS">{{cite book |last1=Hoornweg |first1=Victor |title=Science: Under Submission | date=2018 |publisher=Hoornweg Press |isbn=978-90-829188-0-9 |url=https://victorhoornweg.com/docs/Hoornweg%202018%20Science%20Under%20Submission.pdf }}{{pn|date=November 2024}}{{self-published inline|date=November 2024}}</ref>  Or, if cross-validation is applied to assign individual weights to observations, then one can penalize deviations from equal weights to avoid wasting potentially relevant information.<ref name = "Hoornweg2018SUS" />  Hoornweg (2018) shows how a tuning parameter <math>\gamma</math> can be defined so that a user can intuitively balance between the accuracy of cross-validation and the simplicity of sticking to a reference parameter <math>\lambda_R</math> that is defined by the user.

If <math>\lambda_i</math> denotes the <math>i^{th}</math> candidate configuration that might be selected, then the [[Loss function#Statistics|loss function]] that is to be minimized can be defined as
: <math>
	L_{\lambda_i} = (1-\gamma) \mbox{ Relative Accuracy}_i + \gamma \mbox{ Relative Simplicity}_i.
</math>
Relative accuracy can be quantified as <math>\mbox{MSE}(\lambda_i)/\mbox{MSE}(\lambda_R)</math>, so that the mean squared error of a candidate  <math>\lambda_i</math> is made relative to that of a user-specified <math>\lambda_R</math>. The relative simplicity term measures the amount that <math>\lambda_i</math> deviates from <math>\lambda_R</math> relative to the maximum amount of deviation from <math>\lambda_R</math>. Accordingly, relative simplicity can be specified as <math>\frac{(\lambda_i-\lambda_R)^2}{(\lambda_{\max}-\lambda_R)^2}</math>, where <math>\lambda_{\max}</math> corresponds to the <math>\lambda</math> value with the highest permissible deviation from <math>\lambda_R</math>. With <math>\gamma\in[0,1]</math>, the user determines how high the influence of the reference parameter is relative to cross-validation.

One can add relative simplicity terms for multiple configurations <math>c=1,2,...,C</math> by specifying the loss function as
: <math>
	L_{\lambda_i} = \mbox{ Relative Accuracy}_i + \sum_{c=1}^C \frac{\gamma_c}{1-\gamma_c} \mbox{ Relative Simplicity}_{i,c}.
</math>
Hoornweg (2018) shows that a loss function with such an accuracy-simplicity tradeoff can also be used to intuitively define [[shrinkage estimator]]s like the (adaptive) lasso and [[Bayesian regression|Bayesian]] / [[ridge regression]].<ref name = "Hoornweg2018SUS" /> Click on the [[Lasso (statistics)#Interpretations of lasso|lasso]] for an example.