Likelihood Cross-Validation

Likelihood cross-validation. The cross-validation method used for optimum kernel bandwidth estimation is based on the maximum likelihood principle. In a classical sense, the maximum likelihood principle considers the best “explanation” of the obsered data to be the probability model  \theta, that maximizes the likeliood function, which maximizes the probability of getting as the result what was actually observed. In density estimation, the probability model  \theta is to be estimated from the data which will also be used to test the goodness of fit of the statistical model. One can use, leave one out cross-validation can be defined as the likelihodds for each data point  x_i averaged, i.e.

 \ \ \ \ CV(h)= \frac{1}{n}\sum_{i=1}^n \log \hat{f}_{-i}(x_i).

The score  CV is a function of bandwidth  h, since the density estimate  \hat{f} for a fixed data set is a function of bandwidth and kernel type. The optimum choice of bandwidth  h by likelihood cross-validation is then

 \ \ \ \ h_{CV} = \arg \max_{h} \left\{ \frac{1}{n} \sum_{i=1}^n \log \hat{f}_{-i}(x_i) \right\} .

Methodology and Tools in Knowledge-Based Systems: 11th International ... - Google ブックス