SciELO - Scientific Electronic Library Online

 
vol.43 número1Un modelo Birnbaum-Saunders para el análisis conjunto de datos de supervivencia y longitudinales de insuficiencia cardíaca congestive índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • En proceso de indezaciónCitado por Google
  • No hay articulos similaresSimilares en SciELO
  • En proceso de indezaciónSimilares en Google

Compartir


Revista Colombiana de Estadística

versión impresa ISSN 0120-1751

Resumen

ZHONG, Yi; HE, Jianghua  y  CHALISE, Prabhakar. Nested and Repeated Cross Validation for Classification Model With High-dimensional Data. Rev.Colomb.Estad. [online]. 2020, vol.43, n.1, pp.103-125.  Epub 05-Jun-2020. ISSN 0120-1751.  https://doi.org/10.15446/rce.v43n1.80000.

With the advent of high throughput technologies, the high-dimensional datasets are increasingly available. This has not only opened up new insight into biological systems but also posed analytical challenges. One important problem is the selection of informative feature-subset and prediction of the future outcome. It is crucial that models are not overfitted and give accurate results with new data. In addition, reliable identification of informative features with high predictive power (feature selection) is of interests in clinical settings. We propose a two-step framework for feature selection and classification model construction, which utilizes a nested and repeated cross-validation method. We evaluated our approach using both simulated data and two publicly available gene expression datasets. The proposed method showed comparatively better predictive accuracy for new cases than the standard cross-validation method.

Palabras clave : Area under ROC curve; Cross-validation; Elastic net; Random forest; Support vector machine.

        · resumen en Español     · texto en Inglés     · Inglés ( pdf )