Summary:
The main aim of this experimental study is to analyze different regression methods. The experimentation is undertaken with 52 real-world datasets, with a number of variables within the interval [2, 60] and a number of examples
within the interval [43, 45730]. In all the experiments, a 5-fold cross-validation model (5fcv) has been adopted, i.e., the data-set has been spitted randomly into 5 folds, each one containing the 20% of the patterns
of the data-set. Thus, four folds have been used for training and one for testing. The properties of these datasets are presented in Table I: short name of the dataset (NAME), number of variables (VAR), and number of
examples (EXAMPLES). For each data-set, the number of cases and the number of variables is shown. You may download all data-sets in the Weka format (
These datasets have been downloaded from the following web pages:
NAME | VAR | EXAMPLES | NAME | VAR | EXAMPLES | NAME | VAR | EXAMPLES | |||
2DPLANES | 10 | 40768 | DELTAAIL | 5 | 7129 | MPG8 | 7 | 392 | |||
ABA | 8 | 4177 | DELTAELV | 6 | 9517 | MV | 10 | 40768 | |||
ADD10 | 10 | 9792 | DIABETES | 2 | 43 | PLA | 2 | 1650 | |||
AIL | 40 | 13750 | DIAMOND | 18 | 308 | POLE | 26 | 14998 | |||
AIRFOIL | 5 | 1503 | ELE1 | 2 | 495 | PUMA32 | 32 | 8192 | |||
ANA | 7 | 4052 | ELE2 | 4 | 1056 | PUMA8 | 8 | 8192 | |||
AUTOPRICE | 15 | 159 | ELV | 18 | 16599 | PYRIM | 27 | 74 | |||
BANK32 | 32 | 8192 | FAT | 14 | 252 | QUA | 3 | 2178 | |||
BANK8 | 8 | 8192 | FOR | 12 | 517 | STO | 9 | 950 | |||
BAS | 16 | 337 | FRIED | 5 | 1200 | STRIKES | 6 | 625 | |||
BOSTON | 13 | 506 | HOUSE16 | 16 | 22784 | TRE | 15 | 1049 | |||
CA | 21 | 8192 | HOUSE8 | 8 | 22784 | TRIAZ | 60 | 186 | |||
CAL | 8 | 20640 | KINE32 | 32 | 8192 | WA | 9 | 1609 | |||
CASP | 9 | 45730 | KINE8 | 8 | 8192 | WI | 9 | 1461 | |||
CCPP | 4 | 9568 | LASER | 4 | 993 | WPBC | 32 | 194 | |||
CONCRETE | 8 | 1030 | MACHINECPU | 6 | 209 | YH | 6 | 308 | |||
CPU_SMALL | 12 | 8192 | MOR | 15 | 1049 | ||||||
DEE | 6 | 365 | MPG6 | 5 | 392 |
Six software tools for the analyzed algorithms in the experimental study have been used. These software tools and a small description can be found below:
The complete results obtained by the 164 studied methods in all the datasets can be found in a downloadable spreadsheet. The results are grouped in tables by algorithms where each table shows the average of the results obtained by each algorithm in all the studied datasets. For each algorithm, the first four columns show the average MSE in training and testing data (MSETra/MSETst) together with their standard deviations ( SDs) respectively, and the last column shows the average computational cost in seconds (AvTime).
Notice that some colors are included in the headers of each table in order to highlight the different software by categories. For example, the blue one represents the methods that are available in R.
Complete results in .xlsx format can be downloaded here
Here we provide with two spreadsheet including the complete results obtained by the 164 studied methods sorted by Friedman's ranking when only High Dimensional datasets are considered (>=9 variables) and the same when only Low Dimensional datasets are considered (<9 variables). The same is provided when T2, from the data complexity framework, is used for separation of the datasets (T2 >=250 or T2 <250, respectively).
Complete results separated by Dimensionality in .xls format can be downloaded here
Complete results separated by T2 in .xls format can be downloaded here
© Universidad de Jaén. Webmaster: María José Gacto Colorado mgacto.