MCEC algorithm

Java implementation of an information-theoretic algorithm that combines Multivariate Correlations with Early Classification

View the Project on GitHub MCEC algorithm

Program description

MCEC (Multivariate Correlations for Early Classification) algorithm is a Java implementation of an information-theoretic method for examining the early classification opportunity in a dataset. This dataset contains univariate or multivariate time series together with their respective class labels. The program can be downloaded here.

Input

The input file must be in comma-separated values (CSV) format, containing the time series and the respective class labels.

The number of attributes (dimensions) is also required as input. The time series can be univariate or multivariate, however, they must be of fixed length. Numeric attributes must be discretized and the dataset cannot contain missing values, since the algorithm is not provided with any imputation procedure.

Dataset example:

X1_1, X2_1, X1_2, X2_2, class
TRUE, FALSE, FALSE, FALSE, C1
FALSE, FALSE, TRUE, FALSE, C0
TRUE, TRUE, FALSE, FALSE, C0
TRUE, FALSE, TRUE, TRUE, C1
(...)

Output

The outcomes of the difference in entropy, log-likelihood, MDL score, AIC score and classification accuracy analysis, all for n = {1, ..., L} (where L represents the time series length) are outputted from the Java program in text files:

The implementation uses some functionalities of Weka Data Mining Software and an additional Matlab script is provided for generating the five graphs for representing the results.

Observations

The proposed implementation provides the Markov Lag, an alternative to the standard Early Classification approach. Basically, instead of analysing the correlations from the initial time point until the last, it uses the inverse order (from the last to the first one). In this case, the idea is to check of how much information from the closest past we need, in order to obtain a satisfactory prediction.

Libraries

MCEC algorithm depends on two external libraries:

Usage

Execute the jar file:

$ java -jar MCECalgorithm.jar dataset-filename.csv N optionClass MarkovLag
where the command-line options correspond to:


dataset-filename     Type: String - Name of the dataset file to be analysed.
N                    Type: Integer - Number of features per time point.
optionClass          Type: Boolean - With classification analysis (TRUE)
                     or without classification analysis (FALSE).
MarkovLag            Type: Boolean - With Markov lag approach (TRUE)
                     or with standard Early Classification (FALSE).

Synthetic dataset example

The very simple syntheticTest.csv dataset example is described in the following table:

The command for analysing the early classification opportunity is

$ java -jar MCECalgorithm.jar syntheticTest.csv 1 TRUE FALSE

and produces the following files:

Running the Matlab script MCEC_program.m (with the associated functions) in the same directory as the text files produces the following graphs: