Hauptinhalt

updateMetricsAndFit

Update performance metrics in incremental drift-aware learning model given new data and train model

Since R2022b

    Description

    Mdl = updateMetricsAndFit(Mdl,X,Y) returns an incremental drift-aware learning model Mdl, which is the input incremental drift-aware learning model Mdl with the following modifications:

    1. updateMetricsAndFit measures the model performance on the incoming predictor and response data, X and Y respectively. When the input model is warm (Mdl.IsWarm is true), updateMetricsAndFit overwrites previously computed metrics, stored in the Metrics property, with the new values. Otherwise, updateMetricsAndFit stores NaN values in Metrics instead.

    2. updateMetricsAndFit fits the modified model to the incoming data by performing incremental drift-aware learning.

    The input and output models have the same data type.

    example

    Mdl = updateMetricsAndFit(Mdl,X,Y,Name=Value) uses additional options specified by one or more name-value arguments. For example, you can specify that the columns of the predictor data matrix correspond to observations, and set observation weights.

    Examples

    collapse all

    Create the random concept data and concept drift generator using the helper functions, HelperSineGenerator and HelperConceptDriftGenerator, respectively.

    concept1 = HelperSineGenerator(ClassificationFunction=1,IrrelevantFeatures=true,TableOutput=false);
    concept2 = HelperSineGenerator(ClassificationFunction=3,IrrelevantFeatures=true,TableOutput=false);
    driftGenerator = HelperConceptDriftGenerator(concept1,concept2,15000,1000);

    When ClassificationFunction is 1, HelperSineGenerator labels all points that satisfy x1 < sin(x2) as 1, otherwise the function labels them as 0. When ClassificationFunction is 3, this is reversed. That is, HelperSineGenerator labels all points that satisfy x1 >= sin(x2) as 1, otherwise the function labels them as 0 [2]. The software returns the data in matrices for using in incremental learners.

    HelperConceptDriftGenerator establishes the concept drift. The object uses a sigmoid function 1./(1+exp(-4*(numobservations-position)./width)) to decide the probability of choosing the first stream when generating data [3]. In this case, the position argument is 15000 and the width argument is 1000. As the number of observations exceeds the position value minus half of the width, the probability of sampling from the first stream when generating data decreases. The sigmoid function allows a smooth transition from one stream to the other. Larger width values indicate a larger transition period where both streams are approximately equally likely to be selected.

    Initiate an incremental drift-aware model for classification as follows:

    1. Create an incremental Naive Bayes classification model for binary classification.

    2. Initiate an incremental concept drift detector that uses the Hoeffding's Bounds Drift Detection Method with moving average (HDDMA).

    3. Using the incremental linear model and the concept drift detector, initiate an incremental drift-aware model. Specify the training period as 5000 observations.

    BaseLearner = incrementalClassificationNaiveBayes(MaxNumClasses=2,Metrics="classiferror");
    dd = incrementalConceptDriftDetector("hddma");
    idal = incrementalDriftAwareLearner(BaseLearner,DriftDetector=dd,TrainingPeriod=5000);

    Preallocate the number of variables in each chunk and number of iterations for creating a stream of data.

    numObsPerChunk = 10;
    numIterations = 4000;

    Preallocate the variables for tracking the drift status and drift time, and storing the classification error.

    dstatus = zeros(numIterations,1);
    statusname = strings(numIterations,1);
    driftTimes = [];
    ce = array2table(zeros(numIterations,2),VariableNames=["Cumulative" "Window"]);

    Simulate a data stream with incoming chunks of 10 observations each and perform incremental drift-aware learning. At each iteration:

    1. Simulate predictor data and labels, and update driftGenerator using the helper function hgenerate.

    2. Call updateMetricsAndFit to update the performance metrics and fit the incremental drift-aware model to the incoming data.

    3. Track and record the drift status and the classification error for visualization purposes.

    rng(12); % For reproducibility
    
    for j = 1:numIterations
     
     % Generate data
     [driftGenerator,X,Y] = hgenerate(driftGenerator,numObsPerChunk); 
    
     % Update performance metrics and fit
     idal = updateMetricsAndFit(idal,X,Y); 
    
     % Record drift status and classification error
     statusname(j) = string(idal.DriftStatus); 
     ce{j,:} = idal.Metrics{"ClassificationError",:};
     if idal.DriftDetected
           dstatus(j) = 2;  
        elseif idal.WarningDetected
           dstatus(j) = 1;
        else 
           dstatus(j) = 0;
        end   
     if idal.DriftDetected
        driftTimes(end+1) = j; 
     end
     
    end

    Plot the cumulative and per window classification error. Mark the warmup and training periods, and where the drift was introduced.

    h = plot(ce.Variables);
    
    xlim([0 numIterations])
    ylim([0 0.22])
    ylabel("Classification Error")
    xlabel("Iteration")
    
    xline(idal.MetricsWarmupPeriod/numObsPerChunk,"g-.","Warmup Period",LineWidth=1.5)
    xline(idal.MetricsWarmupPeriod/numObsPerChunk+driftTimes,"g-.","Warmup Period",LineWidth=1.5)
    xline(idal.TrainingPeriod/numObsPerChunk,"b-.","Training Period",LabelVerticalAlignment="middle",LineWidth=1.5)
    xline(driftTimes,"m--","Drift",LabelVerticalAlignment="middle",LineWidth=1.5)
    
    legend(h,ce.Properties.VariableNames)
    legend(h,Location="best")

    Figure contains an axes object. The axes object with xlabel Iteration, ylabel Classification Error contains 6 objects of type line, constantline. These objects represent Cumulative, Window.

    The updateMetricsAndFit function first evaluates the performance of the model by calling updateMetrics on incoming data, and then fits the model to data by calling fit:

    The updateMetrics function evaluates the performance of the model as it processes incoming observations. The function writes specified metrics, measured cumulatively and within a specified window of processed observations, to the Metrics model property.

    The fit function fits the model by updating the base learner and monitoring for drift given an incoming batch of data. When you call fit, the software performs the following procedure:

    • Trains the model up to NumTrainingObservations observations.

    • After training, the software starts tracking the model loss to see if any concept drift has occurred and updates drift status accordingly.

    • When the drift status is Warning, the software trains a temporary model to replace theBaseLearner in preparation for an imminent drift.

    • When the drift status is Drift, temporary model replaces the BaseLearner.

    • When the drift status is Stable, the software discards the temporary model.

    For more information, see the Algorithms section.

    Plot the drift status versus the iteration number.

    gscatter(1:numIterations,dstatus,statusname,"gmr","o",5,"on","Iteration","Drift Status","filled")

    Figure contains an axes object. The axes object with xlabel Iteration, ylabel Drift Status contains 3 objects of type line. One or more of the lines displays its values using only markers These objects represent Stable, Warning, Drift.

    Input Arguments

    collapse all

    Incremental drift-aware learning model fit to streaming data, specified as an incrementalDriftAwareLearner model object. You can create Mdl using the incrementalDriftAwareLearner function. For more details, see the object reference page.

    Chunk of predictor data to which the model is fit, specified as a floating-point matrix of n observations and Mdl.BaseLearner.NumPredictors predictor variables.

    When Mdl.BaseLearner accepts the ObservationsIn name-value argument, the value of ObservationsIn determines the orientation of the variables and observations. The default ObservationsIn value is "rows", which indicates that observations in the predictor data are oriented along the rows of X.

    The length of the observation responses (or labels) Y and the number of observations in X must be equal; Y(j) is the response (or label) of observation j (row or column) in X.

    Note

    • If Mdl.BaseLearner.NumPredictors = 0, updateMetricsAndFit infers the number of predictors from X, and sets the corresponding property of the output model. Otherwise, if the number of predictor variables in the streaming data changes from Mdl.BaseLearner.NumPredictors, updateMetricsAndFit issues an error.

    • updateMetricsAndFit supports only floating-point input predictor data. If your input data includes categorical data, you must prepare an encoded version of the categorical data. Use dummyvar to convert each categorical variable to a numeric matrix of dummy variables. Then, concatenate all dummy variable matrices and any other numeric predictors. For more details, see Dummy Variables.

    Data Types: single | double

    Chunk of responses (or labels) to which the model is fit, specified as one of the following:

    • Floating-point vector of n elements for regression models, where n is the number of rows in X.

    • Categorical, character, or string array, logical vector, or cell array of character vectors for classification models. If Y is a character array, it must have one class label per row. Otherwise, Y must be a vector with n elements.

    The length of Y and the number of observations in X must be equal; Y(j) is the response (or label) of observation j (row or column) in X.

    For classification problems:

    • When Mdl.BaseLearner.ClassNames is nonempty, the following conditions apply:

      • If Y contains a label that is not a member of Mdl.BaseLearner.ClassNames, updateMetricsAndFit issues an error.

      • The data type of Y and Mdl.BaseLearner.ClassNames must be the same.

    • When Mdl.BaseLearner.ClassNames is empty, updateMetricsAndFit infers Mdl.BaseLearner.ClassNames from data.

    Data Types: single | double | categorical | char | string | logical | cell

    Name-Value Arguments

    collapse all

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: ObservationsIn="columns",Weights=W specifies that the columns of the predictor matrix correspond to observations, and the vector W contains observation weights to apply during incremental learning.

    Predictor data observation dimension, specified as "columns" or "rows".

    updateMetricsAndFit supports ObservationsIn only if Mdl.BaseLearner supports the ObservationsIn name-value argument.

    Example: ObservationsIn="columns"

    Data Types: char | string

    Chunk of observation weights, specified as a floating-point vector of positive values. updateMetricsAndFit weighs the observations in X with the corresponding values in Weights. The size of Weights must equal n, which is the number of observations in X.

    By default, Weights is ones(n,1).

    Example: Weights=w

    Data Types: double | single

    Output Arguments

    collapse all

    Updated incremental drift-aware learning model, returned as an incremental learning model object of the same data type as the input model Mdl, incrementalDriftAwareLearner.

    Algorithms

    collapse all

    References

    [1] Barros, Roberto S.M. , et al. "RDDM: Reactive drift detection method." Expert Systems with Applications. vol. 90, Dec. 2017, pp. 344-55. https://doi.org/10.1016/j.eswa.2017.08.023.

    [2] Bifet, Albert, et al. "New Ensemble Methods for Evolving Data Streams." Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM Press, 2009, p. 139. https://doi.org/10.1145/1557019.1557041.

    [3] Gama, João, et al. "Learning with drift detection". Advances in Artificial Intelligence – SBIA 2004, edited by Ana L. C. Bazzan and Sofiane Labidi, vol. 3171, Springer Berlin Heidelberg, 2004, pp. 286–95. https://doi.org/10.1007/978-3-540-28645-5_29.

    Version History

    Introduced in R2022b