Hauptinhalt

kfoldLoss

Loss for cross-validated partitioned quantile regression model

Since R2025a

    Description

    L = kfoldLoss(CVMdl) returns the loss (quantile loss) obtained by the cross-validated quantile regression model CVMdl. For every fold, kfoldLoss computes the loss for validation-fold observations using a model trained on training-fold observations. CVMdl.X and CVMdl.Y contain both sets of observations.

    example

    L = kfoldLoss(CVMdl,Name=Value) specifies additional options using one or more name-value arguments. For example, you can specify the quantiles for which to return loss values.

    example

    Examples

    collapse all

    Compute the quantile loss for a quantile neural network regression model, first partitioned using holdout validation and then partitioned using 5-fold cross-validation. Compare the two losses.

    Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. Create a table containing the predictor variables Acceleration, Cylinders, Displacement, and so on, as well as the response variable MPG. View the first eight observations.

    load carbig
    cars = table(Acceleration,Cylinders,Displacement, ...
        Horsepower,Model_Year,Origin,Weight,MPG);
    head(cars)
        Acceleration    Cylinders    Displacement    Horsepower    Model_Year    Origin     Weight    MPG
        ____________    _________    ____________    __________    __________    _______    ______    ___
    
              12            8            307            130            70        USA         3504     18 
            11.5            8            350            165            70        USA         3693     15 
              11            8            318            150            70        USA         3436     18 
              12            8            304            150            70        USA         3433     16 
            10.5            8            302            140            70        USA         3449     17 
              10            8            429            198            70        USA         4341     15 
               9            8            454            220            70        USA         4354     14 
             8.5            8            440            215            70        USA         4312     14 
    

    Remove rows of cars where the table has missing values.

    cars = rmmissing(cars);

    Categorize the cars based on whether they were made in the USA.

    cars.Origin = categorical(cellstr(cars.Origin));
    cars.Origin = mergecats(cars.Origin,["France","Japan",...
        "Germany","Sweden","Italy","England"],"NotUSA");

    Partition the data using cvpartition. First, create a partition for holdout validation, using approximately 80% of the observations for the training data and 20% for the test data. Then, create a partition for 5-fold cross-validation.

    rng(0,"twister") % For reproducibility
    holdoutPartition = cvpartition(height(cars),Holdout=0.20);
    kfoldPartition = cvpartition(height(cars),KFold=5);

    Train a quantile neural network regression model using the cars data. Specify MPG as the response variable, and standardize the numeric predictors. Use the default 0.5 quantile (median).

    Mdl = fitrqnet(cars,"MPG",Standardize=true);

    Create the partitioned quantile regression models using crossval.

    holdoutMdl = crossval(Mdl,CVPartition=holdoutPartition)
    holdoutMdl = 
      RegressionPartitionedQuantileModel
          CrossValidatedModel: 'QuantileNeuralNetwork'
               PredictorNames: {'Acceleration'  'Cylinders'  'Displacement'  'Horsepower'  'Model_Year'  'Origin'  'Weight'}
        CategoricalPredictors: 6
                 ResponseName: 'MPG'
              NumObservations: 392
                        KFold: 1
                    Partition: [1×1 cvpartition]
            ResponseTransform: 'none'
                    Quantiles: 0.5000
    
    
      Properties, Methods
    
    
    kfoldMdl = crossval(Mdl,CVPartition=kfoldPartition)
    kfoldMdl = 
      RegressionPartitionedQuantileModel
          CrossValidatedModel: 'QuantileNeuralNetwork'
               PredictorNames: {'Acceleration'  'Cylinders'  'Displacement'  'Horsepower'  'Model_Year'  'Origin'  'Weight'}
        CategoricalPredictors: 6
                 ResponseName: 'MPG'
              NumObservations: 392
                        KFold: 5
                    Partition: [1×1 cvpartition]
            ResponseTransform: 'none'
                    Quantiles: 0.5000
    
    
      Properties, Methods
    
    

    Compute the quantile loss for holdoutMdl and kfoldMdl by using the kfoldLoss object function.

    holdoutL = kfoldLoss(holdoutMdl)
    holdoutL = 
    0.9423
    
    kfoldL = kfoldLoss(kfoldMdl)
    kfoldL = 
    0.9685
    

    holdoutL is the quantile loss computed using one holdout set, while kfoldL is an average quantile loss computed using five holdout sets. Cross-validation metrics tend to be better indicators of a model's performance on unseen data.

    Before computing the loss for a cross-validated quantile regression model, specify the prediction for observations with missing predictor values.

    Load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s. Create a matrix X containing the predictor variables Acceleration, Displacement, Horsepower, and Weight. Store the response variable MPG in the variable Y.

    load carbig
    X = [Acceleration,Displacement,Horsepower,Weight];
    Y = MPG;

    Train a cross-validated quantile linear regression model. Specify to use the 0.25, 0.50, and 0.75 quantiles (that is, the lower quartile, median, and upper quartile). To improve the model fit, change the beta tolerance to 1e-6 instead of the default value 1e-4, and use a ridge (L2) regularization term of 1. Specify 10-fold cross-validation by setting CrossVal="on".

    rng(0,"twister") % For reproducibility
    CVMdl = fitrqlinear(X,Y,Quantiles=[0.25,0.50,0.75], ...
        BetaTolerance=1e-6,Lambda=1,CrossVal="on")
    CVMdl = 
      RegressionPartitionedQuantileModel
        CrossValidatedModel: 'QuantileLinear'
             PredictorNames: {'x1'  'x2'  'x3'  'x4'}
               ResponseName: 'Y'
            NumObservations: 398
                      KFold: 10
                  Partition: [1×1 cvpartition]
          ResponseTransform: 'none'
                  Quantiles: [0.2500 0.5000 0.7500]
    
    
      Properties, Methods
    
    

    CVMdl is a RegressionPartitionedQuantileModel.

    Compute the quantile loss for each fold and quantile. Use a NaN prediction for test set observations with missing predictor values.

    L = kfoldLoss(CVMdl,Mode="individual",PredictionForMissingValue=NaN)
    L = 10×3
    
        1.5388    1.6703    1.3547
           NaN       NaN       NaN
        1.9140    2.1864    2.0922
           NaN       NaN       NaN
        1.4339    2.2040    1.7293
        1.5513    1.9968    1.8037
           NaN       NaN       NaN
        1.3979    2.0011    2.0695
           NaN       NaN       NaN
        1.8021    2.2161    1.5746
    
    

    The rows of L correspond to folds, and the columns correspond to quantiles. The NaN values in L indicate that the data set includes observations with missing predictor values. For example, at least one of the observations in the second test set has a missing predictor value. You can find the predictor values for the observations in the second test set by using the following code.

    test2Indices = test(CVMdl.Partition,2);
    test2Observations = CVMdl.X(test2Indices,:)
    

    Instead of using a NaN prediction for test set observations with missing predictor values, remove the observations from the computation.

    newL = kfoldLoss(CVMdl,Mode="individual", ...
        PredictionForMissingValue="omitted")
    newL = 10×3
    
        1.5388    1.6703    1.3547
        1.6612    2.1528    1.4820
        1.9140    2.1864    2.0922
        2.1431    2.6693    2.0767
        1.4339    2.2040    1.7293
        1.5513    1.9968    1.8037
        1.2971    1.8850    1.8236
        1.3979    2.0011    2.0695
        1.6716    2.0485    1.5921
        1.8021    2.2161    1.5746
    
    

    Input Arguments

    collapse all

    Cross-validated quantile regression model, specified as a RegressionPartitionedQuantileModel object.

    Name-Value Arguments

    collapse all

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: kfoldLoss(CVMdl,Quantiles=[0.25 0.5 0.75]) specifies to return the quantile loss for the 0.25, 0.5, and 0.75 quantiles.

    Quantiles for which to compute the loss, specified as a vector of values in CVMdl.Quantiles. The software returns loss values only for the quantiles specified in Quantiles.

    Example: Quantiles=[0.4 0.6]

    Data Types: single | double | char | string

    Fold indices to use, specified as a positive integer vector. The elements of Folds must be within the range from 1 to CVMdl.KFold. The software uses only the folds specified in Folds.

    Example: Folds=[1 4 10]

    Data Types: single | double

    Loss function, specified as "quantile" or a function handle.

    • "quantile" — Quantile loss.

    • Function handle — To specify a custom loss function, use a function handle. The function must have this form:

      lossval = lossfun(Y,YFit,W,q)

      • The output argument lossval is a numeric scalar.

      • You specify the function name (lossfun).

      • Y is a length-n numeric vector of observed responses.

      • YFit is a length-n numeric vector of corresponding predicted responses.

      • W is an n-by-1 numeric vector of observation weights.

      • q is a numeric scalar in the range [0,1] corresponding to a quantile.

    Example: LossFun=@lossfun

    Data Types: char | string | function_handle

    Aggregation level for the output, specified as "average" or "individual".

    ValueDescription
    "average"The output is a 1-by-q vector of loss values, averaged over the folds specified by the Folds name-value argument. q is the number of quantiles specified by the Quantiles name-value argument.
    "individual"The output is a k-by-q matrix of loss values, where k is the number of folds specified by the Folds name-value argument and q is the number of quantiles specified by the Quantiles name-value argument.

    Example: Mode="individual"

    Data Types: char | string

    Predicted response value to use for observations with missing predictor values, specified as "quantile", "omitted", a numeric scalar, or a numeric vector.

    ValueDescription
    "quantile"kfoldLoss uses the specified quantile of the observed response values in the training-fold data as the predicted response value for observations with missing predictor values.
    "omitted"kfoldLoss excludes observations with missing predictor values from the loss computation.
    Numeric scalar or vector
    • If PredictionForMissingValue is a scalar, then kfoldLoss uses this value as the predicted response value for observations with missing predictor values. The function uses the same value for all quantiles.

    • If PredictionForMissingValue is a vector, its length must be equal to the number of quantiles specified by the Quantiles name-value argument. kfoldLoss uses element i in the vector as the quantile i predicted response value for observations with missing predictor values.

    If an observation is missing an observed response value or an observation weight, then kfoldLoss does not use the observation in the loss computation.

    Example: PredictionForMissingValue="omitted"

    Data Types: single | double | char | string

    Output Arguments

    collapse all

    Loss, returned as a numeric row vector or numeric matrix. The loss is the LossFun loss between the validation-fold observations and the predictions made with a quantile regression model trained on the training-fold observations.

    • If Mode is "average", then L is the average loss over the folds. That is, L is a 1-by-q vector of loss values, averaged over the folds specified by the Folds name-value argument. q is the number of quantiles specified by the Quantiles name-value argument.

    • If Mode is "individual", then L is a k-by-q matrix of loss values, where k is the number of folds specified by the Folds name-value argument and q is the number of quantiles specified by the Quantiles name-value argument.

    Algorithms

    kfoldLoss computes losses according to the loss object function of the trained compact models in CVMdl (CVMdl.Trained). For more information, see the model-specific loss function reference pages in the following table.

    Model Typeloss Function
    Quantile linear regression modelloss
    Quantile neural network model for regressionloss

    Version History

    Introduced in R2025a