Dear Professors,
I apologize for bothering you during your busy schedules. We have conducted the program described below, reading three files and performing ROC analysis through machine learning to calculate three AUCs. We have attempted to create a program to demonstrate the statistical significance of the differences among these AUCs, but despite multiple efforts, we have encountered errors that prevent progress. Could you please help us revise the program? I am attaching a program below. I would greatly appreciate it if you could add the necessary modifications to it. I am having a lot of trouble implementing the DeLong test in the program. Thank you very much for your assistance.Thank you very much for your assistance.
Best regards,
% CSVファイルのパスを指定
filePaths = {'C:\Users\rms56\Desktop\1-491B.csv', ...
'C:\Users\rms56\Desktop\1-491C.csv', ...
'C:\Users\rms56\Desktop\1-491D.csv'};
% X と y を格納するための cell 配列を作成
X_all = cell(3, 1); % 特徴量 X 用の cell 配列
y_all = cell(3, 1); % ラベル y 用の cell 配列
% 3つのファイルを順番に読み込み、X と y に割り当てる
for i = 1:3
% CSVファイルの読み込み
data = readmatrix(filePaths{i});
% 各ファイルに応じて X と y の列を指定
if i == 1 % '1-491B.csv': 3列目までがX、4列目がY
X_all{i} = data(:, 1:3); % 1~3列目を X に設定
y_all{i} = data(:, 4); % 4列目を Y に設定
elseif i == 2 % '1-491C.csv': 6列目までがX、7列目がY
X_all{i} = data(:, 1:6); % 1~6列目を X に設定
y_all{i} = data(:, 7); % 7列目を Y に設定
elseif i == 3 % '1-491D.csv': 3列目までがX、4列目がY
X_all{i} = data(:, 1:3); % 1~3列目を X に設定
y_all{i} = data(:, 4); % 4列目を Y に設定
end
end
% ファイルごとの解析をループで実行
for fileIndex = 1:3
% ファイルに対応するデータを取得
X = X_all{fileIndex}; % 特徴量
y = y_all{fileIndex}; % ラベル
% クロスバリデーションの設定
k = 5; % フォールド数
cv = cvpartition(y, 'KFold', k); % クロスバリデーションの分割
accuracy = zeros(k, 1); % 各フォールドの精度を格納する配列
% 各フォールドごとにトレーニングとテストを実行
for i = 1:k
trainIdx = training(cv, i);
testIdx = test(cv, i);
% データの分割
XTrain = X(trainIdx, :);
yTrain = y(trainIdx, :);
XTest = X(testIdx, :);
yTest = y(testIdx, :);
% SVMモデルのトレーニング
model = fitcsvm(XTrain, yTrain, ...
'KernelFunction', 'polynomial', ...
'PolynomialOrder', 2, ...
'KernelScale', 'auto', ...
'BoxConstraint', 1, ...
'Standardize', true);
% モデルを使用してテストセットを予測
[predictions, score] = predict(model, XTest);
% 現在のフォールドの精度を計算
accuracy(i) = sum(predictions == yTest) / length(yTest);
fprintf('ファイル %d - Fold %d Accuracy: %.2f%%\n', fileIndex, i, accuracy(i) * 100);
end
% 全フォールドの平均精度を計算
averageAccuracy = mean(accuracy);
fprintf('ファイル %d - Average Accuracy: %.2f%%\n', fileIndex, averageAccuracy * 100);
% ROC曲線とAUCの計算
[~, ~, ~, AUC_final] = perfcurve(yTest, score(:, 2), 1);
% ブートストラップ法で信頼区間を計算
nBoot = 1000; % ブートストラップの反復回数
[AUC_bootstrap, CI_final] = bootstrapAUC(yTest, score(:, 2), nBoot);
% 混同行列の計算
confusionMatrix_final = confusionmat(yTest, predictions);
tn = confusionMatrix_final(1, 1);
fp = confusionMatrix_final(1, 2);
fn = confusionMatrix_final(2, 1);
tp = confusionMatrix_final(2, 2);
% 指標の計算
sensitivity_final = tp / (tp + fn);
specificity_final = tn / (tn + fp);
ppv_final = tp / (tp + fp);
npv_final = tn / (tn + fn);
accuracy_final = (tp + tn) / sum(confusionMatrix_final(:));
% 結果の表示
fprintf('ファイル %d - 最終的なAUC: %.2f\n', fileIndex, AUC_final);
fprintf('ファイル %d - ブートストラップ法による95%%信頼区間: [%.2f, %.2f]\n', fileIndex, CI_final(1), CI_final(2));
fprintf('感度: %.2f\n', sensitivity_final);
fprintf('特異度: %.2f\n', specificity_final);
fprintf('陽性的中率: %.2f\n', ppv_final);
fprintf('陰性的中率: %.2f\n', npv_final);
fprintf('診断精度: %.2f\n', accuracy_final);
% ROC曲線を描画
figure;
plot(Xroc, Yroc, 'b-', 'LineWidth', 2);
xlabel('特異度');
ylabel('感度');
title(sprintf('ファイル %d - ROC曲線', fileIndex));
grid on;
% 混同行列を描画
figure;
confusionchart(confusionMatrix_final, {'Negative', 'Positive'}, 'RowSummary', 'row-normalized', ...
'ColumnSummary', 'column-normalized');
title(sprintf('ファイル %d - 混同行列', fileIndex));
end
% ブートストラップ法によるAUCと信頼区間を計算する関数
function [AUC, CI] = bootstrapAUC(yTrue, scores, nBoot)
% 初期化
AUC = zeros(nBoot, 1);
for i = 1:nBoot
idx = randi(length(yTrue), [length(yTrue), 1]); % リプレースメントで再サンプリング
yBoot = yTrue(idx);
scoresBoot = scores(idx);
[~, ~, ~, AUC(i)] = perfcurve(yBoot, scoresBoot, 1); % AUC計算
end
% 信頼区間の計算
CI = prctile(AUC, [2.5 97.5]); % 95%信頼区間
end

5 Kommentare

Sandeep Mishra
Sandeep Mishra am 30 Sep. 2024
Could you please provide the specific error messages you encountered while running the code?
Additionally, sharing the CSV files you are using would be helpful for a more thorough analysis of the issue.
% Specify the paths to the CSV files
filePaths = {'C:\Users\rms56\Desktop\1-491B.csv', ...
'C:\Users\rms56\Desktop\1-491C.csv', ...
'C:\Users\rms56\Desktop\1-491D.csv'};
% Create cell arrays to store X and y
X_all = cell(3, 1); % Cell array for feature X
y_all = cell(3, 1); % Cell array for label y
% Arrays to store AUC and scores
AUC_all = zeros(3, 1);
scores_all = cell(3, 1); % Store the model's scores for each file
% Load the 3 files in sequence and assign X and y
for i = 1:3
% Read the CSV file
data = readmatrix(filePaths{i});
% Specify X and y columns for each file
if i == 1 % '1-491B.csv': Columns 1-3 for X, column 4 for y
X_all{i} = data(:, 1:3); % Assign columns 1 to 3 to X
y_all{i} = data(:, 4); % Assign column 4 to y
elseif i == 2 % '1-491C.csv': Columns 1-6 for X, column 7 for y
X_all{i} = data(:, 1:6); % Assign columns 1 to 6 to X
y_all{i} = data(:, 7); % Assign column 7 to y
elseif i == 3 % '1-491D.csv': Columns 1-3 for X, column 4 for y
X_all{i} = data(:, 1:3); % Assign columns 1 to 3 to X
y_all{i} = data(:, 4); % Assign column 4 to y
end
end
% Loop through each file for analysis
for fileIndex = 1:3
% Get the corresponding data for the file
X = X_all{fileIndex}; % Features
y = y_all{fileIndex}; % Labels
% Set up cross-validation
k = 5; % Number of folds
cv = cvpartition(y, 'KFold', k); % Partition the data for cross-validation
accuracy = zeros(k, 1); % Array to store accuracy for each fold
% Perform training and testing for each fold
for i = 1:k
trainIdx = training(cv, i);
testIdx = test(cv, i);
% Split the data into training and test sets
XTrain = X(trainIdx, :);
yTrain = y(trainIdx, :);
XTest = X(testIdx, :);
yTest = y(testIdx, :);
% Train an SVM model
model = fitcsvm(XTrain, yTrain, ...
'KernelFunction', 'polynomial', ...
'PolynomialOrder', 2, ...
'KernelScale', 'auto', ...
'BoxConstraint', 1, ...
'Standardize', true);
% Predict the test set using the trained model
[predictions, score] = predict(model, XTest);
% Calculate accuracy for the current fold
accuracy(i) = sum(predictions == yTest) / length(yTest);
fprintf('File %d - Fold %d Accuracy: %.2f%%\n', fileIndex, i, accuracy(i) * 100);
end
% Calculate the average accuracy across all folds
averageAccuracy = mean(accuracy);
fprintf('File %d - Average Accuracy: %.2f%%\n', fileIndex, averageAccuracy * 100);
% Calculate the ROC curve and AUC
[Xroc, Yroc, ~, AUC_final] = perfcurve(yTest, score(:, 2), 1);
AUC_all(fileIndex) = AUC_final; % Store AUC
scores_all{fileIndex} = score(:, 2); % Store scores
% Calculate confidence intervals using bootstrap method
nBoot = 1000; % Number of bootstrap iterations
[AUC_bootstrap, CI_final] = bootstrapAUC(yTest, score(:, 2), nBoot);
% Confusion matrix
confusionMatrix_final = confusionmat(yTest, predictions);
tn = confusionMatrix_final(1, 1);
fp = confusionMatrix_final(1, 2);
fn = confusionMatrix_final(2, 1);
tp = confusionMatrix_final(2, 2);
% Calculate metrics
sensitivity_final = tp / (tp + fn);
specificity_final = tn / (tn + fp);
ppv_final = tp / (tp + fp);
npv_final = tn / (tn + fn);
accuracy_final = (tp + tn) / sum(confusionMatrix_final(:));
% Display results
fprintf('File %d - Final AUC: %.2f\n', fileIndex, AUC_final);
fprintf('File %d - 95%% Confidence Interval via bootstrap: [%.2f, %.2f]\n', fileIndex, CI_final(1), CI_final(2));
fprintf('Sensitivity: %.2f\n', sensitivity_final);
fprintf('Specificity: %.2f\n', specificity_final);
fprintf('PPV: %.2f\n', ppv_final);
fprintf('NPV: %.2f\n', npv_final);
fprintf('Accuracy: %.2f\n', accuracy_final);
% Plot ROC curve
figure;
plot(Xroc, Yroc, 'b-', 'LineWidth', 2);
xlabel('Specificity');
ylabel('Sensitivity');
title(sprintf('File %d - ROC Curve', fileIndex));
grid on;
% Plot confusion matrix
figure;
confusionchart(confusionMatrix_final, {'Negative', 'Positive'}, 'RowSummary', 'row-normalized', ...
'ColumnSummary', 'column-normalized');
title(sprintf('File %d - Confusion Matrix', fileIndex));
end
% Perform DeLong test to compare AUCs
fprintf('\n--- AUC Comparison via DeLong Test ---\n');
comparisons = {'B vs C', 'B vs D', 'C vs D'};
pairs = [1 2; 1 3; 2 3]; % Comparison pairs (B vs C, B vs D, C vs D)
for compIndex = 1:3
i = pairs(compIndex, 1);
j = pairs(compIndex, 2);
% Run DeLong test and obtain p-value
[pValue, zScore] = delongTest(scores_all{i}, scores_all{j}, y_all{i}, y_all{j});
% Display the results
fprintf('DeLong test p-value for %s: %.4f\n', comparisons{compIndex}, pValue);
end
% Function to calculate AUC and confidence intervals via bootstrap method
function [AUC, CI] = bootstrapAUC(yTrue, scores, nBoot)
% Initialize
AUC = zeros(nBoot, 1);
for i = 1:nBoot
idx = randi(length(yTrue), [length(yTrue), 1]); % Resample with replacement
yBoot = yTrue(idx);
scoresBoot = scores(idx);
[~, ~, ~, AUC(i)] = perfcurve(yBoot, scoresBoot, 1); % Compute AUC
end
% Calculate confidence intervals
CI = prctile(AUC, [2.5 97.5]); % 95% confidence interval
end
% Function to perform DeLong test for AUC comparison
function [pValue, zScore] = delongTest(scores1, scores2, labels1, labels2)
% Ensure labels are identical
if ~isequal(labels1, labels2)
error('The labels for the two datasets must be identical.');
end
% Get indices for positive and negative labels
posIdx = (labels1 == 1);
negIdx = (labels1 == 0);
% Rank transformation
v1 = tiedrank(scores1);
v2 = tiedrank(scores2);
% Sum of scores for positive samples
auc1 = sum(v1(posIdx)) - sum(1:sum(posIdx));
auc2 = sum(v2(posIdx)) - sum(1:sum(posIdx));
% Calculate AUC
auc1 = auc1 / (sum(posIdx) * sum(negIdx));
auc2 = auc2 / (sum(posIdx) * sum(negIdx));
% Variance calculation
q1 = auc1 * (1 - auc1) / (sum(posIdx) * sum(negIdx));
q2 = auc2 * (1 - auc2) / (sum(posIdx) * sum(negIdx));
The function "bootstrapAUC" was closed with an 'end', but at least one other function definition was not. All functions in a script must be closed with an 'end'.
The complete program translated into English and adapted to compare the AUCs between the three files (B, C, D) using the DeLong test. The program calculates AUC for each file, compares the AUCs in pairs (B vs C, B vs D, C vs D), and computes the p-values for each pair.
The error message translates to:
"Logical indices contain true values that are out of bounds of the array.
Error: untitled5>delongTest (line 171) auc1 = sum(v1(posIdx)) - sum(1
(posIdx));"
Sandeep Mishra
Sandeep Mishra am 30 Sep. 2024
Could you verify if your delongTest function is complete? The return values (pValue, zScore) aren't being set.
Also, there's a size mismatch between scores_all{i} and y_all{i} for each 'i', causing an error while calling v1(posIdx) inside the delongTest function.
Takeharu Kiso
Takeharu Kiso am 30 Sep. 2024
Dear Sandeep Mishra,
Thank you very much for providing the detailed feedback regarding the program corrections. I have made the revisions based on your comments, and upon running the program, we were able to calculate the p-value. We had been struggling to resolve this issue, and I am sincerely grateful for your help. Thank you once again.
Moving forward, we plan to compare the AUC using different machine learning models with the same program. Should we encounter any further questions or uncertainties, we would greatly appreciate your assistance once again.
In closing, I would like to express my heartfelt thanks once more.
Thank you very much.
Sincerely,
[Takeharu Kiso]
Sandeep Mishra
Sandeep Mishra am 1 Okt. 2024
Bearbeitet: Sandeep Mishra am 1 Okt. 2024
Thanks for confirmation @Takeharu Kiso,
I will answer the same to mark the question answered!
Please feel free to reach out if you have any further questions or encounter any issues.

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

Sandeep Mishra
Sandeep Mishra am 1 Okt. 2024
Bearbeitet: Sandeep Mishra am 1 Okt. 2024

0 Stimmen

Hi Takeharu,
I executed the code snippet in MATLAB R2023a and encountered the same error you mentioned.
Upon debugging the code, it became evident that there is a size mismatch issue within the 'delongTest' function. Specifically, the variable 'scores_all{i}' has a size of 98x1, while 'y_all{i}' has a size of 491x1 for each variable 'i'. This discrepancy causes an error when calling 'v1(posIdx)' inside the 'delongTest' function.
To resolve this, you need to refactor the code and ensure that the input variables passed to the ‘delongTest’ function are of compatible dimensions.
I hope this helps you rectify the issue.

Weitere Antworten (0)

Kategorien

Produkte

Version

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by