linear model of my code and the R^2 answer change every time i change the input, but the non-linear model gives me the same R^2 answer even when changing my input.

29 Ansichten (letzte 30 Tage)
linear model of my code and the R^2 answer change every time i change the input, but the non-linear model gives me the same R^2 answer even when changing my input. i followed all lecture notes, but can't figure out my error. is anyone able to spot my mistake. below is all my code which i have been told is correct by my lecture but now it is the weekend, i don't have access to him. i am unsure if i need to retrain my model for the non-linear model or not as it has already been done in the linear model.
% Load data from Excel file
DATA = readmatrix('Concrete_Data.xlsx');
% Compute mean and standard deviation for each column of data
M = mean(DATA);
S = std(DATA);
% Create labels for each variable
text = ["Cement (kgm^3)", "Blast Furnace Slag (kgm^3)", "Fly Ash (kgm^3)","Water (kgm^3)", "Superplasticizer (kgm^3)", "Coarse Aggregate (kgm^3)", "Fine Aggregate (kgm^3)", "Age (day)"];
% Plot scatter plots and compute R-squared values for each variable
r_squared = zeros(1,8); % pre-allocate array to store R-squared values
figure;
for i = 1:8
subplot(2,4,i);
scatter(DATA(:,i), DATA(:,9),'filled');
title(sprintf('Average = %5.2f\n Standard Deviation = %5.2f',M(i), S(i)));
xlabel(text(i));
ylabel("Concrete Compressive Strength (MPa)");
box on ; grid on ;
hold on;
x = DATA(:,i);
y = DATA(:,9);
p = polyfit(x, y, 1);
yfit = polyval(p,x);
yresid = y - yfit;
SSresid = sum(yresid.^2);
SStotal = (length(y)-1)*var(y);
r_squared(i) = 1 - SSresid/SStotal;
rsq = r_squared(i);
fprintf('R-squared for %s: %5.2f\n', text(i), rsq);
hold off;
end
input_threshold = input('Enter R-Squared threshold: ');
variable_names = [];
t = 0;
for i = 1:8
if r_squared(i) > input_threshold
t = t + 1;
significant_data(:,t) = DATA(:,i);
variable_names = [variable_names; text(i)];
end
end
fprintf('Variables with R-Squared values above:\n');
disp(variable_names);
significant_data(:,t+1) = DATA(:,9);
rng(1);
cv = cvpartition(length(significant_data),'HoldOut', 0.3);
training_DATA = significant_data(cv.training,:);
testing_DATA = significant_data(cv.test,:);
model = fitlm(training_DATA(:,1:end-1), training_DATA(:,end))
predictions = predict(model, testing_DATA(:,1:end-1));
nlm = @(b,x)b(1)+b(2).*x(:,1).^2+b(3).*x(:,1);
Non_linear_model = fitnlm(training_DATA(:,1:end-1), training_DATA(:,end), nlm,[1 1 1])
Non_linear_predictions = predict(Non_linear_model, testing_DATA(:,1:end-1));
  5 Kommentare
the cyclist
the cyclist am 29 Apr. 2023
After having taken just a quick look (and not really answering your question), I'll mention one other general thing.
When one refers to a "linear" model, that means that it is linear in the parameters, not the terms. Having a term that is x1^2 does not make the model non-linear. That model is linear the parameters (b1,b2, b3).
So, you can ignore what I said in my prior comment, about R^2 not being useful.
Also, there is no reason to use fitnlm to fit that model. You can use fitlm just fine. (I believe you should get the same coefficients, within perhaps some roundoff error.)
I don't know if that will help explain the problem you are seeing, which I have not tried to investigate yet.
Sam
Sam am 29 Apr. 2023
My knowledge of matlab is very poor so that’s why I’m posting in this forum. The only reason I used fitnlm is because this is what we were taught in lectures

Melden Sie sich an, um zu kommentieren.

Antworten (1)

the cyclist
the cyclist am 29 Apr. 2023
Here is why you always get the same result for your "non-linear" model.
Notice that your first variable (Cement) is the one with the highest R^2. Therefore, any threshold that is low enough to include any variables is going to include Cement, and it is going to be your first column.
Your "non-linear" model only looks at the first column of input data, because you specify it like this:
nlm = @(b,x)b(1)+b(2).*x(:,1).^2+b(3).*x(:,1);
Notice you only use x(:,1).
Therefore, the model you estimate using fitnlm will only ever use the Cement variable, and you will always get the same result.
  4 Kommentare
Sam
Sam am 30 Apr. 2023
i have uploaded the brief now also, so hopefully that makes it more clear
the cyclist
the cyclist am 30 Apr. 2023
I think there are two tricky aspects to answering #4. One is conceptual, and one is technical.
The conceptual issue is that it is very unclear what non-linear model to try here. Looking at your nice subplots, there seems to be almost no dependence of compressive strength on any variable, other than the slight correlation with cement. I can't really offer any suggestions there.
The technical challenge is that if you don't know how many explanatory variables are going into the model, it is difficult to write the model formula. You wrote
nlm = @(b,x)b(1)+b(2).*x(:,1).^2+b(3).*x(:,1);
which is fine if there is only one variable. But if there are two variables, then you maybe want
nlm = @(b,x) b(1) + b(2).*x(:,1).^2 + b(3).*x(:,1) ...
+ b(4).*x(:,2).^2 + b(5).*x(:,2);
and for three variables you could do
nlm = @(b,x) b(1) + b(2).*x(:,1).^2 + b(3).*x(:,1) ...
+ b(4).*x(:,2).^2 + b(5).*x(:,2) ...
+ b(6).*x(:,3).^2 + b(7).*x(:,3) ...
;
and so on. So, you would need to a list of if statements that chooses the correct model formula, based on the number of selected variables. Maybe there is another way, but I can't think of one.

Melden Sie sich an, um zu kommentieren.

Produkte


Version

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by