Efficient script to isolate one sub-dataset k-times.

1 Ansicht (letzte 30 Tage)
Vic
Vic am 3 Mär. 2024
Kommentiert: Vic am 7 Mär. 2024
Hi everyone,
The idea is to divide the main dataset into k sub-datasets and delete 1 bin each time and remerge the other sub-datasets. In a nutshell, k bins will create k different sub-datasets. Since the number of bins mays not be a multiple of the number of row in the matrix (Bin k has often less rows), I had to use cell arrays.
Here is an illustration of the general idea for k = 2.
Question:
How can I remove the loop or make this code more efficient?
Here is my script.
------------------------------------------------------
Variables = rand(245,57);
Bin_numb = 11;
Bin_size = [1:floor(length(Variables)/Bin_numb):length(Variables) length(Variables)];
for i = 1:length(Bin_size)-1
if i == 1
Bin_Variables2{1} = Variables(Bin_size(2):Bin_size(end),:);
else
Bin_Variables2{i} = [Variables(Bin_size(1):Bin_size(i)-1,:); Variables(Bin_size(i+1):Bin_size(end),:)];
end
end
Thanks for your inputs
  2 Kommentare
Voss
Voss am 5 Mär. 2024
Bearbeitet: Voss am 5 Mär. 2024
Two observations:
  1. The last row of Variables is included as the last row of every element of Bin_Variables2 (because Bin_size(end) is always included).
  2. When size(Variables,1) is a multiple of Bin_numb, I expect you'd want each element of Bin_Variables2 to be the same size, but that's not what happens.
To illustrate:
Variables = rand(242,7);
Bin_numb = 11;
Bin_size = [1:floor(length(Variables)/Bin_numb):length(Variables) length(Variables)];
for i = 1:length(Bin_size)-1
if i == 1
Bin_Variables2{1} = Variables(Bin_size(2):Bin_size(end),:);
else
Bin_Variables2{i} = [Variables(Bin_size(1):Bin_size(i)-1,:); Variables(Bin_size(i+1):Bin_size(end),:)];
end
end
Observation 1: last row always the same:
fprintf('%36s%s\n','Last row of Variables: ',sprintf('%6.4g ',Variables(end,:)));
Last row of Variables: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156
for ii = 1:numel(Bin_Variables2)
fprintf('%36s%s\n',sprintf('Last row of Bin_Variables2{%d}: ',ii),sprintf('%6.4g ',Bin_Variables2{ii}(end,:)));
end
Last row of Bin_Variables2{1}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{2}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{3}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{4}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{5}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{6}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{7}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{8}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{9}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{10}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156 Last row of Bin_Variables2{11}: 0.02797 0.5595 0.2128 0.4162 0.0364 0.1367 0.6156
Observation 2: unequally sized result matrices even though 242 is a multiple of 11:
bin_sizes = cellfun(@(x)size(x,1),Bin_Variables2)
bin_sizes = 1×11
220 220 220 220 220 220 220 220 220 220 221
Vic
Vic am 7 Mär. 2024
@Voss Thanks for these observations. @Manikanta Aditya & @Dyuman Joshi Thanks for your help. I haven't thought about the logical array. This is an elegant way to solve it.
Here is my current script.
Variables = rand(245,7);
Bin_numb = 11;
Bin_size = 1:floor(length(Variables)/Bin_numb):length(Variables);
if length(Variables)-Bin_size(end) <= 12
Bin_size(end) = length(Variables);
end
Bin_Variables2 = cell(1, length(Bin_size)-1);
for i = 1:length(Bin_size)-1
idx = true(length(Variables), 1);
idx(Bin_size(i):Bin_size(i+1)) = false;
Bin_Variables2{i} = Variables(idx, :);
end
for ii = 1:numel(Bin_Variables2)
fprintf('%1s%s\n',sprintf('Last row {%d}: ',ii),sprintf('%6.4g ',Bin_Variables2{ii}(end,:)));
end
bin_sizes = cellfun(@(x)size(x,1),Bin_Variables2)
length(Variables)-bin_sizes
Bin_size
Unrecognized function or variable 'Variables'.
Invalid expression. Check for missing or extra characters.
I forced a if condition to change Bin_size(end) = length(Variables) if size(Variables,1) is not a multiple of Bin_numb. Therefore, the last bin has floor(length(Variables)/Bin_numb) + mod(length(Variables),Bin_numb) rows (22+3) and I get this:
bin_sizes =
222 222 222 222 222 222 222 222 222 222 220
length(Variables)-bin_sizes =
23 23 23 23 23 23 23 23 23 23 25
It works.
As of the last row always being the same; it seems to be fine now but I still have some doubts about bin N-1 and its size.
Last row {1}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {2}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {3}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {4}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {5}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {6}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {7}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {8}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {9}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {10}: 0.6559 0.4365 0.5963 0.3045 0.6676 0.5343 0.5316
Last row {11}: 0.1865 0.9516 0.07304 0.0887 0.697 0.9751 0.5142

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Manikanta Aditya
Manikanta Aditya am 4 Mär. 2024
Verschoben: Dyuman Joshi am 4 Mär. 2024
Just check out this code snippet which I can propose to make the code more efficient by using logical indexing instead of a loop:
Variables = rand(245,57);
Bin_numb = 11;
Bin_size = [1:floor(length(Variables)/Bin_numb):length(Variables) length(Variables)];
Bin_Variables2 = cell(1, length(Bin_size)-1);
for i = 1:length(Bin_size)-1
idx = true(size(Variables, 1), 1);
idx(Bin_size(i):Bin_size(i+1)-1) = false;
Bin_Variables2{i} = Variables(idx, :);
end
In this code, 'idx' is a logical array that is true for the rows of Variables that you want to keep. This approach avoids the need to concatenate arrays, which can be slow in MATLAB because it involves memory allocation. Instead, you’re just creating a logical index and using it to select the rows you want.
  2 Kommentare
Dyuman Joshi
Dyuman Joshi am 4 Mär. 2024
Bearbeitet: Dyuman Joshi am 4 Mär. 2024
@Manikanta Aditya, This looks good, though I would suggest to use size(Bin_size,1) instead of length(Bin_size).
" ... by using logical indexing instead of a loop:"
You are still using a loop.
@Vic, an important part of the code above is Preallocation, which is a good programming practice in MATLAB resulting in improved code performance.
Manikanta Aditya
Manikanta Aditya am 4 Mär. 2024
Thanks @Dyuman Joshi for the reply back. My bad I didn't see the statement about the loop.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Just for fun finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by