How to remove the value using Histogram

I have the following data, in which my original value is 15 which have count of 7360
I want to remove the remaining values which count less then 33% of orginal values or multiple of the original value
for example in this case I have 30,45 ,60,75 and 90 I want to remove this values. and value of 1 also
How can i do that in MATLAB

Antworten (1)

Star Strider
Star Strider am 24 Jan. 2023

0 Stimmen

I have only a vague idea of what you want to do, especially since the .mat file does not appear to contain the same data as depicted in the posted plot image.
Try this —
LD = load(websave('histogram','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1272595/histogram.mat'))
LD = struct with fields:
ans: [8839×1 double]
v = LD.ans;
Ev = linspace(0, 100, 101)
Ev = 1×101
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
figure
hh = histogram(v, Ev);
Vals = hh.Values;
Edgs = hh.BinEdges;
Retain = (Vals > max(Vals)/3);
Out = Vals(Retain)
Out = 1×2
2896 4568
OutBinsLowerEdge = Edgs(Retain)
OutBinsLowerEdge = 1×2
14 15
If you want to remove the associated data in the original file corresponding to those values, that would be relatively straightforward using logical indexing. Another approach would be to use histcounts, return the 'Bins' output, and index into that.
.

21 Kommentare

Med Future
Med Future am 24 Jan. 2023
The dataset is same as original plot. How can i get the array after processing through it.
What about if i got mutiple of value
Med Future
Med Future am 24 Jan. 2023
Out give the output array as we have original mat file?
I still have no clear idea of what you want to do. It would help somewhat if youoposted the histogram call that produced the data you posted. I have no idea what that is.
If you want to eliminate only those values that are greater than less than of the highest frequency values, one option would be:
LD = load(websave('histogram','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1272595/histogram.mat'))
LD = struct with fields:
ans: [8839×1 double]
Data = LD.ans;
Ev = linspace(0, 100, 1001); % Change This To Produce The Values You Want
figure
hh = histogram(Data, Ev);
[N,Edges,Bin] = histcounts(Data, Ev);
Retain = N > max(N)/3;
FindBins = find(Retain)
FindBins = 1×2
150 151
RetainDataLv = (Bin >= FindBins(1)) & (Bin <= FindBins(2)); % Values In 'Bin' Corresponding To 'Retain' Test
RetainData = Data(RetainDataLv) % Return Desired Subset OF 'Data'
RetainData = 7360×1
15.0200 15.0100 14.9800 15.0100 15.0000 15.0150 14.9950 15.0050 15.0050 15.0050
See if this does what you want.
.
Med Future
Med Future am 25 Jan. 2023
@Star Strider What is Ev? why you select that values
Star Strider
Star Strider am 25 Jan. 2023
The ‘Ev’ vector is the vector of the edges.
I selected it because I still have no idea what you want to do. It seems to produce something similar to the original histogram plot you posted. You never defined how you coded that, so I am doing my best to fill in those gaps.
Image Analyst
Image Analyst am 25 Jan. 2023
@Star Strider you're not the only one. I have no idea what he wants to do. Perhaps removing data points based on bin heights, then re-histogramming, or possibly making the one bin not so high. Certainly needs a better explanation because everyone is confused and don't know what @Med Future wants.
Let me explain it again for you. I have the orginal value which have more number of counts (7360) as you can see it in the histogram
The remaining are the noise.
I want to Delete the remaining Values which have counts less than 33 of maximum counts of value
h=histogram(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
Med Future
Med Future am 25 Jan. 2023
@Star Strider @Image Analyst I hope now you have understand my problem. please if not let me know i will explain again
Image Analyst
Image Analyst am 25 Jan. 2023
Yes, it's clearer now but I'm turning in for the night. If Star doesn't answer you, I'll answer tomorrow.
Med Future
Med Future am 25 Jan. 2023
@Star Strider @Image Analyst I have shared the data above and this is 2nd dataset, Your code does not work on this dataset too.
I still do not understand ‘I want to Delete the remaining Values which have counts less than 33 of maximum counts of value’ so I am guessing that ‘33’ actually means 33 counts, although that also has an ambiguous reference. So I’m interpreting that as any bin with less than 33 counts less than the counts in the maximum bin.
LD = load(websave('seconddata','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1273595/secondata.mat'));
NewDataset = LD.NewData
NewDataset = 1×16075
100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000
h=histogram(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
[N,Edges,Bin] = histcounts(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
Retain = N > max(N)-33;
FindBins = find(Retain)
FindBins = 99
RetainDataLv = (Bin == FindBins(1)); % Values In 'Bin' Corresponding To 'Retain' Test
RetainData = NewDataset(RetainDataLv) % Return Desired Subset OF 'Data'
RetainData = 1×3474
100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000
Using histcounts makes this easier.
.
Med Future
Med Future am 25 Jan. 2023
@Star Strider @Image Analyst Sorry for making this ambigous. I want to delete the values which counts are 33% of the counts of maximum Value.
For example in data I have the maximum value counts are 4641. then 33% of 4641 is (== 1392). I want to remove the values which counts are less than 1392.
O.K. That requires one small change —
LD = load(websave('seconddata','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1273595/secondata.mat'));
NewDataset = LD.NewData
NewDataset = 1×16075
100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000
h=histogram(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
[N,Edges,Bin] = histcounts(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
FindBins = 99
RetainDataLv = (Bin == FindBins(1)); % Values In 'Bin' Corresponding To 'Retain' Test
RetainData = NewDataset(RetainDataLv) % Return Desired Subset OF 'Data'
RetainData = 1×3474
100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000
This returns the original values in bins that are greater than of the maximum bin count value.
.
The BinLimits is 1 to 10000, when i run the following code the FindBins value shows there are 5 bins 99, 100,150,200,250
[N,Edges,Bin] = histcounts(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
But in the following code i got only 1 bin of 99
RetainDataLv = (Bin == FindBins(1)); % Values In 'Bin' Corresponding To 'Retain' Test
RetainData = NewDataset(RetainDataLv)
It retain only one Bin Values not all Bins Values
Star Strider
Star Strider am 26 Jan. 2023
Bearbeitet: Star Strider am 26 Jan. 2023
That must be different data.
Using slightly changed code on both available data —
LD = load(websave('seconddata','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1273595/secondata.mat'));
NewDataset = LD.NewData(:)
NewDataset = 16075×1
100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000
h=histogram(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
[N,Edges,Bin] = histcounts(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
FindBins = 99
RetainDataLv = (Bin == FindBins); % Values In 'Bin' Corresponding To 'Retain' Test
SzRD = size(RetainDataLv);
RetainData = NewDataset(any(RetainDataLv,min(SzRD))) % Return Desired Subset OF 'Data'
RetainData = 100.0000
LD = load(websave('histogram','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1272595/histogram.mat'))
LD = struct with fields:
ans: [8839×1 double]
Data = LD.ans(:);
h=histogram(Data,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
[N,Edges,Bin] = histcounts(Data,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 100]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
FindBins = 1×2
14 15
RetainDataLv = (Bin == FindBins); % Values In 'Bin' Corresponding To 'Retain' Test
SzRD = size(RetainDataLv)
SzRD = 1×2
8839 2
RetainData = Data(any(RetainDataLv,min(SzRD))) % Return Desired Subset OF 'Data'
RetainData = 7464×1
15.0200 15.0100 14.9800 15.0100 15.0000 15.0150 14.9950 15.0050 15.0050 15.0050
See if that does what you want.
EDIT — (26 Jan 2023 at 13:20)
Added ‘(:)’ in the assignment defining the data after the load call to force the data vectors to be column vectors.
.
The BinLimits Changes from [1 100] to [1 10000] Now the FindBins have values of [99 100 150 200 250]
When I run the code on 'seconddata' dataset it gives error
Arrays have incompatible sizes for this operation.
[N,Edges,Bin] = histcounts(NewData,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
RetainDataLv = (Bin == FindBins); % Values In 'Bin' Corresponding To 'Retain' Test
Star Strider
Star Strider am 26 Jan. 2023
I do not understand the reason you are getting the error.
My latest code (adjusted to work with row or column matrices of ‘RetainDataLv’) runs without error when I ran it in my previous Comment with both .mat files.
Having consistent row or column data files would help.
Med Future
Med Future am 26 Jan. 2023
You have to change the The BinLimits Changes from [1 100] to [1 10000] and check in your previous code you only have one bin which is 99 thats why no error in that code
Changed —
LD = load(websave('seconddata','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1273595/secondata.mat'));
NewDataset = LD.NewData(:)
NewDataset = 16075×1
100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000
h=histogram(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
[N,Edges,Bin] = histcounts(NewDataset,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
FindBins = 1×5
99 100 150 200 250
RetainDataLv = (Bin == FindBins); % Values In 'Bin' Corresponding To 'Retain' Test
SzRD = size(RetainDataLv);
[~,idx] = min(SzRD);
RetainData = NewDataset(any(RetainDataLv,idx)) % Return Desired Subset OF 'Data'
RetainData = 10289×1
100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000
LD = load(websave('histogram','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1272595/histogram.mat'));
Data = LD.ans(:);
h=histogram(Data,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
[N,Edges,Bin] = histcounts(Data,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
FindBins = 1×2
14 15
RetainDataLv = (Bin == FindBins); % Values In 'Bin' Corresponding To 'Retain' Test
SzRD = size(RetainDataLv)
SzRD = 1×2
8839 2
[~,idx] = min(SzRD);
RetainData = Data(any(RetainDataLv,idx)) % Return Desired Subset OF 'Data'
RetainData = 7464×1
15.0200 15.0100 14.9800 15.0100 15.0000 15.0150 14.9950 15.0050 15.0050 15.0050
This appears to work.
.
@Star Strider @Image Analyst Same Error with secondata
Can you please solve this
Arrays have incompatible sizes for this operation.
Data=NewData;
h=histogram(Data,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
[N,Edges,Bin] = histcounts(Data,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
RetainDataLv = (Bin == FindBins); % Values In 'Bin' Corresponding To 'Retain' Test
SzRD = size(RetainDataLv)
[~,idx] = min(SzRD);
RetainData = Data(any(RetainDataLv,idx))
You need to force ‘Data’ to be a column vector to work with my code, using the ‘(:)’ operator:
Data=NewData(:);
I decided to do this to make my code compatible with all the data sets, since some are row vectors and some are coliumn vectors.
Try this —
LD = load(websave('secondata','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1275815/secondata.mat'));
Data = LD.NewData(:) % Force Column Vector
Data = 16075×1
100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000
h=histogram(Data,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
[N,Edges,Bin] = histcounts(Data,10000,"BinMethod","sturges",'BinWidth',1,'BinLimits',[1 10000]);
Retain = N > max(N)/3; % Retain Values In Bins Greater Than One-Third Of The Meximum Bin Count Value
FindBins = find(Retain)
FindBins = 1×5
99 100 150 200 250
RetainDataLv = (Bin == FindBins); % Values In 'Bin' Corresponding To 'Retain' Test
SzRD = size(RetainDataLv);
[~,idx] = min(SzRD);
RetainData = Data(any(RetainDataLv,idx)) % Return Desired Subset OF 'Data'
RetainData = 10289×1
100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000 100.0000
That should work with all the data vectors, regardless of whether their initial orientation is as row or column vectors.
.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Image Processing and Computer Vision finden Sie in Hilfe-Center und File Exchange

Produkte

Version

R2021b

Gefragt:

am 24 Jan. 2023

Kommentiert:

am 27 Jan. 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by