Word segmentation based on projection histogram ?

2 Ansichten (letzte 30 Tage)
Nguyen Hien
Nguyen Hien am 3 Aug. 2015
Kommentiert: Ayush Gupta am 1 Apr. 2018
Hi all,
I am currently working on an OCR project, and I am stuck now at word segmentation. The basic algorithm is to base on the horizontal projection of a segmented line, I will look for space between rising edge and falling edge. The problem is I could not differentiate between word space and character space, or I could not automatically find the proper threshold to crop out a word. Please help, any help would be appreciated, thank you guys By the way, how could I contact mr Image Analysis directly please ? Here is how my work is at the moment:
%read in an image
close all, clear all;
I = imread('C:\Users\Nguyen Duy Hien\Desktop\bible.jpg');
%to grayscale image
I = rgb2gray(I);
level = graythresh(I);
%binarization
BW = im2bw(I,level);
%BW = imadjust(I);
%smoothering image
h = fspecial('gaussian',[3 1],0.8);
BW = imfilter(BW,h);
BW=~BW;
BWedge = edge(uint8(BW));
BW = imfill(BWedge,'holes');
figure(1),imshow(BW)
%---line segmentation
pV = sum(BW,1);
pH = sum(BW,2);
figure(2),plot(pH)
figure(3),plot(pV)
lines = pH > 0;
%Detect rising edge and falling edge
d = diff(lines);
startingColumns = find(d>0);
endingColumns = find(d<0);
subImage = [];
n = length(startingColumns);
space = []>0;
y=[];
count = 1;
for k = 1 : n
subImage{k} = BW(startingColumns(k):endingColumns(k),:);
figure(4)
subplot(n,1,k),imshow(subImage{k})
pHline{k} = sum(subImage{k},1);
figure(5)
subplot(n,1,k),plot(pHline{k})
lineN = pHline{k} > 0;
a = diff(lineN);
startingRow = find(a>0);
endingRow = find(a<0);
buf_end = [];
buf_start = [startingRow(1)];
m = length(startingRow)-1;
for j = 1 : m
space{j} =startingRow(j+1) - endingRow(j);
A = cell2mat(space);
y = [y, max(A)];
if min(y)<space{j} && max(y)>space{j}
buf_end = [buf_end; endingRow(j)];
buf_start = [buf_start; startingRow(j+1)];
end
end;
buf_end = [buf_end; endingRow(end)];
o = length(buf_end);
for i=1:o
word{i} = subImage{k}(:,buf_start(i):buf_end(i));
wordarr{count} = word{i};
figure, imshow(wordarr{count})
%figure(6), subplot(o,n,count),imshow(wordarr{count})
count = count+1;
end;
end;
  2 Kommentare
Walter Roberson
Walter Roberson am 4 Aug. 2015
Image Analyst does not wish to be contacted privately. He responds to some posts, if it amuses him to do so.
sayar chit
sayar chit am 14 Nov. 2017
Hi Sir! I am studying image segmentation from printed documents. I got well line segmentation and words segmentation but I cannot get character segmentations from words. So can anyone help me. This is my words a
s inputs. I want to get its as follows မ,ိ,ှု,င,်,း,တ,ိ,ု,က,်,၍

Melden Sie sich an, um zu kommentieren.

Antworten (3)

Nguyen Hien
Nguyen Hien am 4 Aug. 2015
Thank you guys so much for your help, fortunately I have figured out the solution
  2 Kommentare
somanath prakash
somanath prakash am 3 Apr. 2017
Nguyen hien please let me have the solution!!!
Ayush Gupta
Ayush Gupta am 1 Apr. 2018
hey can you send the solution

Melden Sie sich an, um zu kommentieren.


Image Analyst
Image Analyst am 4 Aug. 2015
You just did contact me directly - as direct as it gets. Sorry, I don't do private consulting, besides, OCR is not even my field. I'd just refer you to either the Computer Vision System Toolbox, or, if that doesn't work, then Vision Bib: http://www.visionbib.com/bibliography/contentschar.html#OCR,%20Document%20Analysis%20and%20Character%20Recognition%20Systems Besides you didn't even attach your image so we can't try your code and I couldn't detect problems like yours just by looking over the code and imagining what it would do with an image. Sorry but if it's major algorithm development, we just don't have the time for that here. If it's something quick, like a few minutes to correct syntax or logic flow or something, then maybe we can help with something that short.

Walter Roberson
Walter Roberson am 4 Aug. 2015
There is no fixed number of pixels that can be used to define the difference between spacing between characters and spacing between words. Some languages do not have spacing between words. And the spacing between characters on a very large sign could be larger than the total length of a word on a smaller sign.
You need to examine the relative distance between centroids, perhaps as compared to the width of the blobs.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by