Automatic Text Detection from Image Based on Edge and Fuzzy Logic

Abstract:

Ocr technology has progressed significantly in the recognition of optical characters. most Ocr used in the current use. however, the paper sheet of paper is only read by the paper sheet of paper.

In computer vision research, there is a known problem to detect and extract text from the document. the aim of the research is to extract and recognize the text using edge-based and fuzzy logic. the implementation and evaluation of the proposed algorithm will be carried out using a set of documents.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

To my dearest father, mother and family,

for their encouragement, blessing and inspiration…

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I am grateful to ALLAH on His blessing and mercy for

giving me the strength along the challenging journey of carrying out this project and

making this project successful.

First of all, I would like to express my deepest appreciation to my supervisor,

for his effort, guidance and support throughout this

project. Without his advice, suggestions and guidance, the project would have not been

Successful and achieve the objectives.

To all lecturers who have taught me, thank you for the lesson that has been

delivered. Not forgetting all my friends, thank you for their useful idea, information and

moral support during the course of study.

Last but not least, I would like to express my heartiest appreciation to my parents

and my family, who are always there when it matters most.

 

 

 

 

 

 

 

Table of contents

Abstract ……………………………………………………………………………2

Table of content ……………………………………………………………………..5

List of tables …………………………………………………………………………11

List of figures ………………………………………………………………………..12

List of symbols ………………………………………………………………………15

List of appendices ……………………………………………………………………16

Chapter 1:      1.1 Introduction ……….………………………………………….17

Chapter 2:      2.1 Image Processing ………………………………………………..19

            2.2 Components of Image Processing System: …………………….19

2.3       Fundamental Steps in Digital Image Processing: …………20

2.4       Sampling and Quantization: ……………………………….22

2.4.1    Digital Image Definition ……………………………………22

2.4.2    Representing Digital Images: ………………………………23

2.4.3    Image Sensing and Acquisition: ……………………………24

2.5       Image sampling and Quantization …………………………26

2.6       Digital Image Representation ………………………………26

2.7       Image Restoration ………………………………………….27

2.8       Degradation Model …………………………………………27

2.9       Noise Models ……………………………………………….28

2.10     Boundary Representation …………………………………29

2.11     Types of digital images …………………………………….29

2.12     What is Feature Extraction? ……………………….……..33

2.13     Why Feature Extraction is Useful? ………………………33

2.14     Applications of Feature Extraction ………………………33

2.15     Image Recognition …………………………………………34

2.16     What is Image recognition? ………………………………35

2.17     Image Processing Techniques …………………………….36

2.18     Working of Convolutional and Pooling layers …………..36

2.19     What is Object Recognition? ……………………………..37

2.20     Image Edge Detection ……………………………………..40

2.21     Fuzzy Noise Estimation ……………………………………41

2.20     Fuzzy Image Smoothing ……………………………………41

Chapter 3:      literature review ………………………………………….42

3.1       Introduction ………………………………………………..42

3.2       Segmentation Categories ………………………………….42

3.2.1    Threshold Based Segmentation ……………………………42

3.2.2    Clustering Techniques ……………………………………..42

3.2.3    Matching ……………………………………………………42

3.2.4    Edge Based Segmentation …………………………………43

3.2.5    Region Based Segmentation ……………………………….43

3.3       Categories of Variance Text ………………………………43

3.3.1    Lighting Variance …………………………………………43

3.3.2    Scale Variance ……………………………………………..43

3.3.3    Orientation Variance ………………………………………43

3.3.4    Imperfect Imaging Conditions In uncontrolled ………..44

3.4       Recognition Text …………………………………………44

3.4.1    Text Deletion            ……………………………………………..45

3.4.2    Text Area Identification …………………………………45

3.4.3    Text Region Localization ………………………………..46

3.4.4    Text Extraction and Binary Image ……………………..46

3.5       Analytic Segmentation …………………………………..47

3.5.1    Pattern Recognition ………………………………………47

3.5.1.1 Statistical Pattern Recognition ………………………….47

3.5.1.2 Data Clustering …………………………………………..47

3.5.1.3 Fuzzy sets …………………………………………………48

3.5.1.3.1 Fuzzy Image Processing ………………………………..48

3.5.1.4 Neural Networks ………………………………………….49

3.5.1.5 Structural Pattern Recognition ………………………….49

3.5.1.6 Syntactic Pattern Recognition ……………………………49

3.5.1.7 Approximate Reasoning Approach ……………………..50

3.5.1.8 Application of Support Vector Machine (SVM) ……….50

3.5.1.9 Template Matching ………………………………………50

3.5.1.10 K-Nearest Neighbor Classifier …………………………50

3.5.2    Pattern Recognition System …………………………….51

3.5.2.1 The Structure of Pattern Recognition ………………….51

3.5.3    Application of Pattern Recognition …………………….52

3.5.4    Character Recognition ………………………………….52

3.5.5    Text Verification ………………………………………..52

3.6       Run-Length Coding Algorithm ………………………..53

3.6.1    Neighbors ………………………………………………..53

3.6.2    Path ………………………………………………………55

3.6.3    Foreground ……………………………………………..55

3.6.4    Connectivity …………………………………………….55

3.6.5    Connected Component …………………………………56

3.6.6    Background …………………………………………….56

3.6.7    Boundary ……………………………………………….56

3.6.8    Interior …………………………………………………56

3.6.9    Surrounds ………………………………………………57

3.6.10  Component Labeling …………………………………..57

3.7       Properties Text …………………………………………58

3.7.1    Removing the Borders …………………………………58

3.7.2    Divide the Text into Rows …………………………….59

3.7.3    Divide the Row “Lines” into the Words ……………..59

3.7.4    Divide the Word into Characters …………………….60

3.8       Identify Character …………………………………….60

3.9       Fuzzy Logic ……………………………………………61

3.9.1    What Fuzzy Logic? ……………………………………61

3.9.2    What is the Fuzzy Logic Toolbox?  ……………………62

3.9.3    Fuzzy Sets ………………………………………………63

3.9.4    Membership Function …………………………………63

3.9.5    If-Then Rules ………………………………………….64

3.9.6    Fuzzy Inference System ………………………………65

3.9.7    Rule Review ……………………………………………65

3.9.8    Surface Review ………………………………………..65

3.10     Summary ………………………………………………66

Chapter 4:      Methodology …………………………………………67

4.1       Introduction  …………………………………………..67

4.2       Problem Statement and Literature Review …………68

4.3       System Development …………………………………68

4.4       Performance Evaluation ……………………………..69

4.5       General Steps of Proposed Techniques ……………..69

4.5.1    Proposed Algorithm: Edge Based Text Extraction …70

4.5.1.1 Detection ………………………………………………70

4.5.1.2 Feature Map and Candidate Text Region Detection.70

4.5.1.2.1 Directional Filtering …………………………………75

4.5.1.2.2 Edge Selection ……………………………………….75

4.5.1.2.3 Feature Map Generation ……………………………77

4.5.1.3    Localization …………………………………………77

4.5.1.4    Character Extraction ………………………………77

4.6       Connection Component ……………………………..78

4.7       Fuzzy Logic …………………………………………..83

4.8       Summary ……………………………………………..84

Chapter 5:      Results …………………………………………………………85

5.1       Introduction ………………………………………….85

5.2       Input Image ………………………………………….85

5.3       Complement Edge Detect with them ………………86

5.4       Total Edge Detection ………………………………..87

5.5       Document Localization ……………………………..88

5.6       Separate Text from Background …………………..88

5.10     Determine Character by Run-Length ……………..91

Chapter 6:      Conclusion ………………………………………….95

6.1       Introduction …………………………………………95

6.2       Discussion on Results ………………………………95

6.3       Project Advantage ………………………………….95

6.4       Suggestion and Future Works …………………….96

6.5       Conclusion ………………………………………….96

References …………………………………………………..95

Appendices …………………………………………………101

 

 

 

 

 

        

List of Tables

Table 4.1: Results to object to rows            ………………………………………………..81

Table 4.2: Result of the document scan …………………………………………….82

Table 5.1: Performance evaluation 1 ………………………………………………89

Table 5.2: Performance evaluation 2 ………………………………………………89

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

List of Figures

Figure 2.1: Survey image …………………………………………………………19

Figure 2.2: knowledge base ………………………………………………………21

Figure 2.3: (a) input image (b) output image ……………………………………21

Figure 2.4: Continuous image ……………………………………………………22

Figure 2.5: Illumination …………………………………………………………..24

Figure 2.6: Image Acquisition using a Sensor Arrays ………………………….25

Figure 2.7: Acquisition image ……………………………………………………25

Figure 2.8: Illumination ………………………………………………………….26

Figure 2.9: Transform image ……………………………………………………28

Figure 2.10: Coordinates of shape ………………………………………………29

Figure 2.11: Binary image ……………………………………………………….30

Figure 2.12: True color or red-green-blue image ………………………………30

Figure 2.13: Color image …………………………………………………………31

Figure 2.14: Map image ………………………………………………………….33

Figure 2.15: Recognition image …………………………………………………35

Figure 2.16: Numerical data image ……………………………………………..35

Figure 2.17: Convolution ………………………………………………………..36

Figure 2.18: Structure of pattern recognition …………………………………38

Figure 2.19: System of fuzzy logic ………………………………………………40

Figure 3.1: General model of extraction text ………………………………….44

Figure 3.2: Fuzzy Image Processing ……………………………………………48

Figure 3.3: The composition of a PR system ………………………………….52

Figure 3.4: Horizontal projection calculated from run-length code …………53

Figure 3.5: 4 – and 8 –neighborhoods …………………………………………..54

Figure 3.6: 4-path and 8-path …………………………………………………..55

Figure 3.7: Border of an image …………………………………………………56

Figure 3.8: Ambiguous border …………………………………………………56

Figure 3.9: A binary image with its boundaries ………………………………57

Figure 3.10: Connected components …………………………………………..58

Figure 3.11: Divide the text into rows …………………………………………59

Figure 3.12: Divide the rows into the words ………………………………….59

Figure 3.13: Divide the word into characters …………………………………60

Figure 3.14: Identify character ………………………………………………..60

Figure 3.15: A classic set and Fuzzy set ………………………………………62

Figure 3.16: Input of (a) pixel (b) location for a pixel ……………………….63

Figure 3.17: Output variable “letter” …………………………………………64

Figure 3.18: Building the system with fuzzy logic ……………………………65

Figure 4.1: Proposed method ………………………………………………….68

Figure 4.2: Block diagram of general steps of proposed approach …………69  

Figure 4.3: Gaussian filter ……………………………………………………..70

Figure 4.4: Samples Gaussian pyramid with 8 levels ………………………..71

Figure 4.5: Extraction operation ………………………………………………72

Figure 4.6: Edges detection ……………………………………………………74

Figure 4.7: U shaped object with 4 runs ………………………………………80

Figure 4.8: 8-neighborhoods …………………………………………………..83

Figure 4.9: Identify the character ……………………………………………..83

Figure 4.10: Example of fuzzy ………………………………………………….84

Figure 5.1: Colored document …………………………………………………85

Figure 5.2: Edge detection ……………………………………………………..86

Figure 5.3: Effect of adding two edges ………………………………………..87

Figure 5.4: Total of edges detection            ……………………………………………87

Figure 5.5: Localized of text …………………………………………………..88

Figure 5.6: Separate text from background ………………………………….88

Figure 5.7: Test document …………………………………………………….89

Figure 5.8: Test document …………………………………………………….89

Figure 5.9: Determine Character ……………………………………………..91

Figure 5.10: Ten inputs and one output ………………………………………91  

Figure 5.11: Input one N1 ………………………………………………….…92

Figure 5.12: Output ……………………………………………………………93

Figure 5.13: Output of extracted text ………………………………………..93

 

 

 

 

 

 

 

List of Symbols

OCR – Optical character recognition

CC – Connected components

BAG – Black adjacency graph

AMA – Aligning-and merging analysis

SVM – Support vector machine

RLC – Run-length code

PR – Pattern recognition

SE – Structuring element

MFs – membership functions

FIS – Fuzzy Inference System

 

 

 

 

 

 

 

 

 

 

 

 

 

List of Appendices

 

Appendix A1: Matlab command to find binary image ………………………….101

Appendix A2: Matlab command used fuzzy logic for identify character ………101

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Chapter 1

 

Introduction

 

 

1.1 Introduction

The information provided in the document and video usually provides a brief and important information about the demand for the development of the document and video. a sign is a sign that suggests the presence of a fact. The structure of the structure may be displayed with letters or advertising. There’s a sign everywhere in our lives. When they know you, life is simpler. Many applications can use the detection of edges. Vehicle license or post-detection is a well-known example. The detection of edges is the process of finding the edges. The document’s document is significantly reduced by the document’s document, which is significantly reduced by the document’s document, which is significantly reduced by the document’s document, which is significantly reduced by the document’s document. OCR technology that can only handle text against a plain monochrome background, and text from a complex background, OCR engines cannot they can only recognize text in a text-only background. The document has been mainly used to extract text from the document’s text, which is often skewed and slanted. in the last few years, I’ve been thinking about it. there were rapid growth in applications using fuzzy logic, which were used in a variety of applications. it’s a logical system that’s a multi-value system. the character is used to identify the character after extracting text from the document [10]. Although the system can be useful, it cannot be used in many situations, because the characters have many shortcomings that can be improved in larger size and angle than the document documents scanned using scanner. [11]. the aim of this research is to improve the algorithm that can automatically detect text using the edge that contains eight kernels, which can be used to produce a circular rotation of the coefficient. Edge detection has been used to detect digital document corrupted and sharp intensity, edge-based with multi-scale and multi-orientation, which can be used in the document. The text data is particularly interesting because the text data can contain different text depending on the size, orientation and alignment of the document. The prototype English translation system, which normally has a horizontal, vertical and diagonal alignment, was used in this algorithm. It is necessary to ensure that the candidate text is suitable for the use of the algorithm, the style, the orientation, the alignment and the effect of lighting and the computation time is sufficiently small.

In this research, the algorithm of text identification based on the text extraction of text using a text extraction algorithm based on the text extraction algorithm.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Chapter 2

 

2.1 Image Processing

It’s a collection of functions that extend the capabilities of Matlab’s numeric computing environment. the image processing problem is a problem that can be easily solved in a simple, clear way, which is ideal for the creation of a software prototype for the solution of image processing problems.

2.2 Components of Image Processing System:

Figure 2.1 survey image

Image Sensors: two elements are required to acquire digital images using sensors.

The first is a physical device that is sensitive to the energy emitted by the object we want to capture, and the second is a specialized image processing device.

Specialize Image Processing Hardware: is is a digitizer, which is a hardware device that performs other primitive operations such as an arithmetic and logic

Computer: It’s a general-purpose computer, and it can and can range from a PC to a  supercomputer. Or a specific computer is used to achieve the required performance in dedicated applications.

Software: The software is designed to perform specific tasks, which  also provides with the ability to write code, as a minimum, using the specialized module. these modules can be integrated using more sophisticated software. these modules can be integrated using more sophisticated software

Mass Storage :The ability to store images is a must in image processing applications. if the image is not compressed, the storage space required is 1 megabyte, which is 1024 x 1024 pixels. in which the intensity of each pixel is an 8-bit quantity

Image processing applications falls into three categories of storage.

  • Short term storage for use during processing.
  • On line storage for relatively fast retrieval.
  • Archival storage such as magnetic tapes and disks.

Image Display: The main display is a color television monitor. the monitors are driven by the output of the image and graphics cards that are part of the computer system.

Hardcopy Devices: The image recording devices include laser printers, film cameras, heat-sensitive devices, and digital units such as optical and Cd-rom disk. the highest resolution is provided by the film, but the best choice is paper.

Networking: The vast amount of data in image processing applications is almost a default function in all computer systems today. the main consideration in the transmission of images.

2.3 Fundamental Steps in Digital Image Processing:

The image processing is divided into two categories:

  1. Methods whose outputs and input are images.
  2. Methods whose outputs and attributes extracted from those images.

Figure 2.2 knowledge base

Image acquisition: It’s as easy as taking a picture of the digital image. the image acquisition stage usually involves processing of the image.

Image enhancement: It’s the easiest and most popular digital image processing. the idea is to bring out the details that are obscured or simply to highlight certain aspects of interest in the image. the image processing is a very personal issue.

Figure 2.3 (a) input image figure 3 (b) output image

Image restoration: It’s about restoring the image. it is a mathematical or probability model of image processing, which is based on mathematical or probability models. enhancement is based on human preferences about what constitutes a “good” enhancement.

2.4 Sampling and Quantization:

The digital image is created by converting continuous data into digital form. it’s a two-stage sampling and quantification. the image can be continuous with respect to x and, the coordinates and the amplitude. we have to sample both coordinates and amplitudes in the digital form. sampling is called Digitalization. the quantization is called quantization. the image is continuously on the Ab line. we’ll take the same samples along the line Ab. in the bottom part, a vertical tick marks the location of each sample. the sample is shown as a block of blocks, which are superimposed on the function of the set of these discrete locations. the gray values must also be converted into a digital form (quantized).

So, we’re dividing the gray scale into eight levels, from eight levels. the quantification of the continuous gray is quantified by the assignment of one of the eight discrete gray levels. the task is depending on the vertical proximity of a simple to a vertical tick. the digital image is created by the top of the image and the entire procedure line.

Digital Image Definition:

The analog image F (x, y) is obtained by sampling a digital image F (m, n) in a 2D discrete space. the following chapters shall describe the mathematics of the sampling procedure. we’ll look at some basic definitions of digital images for now. the two-dimensional image of F (x, y) is divided into N rows and M columns m=0,1,2..N-1 and n=0,1,2…N-1 . a pixel is a row and column intersection. the value of the integer coordinates (m, n) is F (m, n). in fact, the function is often a function of many variables, including depth, color and time (t).

                                     Figure 2.4 continuous image

The processing of images is performed using three types of computerized processes:

1- Low-level Process: This is a process that reduces noise, contrast enhancement and image sharpening. the process is characterized by the fact that both inputs and outputs are images.

2- Mid-level Image Processing :  It is performed in a way that reduces the object to a form suitable for computer processing, and classification of individual objects. the process is usually performed on images, but the output is usually attributes extracted from images.

3- High-level processing: It is performed in the context of the recognition of objects, as in image analysis, and the cognitive functions normally associated with vision.

Representing Digital Images:

The result of sampling and quantization is a matrix of real numbers. the resulting digital image is sampled so that the resulting digital image is M rows and N columns. the coordinates (x, y) are now a discrete quantity, so the value of the coordinate is (x, y)= (0, 0) the next coordinate value is the next coordinate value in the first column.

The actual values of the physical coordinates of the image are not necessarily the same.

therefore, the matrix is a digital element, pixel or pel.

The matrix can also be represented as follows. the sampling process is a partition of XY plane into a grid with the coordinates of the center of the XY plane. a grid with the coordinates of the center of each grid being a pair of elements from the Cartesian products Z2 which is the set of all ordered pair of elements (Zi, Zj) with Zi and Zj being integers from Z.

Image Sensing and Acquisition:

The combination of the “Illumination” source and the “Illumination” source of energy from the source of the “scene” is generated by the combination of the “Illumination” source and the “Illumination” source of energy. in the context of the fact that the visible light source illuminates a common everyday 3-D (three dimensional) scene, we have highlighted the fact that the illumination and scene are much more general than the usual situation.

 

                                            Figure 2.5 illumination

Such sources of electromagnetic energy may be electromagnetic energy, such as radar, infrared, or X-ray energy. but, as noted, it can also be derived from less traditional sources, such as ultrasound or even a computer-generated light. in addition, the scene elements can be known objects,, but they can also be molecules, buried rocks, or a human brain. we could also imagine the source, such as the acquisition of images of the Sun. the energy of light is reflected or transmitted by the object depending on the nature of the source.

Light reflected from the surface of the planar surface is an example. the X-ray film is generated by X-rays to produce a diagnostic X-ray film in the second category. the energy reflected or transmitted by the energy converter (e. g. , a phosphor screen) is focused into the visible light. the idea is simple: The energy that is sensed is converted into a voltage by combining the input electrical energy and the sensor material that responds to the particular energy. the sensor (s) is responding to the waveform of the voltage waveform, and the digital quantity is obtained by digitizing the response. we will examine the main methods of image sensing and generation in this section.

Figure 2.6 Image Acquisition using a Sensor Arrays:

Figure 2.7 acquisition image

 

2.5 Image sampling and Quantization:

The digital image is created by converting continuous data into digital form.

This involves two processes:

  1. Sampling and
  2. Quantization

A continuous image, F (x, y), that we want to convert to digital. the x- and y-coordinate images can also be continuous. we have to sample both coordinates and amplitudes in both directions to convert the digital form.

Digitizing the coordinate values is called Sampling.

Digitizing the amplitude values is called Quantization.

Figure 2.8 illumination

2.6 Digital Image Representation:

The digital image is a limited collection of samples (pixels) of any visible object. the pixels represent a two-dimensional view of the object, each pixel having a specific value in a limited range. the pixel values may represent the amount of visible light, infrared light, x-rays, electrons, or any other measurable value. the image must not only be visual; it is sufficient that the samples form a two-dimensional spatial structure that can be represented as a picture. a digital camera, scanner, electron microscope, or any other optical or non-optical sensor can be used to obtain the images.

2.7 Image Restoration:

Restoration improves the image in some way. it’s a process that’s objective. restoration attempts to restore the image that has been damaged by using a priori knowledge of the degradation phenomenon. these techniques are designed to model degradation and then reverse the process to recover the original image. the restoration technique is based on mathematical or probability models of image processing. enhancement is based on human preferences about what constitutes a “good” enhancement.

  1. During display mode
  2. Acquisition mode, or
  3. Processing mode
  4. Sensor noise
  5. Blur due to camera miss focus
  6. Relative object-camera motion
  7. Random atmospheric turbulence
  8. Others

2.8 Degradation Model:

The degradation process is based on a degradation function that operates on an input image with a noise factor. the image is represented by the notation F (x, y), which can be represented by η (x, y). the result is G (x, y) when the two terms are combined. the objective of restoration is to obtain an estimate f’ (x, y) of the degradation function H or η . the estimate should be as close as possible to the original image. the closer we are to F(x, Y), the more we know about H and g(x,y)=f(x,y)*h(x,y)+η(x,y)    we can write this equation in the frequency domain as G(u,v)=F(u,v)H(u,v)+N(u,v)  . The Fourier transform of the corresponding Fourier transform in the spatial domain is expressed in the capital letters.

Figure 2.9 transform image

2.9 Noise Models:

The noise in digital images is mainly caused by the acquisition and transmission of images. the quality of the sensor itself, as well as environmental conditions during image acquisition, can affect the performance of the sensor.

In particular, the transmission is affected by interference in the channels used to transmit the transmission. given the assumption that noise is caused by atmospheric disturbances and image sensors, the noise model can be considered to be spatial invariant (independent of spatial location). the noise model is not related to the object’s function. for the reasons of its tractability in both spatial and frequency domain, these noise models are used frequently. the Gaussian random variable is as follows, where Z = 0.

 

2.10 Boundary Representation:

Models are a more explicit representation than CSG.

The object is represented by a complex data structure about each object’s face, edges and vertices and how they are joined together. since the surface information is readily available, it seems more natural to represent the vision. the description of the object can be divided into two parts:

Topology: The data structure is connected to the faces, edges and vertices by means of pointers. .

Geometry: The exact shape and position of all edges, faces and vertices.

Figure 2.10 coordinates of shape

2.11 Types of digital images:

We’ll take a look at four basic types of images. there’s only black or white pixels. only one bit is needed for each pixel, so we only need one bit in storage, such images can therefore be very effective representation of text (printed or written), fingerprints, or architectural plans may be used to represent the image.

Figure 2.11 binary image

Gray scale imageL A gray scale image means that each pixel can be represented by 8-bits or exactly 1 byte representong a shade of gray, normally from 0(black) to 255(white). the image file handling is a very natural area. in fact, the recognition of most natural objects is 256 different gray levels.

The grayscale image (also called gray scale, or gray-level) is a data matrix that represents the intensity of the intensity range.

Figure 2.12 True color or red-green-blue image.

There’s a particular color in each pixel, which is described by the amount of red, green and blue. the total power of the image is 255, which means that the range of 0-255 is the same as the range of 0-255. it’s enough colors for any picture. the image is also called 24-bit color image, because the number of bits required for each pixel is 24. the image is considered to consist of a stack of three matrices, representing the red, green and blue values of each pixel.

The color of the pixel is specified by three values, one red, blue, and green component of the color. The image files store the image as 24-bit images, where the red, green and blue components are 8 bits. It’s a potential of 16 million colors.

The term true-color image has become a common term for the precise reproduction of real-life images. Class Uint8, Uint16, single, or double can be a true-color array. Each color component is a value between 0 and 1 in a true-color array. the color components of the pixels (0, 0, 0) are black, and the color components of the pixels (1, 1, 1).

Figure 2.13 color image

The code sample below creates a simple image of red, green, and blue, and then creates a single image for each of the three colors used in the image. the image is displayed separately for each color plane, and the original image is displayed.

RGB=reshape(ones(64,1)*reshape(jet(64),1,192),[64,64,3]);

R=RGB(:,:,1);

G=RGB(:,:,2);

B=RGB(:,:,3);

Imshow(R)

figure, imshow(G)

figure, imshow(B)

figure, imshow(RGB);

The index image: there are only a small subset of more than 16 million colors in most color images. the image is associated with a color map or color palette, which is simply a list of all colors used in the image. the color index of the map is not the same as the color index of the map (as in red-green-blue [RGB] image). the index values will only require 1 byte for the 256 colors, so the index values will be stored in the 256 colors. for this reason, only 256 colors are allowed in the image file format. the indexed image is a matrix of colors and a color map. the pixel values in the array are directly linked to the color map. the document uses the X variable to refer to the array and map. the color map matrix is a M-by-3 array of class M-by-3.

Figure 2.14 map image

2.12 What is Feature Extraction?

Feature extraction is a part of the dimensionality reduction process, which is divided into smaller and more manageable groups. it will be easier when you want to process it. the most important characteristics of these large data sets are that they contain a large number of variables. the processing of these variables requires a lot of computing resources.

The best features from the large data sets are selected and combined into features, thus, effectively reducing the amount of data. the accuracy and originality of the data can be described with the accuracy and originality of these features.

2.13 Why Feature Extraction is Useful?

When you have a large data set, it is useful to extract the features. the amount of redundant data in the data set can be reduced by extracting features.

In the end, the reduction of the data will help to reduce the effort of the machine learning process and also increase the speed of learning and generalization of the machine learning process.

2.14 Applications of Feature Extraction:

* Bag-of-Words: The most used method of natural language processing is the bag of words. They’ll take the words or the features from the sentence, document, website, etc. then they’re classified into the frequency of use. the most important part is extraction of the whole process.

* Image processing is The best and most interesting domain , In image processing. we will begin to play with images in this domain. we use many techniques to extract features such as shape, edges, or motion in digital images or videos, and we use them in a variety of ways.

* Auto-encoders: The main purpose of auto-encoders is to efficiently encode data that is not supervised. it’s an unsupervised learning process. the key features of the data , So Feature extraction procedure is applicable here to identify the key features from the data to code by learning from the coding of the original data set to derive new ones.

https://www.mygreatlearning.com/blog/feature-extraction-in-image-processing/

2.15 Image Recognition

* Image recognition:  The technical discipline of image recognition or computer vision is a technical discipline that allows the computer to search the way to automate all the tasks that a human visual system can do.

* Deep learning image recognition systems examples:

Tensor Flow by Google, Deep Face by Facebook, project Oxford by Microsoft. On the other hand, hosted APIs such as Google Cloud Vision, Clarified, and Image allow businesses to save some money on the costly computer vision development teams..

 

 

 

 

 

 

 

2.16 What is Image recognition?

Figure 2.15 recognition image

The recognition of images is based on technologies that identify places, logos, people, objects, buildings, and many other variables in digital images. for example, we can recognize different images, such as images of animals, very easily. we can easily recognize the cat and distinguish it from the horse. but it’s not so easy for a computer. the image is a digital image, which is composed of pixels, each with a limited, discrete quantity of numerical representation. the numerical values of these pixels are then used to recognize the image, which is then used to recognize the patterns and regularities of the numerical data.

Figure 2.16 numerical data image

Open source codes mentioned are numerous benefits. the cloud computing is used to make the image recognition technology more efficient and much cheaper.

2.17 Image Processing Techniques

The image processing is generally divided into several stages: import, analysis, manipulation and image output. the image processing is digital and analog.

https://medium.com/@Adoriasoft/image-recognition-and-image-processing-techniques-fe3d35d58919

2.18 Working of Convolutional and Pooling layers

Convolutional neural networks are built on the basis of convolutional layers and pooling layers. we’ll see them in detail:

How does the Convolutional layer work?the convolution layer contains a set of learnable filters (or kernels), which have a small receptive field. these filters are scanning pixels and collecting information from the batch of images / photographs. the input layer is convolved with the input layer and passes the result to the next layer. it’s like a neuron in the visual cortex responds to a specific stimulus.

Figure 2.17 convolution

the image is shown below. all pixels are similarly processed.

https://www.mygreatlearning.com/blog/image-recognition/

 

2.19 What is Object Recognition?

The term Object-recognition is used to describe a collection of related computer vision tasks that involve identifying objects in digital photographs. classification is based on the classification of one object in the image. the object location is defined by the location of one or more objects in the image and drawing area. the detection of objects combines the two tasks and the identification of objects in the image. the term “object recognition” is often used to mean “object detection. “the term object recognition will be used to encompass both image classification (a task requiring an algorithm to determine what objects are in the image) and object detection (a task requiring an algorithm to localize all objects present in the image).

We can also distinguish these three tasks in computer vision:

  • Image Classification: Classify the object type or class of the image.

the image is a single object, such as a photo.

Output: A class label (e. g.  a single integer that is mapped to the class label).

  • Object Localization:Locates the object in the image and indicates its location.

images: Pictures of objects, such as photographs.

Output: A bounding box (e. g.  defined by a point, width, and height).

  • Object Detection: Detect the presence of objects bound to the bounding box, and types or classes of the objects.

images: Pictures of objects, such as photographs.

Output: A set of bounding boxes (e. g.  defined by a point, width and height), and a class label for each bounding box.

Object segmentation, also called “object instance segmentation” or “semantic segmentation,” where the object is indicated by the specific pixels of the objectthe, recognition of objects is based on a series of challenging computer vision tasks.

Figure 2.18 structure of pattern recognition

Fuzzy image processing is a collection of different fuzzy methods of image processing. Fuzzy images are the collection of all approaches that understand, represent and process the images, their segments and features as fuzzy sets. the representation and processing depends on the chosen fuzzy technique and the problem to be solved. Fuzzy image processing is a process of image fuzzing, modification of membership values, and image defuzzing. The fuzzification and fuzzification procedure is due to the fact that we don’t have a fuzzy computer. The coding of images (fuzzification) and decoding of the results (defuzzification) is therefore a step that allows the processing of images with fuzzy techniques.

The main power of fuzzy image processing is the ability to change the membership values. Appropriate fuzzy techniques are used to change the membership values after the transformation of the gray-level plane to the membership plane (fuzzification). it can be a fuzzy clustering, a fuzzy rule-based approach, and a fuzzy integration approach. the image processing is often uncertain in many ways. in many image processing applications, we need expert knowledge to overcome the difficulties (e. g. Object recognition, scene analysis). the acquisition process is inherently ambiguous, the image is distorted and distorted by the acquisition process, the object definition is not always precise, and the output of low-level processes is often vague, contradictory. the nature of these problems is fuzzy. the question of whether the pixel should be darker or brighter, the question of where the boundary between two images is, is an example of situations where a fuzzy approach is more appropriate. first, the vagueness and ambiguity can be efficiently and effectively managed by fuzzy techniques. second, it’s easy to understand fuzzy logic. in mathematical concepts, it is very easy to make a fuzzy reasoning. the problems are often solved using expert knowledge in many image processing applications. in the form of fuzzy logic and fuzzy logic, they use expert knowledge, which is fuzzy if-then rules. the imperfection will be better handled by a fuzzy method than the traditional method. the original image is a high-pass filter (FIS), a first-order filter (Sobel operator) and a low-pass filter (median). the uncertainty of information is represented by a fuzzy logic. fuzzy images are the collection of all approaches that understand, represent and process the images, their segments and features as fuzzy sets. the representation and processing depends on the chosen fuzzy technique and the problem to be solved. the problem of the Fis system (FIS) is a fuzzy inference system (FIS). in addition, further tuning of the weights associated with fuzzy inference rules is required to reduce the number of pixels not belonging to edges.

Figure 2.19 system of fuzzy logic

2.20 Image Edge Detection:

Segregation is the process of dividing the image into a series of uniformly homogeneous areas. the entire scene is a homogeneous area. it is necessary to segment and classify the objects on the land. the image is divided into a series of abrupt changes in gray. the area is described by the edge contour that contains the area. the main areas of interest in this category are the detection of digital images. the edge corresponds to the discontinuity of the image intensity. the discontinuity reflects a rapid intensity change, such as the boundary between different regions, shadow boundaries, and abrupt changes in surface orientation and material properties. the edges represent the outline of the shape, the difference between the colors and the color. in the context of understanding the scene, the edges can be used to estimate the boundaries. the same scene can also be found in several images of the same scene. the edges of images define the shape of the object, facial appearance and body shape.

2.21 Fuzzy Noise Estimation:

We will determine whether the pixel is corrupted or not in this part. the following criteria shall be taken into account:

  1. if the pixel is very noisy, there are no other gray values in the vicinity of the pixel, so the minimum gray value of the pixel is very large. in the case of a small difference between pixels and their neighbors, one assumes that the pixel is not classified as a noisy pixel. the first parameter of the fuzzy rule system is therefore the minimum gray level:

dif = = min |f (x, y) − f (x’, y’)|, where (x’, y’) is an 8-neighborhood pixel of (x,y).

  1. the number of similar pixels in the vicinity of the pixel is a key parameter to determine whether the pixel is intact, so we can use the number of similar pixels to the expected pixel in the 8-neighborhood. in this case, the number of pixels in the 8-pixel range of the pixel is less than the defined threshold. as a second parameter to the fuzzy rule system, we may use this number.

2.20 Fuzzy Image Smoothing:

We use a pair of fuzzy rules to calculate the correction term Δ for the pixel value. the correction term shall be calculated using the following rules: If the edge is not assumed to be present, the (crisp) derivative value shall be calculated using the following rules:the first part (edge assumption) can be achieved by using a fuzzy derivative value, which will be used to filter the second part (filtering).

 

 

Chapter 3

Literature Review

 

3.1 Introduction

The study, which describes the-state-of-the-art of segmentation categories and focus on the recognition text, is described in this chapter. analytic segmentation, run-length coding algorithm, and character identification are also described. finally, the character is determined by a fuzzy logic.

3.2 Segmentation Categories

The main step in image analysis, object representation, visualization and many other image processing tasks is to divide images into meaningful structures.

In the last decade, there have been a great variety of methods of segmentation, and some categories need to be defined properly. the present classification therefore is a classification of approaches that focus on strict division.

3.2.1 Threshold Based Segmentation

The histogram is a slicing technique used to split the image. it can be applied directly to the image, but can also be used to combine the pre- and post-processing.

3.2.2 Clustering Techniques

Although the term synonym for (agglomerative) segmentation techniques is sometimes used synonymously with (agglomerative) segmentation techniques, which are used primarily for exploratory data analysis of high-dimensional measurements that are similar in some sense [58].

3.2.3 Matching

We can use this knowledge to find the object in the image when we know what the object looks like. [58]

 

3.2.4 Edge Based Segmentation

The detection of edges in the image is used to identify objects and from which they can be identified by known boundaries of the object.

3.2.5 Region Based Segmentation

A region based technique is used to find the boundaries of the object and then to locate the object itself, i. e.  by filling them in. the focus of this research is on segmentation based on the edges. in addition, the text problems could be faced by multi-orientation and multi-scale. this is because the character segment is complex, and it requires a sophisticated approach.

3.3 Categories of Variance Text

The use of posters, signs on roads, and posters on doors with different sizes, types, colors, and orientations of text and can be used in large variety of applicationsin, particular, the text of the variance text is desired in detail.

3.3.1 Lighting Variance

[48] Gatos, Pratikakis, Kepene and Perantonis, 2005a proposed image is varied in light conditions. when the text is different in lightness,, the extraction operation of text from image is effective.

3.3.2 Scale Variance

The image was used to vary the distance between the camera and the scan. the resolution of the image and text is affected by the distance of the image and text [25].

3.3.3 Orientation Variance

the image is displayed in different angles depending on the camera. the natural landscape is virtually unpredictable, and the camera angle can be changed by the size of the text in the image. it may be caused by a pattern of text (e.g., tree leaves, traffic signs, bricks, windows, and stockades), or occlusions caused by foreign objects, which may potentially lead to confusion and mistakes. [1]

 

3.3.4 Imperfect Imaging Conditions In uncontrolled

the, quality of, the text, and video, images and, videos could, not be, guaranteed. in poor, lighting conditions,, the text, may be, blurry or, blurry due, to inadequate, distance or, angle, or blurred because of out of focus or shaking, [1] or noised on account of low light level, or corrupted by highlights or shadows.

3.4 Recognition Text

The extraction text can be drawn as a combination of the following modules that process text in images from raw data until the text is extracted. [8] (Tsung, Yung, Chih, 2006c) clearly describes the extraction text as seen in figure 3. 1. static text text is the main focus of the image text detection and extraction method.

Figure 3.1 General model of extraction text

First, the image of the image should be taken from the initial text detection step. in addition, the appropriate form is to identify the text to ensure that the detection of text is found in the image. in addition, the text localization task, which is known in which text is located. then, the text extraction that used to be extracted from the image. Digital image processing is almost completely affected by the technical effort.

It’s about changing digital data to improve the quality of images using computers. the main components of digital image processing are image enhancement and information extraction. enhancements in image enhancement techniques improve the visibility of any part or feature of the image. the easiest and most popular digital image processing is image enhancement. in the context of enhancement techniques, it is generally to bring out details that are obscured or simply to highlight certain aspects of interest in the image [5].

3.4.1 Text Deletion

In fact,, the entire process that takes place before the localization and extraction step is known to use text detection. the purpose of the detection is to convert raw data into a suitable form and calibration of text lineament.

The image of the text displayed in figure 2. 2. 2 will display the image of the text. in addition, the identification text is a suitable form of text area identification. a limited text detection that may be a translation. [28] The algorithm used to determine the precise location of the edge is based on the Sober algorithm. the gradient map is also in three components of Rgb. the “open” operation is used to disconnect the map when it is too narrow to contain text. afterwards, use a profile of the image block that compresses the spatial pixels. then, after profile projection and comparison with performance, the bounding edges of the dense blocks in the map. the method is based on the use of the information content of the sub image coefficients of the image of the wavelet wavelet, which is defined by the “dense edge” and “dense edge” of the image[27].  The main source of information for the detection of text is the edge information. the system is designed to detect text and edge map by detecting the Sobel detection of the entire image. methods and integrated methods. [51] The new method of detection of Neutrosophic Set (NS) is proposed using maximum normality (EDA-NMNE). indeterminacies and shortcomings are not resolved by many experts and intelligence systems, including fuzzy systems. however, the Ns approach is divided into True (T), False (F) and Inde- terminus (I) subsets.

3.4.2 Text Area Identification

Text should be identified by the task of identifying the text area. [28] The candidate text blocks shall be confirmed by three rules. the text line shall be at least two words in the text block height and width. in the zoom image, the text blocks are small enough to be smaller than 8 pixels or larger than 32 pixels. thirdly, the block containing too many pixels, but with this, sometimes the text is not text, but text. use the Wavelet feature and SVM classifies the candidate text. the text is very weak and irregular, only that they include some strokes, i. e.  horizontal, vertical, up-right-slanting and up-left-slanting strokes. when we consider them as a single block, they are regular, but they are not so regular as to each pixel. in addition, the text area and the refined text area are identified by the text area and the text area. to determine whether the detected text is a true text area, calculate the pixels with horizontal edge in the text area.

3.4.3 Text Region Localization

The text is displayed in a cluster, i. e.  it is arranged in a small way. the text regions can therefore be localized using the characteristics of clustering. the possibility of text is highlighted by a simple global thresholding of the intensity of the map, which can be used to highlight the possibility of text [25]. the very close areas can easily connect the very close areas, while the ones who are far away will be isolated. to get the area referred to as text blobs,, use a morphological dilation operator that has a binary image [59]. the filter is used to filter out the small isolated blobs,.  where the first constraint is used to filter out all the small isolated blobs. .

The retaining walls are enclosed in a box. the maximum and minimum coordinates of the top, bottom, left and right points of the corresponding box are determined by the maximum and minimum coordinates [60]. small amounts of padding are used to prevent the loss of pixels near or outside the initial boundary.

3.4.4 Text Extraction and Binary Image

Text extraction is the final step in the recognition of images. the OCR-ready binary image is converted into a grayscale image, where all pixels of the characters are black and white. bitmap integration is often used to remove the moving background of a text area to extract the static text. however, the algorithm has been modified to remove the false text character region to improve the recognition rate.

Given the Otsu method is often used to calculate the threshold to separate text from background. the vertical adaptive thresholding is also applied to the vertical adaptive thresholding. the text area is then combined with AND operation, which is then followed by AND operation, which forms the text area. [25] Removing the text from the binary image, removing the text from the original image, and removing the text from the original image. they use a uniform white character, pure black ground for recognition.

3.5 Analytic Segmentation

[50] (Chaitanya and Ashwini, 2017) is a stepwise method to match / recognize words / recognition stages with recognition stages.

3.5.1 Pattern Recognition

The subject of the subject is also a description of the object, description and classification of the object, which is also a collection of mathematical, statistical, heuristic and inductive techniques of fundamental role in the task of human beings on computers [30].

Pattern recognition is based on a lot of methods that imply the development of many applications in different fields. the method is intelligent emulation [61].

3.5.1.1 Statistical Pattern Recognition

Statistical decision and estimation theories have been commonly used in PR for a long time. It is a classical method of PR which was found out during a long developing process, it based on the feature vector distributing which getting from probability and statistical model. The statistical model is defined by a family of class-conditional probability density functions Pr (x|ci) (Probability of feature vector x given class ci) in detail, in SPR, we put the features in some optional order, and then we can regard the set of features as a feature vector [62][63]. Also statistical pattern recognition deals with features only without considering the relations between featu.

3.5.1.2 Data Clustering

The aim is to find a few similar clusters in a large amount of data that don’t need any information about the known clusters. it’s a method that’s not supervised. the method of clustering can be divided into two classes, one is a hierarchical clustering method, and the other is a partitioning method.

3.5.1.3 Fuzzy Sets

Human beings often think and feel uncertain and uncertain, and the language of human beings is often confused. in fact, we can’t always answer or classify, so theory of fuzzy sets is born. the concept of the concept can be described in a fuzzy way. [51] We have overcome this limitation by using deep learning to extract features from each modality and then projecting them into a common affective space that has been clustered into different emotions. in the real world, individuals tend to have partial or mixed feelings about the target of the opinion, and we use a fuzzy logic classifier to predict the degree of emotion, [64]. convolutional Fuzzy Sentiment Classifier is a combination of deep neural networks and fuzzy logic.

3.5.1.3.1 Fuzzy Image Processing

[52] (Tarun, Amitt 2013) Information processing is a form of information processing that is both input and output images. the images, their segments and features are represented as fuzzy sets, which are represented by different fuzzy approaches. the image processing is divided into three main stages: fuzzing, modifying the membership values, and image defuzzing (if necessary).

Figure. 3.2: Fuzzy Image Processing

Fuzzing and fuzzing are performed due to the absence of fuzzy hardware. in this way, we encode images (fuzzification) and decode the results (defuzzification) of the image. after the first phase (fuzzy image processing), the intermediate step (modification of the membership values) is necessary to modify the fuzzy image processing techniques (such as fuzzy clustering, fuzzy rule-based approach, fuzzy integration approach and so on).

 

3.5.1.4 Neural Networks

Since the first neural network model of Mlp was developed in 1943, especially the Hopfield neural network and the famous Bp Arithmetic has been developing very quickly. it’s a method of data clustering based on distance measurement; also, this method is model-free. the neural approach is based on biological concepts to recognize patterns, [66]. the artificial neural networks created by the knowledge of the human brain are the result of this effort. neural networks consist of a series of units [62]. moreover, a statistical algorithm is a statistical algorithm that optimizes the neural network (Holland, 1975).

3.5.1.5 Structural Pattern Recognition

The recognition of structural patterns is not based on a firm theory that relies on segmentation and extraction. the description of the structure, which is the basis of the description of the structure, is a structural pattern recognition, which is based on the description of the structure, namely the description of the structure. the recognition of structural patterns is based on two main methods: syntax analysis and structure matching. the theory of formal language is the basis of the structure of the syntax analysis. the structural pattern recognition is best when considering the relationship between the parts of the object. in contrast to other methods, structural recognition of symbols, which is associated with statistical classification or neural networks, is always associated with structural recognition of symbols, which is associated with more complex problems of pattern recognition. [63]

3.5.1.6 Syntactic Pattern Recognition

This method basically emphasizes on the rules of composition. And the attractive aspect of syntactic methods is its suitability for dealing with recursion [62]. After customizing a series of rules which can describe the relation among the parts of the object, syntactic pattern recognition which is a special kind of structural pattern recognition can be used [66].

3.5.1.7 Approximate Reasoning Approach to Pattern Recognition

The problem can be dealt with using two concepts: fuzzy applications, and compositional rules of inference, [67].

3.5.1.8 Applications of Support Vector Machine (SVM) for Pattern Recognition

SVM is a relatively new thing, which has been studied widely since its inception. [32] the SVM base is based on statistical theory and the method of SVM, which can solve the classification and regression problem, especially can solve classification and regression problems, especially can solve classification and regression problems. such as face detection, verification and recognition, object detection and recognition ,speech recognition etc [67].

3.5.1.9 Template Matching

[4] (rode,edwin, Arboleda and Rhowel, 2019)it is a method of converting images into digital form and converting them into digital form, which can be used to obtain a better image or to extract some useful information from the image or image. we’re using a template to find small parts of the image. in digital image processing, the technique of matching images is considered to find small parts of the image that match the image of the template. it can be used to produce quality control, navigation of a mobile robot, or as a way to detect the edges of the image. Arboleda and Rhowel Dellosa 2019, and It’s a method to convert an image into digital form and perform some operations on the image or image, to obtain a better image or to extract some useful information from.

3.5.1.10 K-Nearest Neighbor Classifier

[54] (Rode,Edwin,Arboleda and Rhowel, 2019) KNN is a method that classifies objects based on the closest examples of the feature space. it’s the most basic type of learning. it’s assuming all the points are in a n-dimensional space. the “closeness” of the instance is determined by the distance measurement. KNN classifies the class by finding the nearest neighbors and selecting the most popular class among the neighbors. the training samples are described by n-dimensional numeric attributes in KNN. .

 

3.5.2 Pattern Recognition System

The recognition system can be regarded as a process that allows real and noisy data to be processed. the decision of the system is mainly based on the decision of the human expert [68]. we propose a system of untied systems, where the class cation step is video types, the text detection step is extracted from the video frame. the parameters of the corresponding steps are automatically derived from the training samples.

3.5.2.1 The Structure of Pattern Recognition System

The classification of the computer is based on the information obtained from the analysis of the pattern analysis, which is used to organize the computer into a system of classification. the recognition system used to recognize patterns, which includes five steps to achieve. the system is a kernel of classification / regression / description.  fig 3.3

Classification is a problem in classifying objects into classes,.  the Pr system will give an integer label, such as classifying a product as “1” or “0” [62].

Reversal is a generalization of a classification task, and the Pr system is a real-valued number, such as predicting the share price of a firm based on past performance and stock market prices.

The object representation is a problem of representing objects in terms of a series of primitives, and the Pr system produces a structural or linguistic description. the following general composition of the Pr system is set out below [68].

Figure.3.3 The composition of a PR system

3.5.3 Applications of Pattern Recognition

The Pr theory was one of the most important components of the application. the technology of Pr has been used in many fields, one of them is “character recognition”[67].

3.5.4 Character Recognition

Characterization of scene image based on identification of local target. after the image has been binarised using a single threshold, it is usually performed by recognition of characters. however, the character extraction from the background character.

The document image will therefore be similar to the document image, as the signboard is usually located in the area of the signboard.

The threshold value used to determine the local target area is separated by the character and background area of the local target area.

3.5.5 Text Verification

[55] (Jing,Dmity and Rangachar,2008) The proposed method of verification of candidate blocks and their neighbours is based on the investigation of the relationship between candidate blocks and their neighbours, which can be removed by false alarms.

3.6 Run-Length Coding Algorithm

We usually deal with zero and one to represent the foreground and background in binary images.  (Chengjie, Jie and Trac, 2002b; Kofi, Andrew, Patrick and Jonathan, 2007)  the standard coding technique is a block of code, which is a standard coding technique for image / video compression. the first pair of RUN (number of consecutive zeros) / level (the value of the following nonzero coefficient) is then encoded into a sequence of RUN (number of consecutive zeros). RUN the pixels in the same row. the horizontal projection of the RUN code was represented by the Run code, and the horizontal projection of the RUN code was calculated.

The problem is then to group all images of objects into an object image. we’ll assume that the point is spatially close. the definition of spatial proximity requires a more precise definition, so that the algorithm can be designed to group spatially close points into a group.

 

Figure 3.4 Horizontal projection calculated from run-length code

3.6.1 Neighbors

Two pixels are 4-neigh, and the other two pixels are 4-neigh, so we say that two pixels are 4-neigh. in addition, if they share at least one corner, they are two pixels. for example, two pixels are 8-neighbors if they share at least one corner. For example, the pixel at location [i,j] has 4-neighbors[i+1,j] ,[i-1,j] ,[i,j+1] and [i,j-1]. The 8-neighbors of pixel include the 4-neighbors plus [i+1,j+1], [i+1,j-1] , [i-1,j+1] and [i-1,j1]. A pixel is said to be 4-connected to its 4-neigbhors and 8-neighbors as shown in figure 3.4 below.

 

4-Neighbors [i+1,j] ,[i-1,j] ,[i,j+1] and [i,j-1]

 

4-Neighbors plus [i+1,j+1], [i+1,j-1] , [i-1,j+1] and [i-1,j-1]

Figure 3. 5:4- and 8-neighborhoods for rectangular tessellation[I, J] is located in the middle of the figure.

3.6.2 Path

Path to pixel [i0,j0] to pixel [in,jn] is sequence of pixels [i0,j0]the path is a 4-path, for 8-connection, the path is a 4-path, for example, in figure 2. 5 below.

 

Figure 3.6 4-path and 8-path

3.6.3 Foreground

the foreground is the first pixel in the image.

3.6.4 Connectivity

A pixel p ϵ S is said to be connected to q ϵ S if there is a path from p to q consisting entirely of pixels of S. Note that connectivity is an equivalence relation. For any three pixels p, q and r in S, we have the following properties.

  1. Pixel p is connected to p (reflexivity)
  2. if P is connected to Q, then P is connected to P (compositionally).
  3. if P is connected to Q, then P is connected to Q (transitivity).

 

3.6.5 Connected components

A connected component is a set of pixels that connect all other pixels.

3.6.6 Background

The background is called the background of all the components of S (the complement of S). all the components of S are called holes. in figure 3. 7 below, we see the simple picture.

Figure 3.7 Border of an image

How many objects and how many holes are there?there are four objects in the size of 1 pixel, and there’s one hole in the background. if we’re using eight-connectedness, there’s only one object. in both cases, we have a situation that is ambiguous. in a simple case like figure 3. 8 below, the same ambiguous situation is also found.

Figure 3.8 Ambiguous border

Where if the 1s are connected, then the 0s should not be .To avoid this awkward situation, different connectedness for S, then 4-connectedness should be used for S’.

3.6.7 Boundary

The boundary of S is the set of pixels S’.  that have 4-neighbors in S.

The boundary is usually denoted by S’.

3.6.8 Interior

The pixels are set in the interior of S. the.  interior of S is (S-S’).

 

2.6.9 Surrounds

Region T surrounds region S (or S is inside T), if any 4-path from any point of S to the border of the picture must intersect T. Figure 3. 9 below shows the simple binary image: its boundary, interior, is defined by T. .

 

Figure 3. 9 A binary image with its boundaries, interior and surrounding.

3.6.10 Component Labeling

The most common operation in the field of computer vision is to find the connected components in the image. in computer vision, the surface of the object is most often the surface. the points of the surface project to the nearest point. the digital image is connected to the concept of “spatial proximity. “it is also necessary to mention that the algorithm for the linked components is usually a bottleneck in the binary system. the algorithm is based on a sequential nature, because the search for connected components is a global operation. if there are only one object in the image, then the image must be found by the connected components; however, if there are many objects in the image, the image must be found.

The algorithm for identifying all components in the image and identifying the label of each component is found in the label algorithm [69]. Figure 3. 10 shows the image and the label of the connected components.

Figure 3.10 connected components (a) and (b)

3.7 Text Properties

3.7.1 Removing the Borders

The border should be removed, the image will shrink. the image will remain only the rectangular part of the image containing text or text. so, the background of the pixel “1” is connected to many other components, which are assigned to the pixel “0. “the border is removed by four stages:

First, the first row of the first row, if there is no pixel “1” in the first row, it will be removed and the line will continue to be removed until there is a pixel “1” in the first row. the border of the image in the above part is the border of the pixel “1. “in addition, the same case applies to the bottom-up until the pixel “1” is found in the horizontal position. the same operation is performed in the vertical, but the difference in column from the first column to the last column is the same, so the pixel “1” is the first column, so from the left. in addition, the right-to-left is reverse.

3.7.2 Divide the Text into Rows

The area will be divided into rows after the border. the text is divided into two lines, I. e.  the text with the first line and the last line in the text. in addition, the text to text is different from the text to text because of the multi-scale variances in text [33]. the text is dependent on the size of the text that has the property of the text. in many rows, the length and width of the connected components are also horizontally. the text is represented by the text whose properties and size differ from the properties and size of the other text. we can start with the base text and end with the text. in the following figure, we can deal with it as shown in figure 3. 11 below.

Figure 3.11 divide the text into rows

3.7.3 Divide the Rows “Lines” into the Words

Then the words are divided into one line. the main question is whether the size or size of the text is different from the text to the other space, and the size of the text is also dependent on the size of the text?so, can we answer this question in this case?if we know the length of the width of the word and the length of the width of the word, we can rely on the length and width of the connected components. the word may be a single character or more, the size may be different in the same word. in figure 3. 12 below, the following information is shown.

Figure 3.12 divide the rows into the words

 

3.7.4 Divide the Word into Characters

Then we know the character in each word and the text in the text. after the recognition of the character, the fuzzy logic is used to recognize the character and any connected component not recognized. in the following figure, the following figure shall be shown.

Figure 3.13 divide the word into characters

3.8 Identify Character

The connected component is connected to the rectangular segment, which has four corners, and the connected component is connected to the rectangular segment.

Figure 3.14 Identify character

The four corners and the center point of the Y-axis & x-axis can be identified by any character. we can then determine the character based on these requirements. if the pixel is a pixel, it means a pixel,, if it’s a pixel, it means a pixel. each character has a different property [33].

We see character “a” has upper left corner off, upper right corner off, lower left corner off, lower right corner on and pixel of center pixel is off .This properties different of other characters. So, we can recognize a character that doesn’t has noise or distortion.

3.9 Fuzzy Logic

The goal of Fuzzy Logic was to make computers think like people. the vagueness inherent in human thinking and natural language, and the fact that it is different from randomness can be dealt with by fuzzy logic [70]. using fuzzy logic algorithms could enable machines to understand and respond to vague human concepts such as hot, cold, large, small,, etc, [71]. the approach to the precise information could also provide a relatively simple solution to the issue.

3.9.1 What Fuzzy Logic?

Two different senses were used to describe the term Fuzzy Logic. therefore, it is necessary to clarify the distinction between these two different uses of the term. Fuzzy Logic is a logical system that generalizes the classical two-valued logic of uncertainty [72]. Fuzzy Logic is a broad term that refers to all theories and technologies that use fuzzy sets, which are classes with a sharp boundary.

In classical theory, the concept of “warm room temperature” may be expressed as a range (e. g.  [70 F, 78 F])however, the concept does not have a defined natural boundary. the gradual transition from “not warm” to “warm” is a representation of the concept closer to human interpretationthe.  degree of membership in a group must be defined in order to achieve this. it’s the essence of the fuzzy set. Figure 2. 14 shows a classical set and a fuzzy set.

Figure 3. 15 A classic set and Fuzzy set of “warm room temperature”.

 

 

3.9.2 What is the Fuzzy Logic Toolbox?

The Fuzzy Logic Toolbox is a collection of functions built on the Matlab® numeric environment. it provides Matlab tools for creating and editing fuzzy inference systems. [3]

The edge of the image is defined as a boundary or contour where there is a sudden change in the physical characteristics of the image. the most important task in image processing is to detect edges. edge detection algorithm is used to register, identify and recognize. the Fuzzy Interference System and the fuzzy interference system are used to detect the edge. the image is converted into binary image to calculate the number of edges in the facial image. fuzzy logic is a form of many-valued logic that approximates the values, rather than the exact conditions. in comparison with the traditional binary logic (where variables can be true or false). the concept of partial truth, where the truth value may be between completely true and completely false, has been extended to the concept of partial truth. the processing of images or images or videos is any form of signal processing. the image processing can be either a picture or a set of characteristics or parameters associated with the image. [1]

 

3.9.3 Fuzzy Sets

The fuzzy logic starts with a fuzzy set. a fuzzy boundary is a fuzzy boundary without a clear boundary. it may contain only a partial degree of membership. first, consider what we might call a classical set. the classical container is a container that contains or excludes all of the elements [56]. for example, when we have values of pixels in the above, where they represent “0” or “1” this will represent the value of 0 or 1. the location of the pixel in the centre or in the centre of the pixel is known by the location of the pixel in the centre or in the centre of the pixelin, the next section of the “membership function” of the “membership function”.

Figure 3.16 (a) Input of a pixel

Figure 3.16 (b) Input of location for a pixel

3.9.4 Membership Function

The Mf curve defines the relationship between the values of the input space and the membership values (or degree of membership) between 0 and 1 [57]. the term “the universe of discourse” is sometimes used to refer to the simple concept of the universe. the function of the cure is defined by the pixel in the above figure 2. 15.  That mean, this based on the pixel if is “0-0.5” or “0.6-1” with “off” or “on” respectively also with location of pixel to low or median or high with “0.1-0.3” or “0.4-0.6” or “0.7-0.9” respectively .

The use of the Pixel Value and Pixel Location of the pixel value and the pixel location of the pixel value is shown in figure 3. 17 below of the output.

Figure 3.17 Output variable “letter”

3.9.5 If-Then Rules

Fuzzy sets and fuzzy operators are the subjects and verbs of fuzzy logic. These if-then rule statements are used to formulate the conditional statements that comprise fuzzy logic. The basic unit of knowledge capture in many fuzzy systems is a fuzzy rule.

A fuzzy rule has two components: , An IF part (Also referred to as the antecedent)

And then Part (also called the consequence)if <antecedent> IF <antecedent>then there’s a <consequent>

Where: The antecedent describes the condition, and the consequence describes the conclusion.

 

 

3.9.6 Fuzzy Inference Systems

The fuzzy logic is the process of creating a map from the input to the output using fuzzy logic. the map then provides a basis for decision-making or patterns of analysis. the fuzzy inference process is described in the following sections: Membership functions, fuzzy logic operators, and if-then rules.

3.9.7 Rule Review

The Rule Viewer displays a map of the entire fuzzy inference process. it’s based on a diagram of fuzzy inference.

3.9.8 Surface Review

In the case of two or more inputs and one output, the Surface Viewer has a special feature that is very useful. the building system shown in figure 3. 18 below is a fuzzy logic.

Figure 3.18 Building the system with fuzzy logic

2.10 Summary

The recognition text plays a key role in the recognition character, which is detailed in this chapter. in this way, the detection of text, text area identification, text region, text extraction and binary image that draws and each part is explained separately. in addition, the analysis segmentation, the run-length coding algorithm, the text and the character. finally, a fuzzy logic that uses the text.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Chapter 4: Methodology

 

4.1 Introduction

The methodology and methodology of the project shall be described in this chapter. the methodology used in this study should provide a systematic method of procedure and principles to achieve the objectives of the study.

The methodology is designed to simplify the analysis process and also to explain the requirements and specifications of the project. it is important to ensure that the project can be completed smoothly and efficiently.

 

 

 

 

Figure 4.1 Proposed method

 

 

4.2 Problem Statement and Literature Review

The project would propose a segmentation approach to solve the invariant complex document problem in chapter 1. in this context, a study was carried out on related and latest literatures in the text segmentation field. while the study is designed to show the style of the edge segmentation, text detection, text identification, text localization, text extraction, text extraction, analytical segmentation and current segmentation techniques. the investigation is essential to the design of a novel method and can lead to better performance.

4.3 System Development

Two parts of the proposed approach will be developed separately. after the kernel, eight angles of compass, we will collect all two edges to the kernel opposite, so we can detect four directions of edges. then, we use the help to determine the connection component to extract text from the document using the help of the run-length algorithm. second, the identification of the character is used to identify the connection component, so the rectangle on the connection component is used to identify the connection component.

4.4 Performance Evaluation

The performance of the proposed method is evaluated using the edge of the method. the error occurred when the edge was weak, or when the background was very similar to the character. in addition, the proposed method is well suited to multi-scale and multi-orientation, system development, which would be revised to improve the proposed method.

4.5 General Steps of Proposed Techniques

In this context, the project proposes a segmentation approach to analysis using the kernel of compass, operator of the edges detection. and then, the text is used to extract text. so, the identification of characters is fuzzy logic. in conclusion, the character extraction algorithm has been used to extract the correct character. Figure 4. 2 shows the general outline of the proposed approach.

Figure 4.2 Block diagram of general steps of proposed approach

 

 

4.5.1 Proposed Algorithm for Edge Based Text Region Extraction

The basic steps of the edge-based text extraction algorithm are given below. The details are explained in the following.

Step1:-Input document or read document with original colors.

Step2:- Create a Gaussian pyramid by convolving the input document with a Gaussian kernel and successively down-sample each direction by half.

Step3:- Create directional kernels to detect edge at 0, 45, 90,135,180,255,90and 315 orientations.

Step4:- Convolve each document in the Gaussian pyramid with each orientation filter.

Step5:- Collect kernels to detect edge at 0+180, 45+255, 90+90 and 135+315 orientations.

Step6:- Dilate the resultant document using sufficiently structuring element (3×3) to cluster candidate text regions together.

Step7:- Create final output document with in white pixels against a plain black background.

 

4.5.1.1 Detection

The following steps shall be taken in the following order:given a document, the document with the possibility of text in the document is detected by Gaussian kernel of size 3×3. the process of resizing documents to lower resolution is called down sampling. the figure 3. 3 is a Gaussian filter of size 3×3. the pyramid’s level corresponds to the resolution of the input. figure 3. 4 shows a sample Gaussian pyramid with 8 levels of resolution. in horizontal and vertical angles (0 + 180), horizontal and vertical angles (90 + 90), and diagonal angles (45 + 255) are then combined with directional filters. the kernel used is shown in the figure 3. 5 and its implements in the figure 3. 6

 

Figure 4.3 Gaussian filter

Figures 4.4 Samples Gaussian pyramid with 8 levels

Figure 4.5 Extraction operation

From kernels existed in above we do put them in variables such as kernel0 kernel45, kernel90, kernel135, kernel180, kernel255, kernel90, and kernel315 until now we inserted kernels in whose variables Then we put eight kernels in array we call it kernel{} that has been inserted eight kernels. Next, we do Gaussian as shown in equation below

GK=Fspecial(‘gaussain’)                                                                                    (4.1)

After, we created Gaussain Pyramid, we do that convolve document with Gaussain filter with eight levels until now original size, equations show below.

i=7

Σ pyamid{i}=document1                                                                                    (4.2)

i=0

 

i=7

Σ document2=imfilter(document1,GK,’Conv’)                                                   (4.3)

i=0

Next, we do, the down-sample by 0.5 from i=0 until i=7 .As shown in question below

i=7

Σ pyamid{i}=imresize(document2,0.5)                                                               (4.4)

i=0

Next, we do convolving documents at each level in pyramid with eight edges detection as shown in equation below.

i=7 j=7

Σ Σ Conv{i,j}=imfilter(pyramid{i},kernel{j},’Conv’)                                       (4.5)

 

i=0 j

Then, we do resize the documents to original document size

i=7 j=7

Σ Σ Conv2{i,j}= imresize(Conv{i,j},[size(document1,1) size(document1,2)])  (4.6)

i=0 j=0

We have eight levels of document that each level has been edges detected that are { 0, 45, 90, 180, 255, 90, 135 and 315 } at each level from eight levels. After that, we return each level of original size of document .Next, we do collect each same edge from eight levels which backed to original size. Then it give us edges detection in { 0, 45, 90, 180, 255, 90, 135 and 315 } we can see the result to these edges in (chapter IV, in figure 4.2).The equation below shows the operation of collect to it.

i=7

Σ( total{i}=im2bw(Conv2{1,i}+ Conv2{2,i}+ Conv2{3,i}+ Conv2{4,i}+ Conv2{5,i}+

i=0

Conv2{6,i} + Conv2{7,i} + Conv2{8,i} )           (4.7)

Given k operators, gk(x,y) is the document obtained by convolving f(x,y) with the kth operator with original size .We can put also seven levels remaining ,but it will take big area so, we enough with original document of explain as shown in figure 3.6 . The gradient is defined as

g(x,y) = max gk(x,y)                                           (4.8)

Figure 4.6 Edges detection

We will collect the edges detection with the document after the document is processed with the orientation kernel, because we will be able to detect the final edges of the document.

Edge first= Edge”0”+ Edge “180”                                                               (4.9)

Edge second= Edge”45”+ Edge “255”                                                        (4.10)

Edge third= Edge”90”+ Edge “90”                                                              (4.11)

Edge fourth = Edge”135”+ Edge “315”                                                       (4.12)

Edge total = Edge first + Edge second + Edge third + Edge fourth.

4.5.1.2 Feature map and candidate text Region detection

The map will be created by the map. each pixel is classified as a candidate or non-candidate for text area by weighting factors. if the directional filters are used, the pixel is a candidate for text. the map is therefore a combination of all map maps at different scales and orientations with the highest weight of pixels.

4.5.1.2.1 Directional Filtering

The measurement of intensity peaks that normally characterize text in the document is measured using the second derivation of intensity intensity, which is measured using the intensity of the second derivation. the average strength of the edge is calculated using the average strength of the edge. the variance of orientation is evaluated using eight orientations, 0,45,90,135,180,255,90 and 315 are used to determine the effectiveness and efficiency of the orientation.

4.5.1.2.2 Edge Selection

The most important strokes of characters and their length also reflect the height of the corresponding characters. in real scenes, many other objects, such as windows, doors, walls, etc. . . , also produce strong vertical edges. therefore, the text cannot be located on all vertical edges. however, the vertical edges of such non-character objects are usually very long. therefore, the vertical edges of the vertical edges can be eliminated by grouping them into long and short edges, which will allow them to be processed further.

After that, the long vertical edges may be broken, which may cause false alarms (positive). the proposed method is using a two-stage method of edge generation. the first stage is used to achieve a strong vertical edge by collecting the edge “0” and “180” described in the equations below.

 

 

 

strong

Edge v =│Ev│z                                                                   (4.14)

 

strong        strong                   strong

Edge v = Edge “0” bw +Edge “180” bw                            (4.15)

Where the document is “0” + “180” intensity edges document,, the document is “0” + “180” intensity edges document. z is a thresholding operator to get a binary result of vertical edges, it’s not very sensitive to the threshold.

The second stage is used to obtain a weak vertical edge.

Strong

dilated=Dilation( Edge v ) 3×3                                                         (4.16)

closed=closing(dilated)m x 3                                                               (4.17)

weak

Edge v= │ Ev (closeddilated)│z                                        (4.18)

The operator is then used to close the operator to force the strong vertical edges closed by using a slightly sloping edge and a vertical linear structure element Mxm. the resulting vertical edges are a combination of strong and weak edges described in the following equation.

strong         weak

Edge v= Edge “v” bw +Edge “v” bw                                                         (4.19)

After the use of the question in the previous, the two stages of the method of generation of the vertical edge document are already completed. the resulting vertical edges are then applied to the operator of the Morphological thinning operator, followed by the associated component labels and analysis algorithm.

Thinned=thinning (Edge v)                                               (4.20)

Labeled=BWlabell (Thinned,8)                                        (4.21)

Where the morphological thinning operator is capable of creating a vertical edge on one pixel. the document’s length is a long edge, a simple thresholding that used to separate short edges.

lengthlabeled

Short v bw =│Ev │z                                                          (4.22)

4.5.1.2.3 Feature Map Generation

The average edge density, strength and variance of orientation will be significantly higher in areas with text than in non-text areas. the candidate regions are refined by the use of three characteristics:the following procedure shall be described below.

Candidate=Dilation (Short v bw) m×m                           (4.23)

Where, the morphological dilation with a m×m structuring element employed in the selected short vertical edge document and used to get potential candidate text regions.

4.5.1.3 Localization

This point corresponds to point 6 of the fourth part of the fourth part. the process of localization is further enhanced by removing non-text regions. the text pixels can be grouped into a cluster by using a morphological dilation operation, which can be used to create a cluster of text pixels, which can be used to create a cluster of text pixels. dilation is a structural element of the required shape or size, which enlarges or strengthens the area of interest. the dilation process is carried out using a structure element to increase the area between the two. the structure element [3×3] has been used in this algorithm.

The document after dilation may contain some non-text areas or noise that must be removed. the document is being carried out to remove noise from the document.

4.5.1.4 Character Extraction

According to point 7 of the third part of 3. 5. 1. the document must be able to be easily parsed and recognized by the Ocr system. the text and background should be monochrome and the contrast should be high. the document is then generated by a document with a white background against a black background.

4.6 Connection Component

The fundamental operation of the recognition of patterns is the identification of the components of the binary document. the algorithm converts a binary document into a symbolic document with each component having a unique numeric label. after the conversion of the document to binary form, we can use a number of ways to represent the document using array, run-length, quadtree, octrees and bintress. the RUN mean “take the same block in the same row. “”all RUNs in horizontal projection, and all horizontal projections calculated from the run length code. “the main focus of most labeling algorithms, with little effort to implement and minimise memory, is to represent the equivalence table, which is a table of equivalence. the encoding is using a run-length encoding. the processing of multiple rows in parallel is easy to parallelize the conversion of the original document to a run-length encoding. the format of the run-length encoded document is much more compact than the binary document (individual runs are separated by a single label), and so the sequential transmission is much faster than the usual algorithm. the algorithm is described below. the implementation of our programme shall be carried out in the following stages:

  1. Pixels are converted to runs in parallel by rows,
  2. Initial labeling and propagation of labels
  3. Equivalence table resolution
  4. Translating run labels to connected component

The design is parallelized as much as possible. Although stages 2 and 3 are sequential, they operate on runs, which are far less numerous than pixels. Similar to stage 1, stage 4 can be executed in parallel by row. A run has the properties {ID,EQ, s, e, r}, where ID is the identity number of the run, EQ is the equivalence value, s the x offset of the start pixel, e the x offset of the end pixel, and the row.

The first stage is a series of parallel pixels to run. the document may be divided into n parts to achieve n run-length encoding in parallel depending on the location and access mode of the memory holding the binary document. the equivalence table is reduced by using runs rather than pixels. for the document size M × N, the following sequential local operations are performed in parallel:

Algorithm 3.1: PIXELTORUNS (T)

∀T: T(x, y) = I(x, y)

i ← 0

If T(x, y) = 1 and isBlock = 0

Then si ← x

isBlock ← 1

If isBlock = 1 and (T (x, y) = 0 or x = M).

Then

ei ← (x − 1)

ri ← y

IDi ← EQi ← 0

i ← i + 1

isBlock ← 0

If the document is scanned for T and M, the document is 1. when the end of the line is reached, or when the background pixel is reached, the run is complete. the maximum number of documents in M × N is 2MN.

The second stage involves initial labelling and propagation of labels. The IDs and equivalences (EQs) of all runs are initialized to zero. This is followed by a raster scan of the runs; assigning provisional labels which propagate to any adjacent runs on the row below. For any unassigned run (IDi = 0) a unique value is assigned to both its ID and EQ. For each run i with ID IDi, excluding runs on the last row of the document; runs one row below runi are scanned for an overlap. An overlapping run in 4-adjacency (ie. si ≤ ej and ei ≥ sj ) or 8-adjacency (ie. si+1 ≤ ej and ei+1 ≥ sj ) is assigned the ID IDi, if and only if IDj is unassigned. If there is a conflict (if an overlapping run has assigned IDj ), the equivalence of run i, EQi is set to IDj . This is summarized in algorithm:

Algorithm 3.2: INITLABELLING (runs)

m ← 1

For i ← 1 to Total Runs

Do

If IDi = 0

Then IDi ← EQi ← m

m ← m + 1

For each rj Є ri+1

Do

If Idj = 0 and ei = Sj and Si = ej.

Then IDj ← IDi

EQj ← IDi

If Idj 6 = 0 and Ei ≥ Sj and Si ≤ ej.

Then EQi ← IDj

Where the document contains no runs on the last line. the object in the figure (a “U” shaped object) will be applied to the object in the figure (a “U” shaped object).

Figure 4.7 U shaped object with 4 runs

Table 4.1 Result to object to rows

The third phase is the resolution of conflicts, as described in the algorithm 3. 3. in the example (Figure ** and table 1) the conflict occurs at B3; the initial EQ = 1 is changed to EQ.  = 2 the overlap with B1 and B4 is 2 in the first iteration. the conflict is resolved by resolving conflict ()all four runs are EQ = 2. even though the two “if statements” in the second loop are executed simultaneously, the ResolveConflict () is highly sequential, but it takes half the cycle. without scanning the document, the document is written to the document at the appropriate pixel location, without scanning the document, as each run is associated with S, E and R values.

Algorithm 3.3: RESOLVECONFLICT (runs)

For i ← 1 to Total Runs

Do

123

If IDi =! EQi

Then

TID ← IDi

TEQ ← EQi

For j ← 1 to Total Runs

Do

If IDj = TID

Then IDj ← TEQ

If EQj = TID

Then EQj ← TEQ

Table 4. 2 shows the results of the document scan, where St = start, En = end and Rw = row.

It’s completely clear. the scan is performed using the 8-adjacency label, while the 8-adjacency label is used.

We will use 8-neighbors where they share at least one corner, the position of it are [i+1,j] ,[i-1,j] ,[i,j+1] , [i,j-1], [i+1,j+1], [i+1,j-1] , [i-1,j+1] and [i-1,j-1] as shown in figure 4.8 below

Figure 4. 8, 8-neighborhoods for rectangular document tessellation[I,, J] is located in the middle of the figure.

4.7 Fuzzy logic

After determining the connection component, we will determine the character or connection component of the document, we will use fuzzy logic to identify the character or connection. in the case of each pixel in the corner and other pixels used to identify the connection component, we must know the pixel in each corner.

Figure 4.9 identify the character

Afterward, we give value between 0 and 1 for each identified pixel .Next send to fuzzy of make recognize identified pixel of connection component when fuzzy receives this values of identified pixel and whose status “on” or “off”, after that fuzzy logic determine either character or noise. Below figure 4.10 designs input fuzzy of location of pixel and status of pixel.

Figure 4.10 (a) Example of fuzzy

Figure 4.11(b) Example of fuzzy

4.8 Summary

The methodology for the proposal methodology is described in this chapter. the objectives of this project must be achieved in accordance with the plan that was discussed in this chapter. in some of the details of our methodology, he showed some of the details. the problem statement was developed, followed by a literature review, system development, performance evaluation, proposed technical,, connection component and fuzzy logic. in addition, the procedure for the project is briefly described in each phase. finally, the report is written and the report is written. the main role in the implementation of the project is played by each stage.

Chapter 5: Results

 

5.1 Introduction

The results of the experiments described in Chapter V will be described in this chapter. the chapter is mainly concerned with the findings of this project. the text and recognition of characters will be presented as extraction text.

5.2 Input Document

Figure 5.1 below shows the document as a colored document.

Figure 5.1 Original document

In the following, we will have a document with each edge detection and Gaussian pyramid, as shown in Figure 5. 2 below.

 

Figure 5.2 Edge detection a) “0” edges, b) “180” edges, c) “45” edges, d) “270” edges, e) “90” edges, f) “225” edges, g) “135” edges and h) “315” edges

5.3 Complement Edge Detect with them

In fact,, after the edges have been detected and the Gaussian pyramid is formed. we’ll add two edges to them. with two angles, we’ll be able to detect all edges. the effect of adding two edges is shown in Figure 5. 3 below.

 

Figure 5.3 Effect of adding two edges a) Add edges “0” and “180”, b) Add edges “45” and “255”, c) Add edges “90” and “90”, d) Add edges “135” and “315”

5.4 Total of Edges Detection

The result of adding all edges found is shown in Figure 5. 4.

Figure 5.4 Total of edges detection

5.5 Document Localization

Figure 5.5 below shows the process of localization.

Figure 5.5 Localized of text

5.6 Separate Text from Background

The text and background should be monochrome and the contrast should be high. the document is generated with a white text document against a black background. Figure 5. 6 shows the effect of this operation.

Figure 5.6 Separate text from background

Figure 5.7 and 5.8 show the results of the algorithm.

 

Figure 5.7 Test document 1

Figure 5.8 Test document 2

However, the evaluation of the performance of the text translation is not entirely accepted method. we will manually count the number of correctly localized characters, which are considered to be the best precision rate and recall rate, and we will assess the accuracy of the algorithm. the equations (5. 1), (5. 2) and (5. 3) shall be used to evaluate the performance.

Precision = ((correctly located)/(correctly located + false positive))*100%               (5.1)

Recall = ((correctly located)/(correctly located + false negative))*100%                   (5.2)

False positive = ((false positive)/(correctly located + false negative))*100%            (5.3)

Correctly located: The correct location of the text is located in the image. in other words, the text is exactly where the text is.

False positive: The real intended text is located in a location.

False negative: relates to a localization process of non-text object.

Table 5.1 Performance evaluation 1

No sampleResolutionPrecision rateRecall rateFalse positive
Sample1256×25575%66%33%
Sample2150×19068%65%53%
Sample3250×19080%60%90%
Sample4408×30055%62%40%
Sample5373×49860%71%65%

 

We’ll take a look at the text extraction performance. we will manually count the number of false text and the number of false text from the image, and we will manually count the number of false text. using the equations (5. 4) and (5. 5), we can calculate the recall and false alarm rate using the algorithm (5. 4).

Recall = ((number of correctly detect text)/(number of text))*100%                                  (5.4)

False alarm rate = ((number of correctly detect text)/(number of detected text))*100%    (5.5)

The number of correctly detected text: This is the number of characters found in the text.

number of characters: The number of characters in the text.

The number of false text: The number of characters in the text that the algorithm failed to detect.

Table 5.2 Performance evaluation 2

No sampleResolutionFalse alarm rateRecall rate
Sample1256×25525%80%
Sample2150×1905.26%95%
Sample3250×1901%99%
Sample4408×30040%70%
Sample5373×49830%75%

 

5.10 Determine Character by Run-length and Recognize by Fuzzy logic

We’ll determine the characters and how many characters in the text and how many words in the text. in conclusion, the document is being extracted. when we’ve connected the components and identified the pixels. in Figure 5. 11 below, the following information is shown.

Figure 5.11 Determine Character

Let’s suppose the following

N1 = upper left center  N4 = center center off  N7 = lower left off

N2 = upper center on   N5 = center center on  N8 = lower center on

N3 = upper right off     N6 = center right off   N9 = lower right off  N10 = half lower center

Figure 5. 12 below shows the N1 to N9 pixel from N1 to N9.

Figure 5.12 Ten inputs and one output

Figure 5.13 Input one N1

Figure 5.14 Output

As noticed above of Figure 5.14 shown output we can see that output of character “c” is between 0 and 2, so we can know this is character “c” same scheme with all characters done with output of text is as shown in Figure 5.15 below

Figure 5.15 output of extracted text

 

 

 

 

Chapter VI: Conclusion

 

6.1 Introduction

Ocr has become widely used to extract character from documents recently. the text extraction from the picture is very large. the use of recognition of the character after determining the connected component also identified the pixel using the use of the edge detection of the kernel with eight kernels. after sending, the fuzzy logic used to classify pixels using fuzzy logic.

6.2 Discussion on Result

The project finding includes:

  1. the kernel is implemented with eight used to detect the edge of the document.
  2. the proposed method is the appropriate performance of the detection of text in complex documents.
  3. the coding of the run-length coding was fast and appropriate for the determination of the connected components and the reduction of time and memory.
  4. the text “identified pixel” is more appropriate for the undamaged character.
  5. the classification of the character was possible with a fuzzy logic.

6.3 Project Advantage

the project has some advantages:

  1. The software has very simple with no any complex.
  2. It is easy to deal with the software.
  3. we have clearly defined the objectives of the project before we begin.
  4. Given us suitable result.

6.4 Suggestion and Future Works

Improve this work in the future can be improved by several suggestions and other work:

  1. improve method that is focused on extracting text from large complex documents, such as similar colors between text and background.
  2. Improve the method of extracting text using different styles and formalities. .
  3. Other features of the software: Increase the functions of the characters and remove noise.

6.5 Conclusion

In this context, the objectives, objectives and scope of the project have been met.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

References

 

[1] s.long and x.he ,”scene text detection and recognition’,noname manuscript no,arxiv:1811.04256v5,2020.

[2] h.g.daway,e.g.daway and h.h.kareem ,”colour image enhancement by fuzzy logic based on sigmoid membership function”,international journal of intelligent engineering and system,vol.13m no 5,2020.

[3] M.v.r.v.prasad,m.sirisha and r.bhararth kumar,”estimating the age of the human by using fuzzy logicand image processing”,international journal for research in applied science and engineering technology,issn:2321-9653,value 45.98,2019.

[4] R.emille t,e r.arboleda,a andilab and r.m.dellosa,”international journal of scientific and technology research “,issn 2277-8616,volume 8,issue 08,august 2019.

[5] G.ruth and ch.samson,”survey on image enhancement technigues”, international journal of computational engineering research”,issn: 2205-3005,volume 09,issue 3,march- 2009.

[6] J. Samarabandu, and X. Liu,”An Edge-Based Text Region Extraction Algorithm for in Door Mobile Robot Navigations”. International Journal of Signal Processing, no. N6A 5B9, pp.273-280, 2007.

[7] B.Gatos, I.Pratikakis, K.Kepene and J.perantonis. “Text Dectection in Indoor/Outdoor Scene Document s”. Proceedings to National center for scientific research “Demokritos”,no. GR-153 10 Agia Paraskevi, Athens, Greece,pp. 127-132,2003

[8] T.Han, Y.Chien, C.Lun ,”Comprehensive Motion Videotext Detection Localization and Extraction Algorithm “,National Central University ,Taoyuan County 320 , .no.0-7803-9584-0.,pp.113-116,2006

[9] Q.Ye, W.Gao, W.Wang and W.Zeng,”Roust Text Tetection Tlgorithm in Tmages and Tideo Frames”,ICICS-PCM, no 0-7803-8185-8,pp.1-5

[10] B.Shanwar and N.Zarka ,”Multiscale Edge-Based Text Extraction from Complex Document “,ResearchGate,no 1-4244-0367-7.,pp.1-26 ,.2016.

[11] R.Farhoodi and S.Kasaei,”Text Segmentation from Document with Textured and Colored Background. Sharif University of Technology, Tehran, Iran, PP.395-398, 2005

[12] J.Liu,J.Sun and S.Wang,”Pattern Recognition: An overview”. IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.6. 130012 Changchun, pp.57-61 June 2006

[13] Holland and J.H, “Adaption in Natural and Artificial systems”. University of MICHIGAN press, Vol.5 No.21, pp.256-347, December 1, 2014

[14] V.Dutt,V.Chadhury and I.Khan,” Different Approach in Pattern Recognition”, Computer Science and Engineering,no.10.5923,pp.32-35,2011.

[15] S.Asht,R.Dass, “Pattern Recognition Techniques:Review”:International Journal of Computer Sciences and Telecommunication,Vol.3,pp.25-29 2012.

[16] C.Tu, J.Liang and T.D, “Adaptive Runlength Coding”,IEEE ICIP,no. 0-7803-7622- 6.pp.665-668,2002.

[17] M.Alata and M.Al-shabi,”Text Detection AND Character Recognition Using Document Processing”,Journal of Electrical Engineering .Vol.57, No.5. Jordan.pp.258-267, 2006

[18] S.N.Sivanandam, S.Sumathi and S.N.Deepa,”Introduction to Logic Using Matlab”, Springer, Inc. 2006.

[19] M.A.Rabbani and C.Chellappan,”Fast and New Approach to Gradient Edge Detection”.International Journal of Soft Computing,Vol.2 :pp.325-330 , 2007

[20] D.Kofi, R.Andrew, and P.Jonathan,.”Run-Length Based Connected Component Algorithm for FPGA Implementation”. University of Lincoln.England,pp245-250, .2007

[21] K.Wang and J.Kangas,”Character Location in Scene Document s from Digital camera. Pattern Recognition”. Journal of the pattern recognition ,pp 2287-2299,2003

[22] K.Kim,K.Jung and J.Hyung,”Texture-Based Approach for Text Detection in Document Using Support Vector Machine and Continuously Adaptive Mean Shift Algorithm”. IEEE Transtion On Pattern Analysis and Machine Intelligence, Vol. 25, No. 12,pp.1631-1639,2003 [23] K.C.Kim,H.R.Byun,Y.J.Song,Y.W.Choi,S.Y.Chi,K.K.Kim and Y.K.Chung ,”Scene Text Extraction Scene Document s Using Hierarchical Feature Combining and Verification”, Computer Sciences, 17th International Conference on Pattern Recognition,. no.1051-4651/04,pp.1-4,2009 .

[24] R.Lienhart and A.Wernicke ,”Localizing and Segmenting Text in Document s and Videos”, IEEE Transtion On Pattern Analysis and Machine Intelligence ,Vol 12,No 4, PP.256-268,.2002

[25] Xiaoqing and Jagath,.(2006a).Multiscale Edge-Based Text Extraction from Complex Image..IEEE.1-4244-0367-7.London, Ontario, N6A 5B9, Canada

[26] Jagath, and Xiaoqing. (2006b). An Edge-Based Text Region Extraction Algorithm for in Door Mobile Robot Navigations. International Journal of Signal Processing 3;4. Western Ontario, London, ON., N6A 5B9, Canada

[27] Tsai, Chen, Fang (2006c).A Comprehensive motion videotext detection localization and extraction method.IEEE.0-7803-9584-0.Taoyuan County 320, Taiwan P.R.C

[28] Qixiang, Wen, Weiqiang and Wei (2003a).Roust text detection algorithm in images and video frames.IEEE.0-7803-8185-8.School of Chinese academy of scienes,China

[29] Roshanak and Shohreh(2005c).text segmentation from image with textured and colored background. Proceedings to Sharif University of Technology, Tehran, Iran

[30] Jie ,Jigui and Shengsheng,(2006d). Pattern Recognition: An overview. Proceedings to IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.6. 130012 Changchun, China June 2006

[31] Pavilidis, (1977).,Structural Pattern Recognition, Springer-Verlag,New York, 1977.

[32] Hyeran, Seong-Whan, Applications of Support Vector Machines for Pattern Recognition: A Survey, SVM 2002, LNCS 2388, pp. 213-236, 2002a

[33] Mohanad and Mohammad, 2006e. Text Detection AND Character Recognition Using Fuzzy Image Processing. Proceedings to Journal of Electrical Engineering.Vol.57, No.5. Jordan. 2006

[34] Fuzzy Logic Toolbox User’s Guide” The MathWorks, Inc. 2006f

[35] Kongqiao and Jari, (2003b).Character Location in Scene Images from Digitalcamera.Pattern Recognition. Proceedings to journal of the pattern recognition.Tampere, Finland.2003

[36] Rainer and Axel, (2002c). Localizing and Segmenting Text in Images and Videos. University Pittsburgh.2002

[37] Chunmei, Chunheng, Ruwei, (2005).Text Detection in Image Based on Unsupervised Classification of Edge-Based Features.IEEE. 0-7520-5263. Proceedings of the Eight International Conference on Document Analysis and Recognition. china.2005

[38] Datong,Herve and Jean, 2001. Text identification n complex background using SVM.IEEE.0-7695-1272-0.Dalle molle insitiute for perceptual artificial intelligence,Switzerland .UTM.2001.

[39] Yuzhong, kallekearu and anil, 1995. Locating Text in Complex Color Images.

[40] Jiang and Jie, 2000. An Adaptive Algorithm for Text Detection from Natural Scenes. University Pittsburgh. 2000.

[41] Takuma, Yasuaki and Minoru, 2003d. Digit Classification on Signboards for Telephone Number Recognition .IEEE. 0-7695-1960-1. Proceedings of the Seventh International Conference on Document Analysis and Recognition. Japan.2003

[42] Xilin, Jie, Jing and Alex, (2003e). Automatic Detection of Signs with Affine Transformation. University Mobile Technologies.2003.

[43] Ezaki, Bulacu and Schomaker,(2004).Text Detection from Natural scene Images:T owards a System for Sisually Smpaired Sersons. In Sroceedings of the Snternational Sonference on Sattern Secognition (ICPR’04).pp.683-686.2004.

[44] Matsuo,Ueda and Michio,2002d .Sxtraction of Sharacter String from Scene Smage Binarizing Local Target Aarea.Transaction of the Anstitute of Electrical Engineers.122-c(2).232-241 Japan.2004.

[45] Victor,Raghavan and Edward,(1999).Textfinder:an Automatic System to Detect and Recognize Text in Image.IEEE Tansaction on Pattern Analysis and Machine Intelligence,vol.21, no.11, Novermber 1999.Amherst

[46] Qixiang, Qingming, Wen and Debin, 2005b .Fast and Robust Text Detection in Images and Video Frames.Image and Vision Computing 23.China.2005.

[47] Sivanandam, Deepa and Sumathi, 2007. Introduction to Fuzzy Logic Using Mathlab.

[48] Alasdir, 2004. Introduction to Digital Image Processing with Mathlab. Springer-Verlag Berlin Heidelberg 2007

[49] Yi-feng,xinwen and cheng-lin,2011,hybrid approach to detect and localize texts in natural scene image,IEEE transaction on image processing, vol 20, no3,march 2011.

[50] Chaitanya and Ashwin,text detection and recognition :review,IRJET,volume:04 issue 06,issn:2395=0056,2017

[51] Eser sert and derya avci,expert system with application,Elsevier,499-511,2019.

[52] Tarun and amit,image enhancement using fuzzy technique, Ijrrest,issn 2278-6643,volume-2,issue-2,june-2013.

[53] Kkk

[54] Rodel,Edwin,Adonis and rhowel, implementation of template matching ,fuzzy logic and k nearest neighbor classifier on Philippine banknote recognition system,ijstr,issn 2277-8616,volume 8,issue 08,august 2019

[55] Jing, Dmitry and rangachar,new edge-based text verification approach for video,IEEE ,978-1-4244-2175-6/08,2008

[56] Ruby bhari,satellite high resolution image classification using fuzzy logic,issn 0973-6107 volume 10,number 5(2017)pp 1427-1437

[57] Omar adil, aous y and balasem salem,comparsion between the effect of different types of membership function on fuzzy logic controller performance,ijeert,issn 2349-4395,volume 3,issue 3 march 2015,pp 76-83

[58] Digant, sarman and hiren ,review:various active controur based image segmentation methods,ijiere,issn:2394-3343,volume 3, issue 2,2016

[59] Xiaoqing and jagath, multiscale edge-based text extraction from complex complex images,icme,1-4244-0376-7/06,2006.

[60] A.zanasi and m.artioli, automatic surveillance systems against security threats,safety and security engineering III,issn 1743-3509,vol 108,2009

[61] M.subba ,dr.b.eswara,comparation analysis of pattern recognition methods:an overview,ijcse,issn:0976-5166,vol 2 no jun-jul 2011

[62] Vinita,vikas and Imran,pattern recognition :an overview ,American journal of intelligent system 2012,2(1):23:27

[63] Joseph stroud,creating a feature vector to identify similarity between midi files,2017

[64] Yang,suhangmhaiyyun,tao and erik,disentangled variational auto-encode for semi-supervised learing,sciences direct,volume 482,may 2019.

[65] Vinod,paulose and femy,object identification and process planning using adaptive nruro-fuzzy systems,ijar,issn 0973-4562, volume 13, number 3(2018)

[66] M.subba rao,comparative analysis of pattern recognition methods:an overview,ijcse,issn:0976-5166,vol.2no.3 jun-jul 2011

[67] Jie,jigui and shengsheng, pattern recognition :an overview,ijcs n s, vol 6,no 6, june 2006

[68] Miss debalina and mr.renish,an overview of pattern recognition,ijirst,volume 2,issue 09,febrauary 2016,issn :2349-6010

[69] George bebis,data structure,spring 2012.

[70] Ahmed,Mohamed ,elkhatib and yehia ,model design and hardware implemtation of an intelligent laser warning system,josr,issn:2320-3331.volume 10,issue 5 var ver II,2015

[71] Sai venkatesh, embedded system for waste management using fuzzy logic. [72] Fuzzy logic pi controller:computer sciences & it book chapter,igi global,2021.

Appendices

 

X=Imread(‘image.jpg’); %read image

X1=Reducesize(x); %reduce size by half of 8 time

X2=convolve(x1); %convolve each image with kernels

X3=Imresize(x2); %return image of original size

X4=addedge(x3); %collect edges together

%x4 present total addition edges

X5=Dilation(x4) % does dilation of image

X6=Erosion(x5); %does erosion of image

X7=eliminated(x6); %eliminating long edges

X8=Extract(x7); %extract text from image

Imagebinary(x8); %binary image

A1: Matlab command to find binary image

 

X1=removeborders(‘image.jpg’); %read image and remove borders

X4=bwlabel(x3,8); %label connected component

X5=identifedpixel(x4); %identify pixels of each connected compoment

X6=sendfuzzylogic(x5); %send data into fuzzy logic

X7=recognizecharacter(x6); %recognize the characters

A2: Matlab command used fuzzy logic for identify character

لا توجد تعليقات على “Automatic Text Detection from Image Based on Edge and Fuzzy Logic”