Implementation Of Morphological Binary Image Processor

Nisanth Viswambharan, ME Applied Electronics student, Maharaja Engineering College, Avinashi, Coimbatore, India, Email:nishanth0222@gmail.com, P. Hemaletha, Assistant Professor in ECE department, Maharaja Engineering College, Avinashi, Coimbatore, India, Email: hemapratikme@gmail.com

ABSTRACT

Binary Image Processing is a powerful tool in many image and video applications. A Reconfigurable Processor is presented for binary image processing. The processor’s architecture is a combination of Reconfigurable Binary Processing module, Input and Output Image Control Units and peripheral circuits. The Reconfigurable Binary Processing Module, which consists of mixed-grained reconfigurable binary compute units and output control logic, performs binary image processing operations, especially Mathematical Morphology operations and implements related algorithms more than 200 f/s for a 1024×1024 image. The peripheral circuits control the whole image processing and dynamic reconfiguration process. The processor is implemented on an EP2S180 field-programmable gate array. The experimental results demonstrate that the processor is suitable for real-time binary image processing applications. In addition, in embedded or mobile applications, these systems require low power consumption and low memory requirements.

Keywords: Binary image processing, field-programmable gate array (FPGA), mathematical morphology, mixed grained, real time, reconfigurable.

1 INTRODUCTION

Binary Image Processing is extremely useful in various areas, such as object recognition, tracking, motion detection and machine intelligence, image analysis and understanding, video processing, computer vision and identification and authentication systems. Binary image processing has been commonly implemented using processors such as CPU or DSP. High-speed implementation of binary image processing operations can be efficiently realized by using chips specialized for binary image processing. Therefore, binary image processing chips have attracted much attention in the field of image processing.

Application-specific chips and hardware have been reported for various applications. A chip with a 500-dpi cellular-logic processing array was implemented to enhance and verify fingerprint image. A pointing device using a specialized, Digital Object Identifier algorithm was presented for motion detection. All the above-mentioned chips are designed for specific applications. The major drawback of Application-Specific Chips is the lack of flexibility. With the continuous CMOS technology scaling, the importance of flexibility exceeds that of silicon area, especially in vision chips. The reconfigurable technique can bridge the gap between application-specific integrated circuits and flexibility. A vision system with high flexibility and performance, small size and low power consumption can be implemented in single chip.

Reconfigurable Binary Image Processing (RBIP) chips have been designed to generalize the binary image applications of a chip. Chips were presented to perform basic binary morphological operations, such as Dilation, Erosion, Opening and Closing. A vision chip with the architecture of a massively parallel cellular array of processing elements was presented for image processing by using the asynchronous or synchronous processing technique. It has been a common practice to build application-specific chips for real-time binary image processing.

Some of the chips are made of analog circuit and some are made up of an analog part and a digital part. When compared with the digital part, the analog part shows low robustness, accuracy and scalability although it has a small area and low power consumption. Other general-purpose chips have the architecture of a digital processor array, in which each digital processor handles one pixel. The chips will become extremely large, when large sized images are processed. Thus, further studies are needed to design a high performance, small size and wide application range chip for real-time binary image processing.

Here presents a binary image processor that consists of a reconfigurable binary processing module, including reconfigurable binary compute units, output control logic, input and output image control units and peripheral circuits. The reconfigurable binary compute units are of a mixed grained architecture, which has the characteristics of high flexibility, efficiency and performance.

Basic mathematical morphology operations and complicated algorithms can easily be implemented on it. The processor has the merit of high speed, simple structure and wide application range.

2 ARCHITECTURE

The presented processor is designed for applications in image or video processing, computer vision, machine, intelligence and identification and authentication systems. Such systems should have a high flexibility and high performance processor for wide applications; therefore, the processor design is focused on high flexibility and speed. Some of the conventional works are designed for specific applications and some have large areas and high power consumption. Then, a reconfigurable binary processing module with high speed and simple structure is implemented for wide use and consuming fewer hardware resources. The architecture of the proposed processor is shown in Fig.1. The core of the processor is a reconfigurable binary processing module consisting of binary compute and output control logic. The processor also has two bus
2.1 Reconfigurable Binary Processing Module

The diagram of the Reconfigurable Binary Processing Module is given in Fig. 2. It can be divided into two main parts. The first part is the output control logic, which selects the output from all the binary compute unit outputs according to the given parameters and converts the series data of 1-b binary images into parallel data. The second part consists of several binary compute units that perform binary logic and binary image operations at a high speed. The binary image algorithms are realized by the operations in the individual binary compute units and the connection pattern of these units. The units can execute binary image operations in a pipelined or parallel manner. The operation executed in a binary compute unit is decided by configurable registers, including logic operation parameters, image resolution parameters, mask sizes, input and output selection parameters and auxiliary parameters.

Fig. 3 shows the examples of how a reconfigurable binary processing module with eight binary compute units works. In Fig. 3(a), the RBPM is reconfigured to an Eight-Stage Pipelined Architecture. In Fig. 3(b), the RBP Module is reconfigured to Two Four-Stage Pipelined Architectures such that two images can be processed simultaneously. In Fig. 3(c)-(e), the RBPM are reconfigured in parallel structure. In Fig. 3(c), eight images undergo the same image processing operation in the eight binary compute units, respectively. In Fig. 3(d), the same operation is performed on eight different parts of an image. In Fig. 3(e), different operations are performed on eight parts of an image. The reconfigurable architecture provides higher hardware utilization than the pipelined architecture.

The Architecture Of The Binary Compute Unit is shown in Fig. 4(a). Each binary compute unit, which has two binary compute elements and one set of operation elements, can perform logic, reduction, median filtering and set operations. The binary compute unit has a mixed-grained architecture that has high flexibility, efficiency and performance and short Granularity refers to the level of data manipulation.

The fine-grained architecture is highly flexible and the coarse-grained architecture has fewer reconfiguration parameters and is highly efficient. The mixed-grained architecture is more flexible and efficient than the coarse-grained architecture and has fewer reconfiguration parameters than the fine-grained architecture. The comparison of the Fine, Coarse and Mixed-Grained Architectures of one binary compute unit is listed in Table I.

The set operation element can perform binary set operations, such as union, intersection, complement, subtraction, addition and straight-through output. The inputs of the set operation element and the outputs of the binary compute unit are transmitted via two sets of multiplexers,
The detailed Architecture of the Binary Compute Element is shown in Fig. 4(b). The binary compute element comprises of two input control multiplexers, n-binary logic elements, a binary reduction element and a binary median filter. The input control multiplexer selects input data for the binary logic element from the line memories, the SDRAM and the parameters in the register group.

The input data are selected from the parameters in the register group or SDRAM when images other than videos are processed. The binary logic element can perform operations such as AND, OR, NOT, NAND, NOR, XOR, XNOR and straight-through output. The reduction element performs operations such as reduction AND, reduction OR, reduction NAND, reduction NOR, reduction XOR, reduction XNOR and straight through output.

The set element performs operations such as union, intersection, complement, subtraction and XOR. All the operation results from the binary logic elements, the reduction element and the binary median filter are synchronized and output via multiplexers to the next binary compute unit. The binary compute element has a coarse-grained architecture featured by high performance and short reconfigurable time.

### Table I. Comparison Of Architectures

<table>
<thead>
<tr>
<th></th>
<th>Fine-Grained</th>
<th>Coarse-Grained</th>
<th>Mixed-Grained</th>
</tr>
</thead>
<tbody>
<tr>
<td>Hardware resource (ALUTs)</td>
<td>641</td>
<td>734</td>
<td>641</td>
</tr>
<tr>
<td>Flexibility</td>
<td>High</td>
<td>Low</td>
<td>Medium</td>
</tr>
<tr>
<td>Reconfiguration Parameters (bits)</td>
<td>264</td>
<td>54</td>
<td>116</td>
</tr>
</tbody>
</table>

The input control logic unit selects and synchronizes the inputs from video images, SDRAM and registers to the synchronization circuit. The block diagram of the Input Control Logic Unit is shown in Fig. 5. The unit contains four data converters and a synchronization circuit. Data converters 1 and 2 convert 1-b image signals into 32-b parallel data, which have the same format as the data from SDRAM and registers. Converters 3 and 4 convert the parallel data into 1-b image signals, which are then synchronized by the synchronization circuit. To increase the processing rate, two down sampling circuits are added to down-sample image signals before they are processed by the data converters 1 and 2. The output control logic unit writes the selected parallel image data from the reconfigurable binary processing module into SDRAM through the bus interface 1.

### 2.2 Input And Output Control Logic Units

The input control logic unit reads the configuration information in the configuration registers. It controls the operation process of the reconfigurable binary processing module. It also controls the input and output control logic units.
and bus interfaces during data access. After the processed image data is written to SDRAM, the process control unit transmits interrupt requests to complete the interaction of the processor with external systems.

The configuration register group is an extremely important part in the proposed processor. It contains control parameters, reconfiguration information, operation parameters and interaction information. Most of the registers in the configuration register group are written by an external CPU via the system bus, and the rest are written by the internal modules in the proposed processor.

3 CIRCUIT IMPLEMENTATION

In this section, an Image Processing System based on the proposed binary image processor is implemented on an Altera Stratix II EP2S180C4 Field-Programmable Gate Array (FPGA) to verify the performance and feasibility of the processor in binary image processing. The following shows the main circuit blocks of the system.

3.1 Binary Compute Unit

With consideration of the size, generality and usability of the processor, the logic element is set to be 32-b wide. The maximum block size for image processing is 5×5. The two inputs of each binary computing element are the video image signal and register, respectively.

The line memories for each binary computing element are 4-line memories with the length of 1280 (the maximum size of the image to be processed is 1280×720, depending on the video camera resolution). The binary reduction element computes a 32-b, 25-b, or 9-b reduction operation under the control of configuration registers. The binary median filtering performs median filtering for 32-b, 25-b, or 9-b. The 25-b operation is validly performed on the logic element from 4 b to 28 b, while the 9-b operation is validly performed from 12 b to 20 b. As for the straight through reduction operation, the output is the 16th bit of the logic element input.

As shown in Fig.6 the reconfigurable binary processing module contains four binary compute units and two converters. The binary compute unit 1 inputs are the outputs of the input control logic unit. The inputs of the binary compute unit 2/3/4 are the outputs of the input control logic unit and the binary compute unit 1/2/3. The multiplexers in the binary compute unit decide which inputs will be processed. One hundred and sixteen bit control and configuration parameters are needed for operations on each binary compute unit. For the whole binary processing module, a 13×32-bit configuration register group is used for reconfiguration and image processing control.

3.2 Image processing system

The binary image processing system with the proposed processor is shown in Fig.7. The bi-bus architecture is adopted for the system to improve the data access efficiency.

The SDRAM1 is used as the main memory for the CPU. SDRAM2 is used to store images.

The CPU is used as a controller. The register group 2 and interrupt controller are also used for the control of the system. The dynamic reconfiguration approach is applied for reconfiguration of the binary image processor. The reconfiguration parameters are reduced to 24×32 bits due to the mixed-grained architecture of the binary image processor. The reconfiguration time is less than 30 cycles.

3.3 Synthesis Results

The processor is synthesized with the SMIC 0.18-µm cell library and the synthesis results of the proposed processor are shown in Table II. Then, the processor is implemented on the Altera Stratix II EP2S180C4 FPGA board verification.
Table II. Synthesis Result Of Proposed Processor

<table>
<thead>
<tr>
<th>Process</th>
<th>SMIC 0.18-µm</th>
</tr>
</thead>
<tbody>
<tr>
<td>Area (mm²)</td>
<td>2.56</td>
</tr>
<tr>
<td>Gate count (K)</td>
<td>45</td>
</tr>
<tr>
<td>Memory (mm²)</td>
<td>1.96</td>
</tr>
<tr>
<td>Power consumption (mW)</td>
<td>98.5</td>
</tr>
<tr>
<td>Speed (MHz)</td>
<td>220</td>
</tr>
</tbody>
</table>

Table III. Resource Utilization of the Modules

<table>
<thead>
<tr>
<th>Module</th>
<th>ALUTs</th>
<th>LC Registers</th>
<th>Block Memory (bits)</th>
</tr>
</thead>
<tbody>
<tr>
<td>RBPM</td>
<td>2564</td>
<td>1828</td>
<td>40960</td>
</tr>
<tr>
<td>Input and Output Logic</td>
<td>1331</td>
<td>844</td>
<td>19456</td>
</tr>
<tr>
<td>Registers and Control Logic</td>
<td>264</td>
<td>809</td>
<td>0</td>
</tr>
<tr>
<td>Whole Processor</td>
<td>4159</td>
<td>3481</td>
<td>60416</td>
</tr>
</tbody>
</table>

The detailed hardware consumption for each component of the processor is shown in Table III. The eight 4×1280-b line memories are fabricated in the block memories of the Reconfigurable Binary Processing Module. When the depth of the line memories increases, the processor tends to process larger images. For example, if the maximum horizontal image size is 1920, the line memory depth can be 1920. The block memories in the input and output logic units are used to buffer images for synchronization. The implementation results of the binary compute unit on the FPGA are 641 ALUTs, 457 registers and 10,240 memories. When the frequency is 100 MHz, the dilation or erosion operation can be performed at 200M pixels per second. This means that the frame rate achieves 200 f/s when dilation or erosion with 5×5 structuring element is operated on a 1024×1024 image. The number of binary compute units can be adjusted to realize the target performance of the processor.

4 Binary Image Processing Applications

In this section, binary image processing operations in the proposed processor are discussed, including binary mathematical morphology operations and algorithms, motion detection, and image feature extraction. The following examples are given to illustrate binary mathematical morphology operations and algorithms in the processor. The actual use of the processor is not confined to the examples given.

4.1 Binary Mathematical Morphology

Mathematical morphology is a powerful tool for image processing and analysis in a wide range of applications, including shape recognition, image processing, video processing, document authentication and computer vision.

![Binary logic element](image)

Fig. 8. (a) Dilation. (b) Erosion

The basic binary morphological operations are dilation and erosion. Either of the two operations has two operands: the input signal, which is usually an image, and the structuring element characterized by its shape, size and center location. The other binary morphological operations such as opening, closing and hit-and-miss operation are based on various combinations of the two basic operations, dilation and erosion.

Assuming that A is the image and B is the structuring element, the dilation is defined by

\[ A \oplus B = \{ x | [(B')x \cap A] \neq \emptyset \} \quad \ldots (1) \]

The erosion is defined by

\[ A \ominus B = \{ x | (B)x \subseteq A \} \quad \ldots (2) \]

where (B) is the translation of B by the vector x and (B)
The implementation of binary opening and closing is illustrated in Fig. 9. The operation results of Dilation, Erosion, Opening and Closing are shown in Fig. 10.

The hit-and-miss operation, denoted with $\otimes$, is expressed as

$$A \otimes B = (A \circ B) \cap (A^c \circ B)$$ ................................(5)

Where, $A^c$ denotes the complement of $A$. The Hit-And-Miss Implementation is illustrated in Fig. 11. The binary morphology with spatially variant structuring elements can be implemented on the proposed processor.

5 PERFORMANCE AND COMPARISON

The Binary Image Processing system, as shown in Fig. 4, is applied to verify the proposed Binary Image Processor. The System Performance is shown in Table IV. The size of the structuring element is $5 \times 5$ and the image resolution is $1024 \times 1024$. Table IV shows that the execution time of each operation is less than 5 ms. The frame rate is more than 200 f/s, far exceeding the real-time requirement. The overall performance of the binary image processor is evaluated according to the number of the binary compute units. To be fair, the area and power consumption of the chips are normalized to 0.18 $\mu$m CMOS technology.

The technology scaling of area and power are described as,

$$\text{Power}_{L2} = \text{Power}_{L1} \times (L2/L1)^2 \times (V_{L2}/V_{L1})^2$$

$$\text{Area}_{L2} = \text{Area}_{L1} \times (L2/L1)^2$$

where $L1$ and $L2$ are the characteristic lengths of the two different processes. Since the processors in and support both binary and gray operations, they cannot get higher performance for the binary image processing. The memory in our processor occupies most of the processor area (1.96...
mm²) in order to support the horizontal size of processed images up to 1280.

<table>
<thead>
<tr>
<th>Table IV. System Performance</th>
</tr>
</thead>
<tbody>
<tr>
<td>Operations</td>
</tr>
<tr>
<td>Dilation</td>
</tr>
<tr>
<td>Erosion</td>
</tr>
<tr>
<td>Opening</td>
</tr>
<tr>
<td>Closing</td>
</tr>
<tr>
<td>Hit-and-miss</td>
</tr>
</tbody>
</table>

The architecture comparison shows that the 2-D SIMD array is suitable for processing small fixed-resolution images. The 1-D array Application-Specific Instruction Set Processor (ASIP) has a large chip area. The 1-D MIMD array has high performance, low power consumption and small area. It is suitable for the embedded systems to use for processing large images for vision applications.

6 CONCLUSION

In a Reconfigurable Binary Image Processor was proposed to perform real-time binary image processing. The processor consists of a reconfigurable binary processing module, input and output image control units and peripheral circuits. The reconfigurable binary processing module has a mixed-grained architecture with the characteristics of high efficiency and performance. The dynamic reconfiguration approach was used to increase the processor performance. Basic mathematical morphology operations and complicated algorithms can easily be implemented on it because of its simple structure. The processor, featured by high speed, simple structure and wide application range, is suitable for binary image processing, such as object recognition, object tracking and motion detection, computer vision, identification and authentication. The comparison showed that our processor is more suitable for binary image processing and vision systems.

REFERENCES