# Low-Power Digital Image Segmentation of Real-Time VGA-Size Motion Pictures

Takashi Morimoto, Yohmei Harada, Osamu Kiriyama,

Hidekazu Adachi, Tetsushi Koide, and Hans Jürgen Mattausch

Research Center for Nanodevices and Systems, Hiroshima University, 1-4-2 Kagamiyama, Higashi-Hiroshima,739-8527, Japan Phone: +81-824-24-6265 Fax: +81-824-22-7185 e-mail: {morimoto, koide, hjm}@sxsys.hiroshima-u.ac.jp

### 1. Introduction

The extraction process of the different objects from natural input images is called *image segmentation*. For this necessary first step of object-oriented image processing, strong demands of real-time processing exist in moving-picture applications like intelligent robots or moving object recognition. Several segmentation algorithms [1] and real-time segmentation architectures [2,3] have already been proposed. However, the emphasis on real-time processing has led to insufficient consideration of the low-power dissipation issue. In this paper, we propose an improved version of our previous real-time image segmentation architecture for gray-scale/color images [3], which additionally assures low-power dissipation. More than 75% power-reduction are achieved, without sacrificing realtime processing, by adding a boundary-active-only (BAO) scheme [4] for the region-growing process and by replacing some power-hungry static circuits with low-power dynamic circuitry. The fast segmentation speed of the present architecture further allows reduced hardware cost by a subdividedimage approach [4]. Consequently, low-power, real-time, VGA-size color image segmentation is expected to become possible in conventional 0.35µm CMOS technology with < 50mm<sup>2</sup> area consumption for the segmentation network, forming the core of our architecture. The achieved improvements make the architecture suitable for battery-based low-cost applications such as small-robots and mobile communication equipment.

2. Segmentation-Concept Evaluation by CMOS Test-Chip

The previously proposed architecture of a cell-network-based segmentation algorithm [3] achieves real-time processing in about 500µsec@10MHz (ave.) for VGA-size images (640 × 480), and consists of 4 functional pipelined stages. In the 1<sup>st</sup> stage connection-weights are calculated from luminance (RGB-data for color images) differences between neighboring pixels. The 2<sup>nd</sup> stage is used to determine the set of seeds for the region-growing process, called leader cells, from the calculated connection-weights. The 3<sup>rd</sup> stage, the cell-network, is the core of the proposed architecture and carries out a pixelparallel image segmentation by region-growing based on calculated connection-weights and leader cells. The 4<sup>th</sup> stage serves for the output of the segmentation result. The cell-network (3<sup>rd</sup> stage) is shown in Fig. 1. It consists of cells  $P_{ij}$ , corresponding to pixels, and connection-weight-register blocks  $WR_{ii}$  laid between cells. All cells determine their present state, either self-excitation, excitation or inhibition, in parallel from the states of the neighbor cells and the corresponding connection-weights. A region-growing process starts by self-excitation of a leader cell. In each subsequent clock-cycle, if neighboring cells satisfy the excitation condition, calculated from the corresponding connection-weights, these cells are automatically excited. The region-growing process is continued as long as excitable cells exist. If there are no excitable cells, the region-growing process is finished and the excited segment-member cells are labeled by a segment number and are inhibited. A global-inhibitor circuit is used for detecting whether further excitable cells exist.

The chip photo of Fig. 2 shows the fabricated test-chip of a cell-network with  $10 \times 10$  cells in  $0.35 \mu m$  CMOS technology. For compact implementation, we have designed cells and con-

nection-weight-register blocks in full-custom. Correct segmentation function of the test-chip through region-growing could be verified by measurements. We summarize the characteristics of the fabricated chip in Table I. The measured average power dissipation is about 24.4mW@10MHz. At the 100nm CMOS technology node, the estimated pixel density is 263 pixel/mm², and a cell-network including  $100 \times 100$  pixels can be implemented on a  $6.2 \text{mm} \times 6.2 \text{mm}$  chip. However, the power dissipation would be about 1 Watt, if the chip architecture is not improved further.

3. Boundary-Active-Only (BAO) Scheme for Reduced Power

For battery-based applications further reduction of the power dissipation is judged as indispensable. For this purpose, we propose a boundary-active-only (BAO) scheme [4] as a lowpower technique which doesn't sacrifice real-time processing. BAO effectively exploits the region-growing characteristic of the algorithm. For the region-growing process it is not necessary, that all cells evaluate their state transition in parallel. In fact, only the boundary cells of a region have to be activated in each step of the growing process, as shown in Fig. 3. Consequently, a network cell, which satisfies one of the 3 following conditions, can assume a low-power stand-by mode. (1) It has no excited neighboring cells. (2) It is already excited. (3) It has already a segment number. We implemented this BAO scheme by using a gated-clock concept into the cells for a substantial power reduction of the cell-network. Figure 4 shows an implementation example of the BAO scheme with clock controller. A hierarchical low-power dynamic globalinhibitor circuit, Fig. 5 shows the circuit for 4 rows, was also introduced. This circuit needs to process an OR function of the state signals of all cells. By cutting state signals from cells in stand-by mode and clock signals from rows or row-portions without boundary cells, further power-reduction is pos-

# 4. BAO-Scheme-Performance Simulation and Subdivided-Image Approach

We designed an image segmentation test-chip with BAOscheme, in a 0.35µm 3 metal CMOS technology. Figure 6 shows the layout image of the test-chip including  $41 \times 33$  cells. From the layout of this chip design, we have estimated the power-dissipation of the proposed low-power architecture by worst case analog circuit simulation (HSPICE). Results and a comparison to the previous architecture [3] are shown in Table II. The worst case power dissipation is 6.81mW@10MHz, which corresponds to more than 75% power reduction. For large size images, the processing speed of our proposed architecture allows image segmentation by sequential pipelined processing of subdivided-image blocks with a correspondingly smaller cell-network. For a 41 × 33 cell-network, the estimated average processing time is about 23µsec@10MHz. Therefore, VGA-size images (640  $\times$  480 pixels) can be divided into 16  $\times$ 15 overlapping blocks as shown in Fig. 7, and can be processed in sequential pipeline mode by a  $41 \times 33$  cell-network. The segment structure of the complete VGA-size image can be constructed by evaluating the segmentation results in the block-overlap regions in a post-processing step. Applying this subdivided-image approach, we confirmed by simulation that VGA-size image segmentation in < 7.5msec, including data input and segmentation result output to/from the cell-network, becomes already possible at 10MHz clock frequency in a 0.35µm CMOS technology. 28.0mW power dissipation and 51.06mm<sup>2</sup> area are obtained for the designed cell-network (see Fig. 6).

# 5. Conclusions

In this paper, we proposed a low-power real-time digital image segmentation architecture, which applies a boundaryactive-only (BAO) region-growing scheme. More than 75% power reduction are realized, when compared with an architecture which doesn't use the BAO scheme [3]. If a subdivided-image approach [4] is used, real-time VGA-size image segmentation should become possible already in 0.35µm



Fig. 1: Block diagram of the cell-network construction. Cell-network is implemented by laying active cells  $P_{ii}$ and weight-register blocks WR;;



Fig. 2: Chip photo of the fabricated test chip of the cell-network-based architecture in a 0.35µm 3 metal layer CMOS technology.

CMOS, using a cell-network for  $41 \times 33$  pixels with < 30mW power dissipation and < 60mm<sup>2</sup> area.

### Acknowledgment

The test-chips in this study have been fabricated in the chip fabrication program of VLSI Design and Education Center (VDEC), the University of Tokyo in the collaboration with Rohm Corporation and Toppan Printing Corporation.

Part of this work was supported by the Mazda Foundation's Research Grant.

## References

- [1] J. C. Russ, The Image Processing Handbook, pp. 371-429, CRC PRESS (1999).
- [2] S. Y. Chien, et al., Proc. 2002 IEEE AP-ASIC, pp. 233-236 (2002). [3] T. Morimoto, et al., Ext. Abst. SSDM, pp. 242-243 (2002).
- [4] T. Morimoto, et al., Ext. Abst. SSDM, pp. 146-147 (2003).

Table I: Characteristics of the designed image segmentation LSI chip.

| Weight Parallel Architecture (10x10 pixels) |
|---------------------------------------------|
| 0.35 m, 2-Poly, 3-Metal CMOS                |
| μ 3.3 V                                     |
| 25MHz                                       |
| 24.4mW@10MHz                                |
| 249,810                                     |
| 19.6 pixel/mm <sup>2</sup>                  |
|                                             |

Table II: Power dissipation comparison with the proposed BAO-based architecture (0.35µm CMOS technology, 10MHz clock frequency).

|              | Previous<br>Architecture [3] | Proposed<br>Architecture | Reduction<br>Ratio |
|--------------|------------------------------|--------------------------|--------------------|
| Average Case | 24.4mW@10MHz                 | 5.80mW@10MHz             | 76.2%              |
| Worst Case   | 30.9mW@10MHz                 | 6.81mW@10MHz             | 78.0%              |



Fig. 3: Conceptual diagram of the proposed boundary-active-only (BAO) scheme. Only boundary cells are in active mode, other cells are in stand-by mode.



Fig. 4: Block digram of clock controller for cellnetwork rows. Only region-growing boundary rows are activated by a gated clock signal. The information of activated rows is obtained from the global-inhibitor circuit.



Fig. 5: Dynamic global-inhibitor circuit which calculates an OR function of the state signals of all active cells. If there are cells in active mode, this circuit outputs a "1" (ZOR<sub>i</sub>=1).



Fig. 6: The layout image of the test-chip with BAO including  $41 \times 33$ cells. It is designed in a 0.35 µm 3 metal CMOS technology.



- 41x33-pixel block Estimated processing time: 23usec Estimated power dissipation: 28.0mW at 10MHz clock frequency
- Total processing time 7.49msec@10MHz Segmentation - (16x15) blocks x 23µsec= 5.52msec Data in/out - (16x15) blocks x 0.1µsec x 82cycles = 1.97msec

Fig. 7: Image-segmentation for a VGA-size image with subdividedimage pipeline processing. 41×33-pixel seized blocks are processed sequentially by the cell-network with BAO scheme.