342 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 2, FEBRUARY 2004 An ON–OFF Orientation Selective Address Event Representation Image Transceiver Chip Thomas Yu Wing Choi, Student Member, IEEE, Bertram E. Shi, Fellow, IEEE, and Kwabena A. Boahen Abstract—This paper describes the electronic implementation of a four-layer cellular neural network architecture implementing two components of a functional model of neurons in the visual cortex: linear orientation selective filtering and half wave rectification. Separate ON and OFF layers represent the positive and negative outputs of two-phase quadrature Gabor-type filters, whose orientation and spatial-frequency tunings are electronically adjustable. To enable the construction of a multichip network to extract different orientations in parallel, the chip includes an address event representation (AER) transceiver that accepts and produces two-dimensional images that are rate encoded as spike trains. It also includes routing circuitry that facilitates point-to-point signal fan in and fan out. We present measured results from a 32 64 pixel prototype, which was fabricated in the TSMC0.25- m process on a 3.84 by 2.54 mm die. Quiescent power dissipation is 3 mW and is determined primarily by the spike activity on the AER bus. Settling times are on the order of a few milliseconds. In comparison with a two-layer network implementing the same filters, this network results in a more symmetric circuit design with lower quiescent power dissipation, albeit at the expense of twice as many transistors. Index Terms—Address event representation (AER) , analog circuits, asynchronous logic, Gabor filter, image processing, neuromorphic engineering, nonlinear circuits, visual cortex. I. INTRODUCTION M OVING from the retina to higher levels of visual processing in the cortex, neurons become progressively more selective to more complex stimuli. Cells in the retina are sensitive along stimulus dimensions of position, spatial frequency (size), temporal frequency and color. In the primary visual cortex (V1), additional selectivity along the dimensions of orientation, direction of motion, and binocular disparity emerges. Subsequent areas are selective to higher order combinations of previous dimensions, e.g., curvature and illusory contours. Concurrently, there is a progressively more invariance along stimulus dimensions established earlier. For example, neurons in V2 respond to visual stimuli over a much larger spatial area than ganglion cells in the retina. A functional model that seems to account for the responses of a large proportion of cells in the primary visual cortex Manuscript received January 2, 2003; revised May 31, 2003. This work was supported in part by the Hong Kong Research Grants Council under Grant HKUST6218/01E, and in part by the National Science Foundation CAREER Grant ECS00-93 851. This paper was recommended by Associate Editor P. Arena. T. Y. W. Choi and B. E. Shi are with the Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong (e-mail: [email protected]; [email protected]). K. A. Boahen is with the Department of Bioengineering, University of Pennsylvania, Philadelphia, PA 19104-6392 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TCSI.2003.822551 consists of a linear spatio-temporal filtering stage and three nonlinear mechanisms: half-wave rectification, expansive exponentiation, and contrast normalization [1]–[3]. Linear spatio-temporal filtering determines the neural selectivity along different stimulus dimensions. Half-wave rectification conserves metabolic energy by mapping mean levels to a low quiescent spike rate. Expansive exponentiation sharpens selectivity. Contrast normalization enables neurons to retain stimulus selectivity over a wide input contrast range. This paper describes a VLSI chip that implements two components of this model: linear orientation selective spatial filtering followed by half-wave rectification. Orientation selectivity is a predominant characteristic of neurons in the primary visual cortex [4]. The implementation of orientation selective neurons is an appropriate starting point in building a silicon model of the selectivity of neurons in the visual cortex, since orientation selective neurons are used in neural models of selectivity along other stimulus dimensions such as direction of motion [5], [6], and binocular disparity [7]. This chip implements neurons with spatial receptive field (RF) profiles that are similar to a Gabor function. In the functional model, the RF profile is the filter’s impulse response reflected around the and axes. Gabor functions fit the RF profiles measured from orientation selective cortical neurons well [8]–[10]. A Gabor function is a sinusoidal grating with and orientation modulated by a Gaussian frequency envelope (1) represents the original coordinate function transwhere and rotated by . The parameter determines lated by the spatial phase of the sinusoid with respect to the center of the Gaussian. The RF profiles of the neurons on this chip are Gabor-type, since their modulating function is not Gaussian. The chip described here processes a 32 by 64 pixel input image with two orientation selective filters using a continuous time analog processing network. The filters have even and odd symmetric impulse response that are said to be in phase . Physiological quadrature, since they differ in phase by measurements in cortex indicate that neighboring neurons often [11], with the distribution of phases differ in phase by clustering around even and odd symmetric RF profiles [12]. Energy models of motion and binocular disparity selectivity rely heavily on the existence of neurons with phase quadrature RF profiles [5], [7]. The differential ON–OFF channels used to represent all input, output and internal signals differentiates this chip from previous electronic implementations of orientation selective filtering 1057-7122/04$20.00 © 2004 IEEE CHOI et al.: ON–OFF ORIENTATION SELECTIVE AER IMAGE TRANSCEIVER CHIP networks, which mostly used single-ended representations [13]–[16]. Serrano–Gotarredona et al. propose an architecture that uses an internal differential representation to accumulate signals, but the input and output are single ended [17]. Liu et al. constructed orientation selective neurons from the output of a silicon retina that included both ON and OFF channels, but only used the OFF channels [18]. Although the ON–OFF circuit architecture requires as many transistors as a functionally equivalent single-ended design, it has several compelling advantages as we describe in the latter part of the paper. Section II describes the four-layer cellular neural network (CNN) architecture used to establish the orientation selectivity of the neurons on the chip. Section III derives the spatial transfer functions of the orientation selective filters and proves stability. Section IV describes the chip architecture, with a high level description of the address event representation (AER) communication circuits used for input and output. Section V describes in detail the pixel level processing circuits, including the analog circuits for the orientation selective filtering network, as well as the circuits converting between the digital asynchronous spike train representation used at the periphery and the continuous time current-mode representation used internally. Section VI reports experimental measurements from the chip, and compares the design here with a previous two-layer design. Section VII concludes with a summary and discussion of future directions. Preliminary reports of this work have appeared in [19]–[21]. II. NETWORK ARCHITECTURE Biological systems use ON and OFF channels to encode signal variations around a background level efficiently. While single neurons can encode positive and negative signals as variations around a quiescent firing rate, this representation is inefficient as each spike consumes metabolic resources. With separate ON and OFF channels, background signals correspond to low quiescent spike rates on both channels. Positive signals are encoded by increases in the ON-channel spike rate, negative signals by increases in the OFF-channel spike rate. For example, ON-center and OFF-center retinal ganglion cells respond to positive and negative contrast of the center with respect to the surround. Diffuse illumination applied to both center and surround elicits little response from either cell. ON–OFF signal representations prevail in computational models of visual cortical neurons. Most cortical neurons exhibit low spontaneous spike rates. Hubel and Weisel proposed that the oriented excitatory and inhibitory regions of the RF profiles arise from linear summation of feedforward input from corresponding ON-center and OFF-center cells in the lateral geniculate nucleus [4]. This basic model has been preserved in most subsequent work, which has extended it to include push–pull inhibition to account for contrast invariance or cortical feedback to sharpen orientation selectivity. Ferster and Miller give a review of recent work in [22]. Our network also adopts an ON–OFF signal representation. Each pixel in the image is associated with four neurons. Two neurons carry the positive (ON) and negative (OFF) half-wave rectified outputs of the even symmetric filter. The other two carry the output of the odd symmetric filter. We refer to the neurons as , EVEN OFF , ODD ON , and ODD OFF . EVEN ON 343 Our network establishes orientation selectivity through local recurrent interconnections between neurons, which facilitate implementation in VLSI while enabling the resulting RF profiles to extend over many pixels. We model the network as a four-layer CNN whose layers are indexed . Each layer consists of an by by array of cells, each with real valued input , state , and output , where indexes the array when referring to an entire layer. location. We drop the We assume that the input to all layers is strictly positive. The outputs of the ON and OFF layers are equal to the difference between their states positive and negative half wave rectified (2) where and . The state evolves according to the differential equation (3) where the summation is over all layers. The denotes the elementwise product of two arrays. The denotes correlation, e.g., . The coefficient ma, and are the state feedback, output feedback trices and feedforward cloning templates. This network differs from the classical multilayer CNN [23] in three ways. First, it adds the elementwise product with the state. Second, it contains an additional state feedback template , through which each cell’s state influences its neighboring cells’ states. Third, the output of each cell is a nonlinear function of the states in cells of two layers, which introduces additional coupling between layers. The nonzero cloning templates for orientation selective filtering are where In each template matrix, the central element indicates the (0, 0) term. The parameters are nonnegative reals and determine the 344 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 2, FEBRUARY 2004 common with the original network in (3). First, the two networks share a unique common equilibrium point where the state of all cells is positive. Second, any additional equilibrium point in the original network is unstable. Finally, we conjecture that stability of the common equilibrium point in the simplified network implies its stability in the original network. We derive the transfer function from a constant input image to the common equilibrium by expressing (4) in terms of the sum and difference of the ON and OFF input and state variables, e.g., and . We find that both the sum and difference components evolve according to linear differential equations, but that the sum components are driven by a nonlinear function of the difference components. The difference components evolve independently, and are related to the input difference components by the transfer function Fig. 1. Cell interconnections in a 1-D version of the four-layer network described by (2). Interconnections terminating with filled circles are inhibitory. Interconnections terminating with open triangles are excitatory. Interconnections by resistors are diffusive. The mutually inhibitory interconnections between positive and negative layers of the same type are introduced by the output nonlinearity. (5) and are the discrete Fourier transforms of and and are spatial-frequency variables, and where strength of the interconnections between cells. Substituting the template expressions into (3) Fig. 1 shows the interconnections between cells in a one-dimenis a discrete apsional (1-D) array. Correlation with the reflects proximation to a Laplacian operator. The template connections to a cell from cells in another layer to the left and reflects connections from the right and below. The template above. III. NETWORK ANALYSIS This section derives the spatial transfer function of the network that relates a constant input image with the steady state output and examines the stability of the network. The first part summarizes the main results. The subsections contain the detailed proofs, which can be skipped without loss of continuity. Because the analysis of this network is complicated by the element-wise product, we consider a simplified network without the product (4) Fig. 2(a) shows a block diagram of the interactions between the layers. This network is easier to analyze, but has much in This transfer function reaches its maximum value of , corresponding to orientation and spatial frequency . Since the transfer function drops by approximately one half and , we at and as the 6-dB half bandwidth in the refer to and directions. Although there are five parameters that determine the filter shape, only four can be specified independently by the paramis fixed by the choice of , eters. The filter gain according to and at (6) Because the parameters are positive, both and are nonnegative, corresponding to orientations between 0 and . Orientations outside this range can be obtained by reflecting the templates around the horizontal and/or vertical axes. Alternatively, we can flip the image from left to right before input to the network. CHOI et al.: ON–OFF ORIENTATION SELECTIVE AER IMAGE TRANSCEIVER CHIP 345 Fig. 2. Block diagram (a) showing the interconnections between and among layers due to the templates and the output nonlinearity, (b) of the interaction between the difference and sum components, and (c) of the difference and sum components when the output nonlinearity is removed. To relate this complex valued filter to real valued Gabor filter described previously, observe that where and are the and . The modulating transforms of (1) with can be approximated by a Laplacian function function in 1-D and by a Bessel function in two-dimensional (2-D) [24]. Because the network connections are spatially invariant, the Fourier modes evolve independently [25]. We establish the stability of the sum and difference components by showing that the eigenvalues of the feedback matrices lie in the left half plane for all parameters corresponding to valid filter parameters. For unstable parameters, the network exhibits spatially oriented Turing patterns [26]. Our implementation uses a transistor analog of a network of conductances, which Poggio and Koch suggested for solving problems in computational vision [27], [28]. The stability of conductance networks can be established by viewing the dynamics as gradient descent on a suitably defined cost function [29]–[33], and a similar approach can be taken for this network [24]. Incorporating the half wave rectifying nonlinearity into the feedback is critical in ensuring network stability. A. Original Versus Simplified Network Clearly, any equilibrium point of (4) is also an equilibrium point of (3). The existence of the spatial transfer function in- dicates that the equilibrium point of (4) is unique. The state of all cells is positive at this equilibrium point because the feedand correback loops containing the blocks spond to a lossy diffusion process driven by a strictly positive assumes its minimum at pixel input. Formally, assume that . The first equation in (4) evaluated at equilibrium gives where (7) The term since is a minimum. The next two terms since the outputs and the parameters are nonnegative. The last term by assumption. In the electronic implementation, this input is represented by the current through a transistor in saturation, which will always . Similar be positive due to leakage. Thus, arguments hold for the other layers. Any additional equilibrium point in the original network is unstable. First note that the state of any neuron at equilibrium must be nonnegative. If the minimum state is nonzero, it must satisfy (7), which implies that it must be positive. Any additional for some . equilibrium point must have deLinearizing around this equilibrium and letting note a small perturbation around it, we have that where by the arguments above. It seems reasonable that stability of the common equilibrium point in the simplified network should imply stability for the 346 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 2, FEBRUARY 2004 original network, since the element-wise product operation does not change the slope of the derivative at the equilibrium point, as the state is strictly positive at equilibrium. In addition, our numerical simulations and experimental measurements from the chip have not revealed any unexpected instability. B. Spatial Transfer Function Expressing (4) in terms of the sums and differences of the ON and OFF variables (8) where and denotes is a the element-wise absolute value of . Correlation by temdiscrete approximation to a directional derivative. The plate is a combination of even impulse pairs in the horizontal and vertical directions. Fig. 2(b) illustrates that the both the sum and difference components evolve linearly. The difference components are unaffected by the sum components, but the sum components are driven by the absolute value of the difference components. In [24], we studied the dynamics of the differential components. For completeness, we recapitulate that analysis here. The steady state response and the stability can be analyzed easily in the spatial-frequency domain. Assume a doubly infinite array , and to be the and define 2-D discrete Fourier transforms of the input, state, and output, . Taking the e.g., discrete Fourier transform of the first two equations in (8), corand correspond to multiplication by relation by where Fig. 3. Unstable regions in the network parameter space. Horizontal hatching parameter space where the where indicates the region in the the difference components are unstable. Vertical hatching indicates the region where the sum components of the network that does not include the ON–OFF nonlinearity are unstable. Cross hatching indicates regions where both components are unstable. 0 The sum components evolve according to a discrete approximation to a lossy diffusion equation driven by the sum of the full-wave rectified difference component and the input sum component. In the spatial-frequency domain where . and denote the discrete Fourier transforms of the absolute values of the even and odd difference components. Stability is ensured for all . since Incorporating the half wave rectifying nonlinearity into the dynamics is essential to ensure network stability. To see this, suppose that we remove the nonlinearity and instead let in (4) and make similar substitutions for , and . We find that . Thus (9) We suppress the dependence on and to avoid clutter. If and , then we define . Letting and defining to be the spatial transfer function at temporal steady state, we obtain (5). C. Stability Stability of the difference components is guaranteed for any set of parameters corresponding to valid filter parameters. , which implies Equation (6) implies for all . However, the network is unstable . It is imfor combinations of parameters that imply , because the parameters are nonpossible for negative. Fig. 3 shows that the network is unstable if the cross coupling between the even and odd layers, which is determined parameters, is large enough. by the Fig. 2(c) shows that the evolution of the difference components is identical to that in (8), but the sum component now evolves independently of the difference component. The sum components are unstable for some parameters that correspond to valid filter parameters. In the spatial-frequency domain For stability of the eigenvalues of the feedback matrix, , must be negative for all . Since the parameters are nonnegative, the largest eigenvalue occurs for . This implies that for stability, we must have . Fig. 3 shows that the stable parameter region is only a subset of the stable parameter region for the difference components. CHOI et al.: ON–OFF ORIENTATION SELECTIVE AER IMAGE TRANSCEIVER CHIP Fig. 4. (a) Three-chip system where the output of a silicon retina is fanned out to two orientation selective chips (Chip A and Chip B) tuned to different orientations. (b) Addressing scheme at the merge output of chip B. IV. CHIP ARCHITECTURE The visual cortex processes each region of the visual field with neurons selective to many different orientations, which are grouped into a hypercolumn. To replicate this organization, we require a set of chips, each processing the same image but tuned to different orientations. Fig. 4(a) depicts a three-chip network. To enable the construction of this multichip system, each chip is a transceiver, containing both a receiver to receive input images and transmitter to transmit output images. Each chip also includes asynchronous routing circuits to facilitate signal fan out and fan in, which will be described in detail in a forthcoming publication. Briefly, the split replicates its input, sending one copy into the processing array via a receiver and sending the other off chip. Fanning the output of a silicon retina (e.g., [34]) out to a set of chips by cascading the split output of one with the split input of the next, we can build an array of orientation selective hypercolumns. The merge circuit combines its input with the array output from the transmitter and sends the combined stream off chip. The encoding we use enables us to distinguish the different images at the merge output, but we can also use the merge to combine images additively, implementing signal fan in. Combining input signals from a retina with output signals from other chips, we can implement intracortical interconnections, as well as feedback interconnections from later processing stages. The vast majority of inputs to cortical neurons come from other nearby cortical neurons, i.e., neurons tuned to similar orientations [35], [36]. Feedback from extrastriate areas appears to modulate the responses of neurons in V1 [37]. Input and output images are rate encoded as arrays of spike trains, which are communicated using the AER protocol [38]. The AER protocol communicates continuous time spike activity from an array of silicon neurons in one chip to another chip over an asynchronous digital bus. It is more efficient than scanning when the spike activity within the array is sparse, as we expect here since only a few image locations will contain edges near the orientation selected by each chip. 347 The transmitter signals a spike occurrence by placing the location (address) of the spiking neuron onto the bus. The receiver takes the address that appears on the bus and feeds a spike to the corresponding neuron in its array. The protocol is asynchronous, with the time that the address appears on the bus encoding the spike time directly. Collisions between simultaneous spikes from two neurons are handled by arbitration. Addresses are placed onto the bus in “bursts,” where each burst encodes all of the simultaneous spikes from neurons within a given row and a given chip. We use a word serial format, where each burst is a sequence of addresses. As shown in Fig. 4(b), the transmitter signals the start of a burst by placing an address identifying the source chip onto the address lines (Addr) and taking the request signal ReqY high. Subsequent addresses are signalled by taking _ReqX low. The second address identifies the row. Each of the remaining addresses identifies one of the columns containing a neuron that spiked. The transmitter signals the end of the burst by taking ReqY low. The receiver acknowledges receipt of each address by a transition on the Ack line. We use absolute addressing to identify rows and columns within a chip, but relative addressing to identify each chip. Each chip signals its own activity with bursts whose chip addresses are set to zero. Every time a chip relays a burst from its split or a merge input, it increments the chip address. For example, a chip address of 1 at the merge output of Chip B in Fig. 4(a) indicates the spikes in the burst come from Chip A. For each pixel, the four neurons are addressed using the least significant bit (LSB) of the row and column addresses. EVEN and ODD neurons are indexed by row addresses with the least significant bit (LSB) at 0 and 1. ON and OFF neurons are indexed by column addresses with LSB 0 and 1. Thus, the network for by pixel image actually contains a by processing an array of neurons arranged into 2 2 blocks. V. PIXEL PROCESSING CIRCUITS Each pixel in the array contains the circuits necessary for processing four neurons. This includes four leaky integrators that convert input spike trains to continuous currents, current-mode analog processing circuits that implement the filtering/rectification network and four spiking neuron circuits that convert the current outputs of the network to spike trains. We represent each state variable array as the drain currents in an array of nMOS transistors with fixed gate voltage , as shown in Fig. 5(a). The sources are connected through capacitors to the ground. We assume all transistors operate in weak inversion and are saturated, so the drain currents representing layer are given by the (10) and are the gate and source voltages referwhere is the thermal voltage, is a process enced to the bulk node, is a process deand geometry dependent current and pendent constant. Representative parameters for the TSMC0.25 pA and . Differentiating with um process are , respect to time, is the current entering the capacitor . where 348 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 2, FEBRUARY 2004 Substituting (10), we get where . The total current flowing out of the capacitor at each node due to the five tranwhere sistors connected in the diffuser network implements . B. Cross-Coupling Circuits Layers are coupled through the cell outputs. The mapping from state to output in (2) can be specified by the implicit equations The ON–OFF circuit [34] in Fig. 5(b) implements a similar mapping (11) (12) is a small current set by . where Kirchoff’s current law applied at the sources of transistors and gives . Rearranging the left and right sides gives (11). The translinear principle applied to the loop gives where (13) Combining these equations with Fig. 5. (a) 2-D diffuser network. (b) Circuit mapping the state variables, represented by currents, to outputs, represented by currents. All transistors operate in saturation. (c) Circuit implementing the spatial shift operator with gain. To implement the network, we equate the current flowing out of the capacitor with the sum on the right hand side of the elementwise product operator in (3). Each sum can be grouped as three , and . For the first equation, currents describing the intralayer state feedback, describing the cross coupling between layers describing the input. In the fabricated design, we and do not implement the capacitor explicitly. However, the source nodes will invariably have some parasitic capacitances associated with them. The remainder of this section describes the circuits generating these three currents, as well as the spiking neuron circuit that converts each output current into a spike train. A. Layer Self-Feedback We use the diffuser/pseudoresistor network [39], [40] in . In total, the circuit contains four Fig. 5(a) to implement diffuser networks, one for each layer. In weak inversion, the drain current flowing through the horinto node izontal nMOS transistor from node is which yields (12). The upper bound in (12) is equal to the zero input quiescent output current in both and . Each spatial shift operator is implemented by a tilted current mirror shown in Fig. 5(c). The difference in the source voltage controls the gain: . The shift is implemented by connecting the drain voltage of the output transistor to the appropriate node of the diffuser network. The entire cross coupling between the even and odd arrays, requires one diode connected transistor with source voltage and four mirror transistors, two with source voltage and two with . C. Current-Mode Integrator Four current-mode integrators at each pixel convert the incoming spike trains to input currents . Fig. 6(a) shows the schematic of one integrator [41]. The inputs _RSelX and _RSelY are shared by one row or column of cells. The receiver takes both inputs low when an address event with the corresponding row and column address is received. This injects a charge packet and , and into the diode-capacitor integrator formed by pulls the acknowledge signal _Ack low, signalling the receiver controls that the spike has been delivered. The bias voltage the magnitude of the current pulse and the communication cycle-time determines its duration. The difference between the , controls the source voltages of the current mirror, gain and the time constant of the integrator. CHOI et al.: ON–OFF ORIENTATION SELECTIVE AER IMAGE TRANSCEIVER CHIP 349 TABLE I CHIP CHARACTERISTICS Fig. 7. Layout of the top half of one metapixel. Fig. 6. Schematic of (a) the leaky integrator and (b) the firing neuron. D. Spiking Neuron Four spiking neuron circuits at each pixel convert the four output currents, which are obtained by mirroring the diode connected transistor of the spatial shift circuit, into spike trains. We use the design shown in Fig. 6(b), which is similar to that in is initially high, and decreases as [42]. The voltage discharges . Once reaches a threshold value, the inverter switches and the neuron fires, bringing the row request line, _TReqY, low to signal a spike. Once a row has been selected, the AER transmitter takes TSelX high. All of the neurons in the selected row that have generated a spike then reset and pull the column request lines, _TReqX low. Once a row has been selected, no new neurons in that row can spike. Positive feedback through the current mirror minimizes the inverter switching time, saving power. The bias voltage controls the amount of feedback. If it is too high, current feedback is small and power consumption increases. If it is too low, the background firing rate is high and obscures the signal. The _RESET signal is a global signal which resets all neurons. VI. EXPERIMENTAL RESULTS We designed and fabricated an array of 32 64 pixels in the TSMC0.25 um mixed signal/RF process available through MOSIS. This process contains five metal layers and one poly layer, uses nonepitaxial wafers, and is intended for 2.5-V applications. Chip characteristics are summarized in Table I. We generated the array layout by tiling metapixels, which contain the circuits required for two pixels stacked vertically. Fig. 7 shows the layout of the top half of the metapixel. The bottom half is mirrored vertically so that the analog filter circuits are adjacent. We laid out the metapixels to minimize cross talk from the digital communication circuits (the spiking neurons and the current-mode integrators) to the analog spatial filtering circuits. The analog filtering circuits lie in the middle of the metapixel, with the digital circuits on the top and bottom. Within the digital parts, the integrators lie next to the analog circuits. The spiking neurons, which contain the most switching transistors, lie at the top and bottom, farthest from the analog processing circuits. Guard rings, which are inserted between the Gabor cells, the integrators and the spiking neurons, provide low impedance paths to collect the minority carriers injected by digital transistors, which would otherwise lead to variations in the bulk voltage when they reach a well or substrate. The digital and analog circuits use separate power and ground lines. Bias lines connected to source voltages controlling current mirror gains run wide on the top metal layer to reduce impedance. A. Steady-State Response With the Gabor-type filtering circuits turned off by setting , the spiking neuron circuit maintains a background spike rate because the gate node of transistors and in Fig. 6(b) is not fully discharged to ground during the reset of and the residual current through discharges the gate of . Increasing decreases the quiescent spike rate by reducing this residual current. However, it also increases power consumption per spike by decreasing the gain of the current feedback. In a tradeoff between these two effects, we set mV to minimize total power consumption. At this point, the average spike rate per neuron is 5.8 Hz with a standard deviation of 6.6 Hz. We computed these statistics using a total of 392 256 spikes collected from the merge output during an 8.2-s time window. 350 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 2, FEBRUARY 2004 Fig. 8. Measured average spike rate (relative to the baseline firing rate) for the array tuned to vertical orientations with a spike train applied at pixel (17, 32). Crosshairs indicate the pixel location of the impulse. (a) The e+ output. White and black correspond to relative spike rates of 15 and 188 Hz. Spike rates outside this range are clipped. (b) The o+ output. White/black = 15=117 Hz. (c) The e output. White/black = output. 15=66 Hz. (d) The o White/black = 15=132 Hz. 0 0 0 0 0 0 When the Gabor-type filter circuits turned on but with no input applied, the background spike rate and its variance increase due to the quiescent output current of the ON–OFF circuit. For the array tuned to vertical orientations, the average quiescent spike rate computed across the array was 15.1 Hz, with a standard deviation of 9.0 Hz. We computed these statistics tuning using a total of 392 714 spikes collected from the merge output during a 3.2 second time window. A similar increase is observed for other filter parameters. To test the spatial impulse response of the array, we excited the “ ” input of pixel (17, 32) with a 50-kHz spike train from a pattern generator. All other inputs were silent. A logic analyzer connected to the merge output collected the output spike train, which is digitally processed for analysis. Fig. 8 shows the four outputs of the array. We computed the spike statistics using 390680 spikes collected over a 3.0 second window. To show the tunability of the array, Fig. 9 shows the differences between the ON and OFF outputs for a spatial impulse input, when the array is tuned to vertical, diagonal, and horizontal orientations and different spatial scales. For vertical tuning, filter parameters which fit the response predicted by (1) in the least squares sense were radians/pixel radians/pixel. The signal-toand noise ratio, defined as the energy in the ideal filter output with the best fit parameters divided by the energy in the difference between the actual and ideal filter outputs was 11.2 dB. For the diagonal orientation tuning, best fit parameters were , corresponding to a spatial frequency and orientation , and . The signal-to-noise ratio was 11.8 dB. For the horizontal orientation tuning, best fit parameters were and . The signal-to-noise ratio was 8.8 dB. B. Temporal Response We measured the temporal response of the arrays by applying a step change in the spike rate applied to the input of pixel 6 Fig. 9. (a) The ed output for vertical tuning. White/black = 188 Hz. (b) The od output for vertical tuning. White/black = 132 Hz. (c) The ed output for horizontal tuning. White/black = 121 Hz. (d) The od output for horizontal tuning White/black = 116 Hz. (e) The ed output for diagonal tuning. White/black = 721 Hz. (f) The od output for diagonal tuning. White/black = 759 Hz. The larger spike rate for diagonal tunings is primarily due to a difference in the tuning of the input gain (V ) to the input integrator. 6 6 6 6 6 (17, 32) from 0 Hz to 25 kHz (stimulus onset) and vice versa (stimulus offset). Using a fast input spike rate better indicates the response of the current-mode processing array, since it minimizes temporal ripple in the output current of the current-mode integrator. More than 10 input spikes are integrated per output spike, so the output spike response is not influenced significantly by the temporal characteristics of the input spike train. These experiments revealed a temporal asymmetry between the response to stimulus onset and offset. Fig. 10 shoes the output at pixel (17, 32). The response to stimulus onset is essentially instantaneous. The steady state response is a spike rate of 1.6 kHz. This corresponds to an average interspike interval of 0.625 ms, which is the approximately the delay before the first spike. On the other hand, at stimulus offset the response took about 1 ms to die away. Fig. 11 shows a similar asymmetry in neuron at pixel (17, 33). In this case the the response of the steady state firing rate to the stimulus is 360 Hz, corresponding to an interspike interval of 2.8 ms. The temporal asymmetry is primarily due to the nonlinearity introduced by the elementwise multiplication in (3), which slows down the network at low input levels, but speeds it up at high input levels. The dynamics of the current-mode integrated has similar characteristics [41], so part of the asymmetry can be attributed to this stage. Despite the asymmetry, the settling times for onset and offset are both on the order a few milliseconds, which implies that for intended applications, the temporal dynamics of the array are negligible. For reference, consider that each frame in a video sequence occupies 30–40 ms or that the temporal bandwidth of cortical neurons is on the order of 10s of hertz. CHOI et al.: ON–OFF ORIENTATION SELECTIVE AER IMAGE TRANSCEIVER CHIP 351 Fig. 12. The solid line plots total power consumption versus the average output activity per neuron. The dotted line is a linear least squares fit to the data, which has slope 0.16 mW/Hz and vertical offset 0.77 mW. The quiescent power consumption with no input, but an average output activity of 14 Hz, is about 3 mW. The buffer circuits that drive the pads account for around 75% of the total power consumption. The digital spike communication circuits account for around 24%. The analog circuits consume less than 1%. D. Comparison With Single-Ended Architecture + Fig. 10. (a) Temporal response at the e output at (17, 32) to stimulus onset at time zero. (b) Temporal response at the e output at (17, 32) to stimulus offset at time zero. The upper figures show spike rasters from 50 trials. The bottom figures show peri-stimulus time histograms computed over 100 trials with a 0.25-ms bin size. + + An implementation of the same filter kernels using a single-ended representation, described in [43], requires half as many transistors for implementing the analog filtering network. If connected to an AER interface, each pixel would require half as many integrators and spiking neurons. However, the ON–OFF implementation described here has several advantages which outweigh the additional hardware cost. First, the filtering network has reduced quiescent (zero input) power dissipation. The single-ended implementation encodes positive and negative signals as variations around a quiescent which dissipates power, even if the output of bias current the filter is zero. In the ON–OFF implementation, the analogous bias current is the quiescent output currents in the ON–OFF circuit. To compare the power dissipation across a range of operating conditions, we assume that the maximum absolute signal in the two cases is the same. Since the bias current current limits the maximum negative signal excursion, the power dissiwhere pation of the single-ended implementation is is a constant of proportionality depending upon the supply voltage and the array tuning. An upper bound on the quiescent power dissipation of the ON–OFF implementation is - Fig. 11. Peri-stimulus time histograms of the response at the o output at (17, 33). Histograms were computed over 100 trials with a 0.25-ms bin size. C. Power Dissipation The power consumption is dominated by the activity of the communication circuits, rather than the processing circuits. We measured the power dissipation of the chip while stimulating pixel (16, 32) with spike trains ranging in frequency from 0 Hz to 100 kHz and plot the results in Fig. 12 as a function of average output activity per neuron, which is much higher than the input activity. The power increases linearly with the output activity. where the factor 2 arises because the ON–OFF implementation to ground. We estimate has twice as many paths from this by assuming that the output currents of the ON–OFF circuit are and computing the total current flowing from , ignoring the reduction in the output current due to the feedback which supplies current to the input. Since and should be in saturation, the source voltages of and must be a several multiples of below . Thus, the maximum output current of the ON–OFF circuit is . Combining this with 352 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 2, FEBRUARY 2004 (13) gives implies , which Typical parameters for the TSMC0.25 um process are 3.1 pA and . Choosing nA and , . we obtain However, there is a tradeoff between latency and power. In the ON–OFF implementation, weak signals are processed slower than fast signals because the elementwise product with the state slows the dynamics of the array when the current levels are smaller. The signal gain is independent of signal strength. Since this chip takes input from a contrast sensitive silicon retina, weak signals correspond to areas with little contrast. The slower response improves signal-to-noise ratio for weak signals by increasing temporal smoothing. Biological systems exploit the same strategy. In the retina, rods, which are sensitive for dim light, respond slower than cones, which are sensitive to bright light [44]. The response of cat retinal ganglion cells speeds up for higher contrast signals [45]. Second, the ON–OFF network exhibits reduced fixed pattern noise in the output. The primary source of fixed pattern noise in the single-ended architecture is mismatch in the transistors supplying the bias current, which adds spatial noise to the filter input. By reducing the quiescent bias current, the ON–OFF network reduces the fixed pattern noise. In [43], the standard deviation of the fixed pattern noise was 26–38% of the bias current. In this network, the standard deviation of the quiescent spike rate in Fig. 8 with no input (9.1 Hz) was 1.2% of the peak spike rate (752 Hz). Third, the ON–OFF signal representation includes half-wave rectification of the filter output, while the single-ended architecture does not. Although this could be added as a separate circuit to the single-ended architecture, its design is complicated by the large fixed pattern noise in the output, which means that the reference point around which to rectify varies from pixel to pixel. Fourth, the ON–OFF output representation conserves bandwidth on the AER bus. The low quiescent output currents map to near zero quiescent spike rates at the output of the spiking neuron circuit. For Fig. 8, the average quiescent spike rate (15.1 Hz) was 2.0% of the peak spike rate (752 Hz). If the output current of the single-ended architecture is fed into a spiking neuron circuit, quiescent spike rate must be 50% of the maximum spike rate, assuming that the maximum positive and negative signal excursions are identical. Given that power dissipation is dominated by the communication circuits, this would significantly increase power consumption as well. Finally, the ON–OFF circuit design is more symmetric. First, and in the ON–OFF architecture are all current gains positive, and are implemented using nMOS current mirrors. The single-ended architecture requires positive and negative current gains. Negative gains require an extra mirroring step through a pair of pMOS transistors, which increases mismatch. Second, the positive and negative signal excursions have the same limit . For the in the ON–OFF architecture, both being limited by single-ended architecture, the maximum negative signal is limited by the bias current while the maximum positive signal is limited by the largest current before the transistors leave weak inversion. VII. CONCLUSION Inspired by the functionality of visual cortical neurons, we have designed an orientation selective image filtering chip that uses an ON–OFF signal representation. The resulting circuit architecture has compelling engineering advantages over previous single-ended feedback circuit architectures for orientation selective filtering. Our current work seeks to incorporate this chip into a multichip functional model of the primary visual cortex. Each chip contains an array of neurons, all selective for the same orientation but different image locations. Sets of chips implement hypercolumns of neurons selective for different orientations. Because both input and output are AER encoded spike trains, this network will be able to include feedback interactions, such as competition between orientations to enhance orientation selectivity [46]. The orientation selective neurons may also be used in building neurons selective along other stimulus dimensions, such as binocular disparity and direction of motion. Because the network includes a rectifying nonlinearity, it may also be useful in modeling responses to second-order stimuli using a “filter-rectify-filter” model [47]. ACKNOWLEDGMENT The authors would like to thank P. Merolla and J. Arthur for their work in designing the transmitter and receiver circuits for the AER interface, and K. Hynna and B. Taba for their assistance in generating the chip layout. REFERENCES [1] D. G. Albrecht and W. S. Geisler, “Motion selectivity and the contrast response function of simple cells in the visual cortex,” Vis. Neurosci., vol. 7, pp. 531–546, 1991. [2] D. J. Heeger, “Normalization of cell responses in cat striate cortex,” Vis. Neurosci., vol. 9, pp. 181–197, 1992. , “Half-squaring in responses of cat striate cells,” Vis. Neurosci., [3] vol. 9, pp. 427–443, 1992. [4] D. H. Hubel and T. N. Wiesel, “Receptive fields, binocular interaction, and functional architecture in the cat’s visual cortex,” J. Physiol., vol. 160, pp. 106–154, 1962. [5] E. H. Adelson and J. R. Bergen, “Spatiotemporal energy models for the perception of motion,” J. Opt. Soc. Amer. A, vol. 2, pp. 284–299, 1985. [6] A. B. Watson and J. A. J. Ahumada, “Model of human visual-motion sensing,” J. Opt. Soc. Amer. A, vol. 2, pp. 322–342, 1985. [7] I. Ohzawa, G. C. DeAngelis, and R. D. Freeman, “Stereoscopic depth discrimination in the visual cortex: Neurons ideally suited as disparity detectors,” Science, vol. 249, pp. 1037–1041, 1990. [8] J. G. Daugman, “Two-dimensional spectral analysis of cortical receptive field profiles,” Vis. Res., vol. 20, pp. 847–856, 1980. [9] S. Marceljia, “Mathematical description of the responses of simple cortical cells,” J. Opt. Soc. Amer., vol. 70, pp. 1297–1300, 1980. [10] J. P. Jones and L. A. Palmer, “An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex,” J. Neurosci., vol. 58, no. 6, pp. 1233–1258, 1987. [11] D. A. Pollen and S. Ronner, “Visual cortical neurons as localized spatial-frequency filters,” IEEE Trans. Syst., Man, Cybern., vol. 13, pp. 907–916, 1973. [12] D. L. Ringach, “Spatial structure and symmetry of simple-cell receptive fields in macaque primary visual cortex,” J. Neurophysiol., vol. 88, pp. 455–463, 2002. CHOI et al.: ON–OFF ORIENTATION SELECTIVE AER IMAGE TRANSCEIVER CHIP [13] L. Raffo, “Resistive network implementing maps of Gabor functions of any phase,” Electron. Lett., vol. 31, no. 22, pp. 1913–1914, 1995. [14] P. Venier, A. Mortara, X. Arreguit, and E. A. Vittoz, “An integrated cortical layer for orientation enhancement,” IEEE J. Solid-State Circuits, vol. 32, pp. 177–186, Feb. 1997. [15] G. Cauwenberghs and J. Waskiewicz, “Focal-plane analog VLSI cellular implementation of the boundary contour system,” IEEE Trans. Circuits Syst. I, vol. 46, pp. 327–334, Feb. 1999. [16] R. Etienne-Cummings, Z. K. Kalayjian, and D. Cai, “A programmable focal-plane MIMD image processing chip,” IEEE Journal Solid-State Circuits, vol. 36, pp. 64–73, Jan. 2001. [17] T. Serrano-Gotarredona, A. G. Andreou, and B. Linares-Barranco, “AER image filtering architecture for vision-processing systems,” IEEE Trans. Circuits Syst. I, vol. 46, pp. 1064–1071, Sept. 1999. [18] S. C. Liu, J. Kramer, G. Indiveri, T. Delbruck, T. Burg, and R. Douglas, “Orientation selective a VLSI spiking neurons,” Neural Netw., vol. 14, pp. 629–643, 2001. [19] B. E. Shi, T. Y. W. Choi, and K. Boahen, “ON–OFF differential current-mode circuits for Gabor-type spatial filtering,” in Proc. IEEE Int. Symp. Circuits and Systems, vol. II, Phoenix, AZ, 2002, pp. 724–727. [20] T. Y. W. Choi, B. E. Shi, and K. Boahen, “An orientation selective 2-D AER transceiver,” in Proc. IEEE Int. Symp. Circuits and Systems, vol. IV, Bangkok, Thailand, 2003, pp. 800–803. [21] B. E. Shi, “Cortically inspired visual processing with a four-layer cellular neural network,” in Proc. Int. Joint Conf. Neural Networks, Portland, OR, 2003. [22] D. Ferster and K. D. Miller, “Neural mechanisms of orientation selectivity in the visual cortex,” Ann. Rev. Neurosci., vol. 23, pp. 441–471, 2000. [23] L. O. Chua and L. Yang, “Cellular neural networks: Theory,” IEEE Trans. Circuits Syst., vol. 35, pp. 1257–1272, 1988. [24] B. E. Shi, “Focal plane implementation of 2D steerable and scalable cortical filters,” J. VLSI Signal Processing, vol. 23, no. 2/3, pp. 319–334, 1999. [25] D. G. Kelley, “Stability in contractive nonlinear neural networks,” IEEE Trans. Biomed. Eng., vol. 37, pp. 231–242, Mar. 1990. [26] B. Shi, “Oriented spatial pattern formation in a four-layer CMOS cellular neural network,” Int. J. Bifurcation Chaos, to be published. [27] T. Poggio and C. Koch, “Ill-posed problems in early vision: From computational theory to analogue networks,” Proc. R. Soc. Lond. B, vol. 226, pp. 303–323, 1985. [28] T. Poggio, V. Torre, and C. Koch, “Computational vision and regularization theory,” Nature, vol. 317, pp. 314–319, 1985. [29] J. C. Maxwell, A Treatise on Electricity and Magnetism, Oxford, U.K.: Clarendon, 1873, vol. 1. [30] W. Millar, “Some general theorems for nonlinear systems possessing resistance,” Philosoph. Mag., ser. 7, vol. 42, pp. 1150–1160, 1951. [31] D. L. Standley and J. L. Wyatt Jr., “Stability criterion for lateral inhibition and related networks that is robust in the presence of integrated circuit parasitics,” IEEE Trans. Circuits Syst., vol. 36, pp. 675–681, May 1989. [32] J. L. Wyatt, “Little-known properties of resistive grids that are useful in analog vision chip designs,” in Vision Chips: Implementing Vision Algorithms with Analog VLSI Circuits, C. Koch and H. Li, Eds. Los Alamitos, CA: IEEE Computer Society Press, 1995, pp. 72–104. [33] B. Shi, “Pseudoresistive networks and the pseudovoltage-based cocontent,” IEEE Trans. Circuits Syst. I, vol. 50, pp. 56–64, Jan. 2003. [34] K. A. Zaghloul, “A silicon implementation of a novel model for retinal processing,” Ph.D. dissertation, Dept. Neurosci., Univ. Pennsylvania, Phladelphia, 2001. [35] M. Abeles, Corticonics: Neural Circuits of the Cerebral Cortex. New York: Cambridge Univ. Press, 1991. [36] V. Braitenberg and A. Schuz, Anatomy of the Cortex: Statistics and Geometry. Berlin, Germany: Springer-Verlag, 1991. [37] J. Bullier, J. M. Hupe, A. C. Hames, and P. Girard, “The role of feedback connections in shaping the responses of visual cortical neurons,” in Vision: From Neurons to Cognition, Progress in Brain Research, C. Casanova and M. Ptito, Eds. Amsterdam, The Netherlands: Elsevier, 2001, vol. 134, ch. 13, pp. 193–204. [38] K. A. Boahen, “Point-to-point connectivity between neuromorphic chips using address events,” IEEE Trans. Circuits Syst. II, vol. 47, pp. 416–434, May 2000. [39] A. G. Andreou and K. A. Boahen, “Neural information processing II,” in Analog VLSI: Signal and Information Processing, M. Ismail and T. Fiez, Eds. New York: McGraw-Hill, 1994. [40] E. Vittoz and X. Arreguit, “Linear networks based on transistors,” Electron. Lett., vol. 29, pp. 297–299, 1993. 353 [41] K. A. Boahen, “The retinomorphic approach: Pixel-parallel adaptive amplification, filtering, and quantization,” Analog Integr. Circuits Signal Processing, vol. 13, pp. 53–68, 1997. [42] E. Culurciello, R. Etienne-Cummings, and K. Boahen, “Arbitrated address event representation digital image sensor,” in Proc. IEEE Int. Solid-State Circuits Conf., San Francisco, CA, Feb. 2001, pp. 92–93. [43] B. E. Shi, “A low power orientation selective vision sensor,” IEEE Trans. Circuits Syst. II, vol. 47, pp. 435–440, May 2000. [44] J. E. Dowling, The Retinal: An Approachable Part of the Brain. Cambridge, MA: Belknap, 1987. [45] R. Shapley and J. Victor, “Nonlinear spatial summation and the contrast gain control of cat retinal ganglion cells,” J. Physiol., vol. 290, pp. 141–161, 1979. [46] B. E. Shi and K. Boahen, “Competitively coupled orientation selective cellular neural networks,” IEEE Trans. Circuits Syst. I, vol. 49, pp. 388–394, Mar. 2002. [47] C. L. Baker Jr. and I. Mareschal, “Processing of second-order stimuli in the visual cortex,” in Vision: From Neurons to Cognition, Progress in Brain Research, C. Casanova and M. Ptito, Eds. Amsterdam, The Netherlands: Elsevier, 2001, vol. 134, ch. 12. Thomas Yu Wing Choi (S’03) was born in Hong Kong, SAR, China, in 1975. He received the B.S., M.S., and Ph.D. degrees in electrical and electronic engineering from the Hong Kong University of Science and Technology, Hong Kong, in 1997, 1999, and 2003, respectively. His research interests include analog VLSI design and neural networks. Bertram E. Shi (S’93–M’95–SM’00–F’01) received the B.S. and M.S. degrees in electrical engineering from Stanford University, Stanford, CA, and the Ph.D. degree in electrical engineering from the University of California at Berkeley. He is currently an Associate Professor in the Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology, Hong Kong SAR, China. His research interests are in analog VLSI and cellular neural networks, bio-inspired and neuromorphic engineering, machine vision, and image processing. Dr. Shi was the Student Activities Chair of the IEEE Hong Kong Section, Secretary and Chair for the IEEE Circuits and Systems Society Technical Committee on Cellular Neural Networks and Array Computing and Distinguished Lecturer for the IEEE Circuits and Systems Society. He served as the Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: FUNDAMENTAL THEORY AND APPLICATIONS. Kwabena A. Boahen received the B.S. and M.S.E. degrees in electrical and computer engineering from Johns Hopkins University, Baltimore, MD, in the concurrent masters-bachelors program, both in 1989, and the Ph.D. degree in computation and neural systems from the California Institute of Technology, Pasadena, in 1997. He is an Associate Professor in the Bioengineering Department at the University of Pennsylvania, Philadelphia, where he holds a secondary appointment in electrical engineering. His current research interests include mixed-mode multichip VLSI models of biological sensory and perceptual systems, and their epigenetic development, and asynchronous digital interfaces for interchip connectivity. Dr. Boahen was awarded a Packard Fellowship in 1999, a National Science Foundation CAREER Grant in 2001, and an Office of Naval Research YIP Grant in 2002. He is a member of Tau Beta Kappa and has held a Sloan Fellowship for Theoretical Neurobiology at the California Institute of Technology.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement