Logarithmic Image Sensor For Wide Dynamic Range Stereo Vision System Christian Bouvier, Yang Ni New Imaging Technologies SA, 1 Impasse de la Noisette, BP 426, 91370 Verrieres le Buisson France Tel: +33 (0)1 64 47 88 58 firstname.lastname@example.org, email@example.com Index Terms—HDR, WDR, CMOS sensor, Stereo Imaging. I. I NTRODUCTION We are all familiar with our ability, as human, to extract 3D information from our environment thanks to our stereoscopic vision. This level of perception is achieved through the sensing capability of the human eye and the processing capability of the human brain. In principle, extracting 3D information from an artificial stereoscopic vision system is simple. In the ideal case, if we are able to identify identical scene primitives with both sensor units, we can extract directly the depth information by triangulation. Stereo vision systems find application in a wide variety of fields such as robotics. For example Nasa Curiosity rover has several stereo camera systems to scan Mars surface for navigation, hazard avoidance or for terrain measurements. Stereo vision systems can also be found in automotive application. Car manufacturer Subaru proposes a stereo vision system that is used for collision avoidance, pedestrian detection and driving assistance. Stereo camera can also find application in security for people counting or industrial safety applications. Today, in computer vision, the geometric problems, such as correcting the projection distortions and calibrating a stereo pair are well known and can be dealt with through image processing and calibration steps , , , . But even if we consider an ideal stereoscopic camera, the stereo correspondence process remains one of the most challenging task in order to extract depth information. When the stereo vision system is used to extract depth information, the sensor dynamic range and reactivity become critical because in saturated areas Vck RST RD1 RD2 Hsync Hck Fig. 1. COLbias G1 Pixel Array 1280x720 Video-Amp Vsync PIXbias Exposure Control NEG V - Scan Abstract—Stereo vision is the most universal way to get 3D information passively. The fast progress of digital image processing hardware makes this computation approach realizable now. It is well known that stereo vision matches 2 or more images of a scene, taken by image sensors from different points of view. Any information loss, even partially in those images, will drastically reduce the precision and reliability of this approach. In this paper, we introduce a stereo vision system designed for depth sensing that relies on logarithmic image sensor technology. The hardware is based on dual logarithmic sensors controlled and synchronized at pixel level. This dual sensor module provides high quality contrast indexed images of a scene. This contrast indexed sensing capability can be conserved over more than 140dB without any explicit sensor control and without delay. It can accommodate not only highly non-uniform illumination, specular reflections but also fast temporal illumination changes. G2 Offset Out1 Out2 FPN Compensation H - Scan Sensor general structure. the details are lost and consequently depth extraction becomes impossible. The sensor reactivity will also be important to prevent saturation in changing environments. In this communication we present a stereo vision system that relies on two 1280x720 WDR (Wide Dynamic Range) logarithmic image sensors. In section II we will detail the characteristics of the sensors embedded in the stereo vision system. In section III the stereo vision system architecture will be described. Finally, in section IV we will give some indication of the stereo vision system performance. II. S ENSOR C HARACTERISTICS Similarly to biological photoreceptors, the sensors embedded in the stereo vision system presented in this paper follow a logarithmic law. This kind of sensor response is very interesting in imaging applications because it can deliver contrast indexed images over a very wide dynamic range with a single frame. Previous work on logarithmic sensors were based on current to voltage converting using MOS transistor in sub-threshold mode , , , , , . The NSC1005 sensor presented in this paper features the pixel design that was described in a previous communication , . In this pixel design the photodiode is used in solar cell mode in order to get a natural logarithmic law. The logarithmic pixel design proposed in  resolves the problems of the previous logarithmic image sensors. Internal FPN compensation is included, the sensitivity has been drastically improved and the lag problem have also been addressed. The general structure of the sensor is given by figure 1. Pixel lines are selected sequentially. The FPN correction starts before each line readout. The selected line of the pixel array NSC1005 Response for t = 40ms Gain = 0dB 1/2 inch 5.6um x 5.6um 1280x720 30% Available > 140dB Differential Analog Output 60Hz 80MHz Standard 0.18 1P3M 50mV/decade 1V/lux*s @ texpo =40ms 0.5mV over all DR <1mV Rolling Shutter 230mW 160 140 120 VD (mV) Optical format Pixel Size Pixel Array Size Fill Factor MicroLens Dynamic range Output Format Maximum Frame rate Maximum H scanning rate CMOS process High light Log sensitivity Low light linear sensitivity Random Noise FPN Shutter Mode Power consumption 100 80 60 40 20 0 −2 10 TABLE I −1 0 1 2 10 10 10 10 Optical Flux Faceplate (Lux @ 525nm) 3 10 S ENSOR C HARACTERISTICS SUMMARY. will be loaded to the first column memory buffers by RD1. Then the RST signal resets the current line. On RD2 we load the dark signals to the second column memory buffers. A differential amplifier is used to subtract the contents of these two memory buffers for the FPN compensation. The gain of the amplifier can be set to 4 different values (0dB, 4dB, 8dB and 12dB) using two digital control bits G0 and G1. An offset can be applied by adding an OFFSET voltage to match the sensor output with the input voltage range of an external differential ADC. Table I summarized the sensor characteristics. It was demonstrated in  that the solar cell mode pixel photoelectric response VD is easily derived from the opencircuit voltage of the illuminated photodiode after reset operation: VD = VT ln Iph + IS Iph exp − (Iph +IS )t VT CD (1) + IS where VT = kT q and k, T , q, IS , Iph , t and CD are respectively the Boltzmann constant, the absolute temperature, the elementary charge, the junction current, the photocurrent, the exposure time and the junction capacitance. A quick analysis of eq. 1 denominator shows that for low illumination or short exposure time, the sensor response will be linear whereas the sensor response will follow a logarithmic law for high illumination . Figure 2 shows the measured sensor photoelectric response for an exposure time t = 40ms. We can see on figure 2 that the proposed photoresponse model is valid. For low illuminations, the sensor is clearly operating in linear mode. From 0.8Lux, figure 2 shows that the sensor enters the logarithmic zone of the photoresponse. For those measures, the sensor has been illuminated using a green LED at 525nm. The dynamic range was measured to be over 140dB using laser illumination. Figure 3 shows a crop image sample taken from the sensor looking directly at an intense halogen light. III. S TEREO V ISION C AMERA When a stereo vision system is used to compute depth information, the sensor dynamic range and reactivity become Fig. 2. Sensor Photoelectric Response. Fig. 3. Sample image (frame rate 25Hz, exposure time 40ms) critical because in saturated areas the contrast is lost and depth extraction is not possible. A stereo vision system has been developed to address those issues. Table II summarized the stereo camera main characteristics. At the core of this camera we find 2 B&W WDR NSC1005 sensors that are slaves to the same controller. Both sensors differential analog outputs are connected to 12 bits differential ADCs. In this design a FPGA generates all the sensors control signals (Vsync, Vck, Hsync, Hck, RST, RD1 and RD2) for both sensors resulting in a pixel level synchronization. The digitized left and right pixels levels are then sent to the FPGA. The FPGA will pack the 12 bits left and right channels to create a single 24 bits channel that is sent to a host PC (see fig. 4). Thanks to the dynamic range provided by both logarithmic sensors, it is possible to set the exposure time and the frame rate at fixed values without risk of saturation even if the sensors are directly illuminated with strong light (Fig. 3). Sensors Data Output A/D converter Baseline 2 x NSC1005 24 bits CameraLink channel 12-bits ( x 2) 5cm TABLE II S TEREO C AMERA M AIN C HARACTERISTICS . Left Sensor & 12bits ADC Raw data 12bits Raw data 12bits FPGA Timing Right Sensor & 12bits ADC Timing Camera Link Host PC Fig. 4. Stereo vision system architecture. This means that gain and exposure time control becomes very simple for stereo vision systems. The wide dynamic range provided within a single frame for the 2 sensors makes the stereo vision system very robust to non-uniform illumination, specular reflections and fast temporal illumination changes. For example, in this stereo system design, the frame rate can be set to 25 or 30 Hz, the exposure time is set to the maximum possible value (the frame rate period) and the sensors gain are set from the host PC to the same value for both sensors. The logarithmic conversion function also offers interesting properties regarding both left and right images processing. If we consider the optical intensity OS(λ, x, y) incident on a pixel at coordinates (x, y), OS(λ, x, y) can be written as the product of : OS(λ, x, y) = L(λ, x, y) × R(λ, x, y) × T (λ, x, y) (2) where L(λ, x, y), R(λ, x, y) and T (λ, x, y) are respectively the illuminance illuminating the scene, the reflectance of the elements in the scene and the overall transmittance . This optical signal falling on a pixel will generate a corresponding photocurrent : Iph (x, y) = α(λ, x, y) × OS(λ, x, y) (3) where α(λ, x, y) is the spectral sensitivity of the photodiode. In section II, we saw that with enough light (Iph >> IS ) the sensor enters the logarithmic portion of its photoelectric response(0.8Lux for an exposure time of t = 40ms) and VD can then be approximated by VD ≈ VT ln (Iph /IS ) and: α(λ, x, y) × OS(λ, x, y) IS ≈ VT (ln L + ln R + ln T + ln α − ln IS ) VD (x, y) ≈ VT ln (4) ≈ VT (ln L + ln R + ln T + ln α + F P N ) We see that, when the sensor operates in logarithmic mode, the conversion function will lead to a pixel output that is the sum of the log-luminance, log-reflectance, log-transmittance and log-spectral sensitivity. In most of cases, the illumination on a scene is quite smooth and uniform. As a result, the illumination level, seen as an information vehicle, is transformed into an offset by the logarithmic response and so is the spectral sensitivity of the photo-detector. The suppression of this offset will normalize the reference between both left and right sensors. The resulting signals will then index contrast instead of intensity. In our FPGA implementation we use the minimum level of each image to normalize the left and right images. Once this illumination information is removed, we see that the sensors signals now index contrast instead of light intensity. Furthermore, once the sensor is working in logarithmic mode, the contrast sensitivity will be constant providing that both sensor amplifier are set with the same gain. The direct consequence of this property is that contrast will be identical in areas seen by both left and right sensors regardless of the illumination intensity on both sensors. This property is interesting if we consider the stereo matching process that relies on contrast information. IV. S YSTEM P ERFORMANCE In order to operate the stereo vision system, a companion software was developed to perform the stereo calibration and the rectification processes. The stereo calibration module implements a camera calibration algorithm based on corner extraction from chessboard chart images (, ) to correct both sensor modules distortions. Using the same sets of points the calibration module also computes the essential matrix, the fundamental matrix, the rotation matrix between the left and the right camera coordinate systems and the translation vector between the coordinate systems of the cameras. Finally the left and right images are rectified using Bouguet algorithm (). With this implementation of the calibration and rectification steps, we typically get a RMS re-projection error of the input calibration points of ERM S = 0.2 pixel. After rectification of the left and right images, the relationship between the depth Z and the binocular disparity d is given by: Z= f T Spix d (5) where Z is given in meter, d is given in pixel, f is the lenses focal length, Spix is the pixel size and T is the system baseline. The stereo vision system default configuration features low distortion lenses with a focal length of 5.5mm. Considering that the pixel size is 5.6um the maximum working Zmax range of the stereo system will be Zmax ≈ 50m. The minimum working range will be limited by the disparity range allowed for the stereo matching. The closer the object will be to the camera, the higher the binocular disparity. Depending on the complexity of the stereo matching algorithm, a tradeoff should be made between the matching process complexity, the disparity range, the field of view and the processing speed. In our software implementation we typically set the maximum binocular disparity possible to 128 pixels. This gives a minimum working range Zmin ≈ 0.38m. We can also compute the accuracy Zacc of our calibrated and rectified system as a function of the RMS re-projection error and binocular disparity: Zacc = ERM S f T Spix d2 (6) By combining eq. 5 and eq. eq. 6 we can compute the depth accuracy as a function of depth: Accuracy (m) Stereo Vision System Accuracy Short Range 0.04 0.03 0.02 0.01 0 0 1 2 3 Depth (m) Fig. 5. Stereo camera short range accuracy. Fig. 7. Left sensor image and depth map sample image (frame rate 25Hz, exposure time 40ms) Stereo Vision System Accuracy Long Range 10 approaches with active illumination approaches without fear of losing information because of saturation. Accuracy (m) 8 6 R EFERENCES 4 2 0 Fig. 6. 0 10 20 30 Depth (m) 40 50 Stereo camera long range accuracy. Zacc = ERM S Spix 2 Z fT (7) Figures 5 and 6 gives the short range and long range accuracy curves for ERM S = 0.2. Those curves should be seen as the best accuracy that the stereo camera can achieved. Obviously, the absolute depth precision will dependent on the performance of the stereo matching algorithm. In the case of an active system, the presence of an illuminator generating a pattern will also help to achieve the accuracy given by fig. 5 and fig. 6. Figure 7 gives some depth map sample images computed from the rectified left and right images using the stereo matching algorithm described in  without any kind of illuminator or pattern generator. V. C ONCLUSION In this paper, we have presented a stereo vision system that relies on 2 logarithmic WDR 1280x720 sensors. This stereo vision system is able to provide high quality contrast indexed images of highly contrasted scene. This contrast indexed sensing capability can be conserved over more than 140dB without any explicit sensor control and without delay. The logarithmic photoresponse is also interesting if we consider the image robustness to illumination variabilities. This dynamic range and contrast conservation makes this stereo camera very effective for outdoor 3D stereo vision applications. Besides it also gives the unique possibility to combine passive 3D  R Tsai, “A versatile camera calibration technique for high-accuracy 3d machine vision metrology using off-the-shelf tv cameras and lenses,” 1987.  Takeo Kanade, A. Yoshida, K. Oda, H. Kano, and M. Tanaka, “A stereo machine for video-rate dense depth mapping and its new applications,” in Proceedings of the 15th Computer Vision and Pattern Recognition Conference (ICVPR ’96), June 1996, pp. 196–202.  Zhengyou Zhang, “A flexible new technique for camera calibration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330–1334, 2000.  Gary Bradski and Adrian Kaehler, Learning OpenCV, O’Reilly Media Inc., 2008.  S.G. Chamberlain and J. Lee, “A novel wide dynamic range silicon photoreceptor and linear imaging array,” IEEE Journal of Solid-State Circuits, vol. SC-19, no. 1, pp. 41–48, 1984.  N. Ricquier and B. Dierickx, “Pixel structure with logarithmic response for intelligent and flexible imager architectures,” in Solid State Device Research Conference, 1992. ESSDERC ’92. 22nd European, 1992, pp. 631–634.  T. Delbrck and C.A. Mead, “Analog vlsi phototransduction by continuoustime, adaptive, logarithmic photoreceptor circuits,” California Institute of Technology, Computation and Neural Systems program, CNS Memorandum 30, 1994.  K. Takada and S. Miyatake, “Logarithmic-converting ccd line sensor and its noise characteristics,” in IISW 1997, 1997.  S. Kavadias, B. Dierickx, and G. Meynants, “A self-calibrating logarithmic image sensor,” in IEEE Workshop on CCD&AIS, 1999.  M. Loose, K. Meier, and J. Schemmel, “A self-calibrating single-chip cmos camera with logarithmic response,” IEEE Journal Solid-State Circuits, vol. 36, no. 4, pp. 586–596, 2001.  Y. Ni and K. Matou, “A cmos log image sensor with on-chip fpn compensation,” in ESSCIRC’01, Villach, Austria, September 2001, pp. 128–132.  Yang Ni, YiMing Zhu, and Bogdan Arion, “A 768x576 logarithmic image sensor with photodiode in solar cell mode,” in International Image Sensor Workshop. International Image Sensor Society, June 2011.  B. Hoefflinger, High-Dynamic-Range (HDR) Vision: Microelectronics, Image Processing, Computer Graphics (Springer Series in Advanced Microelectronics), Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2007.  Heiko Hirschmuller, “Accurate and efficient stereo processing by semiglobal matching and mutual information,” in Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) - Volume 2 - Volume 02, Washington, DC, USA, 2005, CVPR ’05, pp. 807–814, IEEE Computer Society.
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project