Digital Sensor Processing Pipeline

This article will examine the basic construction and operation of digital image sensors typically used in many digital cameras.   The focus will be on factors affecting color accuracy or fidelity.   Hopefully this will help the reader understand what goes on under the covers.   And it will explain why raw image colors are frequently different when viewed with different software packages.

Most consumer and many professional digital cameras use the familiar Bayer sensor layout or some variation thereof.   This provides a high resolution image with a single image exposure at low cost.   But it requires interpolation for full color rendition.   In much of the literature this process of creating an image from the raw sensor data is known as the image pipeline.   This is an attempt to describe this processing in layman’s terms and identify some of the challenges.

To begin, we must assume that the reader is familiar with the basic concepts of color management, ICC profiles, CIE color spaces, and the elementary properties of light.   But this article will not be an attempt to confuse the reader with a mind-boggling string of mathematical glyphs.   Naturally I will also assume that the reader is familiar with basic camera components, lenses, and such.   This article is not going to address on-board camera functions such as automatic exposure, automatic focusing, image previews, and such.

The Bayer pattern is composed of color filtered sensors typically in a layout using two green sensors for each red and blue sensor.   But this is not the only arrangement.   The filter layout is called a color filter array (CFA).   The elements are called tiles and the collection is called a mosaic.   Therefore, the process of interpolating the colors in the image is called demosaicing.

There are other approaches to image capture such as scanning backs and beam splitters or three chip cameras.   But these have their own unique challenges limiting their use to specialized applications.   They will not be addressed here.

The primary objective of demosaicing is to provide full color information for each sensor element or photodiode.   These become the image pixels.   This could be accomplished by treating four sensors as a single image pixel and combining the color values.   However this would result in significantly lower resolution, an undesirable tradeoff.   Therefore, each sensor element is treated as an image pixel and the missing color values are reconstructed from neighboring sensors.

Most of the literature focuses on the various algorithms for this interpolation, common artifacts encountered, and how to address them.   While these artifacts are important and interesting, the focus of this article will be on the accuracy or fidelity of the colors.   The fundamental problem is that the sensors record monochromatic light values from different parts of the visible spectrum.   These need to be combined to sets of values that represent color.   We typically think of these as the hue, saturation, and brilliance or lightness.   Because different sensors and filters have different spectral sensitivities, algorithms that work for one sensor may not work for another.   A few examples of these filtered spectral sensitivity charts follow.

The important point is that there are not only different curves between RGB and CMY layouts, there are significant differences between CFAs of the same class.   Naturally the best sensors have good separation between the channels, low sensitivity outside the visible spectrum, and no ambiguities.   Just as obviously, more than three color filters can enhance color accuracy.

We should also note that a portion of the sensitivity is due to the silicon materials.   The following figure shows the unfiltered spectral response of three different sensors.   These are overlaid on an approximation of the human response to the visible spectrum.   It should be obvious that most silicon materials have low sensitivity in the ultraviolet region and very high sensitivity in the infrared region.

These show how luminosity varies across the spectrum.   In addition there is the question of how luminosity translates (scales) to recorded voltages.   That is, how linear is the response.   The following chart shows an example from the specs for a (Dalsa) high-end sensor.   Note the non-linearity at the extreme highlight area.   And this chart does not show the effects of temperature (from long exposure times) or noise in the shadow areas.

Before looking at the image construction and demosaicing, we need to take a quick look at the basic components of the sensor and the processing within the camera.

The sensor is constructed from thin layers of etched silicon crystal materials.   The various layers contain controlled minute impurities that create p (positive) or n (negative) regions.   Photons of light dislodging electrons in the topmost layer create a voltage difference between these materials.   The p and n materials shown in this diagram may be reversed.   In addition, there are two basic classifications of these photodiodes, photo-resistive and photo-voltaic.   The substrate design may be CCD or CMOS.   Fundamentally, this determines the sequence in which the photodiodes are read (addressing), how much logic is included on a single chip, the power requirements, and other performance characteristics.   They all produce an analog voltage at a p/n junction that is then converted to digital values.

These photodiode sensors (or sites) are small.   They are typically 5 to 12 microns across.   A micron is one thousand times smaller than a millimeter.   They can be square, rectangular, or hexagonal.   They are near the size of film grains and the rods and cones in your eye.   The smaller the sensor sites, the higher the resolution.   The most significant difference between film grains, retinal cones, and these photodiodes is random distribution instead of the regular pattern spacing of the digital sensor arrays.

But there are other factors to consider.   Fill factor measures the relationship between the surface areas that actually collect light and those that carry control circuitry.   As the distance between the photodiodes decreases (called the pixel pitch) it becomes harder to shrink the control circuitry a corresponding amount.   Thus the fill factor decreases.   Quantum efficiency measures the ratio of photons striking the material to electrons counted.   This is also a function of the materials and size, including the depth of the silicon layers.   Between the fill factor and quantum efficiency, a site is more likely to collect 20% of the available light than 80%.   In addition very small sizes are subject to optical diffusion limitations.   Without going too deep into this, it can limit the useful aperture sizes.   In summary, smaller sites collect fewer total electrons making it harder to eliminate signal noise.   All sensors are not created equal.

There are several optical filters included as part of the sensor.   Of course, the color filters separate colors to the designated photo diodes.   A micro lens at each site focuses the light more directly on the sensor increasing the local efficiency.   There may also be a low pass filter (not shown) to spread the light across several sites.   This can improve color accuracy and reduce moiré affects.   It is designed to limit the spatial frequency of a scene to match the sensor spacing or Nyquist limit.   In addition, some sensors include an IR filter to reduce infrared light.

There will always be some processing steps within the analog to digital circuits.   At a minimum these consist of gain control (ISO settings) and dark current noise subtraction.

Please note that ISO would not change the bias in the A/D circuits in normal operation.   All that should change is your aperture and shutter settings.   But when you shoot near the design limits, extremely high ISO, very long or very short shutter speeds, some reciprocity issues are typically addressed in the A/D circuits.

Your aperture and shutter settings determine the amount of available light that falls on the sensor.   This is based on your ISO setting.   These three settings define the photographic exposure.   They are selected by the user or the camera based on the amount of light in the scene.

The digitized values are stored in a local buffer before processing by the camera firmware.   The larger the buffer, the more sequential shots can be taken without having to wait for writes to the storage media.   This is very useful for high-speed action photography.

If the image is to be rendered (JPG) in the camera, demosaicing and all other image processing takes place in the firmware.   If the image to be RAW, only the metadata is added to the raw buffer contents.   This metadata describes many of the camera controls and settings at the time the shot was taken.   It may also describe the sensor configuration and how the data is to be interpreted.   Some cameras can also compress the raw data (lossy or lossless).   Most cameras will also include a small JPG preview image in the metadata.   All other image processing is deferred to the image editing software on your personal computer.

The raw sensor values representing the filtered colors are called the sensor color space.   These are initially converted to a reference color space, usually CIE XYZ.   From here, they are converted to a device color space such as sRGB.   Somewhere in the pipeline the colors are adjusted based on the light source illuminating the scene.   Where this is done as part of the XYZ transform it is called the coordinated color temperature (CCT).   Where this is done in the rendered target RGB color space it is usually simply called white balance.   CCT adjustments are based on the spectral content of a light source.   These are defined as tables of chromacity co-ordinates for standard illuminants.   White balance adjustments are typically simple RGB scalars.

Combinations of color matching algorithms and transforms employing matrix arithmetic determine the image color values.   A color-matching algorithm will match three or more achromatic (single wavelength) luminescence values to three or more colorimetric values.   These are called tristimulus values in CIE color spaces such as (XYZ).   They are usually called triplets in RGB or CMY color spaces.   Matrix transforms are also typically used to adjust colors for white balance or convert colors from one color space to another.   The most commonly used transforms are known as Von Kries, CAT97, and Bradford.

The challenge when dealing with Bayer patterns is the fact that each sensor does not have all the spectral information needed to independently and accurately match colors.   Some information must be borrowed from neighboring sensors.   This is color interpolation.   This is where color matching needs to be addressed.

Color Matching Functions CIE RGB CIE XYZ

The heart of color accuracy is in the color matching functions.   These are based on the fact that a particular color (spectral wavelength) can be matched to filtered measurements of the sensor values.   It is not the absolute value that is matched, just the relationship between three or more scaled values sampling the same object color.   In the oversimplified example above, the amount of red and green are near equal and there is no blue.   This color would be yellow at about 580 nm (nanometers).   This is then mapped to the equivalent XYZ values.   A different sensor might give very different results.   This is why the sensor spectral information is so important.   The corresponding color match and XYZ tables may have 1, 5, 10, or even 20 nanometer spacing.

Many of the demosaicing functions also address artifacts such as image noise, sharpening, anti-aliasing, moiré patterns, and dynamic range.   Most of these will change the color values of course.   So if they are applied too soon, color accuracy can be compromised.   When these are done in the camera through firmware what you see is what you get.   These are known as rendered images in a rendered color space.   Since the camera maker has access to the sensor spectral information, we rightfully expect both high quality and color accuracy.   In many cases we judge the image quality on aesthetics more than fidelity (accuracy).   There is nothing wrong with this.   And we can easily accept or reject the results.

But when we shoot raw with one vendor’s camera and use another vendor’s image processing software to render it, the plot thickens.   The generic image editor may not have access to the spectral sensor information.   So they have to resort to assumptions or very detailed calibration measurements.   Some simply assume that the sensor response matches the CIE XYZ or RGB spectrum.   Some leave it up to the user to perform any calibration, albeit it sometimes with limited tools.

The CIE color matching functions are well documented.   They assume that the input RGB values are scaled and add up to one.   The actual values determine the brightness or luminous value.   The relationship between the values determines a dominant spectral wavelength.   This in turn is an index to the corresponding colorimetric XYZ tristimulus values.   These are defined in tables representing the visible spectrum from 380 to 780 nm or 400 to 700 nm.   There are also simultaneous transformation equations that can be used with tristimulus CIE RGB and CIE XYZ transforms.

An alternative to unique functions and tables for a sensor is to design the filters to approximate the CIE XYZ or RGB spectral functions.   This requires that the filters peak at blue 700, green 546.1, and red 435.8 nm and that they approximate the shape of the CIE functions.   High accuracy is impossible, but this does allow the use of simple publicly documented transforms.   One obvious problem is that the CIE RGB values include negative numbers.   Photodiodes do not produce negative values.

Thus the question is whether to perform spectral adjustments (color correction) before or after demosaicing.   In theory, this sequence is interchangeable.   In practice, many other functions have been integrated into the Bayer demosaicing algorithms.   Many of these also adjust colors, but for reasons unrelated to color accuracy.   So it is important to achieve color fidelity before performing other aesthetic image adjustments.

If you setup the camera to produce rendered images (JPG or TIF) you can create an input profile calibrated for the device.   With the proper tools of course.   But these are highly dependent on a consistent light source color temperature.   Thus they are practical for studio use only.   Phase One, Capture One is the only package that I have found that seems to support canned and user generated camera profiles specifically for raw image processing.   Unfortunately camera support is more limited than Photoshop.   And I have not used it myself, so I cannot comment with authority.  

The bottom line is that accurate spectral information about the sensor and filters is necessary for accurate color fidelity.   When provided and used correctly, they are independent of the light source (CCT).   Thus separate profiles for different lighting conditions should be unnecessary.   The key to color accuracy lies in the color matching functions.   And the key to the color matching functions lies in accurate spectral information about the sensor.

This article will not cause any vendor to change their propriety demosaicing algorithms.   But hopefully it has helped the readers understand what is happening under the covers.   And why there can be such large differences in the colors produced by different software packages from the same raw image.   It is my belief that the long term solution lies in published spectral information for all cameras that are expected to produce high quality images.   Especially those that support raw image formats.   The ISO has provisioned this in the TIFF/EP standards.   It would also be satisfactory to simply publish this along with the technical specifications for each camera.   Until this happens, we cannot expect the raw image editing software to use the information.  


If you are interested in more information about metadata and the ISO TIFF/EP standards, just follow this link.   Among the tags related to spectral sensitivity are "SpectralSensitivity" and "SpatialFrequencyResponse".

   References:

Raw Image Processing Software URL MSRP
  ACR - Adobe Camera RAW (CS2) Adobe.com $1100
  Capture One - Phase One PhaseOne.com $500
  Bibble Pro - Bibble Labs BibbleLabs.com $130
  RawShooter - Pixmantec RawShooter Premium Pixmantec.com $100
  NC - Nikkon Capture NikonMall.com $100
  DPP - Canon Digital Photo Professional Canon.jp $0
  Kodak Pro DCS Photo Desk - Kodak Kodak.com $0
  RAW File Converter Ex - Fuji Fujifilm.com $0
  Image Data Converter - Sony Sony.com $0
  DCRAW - Dave Coffin: Open Source CyberCom.net $0
Prices shown are approximate MSRP for a single user license.   Most camera vendors include the respective software with appropriate camera purchases.   Upgrade prices and terms will vary.

I hope you also gained some new insight from this article.   If you have any comments, or suggestions, I would welcome your input.   Please send me an  Email.


Rags Gardner
Rags Int., Inc.
204 Trailwood Drive
Euless, TX 76039
(817) 267-2554
Send Email
www.rags-int-inc.com
December 1, 2005

This page last updated on: Wednesday October 03 2007
You are visitor number 27,692 since 12/05/05