The question of film versus digital image quality has been debated many times and I expect that the debate will continue for many more decades. This article is merely an opportunity for me to express my own opinions on the subject. As they say “where fools rush in” here goes…
The premise of most arguments is resolution, that is the ability to discriminate small details in the image. Film affectionados assume that film emulsions mean that film images are analogue while digital images consist of only discrete values. Neither assumption is totally correct.
Let’s start with an elementary review of a digital image format.
This illustration is not to scale. Each pixel can only hold one tone. So the grid shown is conceptual only.
Mega Pixels is the most commonly used metric of digital quality. It does not tell the whole story. It does give us a count of discrete elements in a image but it does not tell us anything about the size of the image. For that we need pixels per inch (PPI), or the pixel density.
The next metric is the bit-depth. This tells us how many discrete tonal differences can be represented in a pixel. Of course each pixel represents only one of these tones. More bits per pixel results in more tones and larger file sizes (Mega Bytes). Don’t confuse bit-depth with dynamic range. Dynamic range defines the difference between the darkest and lightest tones captured from a scene. Bit-depth defines the granularity of the individual tones in between. If dynamic range was a ladder, bit-depth would define the number of steps on the ladder, not its height. Dynamic range would describe the height of the ladder.
Before the image is stored on some removable media it may be compressed. So the file size is not a reliable indicator of the image quality.
Digital sensors consist of many sensor sites that capture information about the light falling on them. The light consists of only two metrics, the wavelength, which determines the color, and the intensity, which determines the brightness. Together, these represent a tone in the visible spectrum.
A color sensor site consists of one or more sensor receptors also called sensor elements. Each sensor receptor is a unique photodiode. Photodiodes cannot record the wavelength (color) so they are covered with an optical filter that reduces the sensitivity to a single portion of the spectrum. Each receptor records the intensity for a different range of wavelengths. The physical arrangement of these receptors and the filters used accommodate different designs. There are also differences in the electronics such as CMOS and CCD, but they are not important at this point in the discussion.
The most prevalent design for consumer cameras is the Bayer layout of RGGB sensor receptors. This has redundant green receptors. Since the eye is most sensitive to subtle tonal differences in green and green occupies a fairly wide portion of the spectrum, this redundancy is not a bad thing. And, the sensor site results in a square pixel. Sony has a modification to this layout which substitutes a cyan filter for one of the green filters. Canon has a modification which uses cyan, yellow, green, and magenta filters.
The Foveon site is arranged in layers. The top layer captures the blue wavelengths. The middle layer captures the green wavelengths. The bottom layer captures the red wavelengths. This layout results in a square pixel. In theory, the site size can be much smaller, but in practice it is usually larger. This design is also less sensitive to artifacts from the spread function of light as it passes through the lens to the image plane. Click here to see more on the Foveon layout.
Fuji uses hexagonal sensor receptors arranged similar to the Bayer pattern but rotated 45°. Fuji has not published a clear diagram of the physical layout, so this representation cannot be verified. Their newest design (CCD SR) includes an additional receptor dedicated to capturing highlights. This receptor is not color filtered and has lower sensitivity.
A tri-linear arrangement is typically used in scanning devices and scanning camera backs. A row of photo sensors is moved across the image plane or the target image is moved across a row of sensors. And there are sensor-shifted configurations where the entire sensor array moves while multiple image samples are taken. Finally, it is possible to hold the image and the sensor stationary while multiple exposures are taken with different colored filters. These are rather slow requiring significantly longer exposure times but they can produce extremely high resolution and accuracy.
Then there are the beam-splitting configurations with prisms and multiple sensors. They were popular in early video cameras. These can present some unique optical challenges.
To my knowledge no one has designed a digital camera sensor to capture simply the wavelength and intensity, similar to capturing the wavelength and volume of an audio signal. Such a device already exists. It is called a spectrometer. It is used in astronomy to examine the chemical composition of distant objects in space. Spectrometers are also used to measure the light reflected off papers or dyes, and for film and video display measurements.
These alternate designs are interesting and they all have their merits and de-merits. There are also various optical filters that may be in front of the sensors. Some examples are low-pass, infrared blocking, and micro lenses to refocus or intentionally blur the light rays. My objective is not to evaluate these designs but to set the stage for the steps involved in the formation of the actual image pixels.
The formation of an image is the transformation of the data recorded by the individual receptors into something that accurately represents the full spectrum of light and can be mapped to an image pixel. Here the terminology can be very confusing.
The most common term used is demosaicing. It would be better if demosaicing was only used to refer to algorithms that increase the apparent resolution of low-resolution sensors. Essentially, that is what it does. Demosaicing is a form of interpolation that uses color information from neighboring sensor receptors to supplement the color information at a receptor being processed into a pixel.
Some articles refer to chromatic conversions. Chromaticity is the science of mapping the spectrum of light in terms of hue and saturation. It is a descriptive term. It is not an accurate measure of the visual response to colors. HSB, HSV, and HSL are chromatic encoding schemes. The CIE "XYZ" and "xyY" encoding schemes are examples of chromatic based values. They are based on transformations of spectral values.
Colorimetric conversion is a useful term, but it unfortunately has two meanings. One relates to the measurement of tones with light sensitive meters and the other relates tones to the human visual response. More room for confusion. In the first steps of image processing, the visual response is not important, the sensor response is. In the final steps of image formation the chromatic values need to be mapped to visual perception. The CIE "Lab" and "Luv" encoding schemes are example of colorimetric values. RGB values that have been mapped to a color space are colorimetric values.
The dictionary defines mosaic as (abbreviated, verb): to make a picture with small bits of colored tiles. So, to demosaic would be to remove the tiles and replace them with tones. The fundamental assumption in this process is that each sensor (photo diode) is to be mapped to a separate image pixel. Since each sensor represents only one color channel or tile, at least two neighboring sensors must be examined to derive the missing colors. Any way you slice it, this is interpolation.
Since the neighboring sensors may not have seen the same light that the currently sampled sensor saw, color and resolution errors can be introduced. Increasing the number of neighborhood samples does not fix the problem. In fact, it softens the image further. Demosaicing algorithms may also be designed to further increase the apparent resolution by using two sensors to produce three image pixels.
If each site is considered to be a full tonal collection of sensors (three or more), demosaicing is unnecessary. But there will be only one image pixel from each site, not one from each photo diode. The resolution is now a function of the site size, not the photo diode size. It is incorrect to say that all digital cameras need demosaicing.
Most current digital cameras employ demosaicing. This will often soften the image and produce other artifacts. These interpolations are useful in your mobile phone camera, but may be counter-productive in your high-resolution SLR camera.
The sensor receptors have recorded the intensity of sampled wavelengths. These each represent a rather broad range of tones. They are measurements of light intensity, not photographic RGB numbers. They are subject to the sensitivity curves of the individual receptor circuits and their associated filters. In addition, the different wavelengths of the spectrum react differently to the intensity of the light.
These two variables, wavelength range and intensity, are taken from each sensor channel and converted into three new variables, the RGB image channels. These more accurately represent a specific hue, the corresponding saturation, and the brilliance of an image pixel. These RGB values also enable us to reconstruct tones such as yellow or orange, which were not directly recorded.
The most accurate description would be assigning RGB values based the spectral sensitivity of the sensors and the recorded luminescense at each site. The mathematical formulaes for this are typically known as tristimulus functions. The ratios between the separate luminesence values under different filters at the same site are as important as the actual values.
This is sometimes referred to as the linear space since the linear channel values (voltage) are adjusted to linear spectrographic values. I would prefer to use the terms chromatic space or spectral space. Linear with respect to what? All of these color values are linear with respect to something and non-linear with respect to something else.
The accuracy of these adjustments will affect the tonal fidelity of the image and contribute to or subtract from the sharpness of the image.
This resulting RGB tone is a pixel. Before this step, they are sensor elements, not pixels. The generally accepted and better term for a sensor site is sensor element or "SEL". The term for a single photo diode would be a receptor. An SEL that represents full color must consist of three or more photo diodes therefore three or more channels. The data now consists of chromatic values.
These RGB values do not accurately represent colors as we see them until they are mapped into a color space. This is identified by a color profile. This process maps the brightness and tonal values to the visual response and determines the gamut of the recorded colors. This is where gamma correction occurs. Mathematically, this is another conversion. Now the color values have to represent something that can be reproduced within the gamut of the target color space. Since the gamut of the color space represents the response curves of human vision, this is often referenced as a gamma corrected or colorimetric color space.
Without understanding these important but different concepts, the terminology can be very confusing. Chromatic adjustments, colorimetric conversion, and demosaicing have unique and different functions that are performed to create the final RGB values. All too often, the term demosaic is used to encompass two or even all three of the steps.
For those who are obsessively interested in the math behind these conversions, I have created a Color Calculator script. It is described and available here: ColorCalculator
The next step is interpolation. This is necessary if the sensor site is not square since image editing software and display devices usually assume square pixels. Otherwise it is optional but sometimes employed by in camera processing to make larger or smaller images. Whenever and wherever interpolation is performed some data is invented or some data is lost.
The next step is anti-aliasing. There are two factors that contribute to image aliasing. One factor is the spatial sampling of the image data. This is more descriptively referred to as moiré. Another factor is that low resolution images with diagonal or curved edges can appear to look jagged. This form of aliasing is usually much more important in graphic images and text than in photographic images, unless they have been interpolated. Anti-aliasing is usually performed by examining similar elements in neighboring image sites and smoothing the tonal transitions. This has the effect of emulating a higher sampling frequency. The basic process always is to replace some pixels along the edge with a tone between the two contrasting tones that define the edge. Not surprisingly, this can sometimes also lead to soft digital images.
There are several additional important processes that are typically performed on the image data. One is to achieve the correct color balance that matches the light source. Others include sharpening and noise reduction. There are many variations on these. The objective is always to enhance the image quality. It is up to you to evaluate the effectiveness of a manufacture’s efforts.
With what are known as digital RAW files, the sensor data is simply recorded as normalized voltage values from the individual sensors. The image data is not recorded as RGB tonal values in any given color space. All processing is left to the editor software that opens the RAW file. There is no single industry standard for the format or structure of a digital RAW file. However, most of the popular formats are based on the ISO TIFF/EP and EXIF standards.
Recently (April 2005) there has been considerable and heated debate regarding these propritary formats. Click here to read more about the Raw Standards. Click here to read more about the Digital Sensor Pipeline.
Film records a latent image in clumps (grain) of silver crystals and dyes on layers of emulsions. Each clump is analogous to a pixel and each dye layer is analogous to the bit depth. Some films have smaller grains and/or more dye layers than others.
The first metric for this discussion is pixel size at the sensor compared to film grain size. Film grain size is hard to quantify. The silver halide crystals are as small as 2 um (microns or micrometers). But graininess is determined by random clumps of these crystals and dyes. The sizes quoted for these are in the range of 6-8 um at the smallest. This varies with different films and ISO speeds of course. The ISO 800 average would be closer to 17 um or larger. Anyway, this is primarily where the quote of 11-13 million pixels for digital to film equality comes from. The D100 pixel size is 7.6 um. But, the sensor itself is only 2/3 the size of 35mm film. If it were full 35mm format the megapixel rating would be about 14.
Converting these metrics to pixels per inch (PPI) provides some more insight. A top quality film negative with 6 um grain would be 4233 PPI. The Nikon D100 sensor delivers 3333 PPI. Film scanners range from 1200 to 4000 PPI. Print scanners typically range from 300 to 1200 PPI. The point is that the density of the image details is closer than one would think.
Just for grins, the retinal sensors in the eyeball are about 5 um in size. There are 100 million of them, but only about 100,000 are sensitive to color. Of course, this image is never magnified, the sensor is spherical instead of flat, and focusing and color recognition are concentrated in a circular area only 1.5 mm in diameter. No comparison is really possible except to note that the equivalent pixel size is only slightly smaller than either film or digital.
Another metric is MTF. This measures the ability to detect line pairs (lp/mm) in an image. It is very dependent on the contrast available in the image. It is most often used to compare one lens against another or one film against another. For reference, the human eye has been quoted at 6 lp/mm at 2.5 cm and 1 lp/mm at 3.5 m. Lenses and film generally start the high-end measurements at 40 lp/mm. The film measurements generally quote the best numbers at contrast ratios of 1000:1 (unrealistic). A digital sensor’s MTF is limited by the size of two adjacent pixels. For a D100 this would be 66 lp/mm. Whether or not the adjacent pixels would be able to detect the lines would be a function of the contrast and pixel sensitivity. One again, the metrics are not differing by orders of magnitude. And the MTF of the image is ultimately a product of the MTF’s of each of the elements in the system including the lens, the printer, and the ultimate enlargement. The important point is that a high-density digital sensor should not be the limiting factor in image resolution.
The next important metric would be the dynamic range or the ability to detect and discriminate contrast. The eye can resolve about 7 to 10 stops of light (contrast) at a single glance. But a sun and shade daylight scene can easily contain 15 stops of light. So even with the eye, dark and light tones can wind up compressed with loss of detail. The eye can adjust from darkness to a bright scene in about 5 minutes. It takes up to 30 minutes to fully adjust from strong light to darkness. We do the same thing with film or digital images by varying the exposure. And we see the same artifacts, blown highlights or loss of shadow detail.
This is a contrived image intended only to illustrate the concept of dynamic range. It is possible to see the moon and the sun in the same sky, because they fit within the dynamic range of our vision. It is impossible to see the stars behind the moon in daylight because the sun has overwhelmed any contrast. The earth bound objects we see are illuminated by the sun. This city shoreline had an exposure value of approximately EV +15. The starry sky would be approximately EV -6. The moon itself is about EV+14, but it would only illuminate this shoreline at about EV -2. The direct noon sun itself is at least EV +22. (Please don't take your digital camera and point it directly at the sun, you can actually damage the sensor.) The dynamic range of this scene (if it could exist) would be about EV 28. No single film, sensor, or eyeball can take it all in at one view.
With prints, film, and images we can only measure the contrast range of the medium. Digitally, white is 255 and black is 0 (8-bit). How faithfully these are reproduced on paper is a printing matter. So, the dynamic range describes how well the media can capture extreme tonal ranges in a scene. To quantify this we need accurate measurements of the scene and accurate measurements of the resulting image. Ansel Adam's zone system assumes a maximum range of eleven stops.
The dynamic range of a device is the difference (contrast) between the minimum and maximum signal it can faithfully record. It is sometimes expressed as a ratio between the minimum and maximum radiance (decibels). For reference, an exposure stop difference, one EV is approximately 6 dB (decibles). Image density is frequently expressed as brightness measured with a densiometer on a logarithmic scale of 0 to 4. A density of 3.0 is 10 times greater intensity than a density of 2.0. A contrast range of 100:1 is a density range of 2.0, and 1000:1 is a range of 3.0.
Expressed as the density range the numbers typically quoted are; prints 1.7-2.0, negative film 2.4-2.8, slide film 3.2-3.4, digital (8 bit) 2.4 and digital (12 bit) 3.6. For photographic discussions the dynamic range is usually expressed as zones or exposure stops. Expressed as exposure stops typical quoted values are: slides 5-6, negative film 8-10, black and white film 15, and digital at 8-12. There seem to be some discrepancies in these quotes.
Whatever else the numbers provide, they bring a most welcome illusion of certainty.
It is easy to see the dynamic range in a digital histogram. But all you will see is the captured data, not what the scene may have contained that could not be captured. The following two pseudo histograms illustrate this.
The high contrast scene was flash with specular nearby subjects in a large dark room. Lots of deep shadows in the background and bright highlights from the flash. The scene has more dynamic range than the camera could capture. The difference between the deep shadows and the flash reflections was about 17 stops. This image cannot be fixed.
The low contrast scene was a landscape with no bright sky and no real shadows. The camera was able to capture a wider dynamic range than the scene contained. This was about nine stops. This image is rather easy to fix by increasing the contrast.
Some folks claim that negative film has both greater dynamic range and more latitude than digital. I disagree, especially if you shot in the RAW formats. When you open a RAW image you typically have an option to decrease exposure by two stops or increase it by four stops. This is similar to push/pull processing during negative development. But there are other significant differences. One is that once the film has been developed, it is cooked. There is no opportunity to try the development again. Of course, you can adjust the contrast during printing. Obviously with digital RAW that is not the case unless you overwrite the RAW image. The second is that with digital images blown highlights cannot be recovered while with film lost shadow detail cannot be recovered. They both suffer similar effects, just at different ends of the scale. With film it is common to tweak the contrast and tonal balance during the print processing. With digital we perform the same steps with editing programs with much more control and ease.
The electronic design and photon sensitivity of the individual sensors will affect the ability to faithfully capture tonal information. Similar considerations exist for film emulsions. Naturally folks can have favorite films and there are plenty of pros and cons in the debate between CCD and CMOS technologies. An objective measurement between the media types is very difficult simply because the comparative objective data is not generally available. So we are forced to rely on subjective evaluations.
The analog properties of the electronic sensor or film chemicals determine the dynamic range. For digital sensors the bit-depth of the sensor determines how much of this is captured and how many unique tones can be preserved. In black and white terms an 8-bit sensor can only record 256 tones. A 12-bit sensor can record 4,096 tones. So there are advantages to higher bit-depths, but they are not related to dynamic range. Clipping occurs first at the analog stage before any limits imposed by digitizing the data. Even the human eye has limits to it’s dynamic range, night vision versus daylight vision.
If the latitude is the tolerance to exposure errors, the difference between negative film and digital is in the highlights and shadows. Negative film is less tolerant to under-exposure in the shadows while slide film and electronic sensors are less tolerant to over-exposure in the highlights. Slide film and digital sensors both record positive images. This is an artifact of light properties, not simply something inherent in digital sensors. In both cases (film and digital) this clipping is an analog property. And tone compression at the opposite end of the scale is due to the fact that each exposure value is half of the one above it. As you approach the density of black the value approaches zero without ever really getting there (mathematically).
Slide film has very little latitude primarily because there is typically no printing step. The development processing is the only variable and you only get one shot at it. With negative film and digital images, there is considerably more latitude because you can also manipulate the image during post-processing. In fact, if you shoot RAW images you have an opportunity to re-process the development phase. Once developed, the film image is "cooked".
Film suffers from reciprocity failures at very short or long exposure times. Faster film speeds suffer from color de-saturation and noise. Film manufactures usually publish recommend exposure changes to compensate for reciprocity failures. They usually ignore short shutter speed corrections or at most comment that you may experience color shifts.
Digital sensors suffer similar reciprocity failures, again at both at extremely short and very long shutter speeds. At extremely short shutter speeds in very bright light, the sensor wells may “splash” electrons (photons) on neighboring sites counting them inaccurately. These are color shifts and noise. At very long exposures in very low light, dark current noise becomes a factor, primarily from heat buildup on the chip.
Do not put too much emphasis on the histogram. Both film and digital can be subject to reciprocity failure. The extreme left and right sides of the histogram are not always precisely accurate.
Common measurements for film include the resolving power (lp/mm) at maximum and average contrast levels, spectral density, and color density. Digital cameras offer only pixel counts and bit-depths. The pixel count is not a definitive measure of resolution. The bit-depth does not measure spectral or color response, only the number of tones that can be recorded. When this kind of data is published for high-end digital cameras better comparisons will be possible.
There is a point where higher resolution will not improve image quality. This is known as the diffraction limit. Diffraction is an inescapable property of light and optics related to the size of the aperture (lens opening). As light passes through this opening it bends slightly at the edges. The smaller the opening, the greater the effect. This causes an otherwise sharply focused image to become blurred.
I constructed the following chart to illustrate the resolution characteristics and the limiting factors. The vertical scale is not linear above 120 to fit the range of the diffraction limit curve. I have expressed these blur circles as resolution in line pairs per millimeter (lp/mm). The horizontal axis shows the aperture settings.
I have also included the curve of a theoretical lens showing resolution at a constant (max) contrast as the f-stop is changed. Unfortunately most MTF charts for lenses show constant resolutions at varying contrast levels but only at the minimum aperture and at f/8. For this comparison, the resolution as a function of the aperture is desired.
The band of resolution factors for film and digital sensors is also shown though this does not change with aperture. A 6um digital pixel is most similar to ISO 50-100 film and a 12um pixel is more similar to some ISO 800+ films. Most medium format digital backs use 9um pixel sizes.
The maximum resolution of the system is a function of the limits of the individual components. There are some that believe this can be mathematically calculated via root mean square (RMS) formulas. I do not agree since this yields an average weighted value rather than a limit biased value. In other words, the system resolution cannot be better than the lowest component resolution.
The bottom line is that the sharpest possible image will be in an area of this graph that falls under the limiting factors. This shows clearly why the “sweet spot” is said to be between f/5.6 and f/11.
The lines at the bottom of the chart show the CoC for various film and sensor formats again as resolution (lp/mm). These are not physical limits as with the previous metrics but subjective limits based on image resolution objectives for depth of field. This clearly demonstrates why larger image formats yield sharper images at higher f-stops even though the diffraction limits are the same and the lens limits are similar. Thus the related guideline that DX sensors are diffraction limited at f/16 while 35mm formats are diffraction limited at f/22.
If you want to read more about the Circle of Confusion and diffraction limits click here.
To get an objective answer to the film versus digital question you need to measure a specific film against a specific digital sensor. In most cases the answers will be so close that they are insignificant. There are many other factors including the lens and optical properties that will make the difference in the final judgement of the image quality. Thus, my firm belief is that the debate is over and the race is a tie.
Smaller pixels will be possible in the future. Research and applications for nano-technology are already in process. Diffraction limits on the other hand are properties of light. Any breakthrough will have to be in the field of optics. It will have an impact as significant as the invention of the telescope. By that time we will be on the frontier of pico-technology. Peda pixels will be in vogue and we will store images in pedafiles. In the meantime, there is little practical advantage in smaller pixels for serious photography. There are practical advantages for larger image formats.
Smaller sensor formats do have advantages for journalism, sports, and wildlife photography. Larger sensor formats have advantages for portrait, landscape, and artistic photography.
If the debate over image quality is over, that leaves economics and utility functions to be evaluated. The camera costs are just as variable in either media. You get what you pay for. Some attempt to compare the cost of film to the costs of a computer and software. One is capital equipment and one is supplies. Some try to compare lab costs to the user’s time investment. In fact either media can be processed by a lab or by the photographer. If you want to shoot digital but don’t want to invest in the equipment, training, or time to do the digital processing, just shoot JPG and take your images directly to your local super store.
In any comparison of utility, ease of use or degree of control, digital wins hands down. Press a button and those embarrassing shots disappear. You get instant feedback to see if you captured the scene that you wanted and if the quality is at least acceptable. You don’t have to change the film if the lighting has changed and you need more speed or a different white balance. If you want point and shoot ease, just shoot JPG images and let the camera do all the lab processing. If you want total control over the image processing, just shoot in RAW mode and do the lab work yourself with a digital editor.
A single digital media card can hold the equivalent of ten or more rolls of film. The image can be easily used for email to friends and family, for high quality small prints at home or a local lab, and for substantial enlargements at a quality lab.With film any dust in the image is purged when you advance the frame or change the plate. With digital you need to clean the sensor occasionally. To some, this is brain surgery.
If you are unhappy with the colors from film, you just try another film. There are literally thousands to chose from with different dyes and dye sensitivities, even different numbers of layers. If you are unhappy with the colors from digital you can try a different sensor or adjust them in Photoshop. Photoshop CS has taken a step in this direction already with the new lens blur, flare, and color filters. All we need is a PS filter that emulates the color curves of Velvia film. A different sensor means buy another camera.
Generally speaking, film is much more sensitive to ultra violet light than digital sensors. That’s one reason we put a UV filter over the lens. Silicon is much more sensitive to infrared light than film. Usually there is an IR filter just in front of the sensors. Customized digital cameras are available with this filter removed.
Film is still usually better for very long exposures in very low light. Digital sensors suffer from heat buildup under these conditions. In either case, reciprocity is an issue that needs to be addressed for this kind of shooting.
An image can be judged on its artistic or technical merits. There are three broad categories of technical quality. These are sharpness, fidelity, and noise. Sharpness is a subjective criteria but it can be objectively measured in terms of contrast and resolution. Fidelity is an assessment of faithful recording of luminosity and color. It can be evaluated with metrics such as spectral and color curves, and dynamic or tonal range. Noise is a broad category of artifacts that were not in the original scene but got recorded in the resulting image. Film grain, electrical noise, reciprocity failures, and lens artifacts fall into this category.
It is the emotional impact and artistic quality that sells an image.
That is just my two cents. I hope you also gained some new insight from this article. If you have any comments, or suggestions, I would welcome your input. Please send me an Email
Rags Int., Inc.
204 Trailwood Drive
Euless, TX 76039
September 14, 2004
This page last updated on: Wednesday October 03 2007
You are visitor number 60,859 since 02/04/04