In order to take light coming in to a lens and turn it into an video file on a flash card, a camera works through a series of linear steps, once for each frame per second required of the filming.
At the highest level the steps are:
Sensor – Light passes through the lens -not dealt with here, a whole voodoo area of it’s own- and strikes the sensor, a matrix of light sensitive cells.
Processor – An image processor takes the raw information from the sensor and turns it into an image file, by eliminating error, reducing noise and so on. Each file is a frame of the video.
Storage – The camera writes a stream of frames -several MegaBytes a second’s worth- to a single video image file on the onboard flash card.
There are a number of key challenges around this that are familiar to any digital technology. Film is an analog technology limited only by atoms; when light strikes a film it reacts and stores colour information broadly across the spectrum of light. Digital technology is focused on capturing information in buckets; the more buckets and the smaller and closer together they can be put, the smoother it looks. You can see this with printed pictures. Each one is made up of lots of tiny dots that, from a distance look like a smooth whole. Early printers produced quite jaggy pictures because the dots were quite large, more recent printers are able to place tinier and tinier dots, closer and closer together, until we can now print out pictures that look indistinguishable from developed film pictures.
Digital cameras work the same way, just the opposite way around. Early cameras were low resolution and looked jaggy, recent ones have millions more image capture sites within the same size sensor, giving much, much higher resolution images.
The next challenge is how to capture, not only which cells are receiving light but what levels of luminosity (intensity of the light) and chrominance (what colour and shade) are being received.
The final challenge is the quantity of information generated. The more image detecting cells, the better the colour information and the faster frames per second, all create more and more volumes of information that need to be manipulated and written to storage.
As with all things digital, things keep getting smaller and faster but bottlenecks still arise and need solutions to work around them. This article then is an attempt to explain, from a filmmakers perspective, what the various digital video terms that are bandied around actually mean, why some cameras are better than others and what the
A sensor comprises a matrix – in the case of HD video, this would be around 1920×1080- of individual photoreceptors, each one being a tiny site that
generates an electric charge when it is struck by light. The level of charge generated varies depending upon the intensity of the light, so as well as detecting the presence of light, it is also detecting how much light (luminescence). The sensor looks at the amount of electricity coming out of each receptor and writes it all down as a matrix of black, grey or white dots. At the most basic level the output from this is, as a whole sensor, a high resolution black and white picture.
This is great. We are capturing as much light information on a number of receptors that is equal to the number of dots that make up the HD Display (TV or whatever). If we capture 24 of these frames a second we have a great High-Def movie.
The challenge here is that we don’t want black & white most of the time, we want colour. Unfortunately photoreceptors can’t detect colour, they just ping when light hits them. We need an extra layer to capture the colour information.
As we know from school art classes, colour is made up of a mixture of Red, Green and Blue, so our preference is that for every photoreceptor we can somehow pick up the red, green or blue information that makes up the light beam. This would be a good example of where we have to make a workaround. Let’s look for a second at the other end of the process, at the LCD Television that this is ultimately going to be shown on.
An LCD TV is made up of millions of tiny liquid crystals, that, much like the receptors, is able to show light in various degrees. Basically they light up or stay dark and the overall image on the screen is of a picture. They also don’t show colour information. What the TV manufacturers do is place alternating red, green or blue filters over each LCD pixel. By lighting each one up, or not, in a cluster of 3 it is possible to represent any colour.
In the same way, in the camera sensor, red green and blue filters are placed over the individual photoreceptors. The idea being that if red light hits a red filter it will detect light but if blue or green light hits it, nothing will get through. This becomes a receptor that detects red light. Combining this with green and blue receptors we can detect all colours.
This suggests that it takes three times as many receptor sites to capture a colour image than a black and white one. In fact it is four times. The standard layout of these filters is a 2×2 grid, called a Bayer Filter, that comprises one red, one blue and two green filters. This is because the green filters are most able to capture the luminance information.
Now, we don’t want to find that, when shooting in colour , we have a much lower resolution sensor. After all a 16×16 B&W sensor would need to become a 4×4 RGB sensor, hugely reducing the resolution but also increasing the spacing of the notional RGB sensors. This would be one of the areas where a workaround is used but we will look at that in the Processor section.
Incidentally the Bayer filter is only one of a range of solutions for adding colour information to photo receptor sensors. One other solution is 3CCD,
whereby three image sensors are used, one for each of Red, Green and Blue, by using a prism at the beginning of the chain to split the light into three separate beams and send them to the relevant sensor. This is more commonly found in professional equipment due to the considerable cost of this solution. There are numerous solutions being tested all the time and we can expect this area to continue to evolve.
Before moving on it is probably worth addressing the issue of why video is taken in much lower resolutions than still images. A Canon 550D, for example, has an 18megapixel sensor that only records in HD (1920×1080=2 Mega Pixel). The majority of this is down to speed. A 550D can shoot 3 frames per second, for only a short sustained period. Shooting the same resolution at 24 or even higher, frames per second for 12 minutes of video would produce even more data which the storage couldn’t keep up with. More to the point the actual hardware in the camera just doesn’t have the processing power to keep up. Finally today full HD is all that is required for video and there is no need to strive to achieve such massive resolutions.
Challenges with sensors
After that it’s all down to the sensor, in terms of what capabilities you are getting for the price you pay. How clean the output image is at higher ISO levels, dictates how good the camera is at low light shooting. The worse the camera and, as you ramp up the ISO to accommodate low lighting, the more the picutre will become unusable. Better cameras will be able to accommodate important frame rates – 60fps being a nice high-speed that you can slo-mo with and 24p being the standard of filmmaking Common problems seen on budget to mid-range (i.e. under £6k) are Rolling shutter, Colour Aliasing and jagging.
Rolling shutter is an issue that comes from the way in which the processor captures the image from the sensor. In CMOS sensors (the most common type of sensor) the whole sensor is not read at once but line by line. Because there is a difference in time between when the first line is read and the last line you can get strange visual effects when filming fast moving objects, or panning particularly fast. The pixels from across the top of the frame will be at a skewed angle to those towards the bottom, as the object will have moved whilst the sensor is being scanned.Most commonly you will get a wobble or stretching effect visually which ruins the shot. Whilst this can be accommodated by not shooting fast moving objects – fast pans are fairly rare these days anyway- this does need to be considered when choosing your camera for your shoot.
DIGIC The camera processor is unlike the CPU in a computer. The processor is a specialised chip that has been developed to undertake image processing. Effectively the processor takes the raw output from the sensor -still made up of a variety of red, blue and green blocks- and performs a range of processes on it . Most of the stuff the processor does is maths voodoo, invented by very clever people, to modify, clean up, fix and make the image into a final format ready for putting in to the video file.
The first thing the processor does is sort out the resolution problem. You’ll remember that whilst we have the resolution we want, each individual pixel is only either red, green or blue. Using a clever mathematical technique called interpolation, the processor goes through all the existing data and, based on the colours surrounding a given cell, guesses what colour it should be. It doesn’t sound like it should be effective but trust me, this is what pretty much all cameras do and don’t they always look pretty good?
This results in a full resolution image where every pixel has a specific colour value. This will be stored as a certain number of bits, which is how computers store everything. Colour will usually be stored as 24-bits, 8-bits for each colour of RGB, that is it allows all the values between 11111111 and 00000000, or 255 different shades of red, plus 255 shades of blue, plus 255 shades of green. Altogether those 24-bits can therefore describe a ‘colour space’ of 255x255x255 colours, or roughly 16 million. This is stored for each individual pixel, which is a lot of information.
Now the image is as good as it can be, the focus changes to reducing information, whilst losing as little visual information as possible, in order to create a video stream that is manageable in terms of size and speed.
Next the processor performs something called Chroma Subsampling. According to visual theory the human eye is much less sensitive to quality of colour (Chroma) than that of brightness (Luma). Chroma subsampling is a technique whereby the amount of colour information is reduced by half or to a quarter, whilst retaining the brightness information. This dramatically reduces the size of data generated, whilst creating very little visual change to the naked eye. How chroma and luma is stored is a popular topic amongst film makers. The storage method is normally indicated using three numbers, such as 4/4/4 or 4:4:4. All the 4’s means that nothing has been reduced and full chroma and luma information is being stored. This is usually called full RGB and would be seen as the holy grail. Various techniques are used where the chroma is reduced, most commonly seen are 4:2:2 -seen on high end digital video- and 4:2:0 -seen on most common HD video formats such as DVD and AVCHD.
It’s worth pointing out here that the processor isn’t actually looking at data and converting it. In reality the processor is driven by a clock which tells it how frequently to check the sensor status (24 times ,or frames, a second for example, although it will check more frequently as part of error correction). In terms of chroma subsampling it just chooses to only read all the luma information and only read some of the chroma – either not all at once, or only every other frame. So in this way not only is there less data output but the processor has less work to do.
One of the key reasons that chroma subsampling is a popular topic with filmmakers is that this can have an impact on successful chromakeying (green screening). When editing the final video will usually be translated back into 4:4:4 using a similar interpolation technique to that used earlier. Often the interpolation can prevent the green screen from looking truly a single, common, green colour, causing artifacts and strange effects to appear amongst the green. As chromakey is dependent upon taking a given colour and removing it, any variance in the green, however tiny, will create visual distortions.
Some of the common hacks or firmware modifications that are often released for cameras will modify things so that the camera doesn’t do as much of this, producing a higher quality output and greater volume of data. This is much like overclocking a PC, the boundaries of what the hardware was designed to do are being pushed, which may cause problems but, often can be used to get better quality on a lower budget.
At this point the image is available as a visible image. If the camera has an external output, such as HDMI, the picture can be sent out over HDMI to be viewed on a monitor. Otherwise the image is further prepared to be stored on the disk.
The processor then compresses the image in order to further reduce the amount of data to be stored. Again there are a wide range of formulas to compress images. Commonly they will use techniques to eliminate duplication, such as marking whole areas as just black, rather than storing the colour of thousands of pixels as black. The amount of compression depends on the format used -there are a variety, including VC-2, H.264, MPEG-4 – and the level of compression chosen. An effective rule of thumb is that the cheaper the camera the higher a rate of compression that has been chosen by the manufacturer, to fit with the capability of the hardware in the camera. Once again this is where the firmware hacks come in to use. Many of these modify the native operating of the camera to perform less compression and thus provide a better quality picture. The Panasonic GH-2 for example normally outputs 24p footage at 24 MegaBits a second, which is ok but a long way from the 90 Megabits/second required for BBC broadcast. With a firmware hack, however, this can be upped to 100 Megabits/second, a vast improvement in quality. Obviously there will be issues, with pushing a camera so much further than its design but with a DP who knows his camera well he should be able to mitigate the issues.
Finally the image will be sent to storage to be saved into the video stream.
Most cameras use flash memory for storage, usually a CompactFlash (CF) or SecureDigital (SD) card. These are pretty fast, able to record around 48-80Megabits/second (6-10 Megabytes/second). Whilst some newer cards are able to record at higher speeds, you can see from the above notes on BBC Brodcast requirements of 90 Megabits/second that these cards are a bottleneck and the cause for the need for so much compression. More advanced professional cameras record to tape or solid state disk (RED ONE camera’s produce 224 Megabit/second output).
A second approach to recording footage can be to record directly from the HDMI output to a solid state disk, bypassing some of the compression and outputting a much higher quality video stream, sometime with less chroma subsampling. This will, again, depend on the camera, as not all will output uncompressed video over HDMI (in terms of DSLR’s probably only the Nikon D800 does this, and even then may output in order to provide a secondary viewfinder, including overlaid graphics. Usually the disk recorded to will be part of a field recorder unit, which will offer the additional advantage of recording the footage directly to a full size editing format, such as ProRes, which you would otherwise have to laboriously convert your footage to prior to starting editing.