Video Processing Whitepaper

Teranex Video Processing and HQV – Defined

by Jed Deame

History of the Technology

The roots of Teranex HQV processing go back to the early 1980s, when Lockheed Martin developed it for military image and video processing. In the 15+ years of development by

Lockheed Martin, over $100 million was invested in the technology, and 13 patents were issued.

Teranex was founded in 1998 to commercialize the Lockheed Martin technology. Since then, six additional patents on hardware, software, and algorithms have been filed thanks to the extensive work of Teranex's premier engineering team.

The company soon realized that the digital-media industry is particularly suited to the benefits of the technology. Teranex's video-processing platforms are used by the leading broadcasters around the world, including NBC, CBS, ABC, FOX, WB, and Turner

Networks, along with others in Japan, Australia, China, and South Korea. The Teranex video processor is the preferred choice because it is able to process any type of material, whether it be video, film, animation, noisy satellite feeds, or clean camera feeds, with the best depth and clarity.

The post-production community also uses the Teranex box to ensure perfect conversion between the many different formats they encounter as well as for film-to-video transfer

(telecine). In the telecine process, it is important to remove any dirt and scratches along with reducing the film grain to provide a pleasing picture and aid in the DVD-encoding process. The Teranex Image Restore system is a leader in the market and can significantly clean up the content without adding blurring or smearing.

Teranex Video Processing Whitepaper

Rev 12t June 25, 2006

Page 1 of 12

In 2002, Silicon Optix and Teranex realized that semiconductor technology had advanced enough that they could take the large Teranex video-processing box and condense into an affordable single chip. In September 2004, the Realta HQV video processor, which matches the performance of Teranex’s original 3RU video processor, was announced to the world.

The Programmable Advantage

The Teranex architecture has won so much business with broadcasters and postproduction houses because it is fully programmable. With all the evolving formats and standards in different countries around the world, the Teranex video processor can adapt.

Rather than hard-coding the video-processing algorithms into silicon competitors do, the

Realta HQV executes the algorithms entirely in software on one of the world's most advanced array-processing engines. This is especially important when buying a format converter, which is an investment that should last at least 5 – 10 years. It just makes sense to have a video processor that can adapt to the confusing world of rapidly changing standards. Teranex customers can finally purchase a low cost "future-proof" product that doesn’t become obsolete after a few months. Also, as new formats are standardized and new features are developed, software updates will be provided to the customer base.

Of course, one of the primary advantages of this programmability is the Teranex videoprocessing software, which has been refined through 100,000 hours of content verification over the past six years by hundreds of the most demanding customers worldwide – the “Golden Eyes” of Hollywood’s post-production and broadcast companies.

Interlace to Progressive Conversion

The process of interlace to progressive conversion consists of a number of complex image processing steps. It is important to understand the details of the processing steps in order to appreciate the many places where quality may be lost in the conversion process.

Teranex Video Processing Whitepaper

Rev 12t June 25, 2006

Page 2 of 12

De-Interlacing

Why De-Interlace?

Most video sources, including standard definition DV, Betacam, Digi-Beta, and high definition HDV, HDCam, and HD-D5 are predominantly interlaced images. Instead of transmitting each video frame in its entirety (what is called progressive scan), most video sources transmit only half of the image in each frame at any given time. This concept also applies to recording video images: video cameras and film-transfer devices record only half of the image in each frame at a time.

The words "interlaced" and "progressive" arise from the days of CRT or "picture-tube" televisions, which form the image of each frame on the screen by scanning an electron beam horizontally across the picture tube, starting at the top and working its way down to the bottom. Each horizontal line "drawn" by the beam includes the part of the picture that falls within the space occupied by that line. If the scanning is interlaced, the electron beam starts by drawing every other line (all the odd-numbered lines) for each frame; this set of lines is called the odd field. Then it resets back to the top of the screen and fills in the missing information, drawing all the even-numbered lines, which are collectively called the even field. Together, the odd and even fields form one complete frame of the video image.

Translating the interlaced video signal from 480i and 1080i sources into progressive format is required as the first step in DTV format conversion. Until the image in the progressive domain, it is not possible to scale it to the desired output resolution without creating unwanted image artifacts. This interlaced to progressive conversion is the

most important step in the format conversion process and determines the overall quality of the output video signal.

Basic De-interlace Techniques:

If the objects in the video image are not moving, it is very easy to do the de-interlacing – the two fields can be weaved together and combined to form a complete frame. However, if there is motion in the image, this technique can generate significant artifacts. Since the recording is performed in an interlaced manner, the two source fields that make up a complete frame are not recorded at the same time. Each frame is recorded as an odd field at one point in time, and then as an even field recorded 1/50 th

or 1/60 th

of a second later.

Hence if an object in the scene has moved in that fraction of a second, simply combining fields causes the errors in the image called “combing” or “feathering” artifacts.

Teranex Video Processing Whitepaper

Rev 12t June 25, 2006

Page 3 of 12

An example of feathering or combing

Simplest Competitor Approach (Non-Motion Adaptive):

The simplest approach to avoid these artifacts is to only process a single field. This is called a non-motion adaptive approach. In this method, when the two fields reach the processor, only data from the first field is used and the data from the second field is not taken into consideration.

The video-processing circuitry recreates or “interpolates” the missing lines by averaging pixels from above and below. While there are no combing artifacts, image quality is compromised because half of the detail and resolution have been discarded.

More-advanced techniques have been adopted by virtually all standard-definition video processors, but this basic approach is still sometimes used for high-definition signals, due to the increased computational and data-rate requirements of higher video resolution.

With video processors from some competitors, only 540 lines from a 1080i source are used to create the image that makes it to the screen. This is true even for video processors

Teranex Video Processing Whitepaper

Rev 12t June 25, 2006

Page 4 of 12

from companies that may have been considered providers of flagship performance in the standard-definition era.

Advanced Competitor Approach (Frame-based Motion Adaptive):

More advanced de-interlacing techniques available from the competition include a framebased, motion-adaptive algorithm. By default, these video processors use the same technique described above. However, by using a simple motion calculation, the video processor can determine when no movement has occurred in the entire picture.

If nothing in the image is moving, the processor combines the two fields directly. With this method, still images can have full vertical resolution, but as soon as there is any motion, half of the data is discarded and the resolution drops to half. With this technique static test patterns look sharp, but moving images tend to look soft.

Teranex HQV Approach (Pixel-Based Motion Adaptive):

HQV processing represents the most advanced de-interlacing technique available: a true pixel-based motion-adaptive approach. With HQV processing, motion is identified at the pixel level rather than the frame level. While it is mathematically impossible to avoid discarding pixels in motion during de-interlacing, HQV processing is careful to discard

only the pixels that would cause combing artifacts. Everything else is displayed with full resolution.

Only the pixels that would cause combing are removed.

Teranex Video Processing Whitepaper

Rev 12t June 25, 2006

Page 5 of 12

Pixel-based motion-adaptive de-interlacing avoids artifacts in moving objects and preserves full resolution of non-moving portions of the screen even if neighboring pixels are in motion.

Motion Compensation

Motion Compensated De-interlacing is often referred to as the “Holy Grail” of deinterlacing. In this technique, pixels in the first field are shifted to align with corresponding pixels in the second field in order to preserve the full resolution of objects in motion. This technique is extremely computationally complex, and requires significant processing horsepower to do properly. Although it has the capability of providing full resolution conversions of objects in motion, there are a number of practical limitations to how well this can be achieved. For instance, as the motion between fields increases, the search area increases exponentially, as does the probability that the objects will change in shape. This creates a much larger possibility for artifacts due to false matches in the search process.

It is important that motion compensation be performed on a per pixel basis and be utilized in conjunction with a high quality per pixel motion adaptive de-interlace framework in order to avoid the pitfalls and realize the potential benefits. Although region based motion compensation techniques may be used to reduce the computational complexity, the benefits are often indistinguishable from pure motion adaptive approaches. Due to the high compute requirements and associated costs of properly performing motion compensation on a per pixel basis, this technique is often relegated to systems serving the high end of post production.

“Second Stage” Diagonal Interpolation

To recover some of the detail lost in the areas in motion in the pixel based motion adaptive approach, HQV processing implements a multi-direction diagonal filter that reconstructs some of the lost data at the edges of moving objects, filtering out any

“jaggies.” This operation is called “second-stage” diagonal interpolation because it’s performed after the de-interlacing, which is the first stage of processing. Since diagonal interpolation is independent of the de-interlacing process, competitors have used similar algorithms with their frame-based de-interlacing approaches.

Diagonal lines resulting from moving object interpolation (green edges)

Highlighting the jaggy section of the line, the missing detail is re-created by averaging along the diagonal lines.

Teranex Video Processing Whitepaper

Rev 12t June 25, 2006

The new interpolated frame (full resolution)

Page 6 of 12

Truth in Marketing

Teranex is not the only company to implement pixel-based motion-adaptive deinterlacing, and it is important to recognize that all such de-interlacing is not identical. In order to implement a true per-pixel motion-adaptive de-interlacer, the video processor must perform a four-field analysis. In addition to the two fields being analyzed in the current frame, the two previous fields are required in order to determine which pixels are in motion. Clearly, if a competing de-interlacer does not evaluate four fields, it simply does not have the data necessary to perform true per-pixel motion-adaptive analysis.

Some competing products implement region-based analysis, in which motion is determined by evaluating larger blocks of the image rather than complete frames or individual pixels. Obviously, then, a claim of “four-field” analysis alone does not imply per-pixel motion-adaptive de-interlacing.

HQV Processing continues to analyze at the per-pixel level using four-field analysis even in high-definition.

Film Cadence and Video/Film Detection

Automatic Video/Film Detection

An important step in the interlace to progressive conversion process is to detect whether the content is video based (i.e.: interlaced fields), film based (e.g.: 3:2, 2:2, or other assorted cadences), or mixed video and film in the same frame. The processing requirements are very different depending on the content type and it is imperative that this detection logic be accurate in order to ensure that the proper conversion algorithms are used. It is also important that the video/film detection be fast and automatic. If it takes too long for the logic to make a decision, significant conversion artifacts may result.

If it is not automatic, the operator will need to constantly switch the processing mode to reflect the content type being processed. In some post production applications where the content type is known and fixed, this may be desirable, but for broadcast or display applications when the content type changes often, automatic video film detection is key.

If the Video/Film detector determines that the content is video based, the techniques described above are utilized. If the content is film based, an entirely different processing method must be used.

3:2 Film Mode Detection

Motion picture films are recorded at 24 frames per second. When the film is converted to video for DVD or television broadcast, those 24 frames must be converted into 60 interlaced fields. The process is as follows. Consider four frames of film: A, B, C, and D.

The first step is to convert these four frames into eight fields. This transforms 24 frames per second (fps) into 48 interlaced fields per second. Then, to account for the faster rate of the NTSC standard (roughly 30 frames per second or 60 interlaced fields per second), it is necessary to repeat certain fields. This is done by repeating a field every other frame.

That is, both fields of frame A are recorded (A-odd, A-even), but three fields of frame B

Teranex Video Processing Whitepaper

Rev 12t June 25, 2006

Page 7 of 12

are recorded (B-odd, B-even, B-odd). The cycle repeats with frames C and D. This is called a 2:3 (or 3:2) cadence because two fields of one frame are output followed by three fields of the next frame.

When this sequence is played back on a progressive-scan video display, it is possible to implement the same de-interlacing techniques described earlier (non-motion adaptive vs. motion adaptive, etc.). However, it is possible to perfectly reconstruct the original frames without losing any data. Unlike interlaced video, in which the two fields were recorded a fraction of a second apart, these fields were recorded at the same time in the same film frame and later separated into fields.

So, to display a video signal that originated as 24fps film, all a video processor needs to do is analyze the fields and determine that there is a regularly alternating pattern of two fields followed by three fields, etc. This recognition and reconstruction is called 2:3 (or sometimes 3:2) pull-down detection, and it is found in all but the most basic deinterlacers. Unfortunately, nothing is quite that simple.

Mixed Video and Film

Sometimes, further editing and post-processing is done on film that has been converted to video. This includes titles, transitions, and other effects. As a result, simply reconstructing full frames using 2:3 frame matching results in combing artifacts because parts of the image are best processed using a standard de-interlacing approach, while other parts will look better by detecting the right cadence and reconstructing the original frames.

Like the various approaches to standard de-interlacing, there are many approaches to dealing with mixed video and film. If the processor interprets the material as film, feathering artifacts will appear around the video portion; if the processor interprets the material as video, the film portion will be displayed at half of its resolution. Some processors determine whether there is more film or more video content and choose the approach with the greatest benefit. Since this usually means film, the result is feathering artifacts. Other processors are designed with the idea that these artifacts should never be seen and use the video de-interlacing techniques in all cases, at the expense of half the video resolution.

Teranex Video Processing Whitepaper

Rev 12t June 25, 2006

Page 8 of 12

HQV Processing, on the other hand, uses per-pixel calculations for all of its processing.

This means it is possible for the HQV processor to implement cadence-detection strategies for the pixels that represent film content while implementing pixel-based motion-adaptive de-interlacing for the video content that has been superimposed.

Other Cadences

The HQV Processing advantage of true per-pixel de-interlacing becomes even more evident when dealing with other cadences. Although 24fps film and its associated 2:3 video cadence is the most common format, it isn’t the only cadence used today.

Sometimes, TV stations accelerate their film-based movies and TV shows by dropping every twelfth field to make room for more commercials. This speedup is usually too small to be noticed by the average viewer, but these “vari-speed broadcasts” end up having unusual cadences such as 3:2:3:2:2. If a de-interlacer is unable to detect this sequence, as with most of the competition, half the resolution is lost.

The variety of cadences does not end there. Professional DVCPro camcorders are increasingly used in television and film production. In order to maximize the recording time, these camcorders use a 2:2:2:4 cadence or a 2:3:3:2 cadence to store the progressive source signal as 480i on the tape.

Animation is often rendered at 12fps. Two pull-down cadences can be used to convert this to the 30 fps broadcast standard. Doubling every frame, and then applying 2:3 pulldown to the resultant fields will generate a 5:5 cadence. Applying 3:2 pull-down to the frames (rather than the fields) will generate a 4:6 (or 6:4) cadence. The Japanese

‘Anime’ format is often rendered at 8 fps. To convert this to 30 fps, each frame of animation is repeated three times, and then 2:3 conversion is performed resulting in an effective cadence of 7:8 (or 8:7).

Teranex Video Processing Whitepaper

Rev 12t June 25, 2006

Page 9 of 12

Most competitors’ processors count the incoming fields and try to match them against known sequences such as 2:3 or 2:2 in order to select the right decoding. This works in the simplest cases, but there may still be a short delay before the processor is able to

“lock on” and determine the right cadence. In addition, when the video processor encounters an unusual sequence such as animation or DVCPro, it will resort to discarding half the data until it can lock to a known sequence.

With HQV processing, there is never any confusion about cadence. Instead of trying to match the incoming video against known patterns, HQV processing simply identifies complete frames as they come in. HQV processing is able to identify all known cadences, no matter how uncommon, and it can also detect cadences that have not yet been invented.

No matter what type of video you’re watching or where it comes from, HQV processing will always provide the best reconstruction of the image.

Noise Reduction

Random noise is an inherent problem with all recorded images. Noise can enter the system in many places along the path from acquisition to consumption. Not only does noise get introduced during post-production, compression, and transmission, but it is also present at the source in the form of film grain or imaging-sensor noise. Noise-reduction algorithms can minimize the noise in a picture.

The simplest approach to noise reduction is to use a spatial filter that removes highfrequency data. In this approach, only a single frame is evaluated at any given time, and structures in the image that are one or two pixels in size are nearly eliminated. This does remove the noise, but it also degrades the image quality because there is no way to differentiate between noise and detail. This approach can also cause an artificial appearance in which people look like their skin is made of plastic. This represents the most widely used noise-reduction approach.

A temporal filter takes advantage of the fact that noise is a random element of the image that changes over time. Instead of simply evaluating individual frames, a temporal noise filter evaluates several frames at once. By identifying the differences between two frames and then removing that data from the final image, visible noise can be reduced very effectively. If there are no objects in motion, this is a virtually perfect noise-reduction technique that preserves most of the detail. This approach is used by many high-end competitors.

However, a problem arises if there are moving objects in the image, which also cause differences from one frame to the next; of course, these differences should be retained. If moving objects are not distinguished from noise, they will be filtered and a ghosting or smearing effect is seen.

Teranex Video Processing Whitepaper

Rev 12t June 25, 2006

Page 10 of 12

HQV processing uses a per-pixel motion-adaptive and noise-adaptive temporal filter to avoid the artificial appearance and artifacts associated with conventional noise filters. To preserve maximum detail, moving pixels do not undergo unnecessary noise processing.

In static areas, the strength of noise reduction is determined on a per-pixel basis, depending on the level of noise in the surrounding pixels as well as in previous frames, allowing the filter to adapt to the amount of noise in the image at any given time. The end result is a natural-looking picture with minimal noise and grain and maximum preservation of fine details.

Detail Enhancement

Detail enhancement, also called sharpening, is an important component in format conversion, both standard definition and high definition. Unfortunately, due to the historically poor implementations of sharpening algorithms, this process has received a reputation as something to avoid.

All digital video goes through a low-pass anti-aliasing filter to prevent false color and moiré effects that can occur during the digitization process. The filter improves overall image quality, but it necessarily blurs some of the detail. The data-compression stage can also remove some detail. Fortunately, much of the lost detail can be mathematically recovered.

Because the human visual system perceives sharpness in terms of apparent contrast, exaggerating the differences between light and dark can produce what appears to be a sharper image. Unfortunately, due to rudimentary implementations of sharpening in the past, this process has been associated with artifacts known as “ringing” or “halos” in which objects are surrounded by a bright white edge. The resulting image appears harsh and does not reflect what was originally captured. The halos can sometimes be more distracting than the softness from the uncorrected image. For that reason, it is often recommended that users turn down the sharpening on video devices.

HQV Detail Enhancement technology is different. By using a more conservative algorithm and selectively identifying the area of blur before processing, HQV Detail

Enhancement avoids halo or ringing artifacts at even the highest setting. Of course, it is also possible to disable HQV Detail Enhancement if the source has already applied sharpening. A key benefit of HQV Detail Enhancement is that, when used in conjunction with our 1024-tap scaler, standard-definition TV can be delivered at near high-definition quality.

1024-tap Scaling

Teranex Video Processing Whitepaper

Rev 12t June 25, 2006

Page 11 of 12

Converting standard-definition video to high-definition video involves resizing an image to contain as much as six times the number of pixels it had originally. How this is done determines the quality of the resized image.

The most basic video processors perform their scaling calculations by analyzing no more than four pixels in the source image to create one pixel in the final image. This represents what is called a 4-tap scaler. (Without getting too technical, the number of "taps" determines the number of pixels that are analyzed.) With all other things being equal, a larger number of taps will result in better scaling quality. The average scaler uses no more than 16 taps. However, even this level of scaling can still produce blurry images.

HQV processing uses an image scaler with an unprecedented 1024 taps. This level of quality reflects the fact that HQV processing has its roots in Teranex algorithms, which were developed for defense and military image analysis. For every pixel, the HQV processor evaluates the surrounding 1024 pixels in order to provide the best image quality when scaling the image up from standard definition. Again, when this advanced upsampling technology is combined with HQV Detail Enhancement, standard-definition television sources are transformed to near-HD quality.

10-bit 4:4:4 Internal Data Paths

Not only does HQV processing implement some of the most advanced algorithms for video processing, but the internal data paths support 10-bits-per-channel with full 4:4:4 color sampling. (The term "4:4:4" refers to the fact that the color information can be input at full horizontal resolution, and 10-bit data paths provide 1024 steps of brightness and color.) The result is the ability to render over 1 billion colors. In comparison, conventional video processors that only have 8-bit data paths result in the ability to render only 16 million colors. Simply put, by maintaining more bits in the data, HQV products can preserve all the fine detail and dynamic range found in the original source.

Summary

As you can now understand, HQV processing represents an enormous leap in video processing, with true flagship performance in de-interlacing, noise reduction, and scaling with both standard-definition and high-definition signals. Silicon Optix and Teranex designed HQV processing as a no-compromise solution.

Most competing video processors only have enough computational horsepower to handle the simple cases of de-interlacing and 3:2 detection. Since they are not programmable, it is not possible for them to handle all of the “corner cases” common in today’s video content. With HQV processing, you can be assured you are getting the highest possible conversion quality, preserving the full resolution of the source.

Teranex Video Processing Whitepaper

Rev 12t June 25, 2006

Page 12 of 12

Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement