A Primer on
Video Compression
by Ken
Freed.
.
The
fabled '500 channels' is possible because of video
compression. Here's an early overview that's still valid
for those who want to understand the technology.
Video
Compression. The holy grail for digital media transport
of the fabled "500 channels." The legend, the myth, the
misquote. The promise of interactive television by 1995,
the interactivity we still await in 1997.
Sure, the job
turned out to be a lot tougher than people expected. But
patience and persistence do prevail. Digital set-tops and
high-speed cable modems are commercial at last. The FCC
finally has approved a standard for digital television.
And digital compression, squeezing more stuff into less
space, by now is a proven technology.
Which types of
compression equipment are suited to what kinds of
applications? Those handling compressed video in a
production setting have different requirements than those
backhauling a signal for uplink elsewhere. Complicating
matters, as digital video goes through multiple
generations of compression and decompression, as
artifacts multiply exponentially, any conscientious
engineer can start to second-guess earlier purchasing
decisions. Human nature strikes again.
Compression
Meets DTV
Compression
standards and techniques vary, but the essential
principle remains the same. Any video picture being
transmitted in a digital format contains picture elements
(pixels) that do not change from frame to frame, like
stationary background in a talking head shot. Compression
reduces or eliminates carriage of unchanging pixel data
until the picture changes. Compression works best pixel
by pixel.
Compression
involves sampling the frame for encoding and then
reconstructing the frame upon decoding. Digital sampling
in MPEG compression, for instance, can be done in a video
camera or codec with a semiconductor array encoding the
picture on a digitized grid, each pixel accounting for a
tiny piece of the whole picture. Each video frame, in
turn, needs to be decoded for display on either a NTSC
analog screen or an ATSC digital screen.
For reference, the
standard NTSC frame is 720 picture elements across by 525
lines down (483 lines of active video per frame plus 42
lines of available vertical blanking intervals) with a
width-to-height aspect ratio of 4:3. MPEG-2 sampling
retains 430 active lines on a 525-line system and 576
active lines on a 625-line system.
The ATSC Digital TV
standard approved in December by the FCC specifies a more
rectangular aspect ratio of 16:9 with picture sizes of
1920 pixel by 1089 lines and 1280 pixel by 720 lines. The
DTV rules specify 704 pixels by 480 lines at 4:3 for
existing NTSC programming. A standard PC screen of 640 by
480 screen with a 4:3 aspect endures with "material
designed for VGA computer monitors." These indicate the
video frames being compressed today and for years to
come, in the USA, anyway, which may influence standard
decisions in other lands.
In terms of the
broadcast transmissions to be compressed, the new DTV
standard supports one or two simultaneous high-definition
programs streams, four or more NTSC analog programs with
improved quality, numerous digital audio signals, and
massive amounts of data. The December 26 order is the
first in a series of FCC ruling to revamp broadcasting
for the digital age, including channel re-allotments. The
FCC asserts faith the new DTV standard leaves room for
innovation.
Compression
Method Selection
When selecting a
compression method for any given transport application,
in-house or to viewer's homes, many factors need to be
weighed &emdash; infrastructure requirements, network
interoperability, capital budget restraints, for
starters. To nudge thought in a helpful direction,
below are eight sets of questions that can be raised
in any staff meeting considering possible compression
system purchases.
1. Is the
compression method scaleable for varying transfer rates
and memory demands? Can the method control the bitrates
in channels with specified bandwidths? Is the method
suited to the type of information being sent? Is the
compression algorithm designed for the particular job at
hand, such as sending signal on telco lines from the
station to a far transmitter, or was the technology
ported from some other application?
2. If being used
for production, does the compression method under
consideration support random access for frame-accurate
video editing? Can you edit on the fly? Does the image
update fast enough? Does the method at least support
progressive decoding for fast previews?
3. Is scanning
progressive or interlaced, top to bottom or every other
line? How is quadrature YUV video sampled? 4:4:4, 4:2:2,
4:2:0, 4:1:1, and is it real 4:1:1 or something else?
4. Does the
objective quality of the compressed video meet or exceed
mathematically quantifiable criteria for standard
deviation and signal-to-noise ratio? What are the
functional fault tolerances for channel noise, channel
bandwidth fluctuations and channel degradation?
5. Given the
potential delays from compression and decompression
procedures at even the highest speeds, will the financial
expenditures for codec equipment support essential
real-time operations? Given the number of degrading
computer operations performed per pixel, does the total
cost of compression justify the risk of losing viewers?
6. What is the
minimal acceptable resolution? Has the subjective quality
and even the psycho-physiological quality of the video
passed muster in viewer tests and trials? Does the
related audio compression pass similar muster? And does
the staff like what they see and hear?
7. In the case of
interactions on subscription TV services, does the
compression technology support information integrity? Is
it compatible with favored encryption methods? Is
transaction security guaranteed?
8. And in the final
analysis, when the video is viewed on a home TV or PC or
NC screen, will the product deliver clean, sharp analog
or digital video on demand? Will it save or earn more
cash than it costs?
Kindly hold these
questions in mind in considering the compression methods
and formats discussed below, starting with MPEG
compression.
MPEG-2
The Motion Picture
Experts Group (MPEG) of the International Standards
Organization (ISO) adopted MPEG-1 and then MPEG-2 as
international digital compression standards for
transmitting video.
The original MPEG-1
handled transport at 1.5 to 4 Mbps with a 352x240 pixel
sampling grid for 60 Hz systems in North America and
352x288 pixels for 50 Hz systems in Europe. MPEG-1
parameters and algorithms, intended for progressively
scanned video, supports 6:1 compression.
Limited TV
applications for MPEG-1 prompted development of MPEG-2
(ISO 13818), which offers compression rates above 6:1 by
coding 704x480 pixels per frame at 30 frames per second
for video (with audio at 4 to 16 Mbps). Higher data rates
produce better video playback. Also, the video in MPEG-2
can be divided into two or more coded bitstreams, vital
for multichannel transmissions, which is why MPEG-2 is
specified in the FCC ruling for DTV.
At its best, MPEG-2
codes interlaced source video at full bandwidth, reducing
storage and bandwidth costs as much as six times. This
helps explains why MPEG-2 has been adopted for headend
transmitters and receiver terminals in DSS and DVB
satcasting, cable and wireless cable (microwave) TV, and
at last digital broadcast TV.
Despite existence
of a program version differing slightly from the
transport version, MPEG-2 remains a poor choice for video
production because of a tendency to yield artifacts from
repeated compression. Most digital production operations,
therefore, use uncompressed D1 video to ensure video
quality prior to transmission.
A prominent example
of MPEG-2 deployment is the new $20 million digital
upgrade to the broadcasting network of Canadian Satellite
Communications. Cancom uses station feeds from Boston,
Detroit, Minneapolis, Seattle, Edmonton, and Hamilton
along with the network feeds of ABC, NBC, PBS, CBS, and
Fox to create regional programming packages for 2500
cable headend operators.
The MPEG-2 system
is replacing Cancom's analog transmission network. Cancom
now distributes analog programming to Canadian cable
companies from 25 uplinks across the top of North America
to about 14,000 Scientific-Atlanta MPEG-2 digital
satellite receivers in subscriber homes. Their new
Scientific-Atlanta network management system enables the
company to control 15 uplinks and 25 encoding systems
from one location.
"Cancom's use of
our MPEG system is a textbook application of both the
cost saving and increased programming options available
with digital video compression," said Dwight Duke,
president of satellite television networks at
Scientific-Atlanta. "By increasing the amount of
programming without driving up their transponder costs,
Cancom has great potential for increased growth in
revenue."
Elsewhere, ADC
Telecommunications announced a $6 million investment in
Optivision, a developer of MPEG compression products,
including an OptiVideo line of MPEG-1 and MPEG-2 encoders
and decoders designed for transmission and
video-on-demand markets. In alliance with C-Cube, a world
supplier of MPEG codec chips, Optivision is developing an
advanced MPEG-2 encoder, a genlock decoder and a high
speed four-channel, rack mounted decoder. These decoders
operate from 600 Kbps to 15 Mbps and support such data
transport formats as MPEG-1 System, MPEG-2 Program, and
MPEG-2 Transport.
"Applications for
the combined product lines of ADC and Optivision are in
the cable TV, distance learning and broadcast video
markets," says Fred Lawrence, senior vice president of
the ADC Transmission group. New ADC products include the
DV6000 digital transmission system, which operates at 2.4
Gbps while transporting simultaneously up to 16 channels
of digitized broadband traffic. The modular system
supports diverse network configurations and multiple
video formats. Ameritech, SNET, and Viacom Cable use the
DV6000 digital video transmission system in their video
networks. ADC also makes the PixlNet system, a multipoint
video conferencing control and switching unit that's
compatible with H.261 viewphone codecs.
Another of the
numerous players in the MPEG-2 arena is Alcatel Network
Systems. Alcatel video compression products include the
1745VC for 525-line screens, refreshing at 720 lines per
second at 45 Mbps, sampling at 14.3 MHz for short haul to
long haul service.
A full list of
MPEG-2 vendors would also include these familiar names
&emdash; C-Cube, Compression Labs, Divicom, General
Instruments, Hitachi, Hughes, Hyundai, Motorola, IBM, LSI
Logic, NEC, Pioneer, Shure, Siemens, Sony, Teleos,
Thomson, Toshiba, TVCOM, Vela Research, Zenith, and Xing.
These and other companies make decoders, encoders,
set-tops, and chipsets for varied applications. They all
have websites.
Studio Profile
MPEG-2
For anyone involved
in video production, from local stations to world-class
post houses to global network operations centers, the
biggest and perhaps best compressions news at the start
of 1997 is the advent of 4:2:2 studio profile MPEG-2
compression at main level.
Content producers
have long complained that MPEG-2 video lacked sufficient
quality for studio applications, to put it politely.
Consequently, the Motion Picture Experts Group in 1994
began evaluation of the 4:2:2 component studio signal as
established in Recommendation 601 from the International
Telecommunications Union (ITU). The vital improvement in
studio profile MPEG-2 is more chroma sampling of the
digitized picture.
Explains Dave
Elliot, vice president of engineering services for the
ABC Television Network, "Standard MPEG and MPEG-2 uses a
4:2:0 sampling scheme, which means it takes a full sample
of the luminance, but it tosses out half of the
chrominance information, specifically, the color
coordinate on one axis of the color grid."
"Studio profile
MPEG increases the chrominance sample to 4:2:2," he says,
"thereby accounting for both axes on the color grid by
sampling every other element, which provides better
replication of the original 4:4:4 signal." The 4:2:2
profile preserves 512 lines on a 525-line NTSC system and
605 lines on a 625-line PAL system.
"The 4:2:0 sampling
is okay if a signal's going straight out because there's
little risk of picture degradation from transmission,"
Elliot says. "But the 4:2:2 sampling is better for
multiple iterations of a video signal where the video
will be compressed, decompressed and recompressed several
times before it finally goes out to viewer's
homes."
MPEG 4:2:0
compression is limited to a maximum bitrate of 15
Megabits per second, and it prohibits simple editing. The
4:2:2 scheme supports speeds of 45 Mbps and permits
datastream editing on either tape or disk &emdash; an
alluring capability for content producers.
ABC Television
became the first network to deploy the new Sony MPEG-2
4:2:2 studio profile at main level in their national
broadcasts from the Republican National Convention in San
Diego. Transported over AT&T long distance fiber
lines from the convention center to network studios in
New York City, the MPEG-2 4:2:2 technology allowed ABC to
double their transmission capacity at increased
transmission speeds.
Sony MPEG-2 4:2:2
compression supports transmission of two
broadcast-quality video channels on a single 45 Mbps DS3
fiber line (using a serial digital data interface, SDDI).
The compression scheme also supports transmission of one
channel over DS3 at twice the speed.
For the San Diego
trial at the GOP convention, Sony provided the prototype
of two new products, the DSM-M1 multiplexer and its
companion unit, the DSM-D1 demultiplexer, which were used
with a prototype "LinkRunner" box from Lucent for
protocol transfer into DS3 framing. The Sony boxes could
accept two single channels or a signal channel at double
the speed.
To help avoid
diffraction and cascading degradation, Sony is using the
same 4:2:2 profile at main level in their new generation
of Betacam SX players, file servers, nonlinear editors,
hybrid recorders and other digital systems. Sony SX
camcorders already have SDDI outputs, packetizing 18 MB
to ride in a 270 MB cable, so SX cameras are compatible
with the Lucent DS3 box and multiplexer, which will be
sold as a package through Sony, Lucent, AT&T, or
local telcos.
AT&T media
industries marketing director Jack Gelman says that a
4:2:2 system is a "vast improvement" over an NTSC codec
using a 45 MB line to carry a composite analog video
signal. "When you can get two digital component video
signals on the same bandwidth, when you can get twice as
much throughput, or when you can use that 45 MB pipe in
half the time, like an ENG crew sending a 30 minute tape
in 15 minutes, you not only can reduce transmission
costs, but you can get recorded footage onto the network
faster than ever before."
Motion-JPEG
The Joint
Photographic Experts Group developed JPEG for compressing
color or gray-scale images, such as photographs and
naturalistic artwork. JPEG generally is unsuited for text
and line art because of the amount of image content lost
upon decompression. Based on what tests show the human
eye can't detect, JPEG utilizes "color independent"
eight-bit and twelve-bit sampling in combinations that
can progressively scan frequency, amplitude and other
factors.
Motion-JPEG
algorithms can compress individual video frames without
looking at adjacent frames in a video sequence. Compared
to MPEG, JPEG offers lower compression (because there's
no interframe information in the datastream), has
real-time compression, supports frame-by-frame editing at
a uniform bit rate, and JPEG is cheaper. The chief
disadvantage of JPEG is an inherent loss of image
quality, which may be addressed in the specification for
the new JPEG 2000.
"JPEG is a
well-established technology with viable applications in
television," says Peter Symes of Tektronix, manager of
advanced technology for Grass Valley products. "Yet to
make the compression technique more useful, the JPEG
committee is now in the process of formulating a new JPEG
standard that will be published by both the ISO/IEC and
ITU." A draft specification is expected in
1997.
"The new M-JPEG
will be backward compatible," Symes says, "and it will
offer more flexibility with the use of basic tools like
MPEG. The main improvement will be the quantizing
matrices, which JPEG now defines for the whole image. The
new JPEG will define different QM within one picture, so
you get different compression in different parts of the
picture, according to your needs."
Another M-JPEG
improvement is a new "lossless" mode, a mathematical
construct for more efficient coding by using less bits.
"The new JPEG lossless mode will allow you to get back
exactly what you put in," he says, "It uses statistical
prediction to compare pixels next to each other and
select the shortest code possible to represent each
pixel, thereby reducing the amount of code about
2:1."
Symes notes that
Tektronix already has a "successful JPEG implementation"
in the Profile line of compressed disk recorders, which
soon will be enhanced with studio profile MPEG-2 at 4:2:2
sampling, which Symes says was a Tektronix initiative.
"The Tektronix staff did a lot of the drafting toward the
end."
Another company
implementing Motion-JPEG is Barco in Belgium, partly
owned by the Flemish government, which offers the
DigiTrunk video compression system for point-to-point
digital transmission, reportedly without artifacts.
Modular architecture with optional analog video and audio
inputs allow fairly flexible configuration of the 19-inch
rack-mountable units. At output rates of 10 to 25 Mbps,
DigiTrunk M-JPEG uses the MPEG-2 data packet format to
support MPEG-2 transmission equipment.
DVD By Any
Name
Call it "Digital
Video Disc" or "Digital Versatile Disc," but digital
optical disks are coming to market in 1997, and
compression may be the key to success. Deliver more
content faster. Push, push, push.
The digital disk
specification has several formats, such as DVD-ROM and
DVD-Audio. The DVD-Video format supports both DTV 16:9
and NTSC 4:3 frames along with eight tracks of digital
audio, each with eight channels of Dolby surround sound.
DVD handles frame searches along with seamless video
branching with up to nine camera angles available for
selection during playback &emdash; if not blocked by any
parental lockouts. High-end DVD players may offer
component video output at near-studio-quality if D1 video
is compressed with MPEG-2. DVD data rates vary from 1 to
10 Mbps, averaging 3 Mbps.
DVD presently uses
a red laser to read the disk, but DVD likely will shift
to a blue laser, as advocated by David Paul Gregg,
because the blue wavelength supports a finer focus,
expanding the amount of compressed video and audio and
data a digital disk can contain.
Until digital disk
technology replaces tape in professional camcorders,
digital video cassette (DVC) camcorders will rule the
field, but even here compression is crucial. DVC players
include Hitachi, Panasonic, Philips, Sony, Thomson, and
Toshiba, to list a few.
Offering viable 5:1
compression, DVC camcorders can compress fields
separately or combine two fields into a single
compression block. DVC quality, say varied sources, falls
between M-JPEG and MPEG-2.
Illustrating the
options, the Panasonic DVCPro and Sony Digital Betacam
compete head-to-head for ENG applications where there
will be digital post production. Sony DVcam use a YUV
4:2:0 codec for European PAL where DVCPro compresses YUV
4:1:1, making the two incompatible. For NTSC
applications, both compress YUV at 4:1:1.
If the Sony Digital
Betacam or DVCam or the Panasonic DVCPro do not fit your
needs, another option is the JVC Digital S format.
Compression in
Perspective.
Name any television
delivery system &emdash; terrestrial and satellite
broadcasting, microwave wireless, optical fiber, coax
cable, hybrid fiber-coax, utility power line, even plain
old telephone lines using twisted pairs of copper wires
&emdash; and there are compression products available for
video transport. Name any conventional or nonlinear
production house, and suitable compression products are
announced and ready to ship.
Not all the bugs
have been worked out, of course, and wondrous innovations
hiding behind the corner may knock current thinking for a
loop, but the state of compression at the start of 1997
can be called realistically optimistic.
The dream is coming
true. City by city, town by town, county by county,
thanks to digital compression, the USA and the rest of
the industrialized and developing world is about to have
access to more information in a second than our ancestors
ever had in a lifetime.
Time is money in
digital transport, so investing in compression equipment
increasingly makes fiscal sense. Send more content
faster. Push, push, push. In the emerging open
marketplace of digital services, the companies that can
reliably compress the most content with the most quality
and least signal degradation will have a competitive
advantage. .
How MPEG
Compression Works
MPEG is based on a
full quadrature sampling of every digital picture element
in an image, designated "4:4:4." The first digit
represents luminance (light) or degree of brightness on a
1 to 10 scale. The next two digits in the formula
represent the sampling of chrominance (color),
identifying a precise spot on a standard grid of 256
colors by 256 colors. As the picture is converted from
RGB to YUV, each frame is broken into 16x16 macroblocks.
These blocks are broken into four 8x8 luma (Y) blocks,
and two 8x8 chrominance (CrCb/UV) blocks. The image then
is subsampled as YUV. MPEG-1 samples YUV at 4:2:0. MPEG-2
samples at 4:4:4, 4:2:2 and 4:2:0.
Each macroblock is
predicted from the previous or future frame based on the
amount of motion in the block during the time interval.
The three types of frames in MPEG bitstream are
designated as "I" for Intra-frame coding, "P" for
Predictive inter-frame coding and "B" for Bidirectionally
interpolated coding. I frames encode a still image, the
snapshot, and every datastream must start with an I-frame
since no prior frames can predict it. P frames are
predicted from the most recent I or P frame. When
pictures shift so fast frame prediction is impossible,
the blocks are coded as I frames. B frames are predicted
from the closest I or P-frame, and cannot suffice alone.
All three methods
of frame coding are attempted at the outset, and the best
frame coding is what goes into the datastream. Pattern
strings define how frames flow in the bitstream. For
instance, in an IBBPBIBBPB sequence, the stream starts
with an I frame. All of the B and B frames reference fore
or aft I or P frame. The string repeats until the picture
ends. This sequencing helps reduce decoding errors.
The MPEG-2
compression standard includes different tools for
different applications of increasing complexity. The
options are expressed as a matrix of profiles and levels
with complexity increasing from left to right and from
top to bottom. This very simplified chart of the matrix
specifies the MPEG-2 chrominance sampling options. The
most frequent compression scheme for television is main
profile, main level. The new 4:2:2 studio profile
compression takes place at main level. .
On The
Horizon: MPEG-4 and MPEG-7
High-speed digital
transmission remains beyond the fiscal reach of many
television operations, for now, so an effort is being
made to provide reliable video compression at lower
speeds. One valuable answer may be MPEG-4, a standard
from the Motion Picture Experts Group for coding
audiovisual content at very low bitrates.
The work on MPEG-4
(ISO 14496) officially began at the MPEG meeting in
Brussels in September 1993. and the initiative has been
approved by unanimous ballot of all national bodies of
ISO/IEC JTC1. A draft specification is expected in 1997
with adoption foreseen for November 1998.
MPEG-4 requires
engineers to develop fresh solutions. According to J.
Ostermann at the University of Hannover, chairman of
regional coordinators for the MPEG organization, the
techniques considered so far have included model-based
image coding, human interaction with multimedia
environments, and low bitrate speech coding.
"When completed,"
Ostermann says, "the MPEG-4 standard will enable a whole
spectrum of new applications, including interactive
mobile multimedia communications, videophones, mobile
audio-visual communication, multimedia electronic mail,
remote sensing, electronic newspapers, interactive
multimedia databases, multimedia videotext, games,
interactive computer imagery, [and] sign language
captioning. Since the primary target for these
applications is bitrates of up to 64 kbps at good
quality, it is anticipated that new coding techniques
allowing higher compression than traditional techniques
may be necessary. This effort is in the very early
stages. Morphology, fractals, model-based techniques are
all in the offering."
MPEG-4 to date is
loosely being defined with the sampling grid having
dimensions of 176 by 144 at 10 Hz with coded rates
between 4800 bits and 64 kilobits per second. A target
application at this rate could be video conferencing or
home viewphones over POTS lines.
Reflecting the kind
of thinking going into MPEG-4, an important seminar on
MPEG-4 was held in July 1994 in Grimstad, Norway. The
meeting brought together experts in media psychology,
physiological aspects of vision and hearing, music
synthesis, speech synthesis, computer graphics,
animation, computer vision, artificial and virtual
reality, plus other fields They contributed ideas for
various applications and coding methods for MPEG-4.
Conceivably, MPEG-4
could replace CCITT H.261, the most widely used
international video compression standard for video
conferencing over switched networks. H.261 encodes data
in a hierarchical block structure format.
MPEG-4 is receiving
its most ardent support in Europe. The European Union
ACTS project developed software for several parts of
MPEG-4, including the successful development in 1995 of
software for video encoding and decoding. More recently,
the effort to develop MPEG-4 architecture, software and
hardware has shifted to a project called Emphasis.
Players in the Emphasis project include Thomson, Siemens,
Philips, Hertz Institute, France Telecom, Telenor, Ecole
Polytechnic, University of Hannover, and
others.
"The objective of
the Emphasis project," says spokesperson Paul Fellows at
Thomson Microelectronics Ltd., "is to firmly establish a
European lead in software and silicon technology suitable
for MPEG-4. The project will actively contribute to
MPEG-4 standards by delivering three key technologies
&emdash; MSDL [MPEG-4 Syntax Description
Language], software implementation of MPEG-4 tools
and algorithms, and then the specifications for processor
and co-processor architectures that meet the demands of
MPEG-4 applications."
If the MPEG-4
specification is ready by the 1998 deadline, according to
Fellows, Emphasis expects European media companies to
implement MPEG-4 as a mass market platform by "lowering
the cost of MPEG-4 technology to create a critical mass
of installed terminals."
At the last MPEG
meeting in Chicago, held September 30 to October 2, the
group approved work on a new standard entitled
"Multimedia Content Description Interface," nicknamed
MPEG-7.
A working draft is
expected in July 98 with a committee draft in March 99
followed by a draft international standard in July 1999
with specification of an international standard in
November 1999. If the work stays on track, MPEG-7 would
become an international standard one year after MPEG-4
attains this standing.
A pointman for the
MPEG-7 initiative is Fernando Pereira of the Instituto
Superior Tecnico in Lisbon, who gave the keynote address
at the 1996 Picture Coding Symposium in Melbourne.
"Although there is still no project description for
MPEG-7," Pereira says, "it may be foreseen that this
project will standardize the tools for high level
indexing and description of MPEG-4 coded audio-visual
information." .