3.4.3 Video Signal Processing
Digital video has many advantages over conventional analog video, including bandwidth compression, robustness against channel noise interactivity and ease of manipulation. Digital-video signals come in many formats. Broadband TV signals are digitized with ITU-R 601 format, which has 30/25 fps, 720 pixels by 488 lines per frame, 2:1 interlaced, 4:3 aspect ratio, and 4:2:2 chroma sample. With the advent of high-definition digital-video, standardization efforts between the TV and PC industries have resulted in the approval of 18 different digital video formats in the United States. Exchange of video signals between TV and PCs requires effective format conversion. Some commonly used interframe/field filters for format conversion, for example, ITU~ R 601 to the Source Input Format (SIF) and vice versa and 3:2 pull-down to display 24 Hz motion pictures in 60 Hz format, have been reviewed [3.57]. As for video filters, they can be classified as interframe/field (spatial), motion-adaptive and motion-compensated filters [3.58]. Spatial filters are easiest to implement. However, they do not make use of the high temporal correlation in the video signals. Motion-compensated filters require highly accurate motion estimation between successive views. Other more sophisticated format conversion methods include motion-adaptive field-rate doubling and deinterlacing [3.59] as well as motion compensated frame rate conversion [3.58].
与传统的模拟视频相比,数字视频有很多优点,包括带宽压缩、抗信道噪声、交互性和易于操作。数字视频信号有很多格式。广播电视信号以ITU-R 601格式数字化,帧频为30/25 fps,每帧720象素、488线,2:1隔行,4:3宽高比,4:2:2色度抽样。随着高清晰度数字视频的出现,美国TV和PC行业之间的标准化努力的结果是批准了18种数字视频格式。TV和PC之间视频信号交换需要进行格式转换。一些共用的格式转换帧间/场滤波器已经接受评审,例如,ITU-R 601到SIF(源输入格式)及其反向转换、在60 Hz格式中3:2下降到24 Hz动画显示。视频滤波器可以分类为帧间/场滤波器(空间)、运动自适应和运动补偿滤波器。空间滤波器最容易实现。但是,它们不能用于时间关联度高的视频信号。运动补偿滤波器需要相邻图像之间非常精确的运动估计。其它更复杂的格式转换方法包括运动自适应场频倍增和去隔行以及运动补偿帧频转换。
Video signals suffer from several degradations and artifacts. Some of these degradations may be acceptable under certain viewing conditions. However, they become objectionable for freezeframe or printing from video applications. Some filters are adaptive to scene content in that they aim to preserve spatial and temporal edges while removing the noise. Examples of edge-preserving filters include median, weighted median, adaptive linear mean square error and adaptive weighted-averaging filtering [3.58]. Deblocking filters can be classified as those that do require a model of the degradation process (inverse, constrained, least square, and Wiener filtering) and those that do not (contrast adjustment by histogram specification and unsharp masking). Deblocking filters smooth intensity variations across amounts of temporal redundancy. Namely, successive frames generally have large overlaps with each other. Assuming that frames are shifted by subpixel amounts with respect to each other, it is possible to exploit this redundancy to obtain a high-resolution reference image (mosaic) of the regions covered in multiple views [3.60]. High-resolution reconstruction methods employ least-squares estimation, back projection, or projection-autoconvex sets methods based on a simple instantaneous camera model or a more sophisticated camera model including motion blur [3.61].
One of the challenges in digital video processing is to decompose a video sequence into its elementary parts (shots and objects). A video sequence is a collection of shots, a shot is a group of frames and each frame is composed of synthetic or natural visual objects. Thus, temporal segmentation generally refers to finding shot boundaries, spatial segmentation corresponds to extraction of visual objects in each frame and object tracking means establishing correspondences between the boundaries of objects in successive frames.
Temporal segmentation methods edit effects as cuts, dissolves, fades and wipes. Thresholding and clustering using histogram-based similarity methods have been found effective for detection of cuts [3.62]. Detection of special effects with high accuracy requires customized methods in most cases and is a current research topic. Segmentation of objects by means of chroma keying is relatively easy and is commonly employed. However, automatic methods based on color, texture and motion similarity often fail to capture semantically meaningful objects [3.63]. Semiautomatic methods, which aim to help a human operator perform interactive segmentation by tracking boundaries of a manual initial segmentation, are usually required for object-based video editing applications. Object-tracking algorithms, which can be classified as boundary region or model-based tracking methods, can be based on 2D or 3D object representations. Effective motion analysis is an essential part of digital video processing and remains an active research topic.
Storage and archiving of digital video in shared disks and servers in large volumes, browsing of such databases in real time and retrieval across switched and packet networks pose many new challenges, one of which is efficient and effective description of content. The simplest method to index content is by assigning manually or semiautomatically the content to programs, shots and visual objects [3.64]. It is of interest to browse and search for content using compressed data because almost all video data will likely be stored in compressed format [3.65]. Video-indexing systems may employ a frame-based, scene-based or object-based video representation. The basic components of a video-indexing system are temporal segmentation, analysis of indexing features and visual summarization. The temporal-segmentation step extracts shots, scenes and/or video objects. The analysis step computes content-based indexing features for the extracted shots, scenes, or objects. Content-based features may be generic or domain dependent. Commonly used generic indexing features include color histograms, type of camera- motion direction and magnitude of dominant object motion entry and exit instances of objects of interest and shape features for objects [3.66, 3.67]. Domain-dependent feature extraction requires a priori knowledge about the video source, such as new programs, particular sitcoms, sportscasts and particular movies. Content-based browsing can be facilitated by a visual summary of the contents of a program, much like a visual table of contents. Among the proposed visual summarization methods are story boards, visual posters and mosaic-based summaries.