3.4 Challenges of Multimedia Information Processing
Novel communications and networking technologies are critical for a multimedia database system to support interactive dynamic interfaces. A truly integrated media system must connect with individual users and content-addressable multimedia databases. This will be a logical connection through computer networks and data transfer.
To advance the technologies of indexing and retrieval of visual information in large archives, multimedia content-based indexing would complement the text-based search. Multimedia systems must successfully combine digital video and audio, text animation, graphics and knowledge about such information units and their interrelationships in real time.
The operations of filtering, sampling, spectrum analysis and signal representation are basic to all of signal processing. Understanding these operations in the multidimensional (mD) Case has been a major activity since 1975 [3.15, 3.16, 3.17]. More key results since that time have been directed at the specific applications of image and video processing, medical imaging, and array processing. Unfortunately, there remains considerable cross-fertilization among the application areas.
Algorithms for processing mD signals can be grouped into four categories:
Separable algorithms that use 1D operators to process the rows and columns of a multidimensional array
Nonseparable algorithms that borrow their derivation from their 1D counterparts
mD algorithms that are significantly different from their 1D counterparts
mD algorithms that have no 1D counterparts.
Separable algorithms operate on the rows and columns of an mD signal sequentially. They have been widely used for image processing because they invariably require less computation than nonseparabte algorithms. Examples of separable procedures include mD Discrete Fourier Transforms (DFTs), DCTs and Fast Fourier Transform (FFT)-based spectral estimation using the periodogram. In addition, separable Finite Impulse Response (FIR) filters can be used in separable filter banks, wavelet representations for mD signals and decimators and interpolators for changing the sampling rate.
The second category contains algorithms that are uniquely mD in that they cannot be decomposed into a repetition of 1D procedures. These can usually be derived by repeating the corresponding 1D derivation in an mD setting. Upsampling and downsampling are some examples. As in the 1D case, bandlimited multidimensional signals can be sampled on periodic lattices with no loss of information. Most 1D FIR filtering and FFT-based spectrum analysis algorithms also generalize straightforwardly to any mD lattice [3.18]. Convolutions can be implemented efficiently using the mD DFT either on whole arrays or on subarrays. The window method for FIR filter design can be easily extended, and the FRI” algorithm can be decomposed into a vector-radix form, which is slightly more efficient than the separable row/column approach for evaluating multidimensional DFTs [3.19, 3.20]. Nonseparable decimators and interpolators have also been derived that may eventually be used in subband image and video coders [3.21]. Another major area of research has been spectral estimation. Most of the modern spectral estimators, such as the maximum entropy method, require a new formulation based on constrained optimization. This is because their 1D counterparts depend on factorization properties of polynomials [3.22]. An interesting case is the maximum likelihood method, where the 2D version was developed first and then adopted to the 1D situation [3.23].
第二类是唯一不能分解为重复1D规程的mD算法。它们通常通过在一个mD框架内重复相应的1D推导而推导出来。升抽样和降抽样就是它们的例子。如同1D的情况下，带限多维信号可以信息无损地按照周期晶格抽样。大多数1D FIR滤波和基于FFT的频谱分析算法也直接归为任一mD晶格。用mD DFT对阵列或子阵列都可以有效地进行卷积运算。FIR滤波器设计的窗口法易于扩展，FRI的算法可以分解为矢量基数形式，它比用分离的行/列逼近法求多维DFT的值效率稍高一些。不可分离的抽值器和内插器也已被导出，可最终用于子带图像和视频编码器。研究的另一个主要领域已经是频谱估计。最新的频谱估计器，例如最大熵法，需要一种基于强迫优化的新的表述。这是因为它们的1D副本依赖于多项式的因数分解性质。一种有趣的情况是最大似然法，首先开发出来的是2D版本，然后才被采用于1D。
There are also mD algorithms that have no 1D counterparts, especially algorithms that perform inversion and computer imaging. One of these is the operation of recovering an mD distribution from a finite set of its projections, equivalently inverting a discretized Radon transform. This is the mathematical basis of computed tomography and positron emission tomography.
Another imaging method, developed first for geophysical applications, is Fourier integration. Finally, signal recovery methods unlike the 1D case are possible, The mD signals with finite support can be recovered from the amplitudes of their Fourier transforms or from threshold crossings [3.24].
3.4.1 Pre and Postprocessing
In multimedia applications, the equipment used for capturing data, such as the camera, should be cheap, making it affordable for a large number of users. The quality of such equipment drops when compared to their more expensive and professional counterparts. It is mandatory to use a preprocessing step prior to coding in order to enhance the quality of the final pictures and to remove the noise that will affect the performance of compression algorithms. Solutions have been proposed in the field of image processing to enhance the quality of images for various applications [3.25, 3.26]. A more appropriate approach would be to take into account the characteristics of the coding scheme when designing such operators. In addition, pre- and postprocessing operators are extensively used in order to render the input or output images in a more appropriate format for the purpose of coding or display.
Mobile communications is an important class of applications in multimedia. Terminals in such applications are usually subject to different motions, such as tilting and jitter, translating into a global motion in the scene due to the motion of the camera. This component of the motion can be extracted by appropriate methods detecting the global motion in the scene and can be seen as a preprocessing stage. Results reported in the literature show an important improvement of the coding performance when a global motion estimation is used [3.27].
It is normal to expect a certain degree of distortion of the decoded images for very tow-bit- rate applications. However, an appropriate coding scheme introduces the distortions in areas that are less annoying to the users. An additional stage could be added to reduce the distortion further due to compression as a postprocessing operator. Solutions were proposed in order to reduce the blocking artifacts appearing at high compression ratios [3.28, 3.29, 3.30, 3.31, 3.32, 3.33]. The same types of approaches have been used in order to improve the quality of decoded signals in other coding schemes, reducing different kinds of artifacts, such as ringing, blurring and mosquito noise [3.34, 3.35].
Recently, advances in postprocessing mechanisms have been studied to improve lip synchronization of head-and-shoulder video coding at a very low bit rate by using the knowledge of decoded audio in order to correct the positions of the lips of the speaker [3.36], Figure 3.2 shows an example of the block diagram of such a postprocessing operation.
3.4 Challenges of Multimedia Information Processing