Research Statement Radu Balan February 2007 Executive Summary My research interests cover areas of statistical estimation and modeling, with applications to sensor networks, communications, and biomonitoring systems. This document is structured in two parts. First part contains description of three research directions (systems) that I would like to pursue: • Audio-Video Sensor Networks • Body Sensor Networks • Statistical Modeling and Control of Wireless Local Area Networks In part two I highlight a selected set of problems where I made significant contributions, and are still open to research in order to advance the three research directions mentioned above. More specifically I discuss: • Sparse Signal Estimation • Nonlinear Signal Processing • Sensor Fusion • Analysis of Wireless Communication Channels • 802.11 WLAN MAC Layer Modeling and Control • Optimizations in Wireless Networks • Topics of Applied Mathematics Over the years, with help from my colleagues and students at Siemens, I developed and built some of these components. Other components are still waiting for technology, or theory, to develop. http://folk.uio.no/paalee Chapter 1 Systems 1.1 Heterogeneous Sensor Networks A Heterogeneous Sensor Network (AVSN) is represented by a collection of heterogeneous sensors (such as microphones, videocameras, RFID, radar) that communicate with a central monitoring system in a wired, or a wireless manner (see Figure 1.1). Figure 1.1: An Audio-Video Sensor Network deployed in a wired-wireless mixed mode Typical applications targeted by such a system include: • Security/Surveillance Systems • Meeting Transcription Systems • Scene Understanding Typically, this system includes the following components: • Front-end preprocessing of audio/video/radar signals, e.g. noise reduction, source separation, detection, localization, feature extraction block; • Communication block; • Statistically principled sensor fusion; • Context estimation and http://www.unik.no/personer/paalee high-level statistical inference; • Control and feedback into external world. In part two of this document I describe in more detail some challenges and issues for some of these components. Here I only sketch the current status of my research and technology development concerning these components. Noise Reduction: I worked extensively on speech enhancement, and noise reduction. In part two I present two new approaches to this problem: sparse signal estimation, and nonlinear signal processing. Signal Separation: Using geometric information known a priorily, or estimated using audio-video past information, audio source signals can be separated by signal separation algorithms such as the ones described in Part two of this document. Pattern Detection, Classification, and Tracking: This component has been used in the Person Recognition system we developed at Siemens. Communication block: In the case of wireless communication time synchronization is critical. Assuming the wireless communication is achieved through 802.11 WLANs, one subject of research is achieving high accuracy in time stamping wireless data packets. In particular 802.11p promises a new chipset design that includes a time stamp for each wireless data packet. Postprocessing: When data is sampled at nonuniform times, resampling is required. My expertise in applied harmonic analysis is very well suited here. Depending on available information and computational resources, I will choose an appropriate resampling algorithm. Sensor Fusion: In the Person Recognition project at Siemens, I developed a joint posterior distribution estimation based on each component characteristic. The main challenge is to have a “good” training database to cover all cases of interest. Statistical Inference: For a specific scenario a Dynamic Bayes Network (DBN) is created to model the context. For most of these components I already have an extensive expertise. Some components may be developed jointly through collaborations within EE Department, or other EE/ECE Departments, or industry (in particular Siemens). I envison a lot of interest would come from US Government agencies (e.g. DARPA, ONR, ARL), or industry. Two years ago I had a joint NSF proposal with Professors Poor and Kulkarni, and Dr. Zhu from Siemens. Currently I plan to develop a joint ARL grant proposal with Radu Marculescu from CMU. Recently (January 2007), the ONR launched a SBIR BAA (N07-T039) concerning transfer of novel signal separation technology and nonlinear methods where my joint research with Professors Casazza and Edidin is a first referenced paper. The application is the use of a microphone array for a particular speaker identification and localization in a cocktail party problem setup. 1.2 Body Sensor Networks A Body Sensor Network (BSN) represents a collection of in-vivo sensors that continuously monitor electro-chemical signals of a patient. An example of such a system is pictured in Figure 1.2. Depending on sensors type, and communication mode, a BSN can exhibit the following features: • Passive (no battery) or Active (battery powered) sensors; • Fixed, or mobile sensor; • Wired or wireless communication between nearby reading antenna, and the monitoring system. Such a system requires components that perform the following functions: • In-Vivo sensors; • Sensor reading: Wireless communication between in-vivo sensors and external antenna(s); • Body surface communication between antenna(s) and the monitoring system; • Signal detection, estimation, and sensor fusion (if multiple sensors) As in previous system proposals, I posses expertise on designing some of these components, but I lack expertise in others. More specifically, In-vivo sensors: I do not have expertise in building these sensors. I envison collaborating with bio-sensor groups that may provide in-vivo sensors; Wireless sensor reading: I have expertise in communication channel modeling for wireless communications, and acoustic environments; Also my applied mathematics background and strong connections with the applied mathematics community would play a major role in solving this problem. I propose to consider several scenarios: multiple external antennas for more accurate detection and estimation (higher SNR signals), and for localization of sensor (e.g. sensors through digestive system, or blood vassels). Body surface communications: I am familiar with IEEE 11073 standardization activity devoted to Medical Device Communications, a MAC layer description of communication protocol. Statistical Signal Processing: I worked on several projects and have extensive expertise on statistical signal detection and estimation: signal separation, source localization, sensor fusion, pattern recognition and abnormal mode detection (see my CV). Figure 1.2: A Body Sensor Network with passive sensors read off by an external antenna Due to multidisciplinary nature of this project, a larger team should collaborate to design, develop, and test such a system. I am eager to collaborate with medical doctors, and researchers on medical sciences. 1.3 Statistical Modeling and Control of Wireless Local Area Networks Wireless Local Area Networks (802.11 networks) have become as ubiquitous as the internet access. Due to their widespread in home (personal) use, and enterprise environment, 802.11 WLAN has become a hot topic of research and development these days. Compared to other more traditional research areas in Electrical Engineering, such as Signal Processing, Control Theory, Information Theory, WLAN control (management) theory is quite underdeveloped presently, but it is developing fast. Sure, one can argue that WLAN control can be seen as a subset of the cross-layer design theory. However, due to the standardization process, fewer degrees of freedom are left to vary, and lot of care should be given to specifics of 802.11 standardized networks. My specific proposal concerns the physical and MAC layer modeling and control of 802.11 WLANs. The basic unit of a WLAN is composed of one Access Point (AP), and several mobile stations (STAs), all forming a Basic Service Set (BSS). A typical WLAN setup is depicted in Figure 1.3. Figure 1.3: A typical Basic Service Set (BSS), with one AP, and several mobile stations. There are two distinct regimes, completely opposite from one another, that characterize the behaviour of such a network: 1. Deterministic Regime: when no collision happen, and the initial MAC instance time are sufficiently far apart, then transmission happens in a deterministic mode. Such a case may happen when only voice data (such as VoIP stations are connected to the AP), or periodic transmitting stations are present. The deterministic regime analysis gives an upper bound on system performance; 2. Stochastic Regime: once collisions happen, or medium is detected busy during a packet arrival, the random generation of a Backoff counter happens, and the contention-based mechanism kicks in. The deterministic regime is used to compute the maximal performance of a WLAN. In such a case, performance may be superior even to the PCF (Point Coordination Function) mode, where an AP acts as a transmission controller. However, in highly loaded networks, collisions are quite frequent, and the stochastic regime is more likely. Several works proposed stochastic models for this regime. Each work concerned with one feature or another of the network behavior. In particular Bianchi [11] was the first to propose the use of Markov Chain in modeling the saturation regime of a 802.11 WLAN. Since his paper, several others considered saturation, and non-saturation modeling of WLANs, increasing the model complexity, and taking into account more phenomena observed in experimental setups. However the current state-of-the-art model is not sufficient for several reasons: 1. It does not take into account the deterministic regime, nor does the performance converge to that upper bound; 2. The Markovianity assumption is not always justifiable; is it possible to introduce a deterministicstochastic hybrid model? 3. Subsequent improvements of the standard are not yet captured by the current model; in particular the 802.11n draft introduces new MAC enhancements that will have to be accounted for in future models. The purpose of modeling is two-fold: to study a network performance, and to design control algorithms that optimize a desired criterion. During August 2007 I will run a mentoring program at the University of Minnesota (IMA) devoted to modeling of 802.11 WLAN MAC layer. The program is focused on second year graduate students with strong applied mathematics background. I plan to use ns2 and Matlab tools to simulate WLAN behavior in extreme scenarios. In the second part of this document I present in more details one problem (saturation avoidance) and the approach I propose to follow. Chapter 2 Components, and Problems 2.1 2.1.1 Statistical Signal Methods for Estimation and Modeling Sparse Signal Estimation – Part I Our research at Siemens Corporate Research during the past six years produced a rich body of papers concerning blind source separation of speech signals. The key observation enabling our breakthrough is that very rarely two speech signals use the same time-frequency point. We turned this observation into a hypothesis, called W-disjoint orthogonality, by postulating that supports of time-frequency representations of speech signals form disjoint sets. We further extended this hypothesis by allowing simultaneous use of same time-frequency points by up to N source signals (so called generalized W-disjoint orthogonality hypothesis) (see [9, 17]). More specifically the mixing model has the form x(t, ω) = A(ω; θ)s(t, ω) + n(t, ω) (2.1) where (t,ω) is the time-frequency point, A(ω; θ) is the frequency dependent mixing matrix defined by mixing parameters θ, x(t, ω) is the measurements vector, s(t, ω) is the vector of source signals (to be estimated), and n(t, ω) is the noise. Typically we assume n is a zero-mean Gaussian random variable with known covariance matrix, e.g. Rn = σ 2 I. A challenging case is when dimension of s is larger than that of x (i.e. more sources than measurements). This is precisely the case when our hypothesis is best suited to estimate both the mixing parameters θ and source signals s from a sequence of observations of x. Under these assumptions the Maximum Likelihood (ML) estimator of s is the most appropriate statistical signal estimator. It turns out that under appropriate prior signal models (e.g. generalized Gaussians with subunit exponent) the maximum à posteriori (MAP) estimator yields also sparse signal values. However the ML estimator requires only very little information about the model; more specifically it requires only to set the maximum number of simultaneously active sources, whereas the MAP (and similarly the MMSE) estimators require the knowledge of the source signal distributions (spectral power, exponent, etc.). A more complex source signal model may yield a better performance provided it fits well the data. However more complex models are less robust to mismatches than a simpler model, and may perform worse on real world data. The difficult art is to find the right balance between deterministic and statistic signal prior model complexity. Temporal description is yet another dimension that can be added to our model. Instead of treating each time-frequency coefficient as an independent random variable, one can use dynamic models to select among possible descriptions. Given the advent of ever increasing computational power of today processors, hidden Markov models (HMMs) are a popular choice nowadays. Our source signal model is of the form s(t, ω) = b(t, ω)G(t, ω), a product between a Bernoulli random variable b(t, ω) and a continuous random variable G(t, ω). However we would like to increase the power of source separation particularly when there exists prior knowledge about the sources (see also [18], [19]). In a very recent preprint we proposed an incremental increase in source model complexity conforming to our basic belief that models should not be more complicated than what is really needed in order to solve the problem. For this we allow for statistical dependencies of source signals across time: we modeled {b(t, ω); t} by a first order Markov model. Thus: p(b(t, ω)|b(t − 1, ω), b(t − 2, ω), ..., b(1, ω)) = p(b(t, ω)|b(t − 1, ω)) = πω (b(t, ω), b(t − 1, ω)) where πω are the 2x2 transition probability matrices indexed by frequency ω. Then the posterior distribution for vectors b(t, ω), G(t, ω) and θ assuming G(t, ω) and θ are uniformly distributed and the noise in (2.1) is Gaussian turns into: P ({b(t, ω), G(t, ω); t}, θ|X)) ∝ T Y 1 2 C exp − 2 kX(t, ω) − Ar (ω; θ)Gr (t, ω)k σ t=1 Qω (b(t, ω), b(t − 1, ω)) Q0ω (b(0, ω)) where Qω is the transition probability between selection vectors b(t − 1, ω) and b(t, ω) obtained simply by multiplying the corresponding component transition probabilities πω , Q0 is the initial probability, and Ar , Gr are the reduced matrix, respectively vector, by removing the columns, entries, corresponding to null entries of b(t, ω). The MAP estimation problem has been reduced now to maximize this criterion. Given θ and {b(t, ω)}, {G(t, ω)} can easily be computed from a least square problem. Taking negative of the logarithm, the optimization turns into min {b(t,w);t},θ T X ∗ X (I − A∗r (A∗r Ar )−1 Ar )X − σ 2 log Qω (b(t, ω), b(t − 1, ω)) − σ 2 log Q0ω t=1 The optimization problem is constrained by the generalized W-disjoint orthogonality hypothesis that P reads as k bk = N . It is apparent that the optimum solution would not depend too much on the initial probability distribution provided we choose a large T . One can carry out the optimization by alternating two partial optimization steps: one over the selection variables {b(t, ω)}, the other over the mixing parameters θ. The pleasant thing about the first optimization problem is that it can be carried out efficiently using a Viterbi decoding scheme. The second optimization problem reduces to a classic ML source location estimation problem. The complexity of the problem depends heavily on the sensor array geometry, and the mixing model. The transition probabilities are learned from a training dataset. The training procedure involves thresholding the database signals with a threshold proportional to average signal spectral power. More specifically we assume a signal model of the form S = Scritical + Srest where the “critical” component is the information carrying component of the signal, and the “rest” is just the rest. The prior assumption is that the critical part has a sparse distribution, whereas the rest has a Gaussian distribution. Then the MAP estimator of the critical component is given by thresholding (hard or soft, depending upon the exponent of the prior distribution). Since the critical component has a sparse distribution, we apply the product model and thus we estimate the binary selection variables. Then the transition probabilities can be easily estimated using e.g. a ML criterion. Preliminary tests (see our recent preprints) in the case of known mixing parameters showed an improvement of about 1.5 dB of separation SINR gain compared to the DUET algorithm that would use uniform probabilities of transition. Future work will concern the “blind” case of the signal separation problem, namely when both the source signals and the mixing parameters are to be estimated. 2.1.2 Sparse Signal Estimation – Part II Let use return to the model (2.1) introduced before. Another prior distribution very popular nowadays is given by p P (S) = Cµ,p exp(−µ |S| ) where µ, p are adjustable parameters. For p = 2 we get the Gaussian distribution, and the MAP estimator corresponds to Tikhonov regularization. In this case the solution is obtained by solving a linear system of equations. The case p = 1 corresponds to Laplace prior distributions and the MAP estimator is obtained efficiently by solving a convex optimization problem. Cases when p < 1 are the most interesting since they yield sparse solutions (that is vectors S with many vanishing components). However the optimization problem is no longer convex. The estimator is obtained by solving an optimization of the form: 2 p arg min kX − ASk + λ kSkp S (2.2) with λ = µσ 2 . In a recent paper [10], we studied this optimization problem for two values of p: p = 0 and p = 1. In particular we showed that the optimizer of (2.2) for (A, λ, p) = (A1 , λ1 , 1) and (A, λ, p) = (A0 , λ0 , 0) have p the same support for a nonempty interior set of input vectors X, when A1 = (A0 )−T and λ1 = λ0 /a(A0 ), with a(A0 ) a function of A0 . Next I would be interested to explore algorithms that solve (2.2) based on homotopical connection between the case p = 1 (when we know how to solve (2.2) efficiently) and p = 0 (which is the one we are really interested). 2.1.3 Nonlinear Signal Processing . A longstanding paradigm of speech signal processing is that frequency domain phase information is either not critical to the task, or it cannot be further improved by the signal processor and therefore it is not be touched. More specifically I refer to the following two problems: speech recognition, and speech enhancement (noise reduction). A speech recognition system typically uses Mel frequency cepstral coefficients (MFCCs) by which the phase information is discarded. Speech enhancement systems perform time-to-TF-domain conversion, followed by a processing of the modulus of speech TF coefficient, followed by a linear reconstruction back into time domain using the same phase of the noisy signal. In the former case we ask whether there is any loss of information by totally discarding the phase, whereas in the latter case the problem is to find alternate reconstruction algorithms (possibly nonlinear) that do not use phase information. Jointly with Peter Casazza, Dan Edidin (both from Univ. of Missouri), and Gitta Kutyniok (Math. Institute Justus-Liebig, Univ. Giessen, Germany) we published already several results (see [5, 4, 6]). In a nut shell the abstract problem can be stated as follows. Assume F = {f1 , f2 , . . . , fn } are n vectors in a d-dimensional Euclidian space E (Rd or Cd ) that span the space (hence n ≥ d). On E consider the equivalence relation x ∼ y if there is a scalar z with |z| = 1 so that y = zx (that is, x and y are essentially the same vector up to a constant phase factor). The problem is to study when the nonlinear map M : E/ ∼→ (R+ )n , M (x) = {|hx, fk i|}1≤k≤n is injective, and in such a case to propose an inversion algorithm. Our analysis so far proved several necessary and sufficient conditions for injectivity of this map, and also produced the following important and practical result. Assume the real case (E = Rd ) and use the following notations: I I a A= , α= , G = f˜1 | · · · | f˜n I − P −(I − P ) 0 where G is the d × n matrix whose columns are the canonical dual frame vectors. Let a = M x. Then Theorem[6] M −1 (a) contains only one point if and only if for every 0 ≤ p < 1 the following optimization problem minAu=α kukp admits exactly two solutions u and v independent of p, with u = [uT1 uT2 ]T and v = [uT2 uT1 ]T so that a = |u1 + u2 | and x = G(u1 − u2 ) or x = −G(u1 − u2 ). 2 We remark the similarity of this statement to the equivalence principle found in [13]. In speech processing, a machine learning approach (HMM based phase estimation) has recently been considered in [12]. Our approach has been so far purely deterministic. Perhaps combining the two approaches can be beneficial to advanced speech enhancement techniques. 2.1.4 Sensor Fusion . One problem of system integration is how to fuse together overlaping components. The challenge is to do so in a statistically principled manner. Formally this can be achieved by estimating a joint distribution of sensor outputs, and then perform a statistical inference based on this estimate. However, due to lack of sufficient training data, or even to complete lack of training data of some type, reliable joint estimation may be a hard problem. An example of the latter case is furnished by the following scenario. Assume a sensor network that monitors a gas turbine, where the sensors measure temperature, pressure, vibration, acoustic emissions (ultrasound), etc. The task is to detect any abnormal regime of functioning. It is impractical to generate training data for abnormal regimes, hence the decision system should use non-Bayes techniques to perform this task. Once such approach is based on novelty detection in machine learning. (One may argue that there still exists a Bayes interpretation for one-class classifiers. I tend to be agnostic on this issue.) The particular problem I am mostly concerned with is how to assign probabilities to machine learning classifiers, and in particular to (kernel) support vector machines (kSVM). To fix the problem, assume a 2-class classification problem where a linear SVM produces the following decision: x ∈ Rn , x 7→ y = sign(wT x + a) , y ∈ {+1, −1} where x ∈ Rn is the input feature vector, w ∈ Rn is the normal to the separation hyperplane, a ∈ R is an offset, and y ∈ {+1, −1} is the binary decision. It is conceivable that the larger the argument wT x + a the more probable the true class yt is +1 (and similar for the negative case). But what is the exact distribution. A popular choice is to use sigmoidal type functions: P (y = 1|r = wT x + a) = eαr eαr + e−βr e−βr eαr + e−βr where parameters α, β are fit experimentally. Assume this is the case, then the next question is how to combine multiple SVM classifiers, and fit parameters. P (y = −1|r = wT x + a) = 2.2 2.2.1 Wireless Communication Networks Physical Layer: Analysis of Communication Channels The RAKE receiver is design to exploit spatial diversity in propagation medium by aligning different paths to increase the effective SNR. Similarly, the time-frequency RAKE receiver introduced by Sayeed and Aazhang in 1999 exploits the diversity of the Time-Frequency doubly spread communication channel and achieves a higher effective SNR. Recently (in [7, 8]) in joint works with S.Rickard, V.Poor and S.Verdu, we explored other channel models by taking into account the time dilation associated with Doppler effects. We proposed two new channel models: the time-scale and the frequency-scale channel model. Consider a linear communication channel H whose time-varying impulse response R is h(t, τ ). Thus for a transmit signal x(t), the received signal y(t) is given by y(t) = Hx(t) = h(t, t − τ )x(τ )dτ . Using the spreading function formulation (or Weyl quantization) the channel takes the form Z Z y(t) = S(ω, τ )e2πiωt x(t − τ )dωdτ (2.3) Assume the transmit signal is bandlimited to [−Ω/2, Ω/2] and the observation takes place over [0, T ]. Then Sayeed and Aazhang proved the received signal admits an expansion of the form X m n t n y(t) = Ŝ( , )e2πim T x(t − ) (2.4) T Ω Ω m,n where the coefficients are given by sampling Z Z Ŝ(u, v) = S(ω, τ )sinc((v − τ )Ω)sinc((u − ω)T )e−iπ(u−ω)T dωdτ (2.5) For some channels (see [7]), the input-output correspondence can be rewritten as Z Z 1 t−b )da db y(t) = L(a, b) p x( a |a| where the time-scale symbol L(a, b) replaces the spreading function (Weyl symbol) S(ω, τ ). Assume the transmit signal is bandlimited to [−1/2b0 , 1/2b0 ] as before, but the received signal is passed through a scale-limited filter of scale band [−1/2ln(a0 ), 1/2ln(a0 )] (the scale band filters are linear filter similar to frequency band filters where the Fourier transform is replaced by the Mellin transform). Then similar to (2.4), the output admits the following expansion X −m/2 y(t) = L̂(m, n)a0 x(a−m (2.6) 0 t − nb0 ) m,n where Z Z ln a b )sinc(n − )da db ln a0 ab0 Expansion (2.6) is called the canonical time-scale channel model. (2.3) can be replaced by Z Z t 1 y(t) = ρ(ω, a)e2πiωt p x( )dω da |a| a L̂(m, n) = L(a, b)sinc(m − (2.7) For scale band-limited transmit signals to [−1/2ln(a0 ), 1/2ln(a0 )] and finite observation time limited to [T1 , T2 ], the received signal admits an expansion of the form X t −n/2 y(t) = cm,n e2πim T2 −T1 a0 x(a−n (2.8) 0 t) m,n where cm,n T +T 1 −imπ T1 −T2 2 1 = e (T2 − T1 )2 Z Z ρ(ω, a)einω(T1 +T2 ) sinc( ω ln a − m)sinc( − n)da dω Ω ln a0 The expansion (2.8) is called the canonical frequency-scale channel model. Next I propose several directions that ought to be studied: • Study the performance of the RAKE receivers • Channel decompositions. This is an operator and functional analysis problem closely related to the time-scale symbols of bounded operators issue that I present below. • Comparison between channel models. For a given channel expressed, one can use any of a set of equivalent representations (e.g. time kernel, time-frequency, time-scale, frequency-scale). The issue is to compare the corresponding RAKE receiver performance. • Use of smoother cut-offs. The channel models obtained so far use orthogonal projections: either time cut-offs, or frequency cut-offs, or scale cut-offs. It would be interesting to replace these sharp cut-offs by smoother versions. It is likely to obtain localized formulae for channel coefficients, similar to the oversampling case (instead of reconstruction using sinc functions, one can use reconstruction using faster decaying prototypes). 2.2.2 802.11 WLAN MAC Layer Modeling and Control Several studies dealing with 802.11 MAC layer stochastic modeling have been published. The seminal paper by Bianchi ([11]) produced many off-shots including [14] which I consider the state-of-the-art in modeling the stochastic regime of a 802.11 WLAN. However not all 802.11 standard mechanisms are accounted for by this model. In particular, the countdown of backoff counters is empirically modeled by a 1st order feedback process with an independent return rate. Also the non-saturation case has been modeled in a simplified manner to minimize computational complexity. To account for these shortcomings I propose the Markov chain pictured in Figure 2.1. Figure 2.1: A Markov Chain model for the stochastic regime of a WLAN device. For highly loaded networks, devices are well modeled by Markov chains, such as the one I propose above. However for low load, or some QoS scenarios, the WLAN behaves deterministically and Markov chain modeling is not appropriate. One problem I intend to study is a heterogeneous model, where at one limit the system behaves deterministically, at the other limit it behaves randomly, as described by the Markov chain depicted in Figure 2.1. Another problem of interest is detection of saturation, and network control. From the AP perspective, during transmission of a voice packet the AC[0] instance (Voice Access Category) spends the following time: • Counting down for a total of n slot times Tσ , where n is the sum of all randomly generated back-off counters (including the value of the post-back-off counter, if applicable) • Holding down while medium is busy • Waiting for TAIF S each time after a busy medium event (a countdown deferral) Let us denote by T the total transmission time, Tidle the time medium was idle during transmission, and Tbusy , the medium was busy during transmission. Denote further by m the number of deferrals. Then we have: T = Tidle + Tbusy , Tidle = nTσ + mTAIF S Let R denote the number of retransmissions for this packet, and b a Boolean variable recording whether the last transmissions was successful or not: b = 0 if successful, b = 1 if unsuccessful. Let Q denote the number of unsuccessful transmissions, let D denote the number of collisions, and let E denote the number of unsuccessful transmissions due to bad channel, all referring to the same packet. Then we have: Q=R+b=D+E From this equations we notice that, given (T, Tbusy , m, R, b) we can compute (m, n, Q). In the following we assume (m, n, Q) are known for each packet, and we try to derive an estimator for p, the probability of an unsuccessful transmission due to collisions. If (D,E) were known for each packet, then this probability would easily be estimated by: P k Dk p̂ = P (D k + Ek ) k Thus our input data is the sequence (mk , nk , Qk )k , indexed by k. Consider the compressed time t0 = t−Tbusy (t)−m(t)TAIF S . Where Tbusy (t) denotes the duration the channel was busy by time t, and m(t) denotes number of deferrals by time t. Essentially t0 increases in increments of slot time Tσ . Now we make the following assumption. We assume that over-the-air packet transmissions occur as a Poisson arrival process with rate λ in this modified time t0 , see Figure 2.2. Figure 2.2: Packet transmission time diagrams, and compressed time t0 . During current packet transmissions, the medium was busy due to other transmissions for a number M = m + D of times. Since 0 ≤ D ≤ Q, M is a random variable taking one of the following Q + 1 possible values: {m, m + 1, . . . , m + Q}. M represents the number of arrivals of our Poisson process during the Tidle − mTAIF S time. Then our task is to write the likelihood of having this data (m, n, Q) knowing parameters p and λ. First we have: Q X Q P (m, n, Q|p, λ) = pd (1 − p)Q−d P rob(D = d|λ) d d=0 where the remaining probability represents the probability of having exactly d collisions, hence m+d arrivals. This probability is expressed as (given the Poisson process hypothesis): P rob(D = d|λ) = (λnTσ )m+d −λnTc e (m + d)! Putting these two together, and denoting Λ = λTσ we obtain: P (m, n, Q|p, Λ) = Q X (Λn)m+d −Λn Q pd (1 − p)Q−d e d (m + d)! d=0 In this likelihood we can relate further the two parameters, p and Λ. We make the observation that p represents the probability of a packet transmission, other than own transmission, in the next slot time. This implies: p = 1 − e−Λ or: Λ = −log(1 − p). Given a sequence of observations (mk , nk , Qk )k we obtain: # "Q f k mk +d Y X Qk d Qk −d+nk (−nk log(1 − p)) P [Observation|p] = p (1 − p) d (mk + d)! k=1 d=1 The Maximum Likelihood estimator (MLE) of p becomes: p̂M LE = argmaxp P [Observation|p] Once such an estimate has been obtained, we can estimate further: P rob[d collisions in Q unsuccessf ul transmissions] = 2.2.3 Q d pd (1 − p)Q−d Optimizations in Wireless Networks This research direction concerns optimizations of utility like measures used in wireless communications (see [15, 16]). Consider a multiple access scenario where a base station receives signals from multiple mobile stations. One defines the utility function u for each user, as the ratio of user’s goodput by its transmission power uk = PTkk , where the goodput Tk represents the number of successfully transmited bits per second of user k, and Pk represents the transmission power. In the MA scenario described before, Tk = Rk f (γk ) is the product of transmission rate Rk and the probability of successful k) transmission f (γk ) that depends on the SINR γk . Since γk = σ2 +PPk Pj , it follows that uk = C f (γ γk j6=k where C depends on the other users. Fixing other users’ transmit powers and rates, the utilitymaximizing strategy for user k (the Nash equilibrium) is given by the solution of the constrained maximization maxγk uk subject to fixed γj , ∀j 6= k Given our previous derivation for utility, it follows that each users’ optimum power is given by k) independently. Two conclusions can be drawn: maximizing f (γ γk (i) At optimum utility function, each user achieves the same received SINR (ii) The user optimum power can be computed independently by each user, providing the base station informs the user of received total power and user’s gain. These conclusions are well-known for the non-cooperative game as described above. In [15, 16] we extended this problem by taking into account QoS constraints. First we considered a M/G/1 queue type service with an automatic-repeat-request transmission for each user. The QoS contraint is manifested by limiting the average wait time W̄ . Thus the problem becomes: maxRk ,pk uk subject to fixed γj , ∀j 6= k and W̄k ≤ τk In this framework we studied the existence of Nash equilibria, and analyzed different network characteristics (e.g. total goodput, maximum number of users that satisfy QoS constraint, etc.). Some problems remained to be studied further. One issue concerns the utility function defined above. The sharp delay constraint can be “softened” by using a fixed cost multiplier. Thus the problem turns into: maxRk ,pk uk − λk (τk − W̄k ) subject to fixed γj , ∀j 6= k Other possible variants can be imagined. Another future issue concerns contention-based communication mechanisms, such as 802.11. For this networks, the service mechanism is not accurately modeled by a M/G/1 queue. It will be interesting to see how the WLAN type communication can be modeled. Once such a model is obtained, one can then look at optimal non-cooperative strategies as before. The interesting issue is to see what is the minimal information one user needs in order to obtain its own optimal strategy. In the MA scenario this optimization decouples once the user knows the total interfering power and its transmission gain. Would such a distributed optimization hold true for other service models? 2.3 2.3.1 Topics of Applied Mathematics Frames: Redundancy, Density and Measure Theory Frames are redundant sets of vectors in a Hilbert space. While redundancy and excess are straightforward notions in the case of a finite frame set, the similar concepts in the infinite set case are not so well understood. My collaboration with Z.Landau and the set of papers jointly with him and P.Casazza and C.Heil ([1, 2, 3]) are important steps toward a better understanding of these concepts. The key observation (and belief) is that, for a frame set F = {fi , i ∈ I}, a measure of redundancy is governed by the partial averages of the form a(J) = 1 X hfi , f˜i i |J| i∈J where F̃ = {f˜i , i ∈ I} is the canonical dual frame. In the aforementioned papers we linked these averages to densities of labels and making them computationally feasible. To give the “flavor” of results we proved I state just some of the results, by refering to Gabor frames only; However the statements we proved are much more general. Consider G = {gλ = Uλ g ; λ ∈ Λ} a Gabor frame with canonical dual frame G̃ = {g˜λ ; λ ∈ Λ}, where Λ ⊂ R2d is the set of time-frequencies parameters, Uλ g(x) = eiωx g(x−t), is the time-frequency shift with parameter λ = (t, ω). We let SR (c) denote a box of size R centered at c in the phase space R2d , SR (c) = {λ | kλ − ck ≤ R}. For a set I, we let |I| denote its cardinal. Define a(R, c) = 1 |Λ ∩ SR (c)| and D(R, c) = X hgλ , g˜λ i λ∈Λ∩SN (c) |Λ ∩ SR (c)| vol(SR (c)) where vol(K) is the volume of set K. The Beurling densities are D+ (Λ) = lim supR→∞ supc D(R, c), respectively D− (Λ) = lim inf R→∞ inf c D(R, c). I also recall the modulation space Z M 1 = {f ∈ L2 (Rd ) ; |hγλ , f i|dλ < ∞} where γ(x) = exp(−x2 /2) is the Gaussian window. Theorem. Let G be a Gabor frame for L2 (Rd ) with canonical dual G̃. 1. Let (Rn , cn ) be a sequence so that D0 = limn D(Rn , cn ) exists. Then: lim a(Rn , cn ) = n 1 D0 (2.9) 2. If g ∈ M 1 , then for all λ ∈ Λ, g˜λ ∈ M 1 and there is an envelope F ∈ L1 (R2d ) so that |hγµ , g˜λ i| ≤ F (µ − λ); 3. Assume D− > 1 and g ∈ M 1 . Then there is a subset Σ ⊂ Λ of positive uniform measure, that is D+ (Σ) = D− (Σ) > 0, so that G 0 = {gλ ; λ ∈ Λ \ Σ} is frame for L2 (Rd ); 4. Assume D+ > 1 and g ∈ M 1 . Then there is a subset Σ ⊂ Λ so that G 0 = {gλ ; λ ∈ Λ \ Σ} is frame and D+ (Λ \ Σ) < D+ (Λ). 2 The method applies only to Gabor or Gabor like frames. It would be interesting to explore if and how these methods extend to other sets of frames, in particular to wavelet sets. Even for Gabor sets there still remains as an open problem the issue of removing subsets of positive density and leave the remaining set frame with Beurling densities arbitrarily close to one. 2.3.2 Algebras of Time-Frequency/Time-Scale Shift Operators Consider the set of Time-Frequency shift operators. It naturally forms a group, and by taking arbitrary linear combinations with absolutely summable coefficients it gives rise to a Banach algebra with involution: X X cλ Uλ ; kT kAv := v(λ)|cλ | < ∞} (2.10) Av = {T = λ λ iωx where Uλ f (x) = e f (x − t) is the time-frequency shift by λ = (t, ω), and v is an admissible weight (e.g. polynomial growth). Note we do not assume the support of c has a lattice structure, supp(c) = {λ ∈ R2d ; cλ 6= 0}. In general the support is a countable subset of R2d , possibly dense. The closure of Av with respect to the operator norm produces a noncommutative C*-algebra denoted by C. The closure of Av (or C) with respect to the weak (or strong) operator topology is the full B(L2 (Rd )) algebra of bounded operators on L2 (Rd ). Regarding these algebras, I proved in a recent preprint the following results Theorem. 1. The algebra Av is inverse closed. Thus, if T ∈ Av and T is invertible in B(L2 (Rd )), then T −1 ∈ Av . 2. For any T ∈ Av its spectral radius with respect to algebra Av is the same as the spectral radius with respect to algebra B(L2 (Rd )). P 3. Assume T = λ∈Λ cλ Uλ with |Λ| = N < ∞ and R0 = maxλ∈Λ kλk. Assume T is invertible in 2 2 B(L2 (Rd )), and hence in Av as well. Denote A = T −1 B(L2 (Rd )) , B = kT kB(L2 (Rd )) , and ρ = max(1, 2R0 ), and assume a polynomial weight w(x) = C(1 + x)m for some C > 0 and m ∈ N. Then m+N −1 Cρm kT kAv A+B T ≤ (m + N )! (2.11) Av A 2A 2 Furthermore these algebras admit a faithful tracial state, namely X T = cλ Uλ −→ γ(T ) := c0 λ This is given explicitely by the following result. (2.12) Theorem. Consider now G = {gm,n;α,β := Uβn,2παm g | m, n ∈ Zd } a Gabor frame for L2 (Rd ), with α, β > 0, αβ ≤ 1, and a dual Gabor frame (not necessarily the canonical dual frame) G̃ = {g̃m,n;α,β := Uβn,2παm g̃ | m, n ∈ Zd }. Then for any T ∈ C, γ(T ) = 1 (αβ)d lim M,N →∞ 1 (2M + 1)d (2N X + 1)d X hT gm,n;α,β , g̃m,n;α,β i (2.13) |m|≤M |n|≤N is the faithful tracial state (2.12) on C, independent of the choice of the Gabor frame G. 2 Putting all these elements together I was able recently to prove a special case of the HeilRamanthan-Topiwala conjecture (linear independence of finitely many time-frequency shifts of an L2 function) namely: P Theorem. For any finite Λ ⊂ R2d and complex scalars (cλ )λ∈Λ , the operator T = λ∈Λ cλ Uλ has no finite multiplicity eigenvalue. Hence the pure point spectrum, if exists, can only contain either eigenvalues with infinite multiplicity, or eigenvalues that belong to the continuum part of the spectrum as well. 2 I plan to study further properties of these and other similar algebras. More specifically: • I would like to investigate how to extend the eigenspectrum theorem to the infinite multiplicity case; • Another case of interest is furnished by dilation operators. Thus time and scale shift operators are closely related to wavelet sets. There is also interest in the full wave packet group containing time, frequency, and scale shifts. • An application of this theory is to the channel equalization problem. More specifically the question is to invert an operator T ∈ A that has finite support. Our norm estimates suggest how to approximate the inverse using finitely many coefficients. 2.3.3 Time-Scale Symbols of Bounded Operators An off-shot of the project on communication channels analysis ([7, 8]) is the study of integral operators whose kernels act through time-scale shifts. More specifically the class of operators we are interested in is given by: Z Z x−b 1 T f (x) = L(a, b) p f ( )da db a |a| where L(a,b) is its kernel. It turns out an object of interest for designing a RAKE receiver is a “sandwich” of operators PTQ, where P and Q are some orthogonal projectors. For particular choices of P and Q, we were able to prove that PTQ admits decomposition into a convergent series of type P T Q = Σm,n cm,n P U m V n Q, where U and V are some unitary operators. Define the set X X A = {T = cm,n P U m V n Q ; kT kA := |cmn | < ∞} m,n m,n A is a Banach space, subspace in B(Ran Q, Ran P ) the space of bounded operators from Ran Q to Ran P . Of interest are the cases when P, Q, U, V are chosen so that P U = U P , QV = V Q, and there are e0 , f0 ∈ L2 so that {U m e0 ; m ∈ Z} is an orthonormal basis in Ran P , and {V n f0 ; n ∈ Z} is an orthonormal basis in Ran Q. Denote: am,n = hV m f0 , U n e0 i , hm,n = hP T QV m f0 , U n e0 i X X A(z1 , z2 ) = am,n z1n z2n , H(z1 , z2 ) = hm,n z1m z2n m,n m,n So far I proved the following result Theorem. Assume P Q and P T Q are Hilbert-Schmidt operators. 1. The sequence a = (amn ) is in l2 (Z2 ). Hence A(z1 , z2 ) is a function in L2 (T 2 ). The same goes for h = (hm,n ) and H(z1 , z2 ). 2. Assume further that for some a0 > 0 and a1 < ∞, a0 ≤ |A(e2πiθ1 , e2πiθ2 )| ≤ a1 Then Z 1/2 Z 1/2 dθ2 dθ1 cm,n = −1/2 −1/2 H(z1 , z2 ) | 2πiθ1 ,z =e2πiθ2 2 A(z1 , z2 ) z1 =e (2.14) P is in l2 (Z2 ) and the series m,n cm,n P U m V n Q converges strongly to P T Q. 2 The typical cases where we applied this result are given by the translation, modulation , and dilation operators. However the combination dilation-translation does not yield a Hilbert-Schmidt operator. I plan to explore further such decompositions for other pair of operators, and to understand the operator algebras generated by them. 2.3.4 Machine Learning: Data Embeddings into Higher Dimensional Linear Spaces A redundant set of vectors in an Euclidian space performs a linear embedding of the space vectors into the higher dimensional space of coefficients: x 7→ {hx, fk i}1≤k≤n . A more complex embedding is given by the absolute value of frame coefficients map considered in the Nonlinear Signal Processing problem presented above. Nonlinear embeddings given by reproducing kernel Hilbert spaces (RKHS) are of high interest in classification problem, e.g. kernel support vector machines (KSVMs). I propose to consider nonlinear embeddings suggested by the RKHS associated to Gabor and Wavelet analysis. I would like to explore the mathematical fundations of these embeddings into a series of lectures as an advanced seminar, or develop a new curriculum on this topic. Bibliography [1] R. Balan, P. Casazza, C. Heil, and Z. Landau. Deficits and Excesses of Frames. Advances in Computational Mathematics, 18:93–116, 2003. [2] R. Balan, P. Casazza, C. Heil, and Z. Landau. Excesses of Gabor Frames. Appl. Comput. Harmon. Anal., 14:87–106, 2003. [3] R. Balan, P. Casazza, C. Heil, and Z. Landau. Excess of Parseval frames. In Proceedings of SPIE Wavelets XI, August 2005. [4] R. Balan, P.G. Casazza, and D. Edidin. On signal reconstruction from absolute value of frame coefficients. In Proceedings of SPIE Wavelets XI, August 2005. [5] R. Balan, P.G. Casazza, and D. Edidin. On Signal Reconstruction without Noisy Phase. Appl.Comput.Harmon.Anal., 20:345–356, 2006. [6] R. Balan, P.G. Casazza, and D. Edidin. Equivalence of Reconstruction from the Absolute Value of the Frame Coefficients to a Sparse Representation Problem. IEEE Sig.Proc.Letters, May 2007. [7] R. Balan, H.V. Poor, S. Rickard, and S. Verdu. Canonical time-frequency, time-scale, and frequency-scale representations of time-varying channels. J. of Comm. in Infor. Syst., 5(5):1– 30, 2005. [8] R. Balan, V. Poor, S. Rickard, and S. Verdú. Frequency and Time-Scale Canonical Representations of Doubly Spread Channels. In Proceedings of EUSIPCO 2004, Vienna Austria, September 2004. [9] R. Balan, J. Rosca, and S. Rickard. Non-square Blind Source Separation under Coherent Noise by Beamforming and Time-Frequency Masking. In Proc. ICA, 2003. [10] R. Balan, J. Rosca, and S. Rickard. Equivalence Principle for Optimization of Sparse versus Low-Spread Representations for Signal Estimation in Noise. International Journal of Imaging Systems and Technology, 15(1):10–17, 2005. [11] G. Bianchi. Performance analysis of the ieee 802.11 distributed coordination function. IEEE Journal on Selected Areas of Communications, 3(18):535–547, 2000. [12] K. Chan, S.T. Roweis, and B.J. Frey. Probabilistic inference of speech signals from phaseless spectrograms. In Proceedings of Neural Information Processing Systems (NIPS03), volume 16, 2003. [13] D.L. Donohoe and X. Huo. Uncertainty principles and ideal atomic decomposition. IEEE Trans IT, 47(7):2845–2862, 2001. [14] P.E. Engelstad and O.N. Osterboro. Non-saturation and saturation analysis of ieee 802.11e edca with starvation prediction. In Proc. 8th ACM Int.Symp.onModel.Anal.Sim.WirelessMob.Syst., MSWiM’05, 2005. [15] F. Meshkati, H.V. Poor, S.C. Schwartz, and R.V. Balan. Energy-Efficient Power and Rate Control with QoS Constraints: A Game-Theoretic Approach. In Proc. Int. Comm. and Mobile Comp. Conf., 2006. [16] F. Meshkati, H.V. Poor, S.C. Schwartz, and R.V. Balan. Energy-Efficient Resource Allocation in Wireless Networks with Quality-Of-Service Constraints. to appear in IEEE Trans. in Comm., 2006. [17] J. Rosca, C. Borss, and R. Balan. Generalized sparse signal mixing model and application to noisy blind source separation. In Proc. ICASSP, 2004. [18] S. T. Roweis. One microphone source separation. In Neural Information Processing Systems 13 (NIPS), pages 793–799, 2000. [19] P.J Wolfe, S.J. Godsill, and W.J. Ng. Bayesian variable selection and regularization for timefrequency surface estimation. J.R.Statist.Soc.B, 66(Part 3):575–589, 2004.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement