Dr. Georgios B. Giannakis, ADC Chair in Wireless Telecommunications in Electrical and Computer Engineering
University of Minnesota
The information explosion propelled by the advent of personal computers, the Internet, and the global-scale communications has rendered statistical learning from data increasingly important for analysis and processing. The ability to mine valuable information from unprecedented volumes of data will facilitate preventing or limiting the spread of epidemics and diseases, identifying trends in global financial markets, protecting critical infrastructure including the smart grid, and understanding the social and behavioral dynamics of emergent social-computational systems. Along with data that adhere to postulated models, present in large volumes of data are also those that do not -- the so-termed outliers.
In this talk I will touch upon several issues that pertain to resilience against outliers, a fundamental aspect of statistical inference tasks such as estimation, model selection, prediction, classification, tracking, and dimensionality reduction, to name a few. The recent upsurge of research toward compressive sampling and parsimonious signal representations hinges on signals being sparse, either naturally, or, after projecting them on a proper basis. I will start by introducing a neat link between the seemingly unrelated notions of sparsity and robustness against outliers, even when the signals involved are not sparse. It will be argued that controlling sparsity of model residuals leads to statistical learning algorithms that are computationally affordable and universally robust to outlier models. I will highlight a few relevant application domains that include preference measurement for consumer utility function estimation in marketing, and load curve cleansing -- a critical task in power systems engineering and management.
In the second part of the talk, I will switch focus towards robust principal component analysis (PCA) algorithms, which are capable of extracting the most informative low-dimensional structure from (grossly corrupted) high-dimensional data. Beyond its ties to robust statistics, the developed outlier-aware PCA framework is versatile to accommodate scalable algorithms to: i) track the low-rank signal subspace as new data are acquired in real time; and ii) determine principal components robustly in (possibly) infinite-dimensional feature spaces. Synthetic and real data tests corroborate the effectiveness of the proposed robust PCA schemes, when used to identify aberrant responses in personality assessment surveys, as well as unveil communities in social networks, and intruders from video surveillance data.