Dr. James Cox & Dr. Zheng Zhao, SR. Dev Manager, SAS Text Miner
SAS Institute Inc.
General Approaches for working with large, sparse feature sets
Dr. James Cox
Many different application areas utilize large, sparse feature sets, including text, genomic SNPs, image processing, recommendation engines, sensor data, web logs, and item purchase data. Since most machine learning utilizes an observation by variable format, researchers utilizing such feature sets often end up inventing their own techniques; but limiting it to their application area. I propose that we develop a general toolkit for working with such data, and will present some general ideas of what this might look like.
Massively Parallel Feature Selection: An Approach Based on Variance Preservation
Dr. Zheng Zhao
We present a novel large-scale feature selection algorithm that is based on variance analysis. The algorithm selects features by evaluating their abilities to explain data variance. The algorithm was implemented as a SAS High-Performance Analytics procedure, which can read data in distributed form and perform parallel feature selection in both symmetric multiprocessing mode (SMP) and massively parallel processing mode (MPP).
James A. Cox has been the development manager for SAS Text Miner ever since its inception twelve years ago. Jim holds a Ph.D. in Cognitive Psychology and Computer Science from UNC-Chapel Hill.
Zheng Zhao received the BEng and Meng degrees in computer science and engineering from Harbin Institute of Technology (HIT) and the PhD degree in computer science and engineering from Arizona State University (ASU).