2016 INFORMS Annual Meeting Program
TB01
INFORMS Nashville – 2016
Tuesday, 10:00AM - 10:50AM
3 - A New Optimization Model For Supervised Biclustering Problem In Biomedical Dataset Classification Cem Iyigun, Associate Professor, Middle East Technical University, Inonu Blvd, Endustri Muhendisligi, Ankara, 06800, Turkey, iyigun@ie.metu.edu.tr, Saziye Deniz Oguz Arikan Biclustering groups samples and features simultaneously in the given set of data. We focus on a supervised biclustering problem leading to unsupervised feature selection for binary class and multi-class problems. For this problem, we have proposed a new supervised biclustering optimization model which aims to maximize classification accuracy by selecting almost all features. 4 - Large Scale Spectral Partitioning By Simulated Mixing Shahzad Bhatti, University of Illinois at Urbana Champaign, Several problems can be cast as a spectral partitioning problem such as data clustering, graph partitioning, community detection, image segmentation etc. However, computational complexity of eigenvalue decompositions has handicapped application of spectral partitioning to large scale problems. Several algorithms in the recent past focus on accelerating spectral partitioning, however they sacrifice its accuracy to achieve faster speed. Our algorithm on the other hand does not require eigenvalue decomposition, rather it recursively bi- partitions the data by finding an approximate linear combination of eigenvectors of the normalized adjacency matrix of the underlying graph. TB02 101B-MCC Methods for Analysis of Next-Generation Sequencing Data Sponsored: Data Mining Sponsored Session Chair: Paul Brooks, Virginia Commonwealth Univ, Richmond, VA, United States, jpbrooks@vcu.edu 1 - Quality Control For Microbiome Experiments David Edwards, Virginia Commonwealth University, Richmond, VA, United States, dedwards7@vcu.edu, Paul Brooks Microbiome studies aim to understand the role of the bacterial communities in physiology and disease. The primary goal of the Vaginal Microbiome Consortium is to develop methods to facilitate the discovery of patterns in 16S rRNA data and extensive clinical and demographic data as it relates to women’s health. Maintaining internal consistency and understanding measurement variation in microbiome experiments is key to identifying and avoiding batch effects. In this talk, we discuss and illustrate how statistical quality control techniques (and related visualizations) are useful for assessing data consistency across time via positive and negative controls. 2 - Characterizing The Vaginal Microbiome Based On A Large Observational Study Victoria Pokhilko, Virginia Commonwealth University, Richmond, VA, United States, pokhilkovv@vcu.edu, Paul Brooks, David Edwards We conducted an analysis of 16S rRNA surveys of the vaginal microbiome based on samples from over 6,000 women. Vaginal microbiome profiles are typically dominated by a single bacterium, leading to a classification of samples into groups that we call vagitypes. Vagitype classifications facilitate the discovery of relationships between microbiome profile and clinical data. The presence or absence of Lactobacillus species and a diagnosis of bacterial vaginosis have been shown to play an important role in the reproductive health of a woman. Our analysis provides information about these patterns and suggests roles for other bacteria in health and dysbiosis. 3 - Longitudinal Data Analysis Techniques For Analyzing Microbiome Data Eugenie Jackson, University of Wyoming, ejacks20@uwyo.edu Microbiome data is characterized by a high degree of sparseness, a number of observations much smaller than the number of taxa, and often a small set of taxa that dominates the data. Goals of analysis include identifying and characterizing microbiome profiles, discovering relationships between microbial populations and health states, and understanding interdependencies among taxa. Changes in human microbial communities and their respective hosts across time is of fundamental interest. We present an overview of recent longitudinal analysis techniques for microbiome data. We discuss their respective strengths and uses, open problems, and directions for future work. 104 S Mathews Ave,, Urbana, IL, 61801, United States, bhatti2@illinois.edu, Carolyn Beck, Angelia Nedich
Tuesday Plenary
Davidson Ballroom-MCC Big Data and Big Decisions Plenary Session Chair: Shabbir Ahmed, Georgia Tech, shabbir.ahmed@isye.gatech.edu 1 - Big Data And Big Decisions Suvrajeet Sen, University of Southern California, 3715 McClintock Ave, Los Angeles, CA, 90089, United States, s.sen@usc.edu, Suvrajeet Sen Over the past decade, the world of Statistical and Machine Learning have made dramatic in-roads into some of the more challenging AI problems ranging from speech recognition and natural language processing, to bio and health informatics. Both supervised and unsupervised learning methods have exploded in daily use for applications covering business analytics, e-commerce, educational/tutoring systems, and others. In many cases, new models and algorithms have been developed so that the results of learning are also easier to interpret (for a human decision maker). The partnership between AI and human cognition is not new, but its widespread success in recent years has transformed the way we do business today. The combination of modern informatics and high dimensional statistics has often been credited with this transformation. This lecture will not only highlight some successes of Big Data, but also explore settings where human cognition may not provide the best test of decision quality. This new class of problems involves not only Big Data, but also Big Decisions. This lecture will explore the continuum between Big Data and Big Decisions. TB01 101A-MCC Clustering Methods in Data Mining Sponsored: Data Mining Sponsored Session Chair: Majeed Simaan, RPI, 231 Congress Street, Troy, NY, 12180, United States, simaam@rpi.edu 1 - Parable: A Parallel Random Partition Based Hierarchical Clustering Algorithm For The Mapreduce Framework Large datasets, of the order of peta and tera bytes, are prevalent in many scientific domains. To effectively store, query and analyze these gigantic repositories, parallel and distributed architectures are popular. Apache Hadoop is one such parallel framework for supporting data-intensive applications. In this paper, we present a PArallel, RAndom-partition Based hierarchicaL clustEring algorithm for the MapReduce framework on Hadoop. It proceeds in two steps - local hierarchical clustering on nodes and integration of results by a novel dendrogram alignment technique. Empirical results indicate that significant scalability benefits Haimonti Dutta, University at Buffalo, 325P Jacobs Management Center, Buffalo, NY, 14260, United States, haimonti@buffalo.edu Amin Ariannezhad, Graduate Research Assistant, University of Arizona, 1209 E. 2nd Street, Tucson, AZ, 85719, United States, ariannezhad@email.arizona.edu, Yao-Jan Wu This study aims to identify the meaningful patterns of errors observed in traffic data collected from dual loop detectors in Phoenix, Arizona. A set of data quality control criteria was implemented to calculate the percentage of different types of errors observed during each day of data for each loop detector. K-mean clustering method was then utilized to cluster the 15 possible error categories in the data detected in each loop detector on daily basis. Seven significant patterns were found in these errors based on the relationship between them. Findings from the field visit revealed that the clustering method could successfully find different meaningful patterns in data errors. Tuesday, 11:00AM - 12:30PM can be obtained while maintaining good cluster quality. 2 - Clustering The Traffic Data Errors Using K-mean Clustering Method
264
Made with FlippingBook