2015 Informs Annual Meeting

SD32

INFORMS Philadelphia – 2015

SD30 30-Room 407, Marriott “Speed Networking” Coordination of Subdivisions’ Interests Sponsor: CPMS Sponsored Session Chair: Doug Samuelson, InfoLogix, Inc., 8711 Chippendale Court, Annandale, VA, 22003, United States of America, samuelsondoug@yahoo.com 1 - “Speed Networking” Coordination of Subdivisions’ Interests Doug Samuelson, InfoLogix, Inc., 8711 Chippendale Court, Annandale, VA, 22003, United States of America, samuelsondoug@yahoo.com We imitate “speed networking” events in which couples spend ten minutes conversing, then switch partners, allowing for eight or nine such meetings. This allows subdivision officers to learn about other subdivisions with similar interests, promote coordination of sessions, reduce schedule conflicts, and possibly collaborate outside the annual meeting. All subdivision officers are encouraged to attend and participate. The organizer will arrange pairings, following participants’ preferences. Chair: Shouyi Wang, Assistant Professor, University of Texas at Arlington, 3105 Birch Ave, Grapevine, TX, 76051, United States of America, shouyiw@uta.edu 1 - Co-clustering Based Dual Prediction for Cargo Pricing Optimization Yada Zhu, Research Staff Member, IBM, Thomas J. Watson Research Center, 1101 Route 134 Kitchawan Rd, Yorktown Heights, NY, 10598, United States of America, yzhu@us.ibm.com In the air cargo business, given the features associated with a pair of origination and destination, how can we simultaneously predict both the optimal price for the bid stage and the outcome of the transaction (win rate) in the decision stage? In this paper, we propose a probabilistic framework and a COCOA algorithm to simultaneously construct dual predictive models and uncover the co-clusters of originations and destinations. 2 - An Efficient Orthogonal-polynomial-based Approach for Time Series Representation and Prediction Shouyi Wang, Assistant Professor, University of Texas at Arlington, 3105 Birch Ave, Grapevine, TX, 76051, United States of America, shouyiw@uta.edu We present a new efficient time series representation and prediction framework, called orthogonal-polynomial-based variant-nearest-neighbor (OPVNN) approach, for complex and highly nonlinear time series data. The proposed approach achieved the most robust prediction performance compared to the state- of-the-art time series modeling and prediction methods for the challenging respiratory motion prediction problem. It has a great potential to handle complex time series data streams efficiently. 3 - Online Social Network (OSN) Fake Account Detection System with Cluster Level Features Danica Xiao, PhD Candidate, University of Washington, Seattle, 3900 Northeast Stevens Way, Seattle, WA, 98195, United States of America, xiaoc@uw.edu Most online social networks (OSN) are often faced with users with undesired activities during the network’s growth and expansion. Most of them are malicious. Many of malicious activities start with fake accounts (aka “sybil accounts”) attack. This paper presents a supervised learning based system to address such challenge. SD31 31-Room 408, Marriott Data Analytics and Statistical Learning Sponsor: Data Mining Sponsored Session

4 - Unsupervised Data Mining for Medical Fraud Detection Tahir Ekin, Assistant Professor, Texas State University, 601 University Dr. McCoy Hall 411, San Marcos, TX, 78666, United States of America, t_e18@txstate.edu, Greg Lakomski, Rasim Muzaffer Musal U.S. governmental agencies report that three to ten percent of the annual health care spending is lost to fraud, waste and abuse. These fraudulent transactions have direct cost implications to the tax-payers, in addition to diminishing the quality of the medical services. This talk discusses the use of unsupervised data mining approaches such as latent Dirichlet allocation for medical fraud detection. Our main objective is to identify the billing behaviors and find providers that are outliers. SD32 32-Room 409, Marriott Computational and Statistical Challenges in Big Data Genomics Cluster: Big Data Analytics in Computational Biology/Medicine Invited Session Chair: Li-San Wang, Associate Professor, University of Pennsylvania, 423 Guardian Drive, 1424 Blockley Hall, Philadelphia, PA, 19104, United States of America, lswang@upenn.edu 1 - Big Data Analyses Reveal Many New Short Non-coding RNAs in Health and Disease Isidore Rigoutsos, Professor, Computational Medicine Center, Jefferson Medical College, Thomas Jefferson University, 1020 Locust Street, Suite #M81, Philadelphia, PA, 19108, United States of America, isidore.rigoutsos@jefferson.edu By analyzing transcriptomic datasets from healthy individuals and patients we have uncovered numerous novel regulatory non-coding RNAs. These molecules include novel microRNAs, isoforms of microRNAs, fragments of transfer RNAs (tRNAs), and other. Importantly, we find that these molecules’ composition and abundances are dependent on an individual’s race, population, and gender as well as on tissue, tissue state and disease subtype. 2 - Awsomics: A Knowledge Discovery Infrastructure Based on Annotated Genomic Data Zhe Zhang, Bioinformatics Scientist, Children’s Hospital of Philadelphia, 3535 Market Street, Suite 1067, Philadelphia, PA, 19104, United States of America, zhangz@email.chop.edu Knowledge discovery is adversely lagging behind data and information generation in the field of genomic research. To assist biomedical researchers to digest the overwhelming amount of genomic data, we developed a system based on Amazon Web Service. It includes an archive of curated data and results, various methods supporting integrative analysis, and a web-based toolbox. It will be a valuable resource for biomedical researchers to gain novel insights about the complicated biological systems. 3 - Quality Control of Whole Genome and Exome Data in a Large Sequencing Study of Alzheimer Disease Adam Naj, Instructor, Department Of Biostatistics And Epidemiology, University of Pennsylvania, 423 Guardian Drive, 229 Blockley Hall, Philadelphia, PA, 19104, United States of America, adamnaj@mail.med.upenn.edu The Alzheimer’s Disease (AD) Sequencing Project (ADSP) is an NIH project to sequence 578 familial genomes and 10,692 unrelated exomes of cases and controls to identify causal genomic variants. Here we describe extensive bioinformatics applications in a multi-center quality control effort: performing genotype calling, integrating data from multiple calling pipelines, filtering low- quality samples, and incorporating external annotation to facilitate identifying rare variants affecting AD risk.

129

Made with