Informs Annual Meeting 2017

TB71

INFORMS Houston – 2017

TB70

TB71

371E Data Mining Contributed Session Chair: Gianluca Gazzola, Rutgers University, Piscataway, NJ, United States, ggazzola@rutgers.edu 1 - Deep Causal Inference Yuanyuan Shen, Stanford Graduate School of Business, 10 Comstock Circle, Apt 325, Stanford, CA, 94305, United States, yyshen@stanford.edu Kiva is an online non-profit microfinance platform that raises funds for the poor. To raise funds as fast as possible, borrowers have the option to form groups and post loan requests in the name of groups. While group loans pose less risk for investors than individual loans do, we study whether this is the case in a philanthropic online marketplace. We measure the effect of group loans on funding time and control for the loan sizes and other factors. Because loan descriptions play an important role in lenders’ decision process on Kiva, we make use of this information through deep learning in natural language processing. We find that on average, forming group loans speeds up the funding time by at least two days. 2 - Uber Demand Modeling Amir Mousavi, George Washington University, 475 K.St NW, Apt 419, Washington, DC, 20001, United States, ahmn00@gmail.com Undoubtedly, UBER has become one of the most significant startups of the 21st century. Despite the simple and user-friendly interface, UBER runs complex processes behind the scenes. it is crucial to understand the demand; more specifically, the demand distribution in order to calculate the optimal price. This project uses historical data to determine the demand distribution at given times and locations. 3 - Analysis of Feature Extraction and Feature Selection Impact on Angiographic Disease Diagnosis Accuracy Amirhossein Koneshloo, Texas Tech University, Lubbock, TX, United States, amir.koneshloo@ttu.edu This study investigates the importance of appropriate Feature Extraction (FE) and Feature Selection (FS) to improve Angiographic disease diagnosis accuracy with multivariate statistical analysis such as kernel PCA. The data for this analysis comes from the Cleveland Clinic Foundation. Widely-used techniques, i.e. Support Vector Machine and Cross Validation, are applied for data classification and performance evaluation, respectively. 4 - Frequent Temporal Pattern Mining with Extended Lists Anton Kocheturov, Research Assistant, Department of Industrial and Systems Engineering, UF, 2600 SW. Williston Road, Apt 707, Gainesville, FL, 32608-3949, United States, antrubler@gmail.com In this paper we consider Temporal Pattern Mining (TPM) for extracting predictive class-specific patterns from multivariate time series. We suggest a new approach that extends usage of the apriori property which requires a more complex pattern to appear only at places where all its subpatterns appear as well. It is based on tracking positions of a pattern inside records in a greedy manner. We demonstrate that it outperforms the previous version of the TMP on several real-live data sets independent of the way how the temporal pattern is defined. 5 - Incremental Feature Importance Ranking for Mixed Classification Tasks Alaleh Razmjoo, PhD Student, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL, 32816, United States, Orlando, FL, 32816, United States, alaleh.razmjoo@knights.ucf.edu Online data mining methods are normally adaptations of classical data mining techniques for learning in an incremental fashion. In this paper, we present a method for online feature importance ranking in online classification tasks. In this method, the merit of features is based on their impact on the classification outcome. Unlike available algorithms, our method could be applied for both continues and categorical features, without a need to discretization of the input. To evaluate the merits of the proposed method, we performed experiments on benchmark datasets. Primary experiments show promising results compared with available methods. 6 - Clustering-conditional-importance Variable Selection with Random Forests Gianluca Gazzola, Rutgers University, 100 Rockafeller Road, 5th floor, Piscataway, NJ, 08854, United States, ggazzola@rutgers.edu We present a variable selection algorithm that employs a novel variant of random-forest conditional-permutation-importance measure. This measure permutes observations within groups obtained by clustering, via a criterion defined by the structure of dependencies existing among variables, in order to eliminate unimportant and/or redundant predictors. The algorithm outperforms several other permutation-importance variable selection algorithms on both artificial and real-world data sets.

371F Black-box or Simulation Optimization Sponsored: Optimization, Global Optimization Sponsored Session Chair: Zelda B Zabinsky, University of Washington, Seattle, WA, 98195-2650, United States, zelda@u.washington.edu 1 - A Two-time-scale Random Search Algorithm for Global Optimization Qi Zhang, Stony Brook University, SUNY, Stony Brook, NY, 11794, United States, qi.zhang.1@stonybrook.edu, Jiaqiao Hu We propose a random search algorithm for solving black-box optimization problems. The algorithm iteratively finds improved solutions by modifying and sampling from a probability distribution over the solution space. In contrast to existing algorithms in the class, which are mostly population-based, our approach employs a two-time-scale stochastic approximation idea and uses only a single candidate solution per iteration. We establish global convergence of the algorithm and present numerical results to illustrate its performance. 2 - An Epsilon-constraint Method for Integer-ordered Bi-objective Simulation Optimization Kalyani S. Nagaraj, Oklahoma State University, 322 Engineering North, Industrial Engineering & Mgmt, Stillwater, OK, 74078, United States, kalyani.nagaraj@okstate.edu, Kyle Cooper, Susan R.Hunter Consider the context of integer-ordered bi-objective simulation optimization, in which the feasible region is a finite subset of the integer lattice. We propose a retrospective approximation (RA) framework to identify a local Pareto set that involves solving a sequence of sample-path bi-objective optimization problems at increasing sample sizes. We apply the epsilon-constraint method to each sample- path bi-objective optimization problem, thus solving a sequence of constrained single-objective problems in each RA iteration. Our algorithm displays promising numerical performance. 3 - Parallel Simultaneous Perturbation Optimization Atiye Alaeddini, Institute for Disease Modeling, 3150 139th Ave SE, Bellevue, WA, 98005, United States, aalaeddini@idmod.org, Atiye Alaeddini Optimization is a common problem in stochastic simulations. The objective function, however, is time-intensive to evaluate, and cannot be directly measured. Individual realizations of the model are corrupted by noise. We consider the problem of optimizing the expected value of an expensive black-box function, from which observations are corrupted by Gaussian noise. We present Parallel Simultaneous Perturbation Optimization (PSPO), which takes advantage of parallel computing resources, like high-performance cloud computing. PSPO algorithm takes fewer time-consuming iterations to converge, automatically chooses the step size, and can vary the error tolerance by step. 4 - Sampling within Adaptive Level Set Approximations for Black-box Global Optimization David Desmond Linz, University of Washington, 5246 NE 15th Avenue, Apt 101, Seattle, WA, 98105, United States, ddlinz@gmail.com, Zelda B.Zabinsky Several random search approaches (e.g., pure adaptive search) have been shown to be effective for global optimization with expected performance that increases linearly with dimension. We explore a practical implementation that uses models in order to sample from a series of nested level set approximations. The paper describes a method for adjusting quantile levels and parameters to achieve a probability of successfully sampling within each level set. We explore the application of this framework to various model-based sampling methods. We discuss theoretical bounds for convergence and pair the results with numerical examples.

329

Made with FlippingBook flipbook maker