Informs Annual Meeting Phoenix 2018
INFORMS Phoenix – 2018
TB39
2 - Asymptotically Optimal Multi-armed Bandit Policies Under Side Constraints Odysseas Kanavetas, Koc University, Koc University Campus, Istanbul, Turkey, Apostolos Burnetas, Michael N. Katehakis We develop asymptotically optimal policies for the multi armed bandit (MAB), problem, under side constraints. Such models are applicable in situations where each sample (or activation) from a population (bandit) incurs known bandit dependent costs. We consider the class of feasible uniformly fast (UF) convergent policies, that satisfy sample path wise the constraints. We first establish a necessary asymptotic lower bound for the rate of increase of the regret function of UF policies. Then we provide conditions under which a simple class of UF policies attain this lower bound and are asymptotically optimal within the class of UF policies. 3 - Generative Networks for Data Modeling in Sequential Change Detection George Moustakides, Professor, Rutgers University, 110 Frelinghuysen Road, Hill 359, Piscataway, NJ, 08854, United States One of the most important problems in applications is data modeling. When data are time-dependent, we limit ourselves to linear AR or ARMA models therefore silently assuming that the data are Gaussian. Recently, in Machine Learning, nonlinear neural network, known as Generative Networks, were introduced that can be properly trained to capture non-Gaussian behavior. In this work we extend this idea to represent nonlinear time-dependencies instead of simple independent realizations which is the current practice. Our method is then applied to the problem of sequential change detection for the rapid detection of changes in the statistical behavior of observed processes. 4 - Reinforcement Learning: Connections Between MDPS and MAB Problems Michael N. Katehakis, Distinguished Professor, Rutgers University, 100 Rockafeller Road, Piscataway, NJ, 08854, United States, Wesley Cowan, Daniel Pirutinsky This talk considers a basic reinforcement learning model dealing with adaptively controlling an unknown Markov Decision Process (MDP), in order to maximize the long-term expected average value. We show how a factored representation of the MDP problem allows it to be decoupled into a set of individual MAB-problems on a state by state basis. Additionally, i) we show the construction of a simple UCB-type MDP policy, dramatically simplifying an earlier proof of its optimality, and ii) we discuss extensions to other MAB policies e.g., Thompson Sampling. n TB39 North Bldg 226A Online Optimization and Probability Sponsored: Applied Probability Sponsored Session Chair: Alessandro Arlotto, Duke University, Durham, NC, 27708, United States 1 - On Policies for Single-leg Revenue Management with Limited Demand Information Will Ma, Google, Cambridge, MA, United States, David Simchi-Levi, Chung-Piaw Teo In this paper we study the single-leg revenue management problem, with no information given about the demand trajectory over time. The competitive ratio for this problem has been previously established under the assumption of independent demand, i.e. demand from higher fare classes do not spill over to lower fare classes. We extend these results to general demand, by using a price- skimming technique. That is, we derive price-skimming policies which stochastically-increase their price distributions as inventory decreases, in a way that yields the best-possible competitive ratio. A key technical ingredient in our paper is a new ``valuation tracking’’ subroutine. 2 - Adaptive Policies for the Dynamic and Stochastic Knapsack Problem with Equal Values Xinchang Xie, Duke University, 100 Fuqua Drive, Durham, NC, 27708, United States, Alessandro Arlotto We study a dynamic and stochastic knapsack problem in which a decision maker is sequentially presented with n items and needs to select which items to include in a knapsack with fixed capacity. Arriving items have non-negative, independent sizes with common continuous distribution, and every decision is terminal upon arrival. The goal is to maximize the expected number of selections under capacity constraint. Under mild regularity conditions on the distribution, we propose adaptive online policies that are within O(log n) of optimal. Moreover, the limiting distribution of the number of selections under such policies is the same as the distribution that one obtains by implementing an optimal policy.
3 - Uniformly Bounded Regret in the Multi-secretary Problem Alessandro Arlotto, Duke University, 100 Fuqua Drive, Durham, NC, 27708, United States, Itai Gurvich In the secretary problem of Cayley (1875) and Moser (1956), n non-negative, independent, random variables with common distribution are sequentially presented to a decision maker who decides when to stop and collect the most recent realization. The goal is to maximize the expected value of the collected element. In the k-choice variant, the decision maker is allowed to make k=n selections to maximize the expected total value of the selected elements. Assuming that the values are drawn from a known distribution with finite support, we prove that the regret-the expected gap between the optimal online policy and its offline counterpart-is uniformly bounded in the the number of candidates n and the budget k. 4 - The Bayesian Prophet Framework for Low Regret Online Decision-making Siddhartha Banerjee, Cornell University, 229 Rhodes Hall, Ithaca, NY, 14853, United States We propose a new framework for online policies for packing problems with stochastic arrivals, based on access to an oracle that provides statistical information regarding the offline optimal solution. Our policy, which we call the Bayes Selector, is a simple greedy policy for general online decision-making problems based on such an oracle. We provide a general technique for deriving expected regret bounds for this policy, and prove that in any online packing problem with a discrete type space, the Bayes Selector achieves an expected regret which is independent of the number of arrivals and the starting resource levels. n TB40 North Bldg 226B Research and Funding Trends in Decision Analysis Sponsored: Decision Analysis Sponsored Session Chair: Christian Wernz, Virginia Commonwealth University-VCU, P.O. Box 980203, Richmond, VA, 23298, United States Moderator Christian Wernz, Virginia Commonwealth University-VCU, P.O. Box 980203, Richmond, VA, 23298, United States Panelists Robin Dillon-Merrill, National Science Foundation, 517 Hariri Building, McDonough School of Business, Washington, DC, 20057, United States Georgia-Ann Klutke, National Science Foundation, 4201 Wilson Boulevard, Arlington, VA, 22230, United States Irina Dolinskaya, National Science Foundation, 2415 Eisenhower Ave, Alexandria, VA, 22314, United States n TB41 North Bldg 226C Environmental Decision Analysis Sponsored: Decision Analysis Sponsored Session Chair: Melissa A. Kenney, University of Maryland, College Park, MD, 20740, United States Co-Chair: Sayanti Mukherjee, Purdue University, West lafayette, IN, United States 1 - A Comparison of Spatial and Random Failures when Simulating Network Outages Benjamin Rachunok, Purdue University, 315 N. Grant Street, West Lafayette, IN, 47906, United States In network-based analyses of infrastructure failures and recovery, small perturbations in initial outages can cascade through the system causing wildly differing end-stage outage patterns. Frequently in these analyses, random nodes are selected for removal to simulate the degradation of system due to tropical storms. In this case we evaluate this random node selection technique against outages which are spatially motivated by the actual outage patterns caused by tropical storms. The resulting outage and recovery patterns are analyzed for differences.
288
Made with FlippingBook - Online magazine maker