Informs Annual Meeting Phoenix 2018

First page Previous page 134 Next page Last page

INFORMS Phoenix – 2018

MA39

2 - A Finite Time Analysis of Temporal Difference Learning Dan Russo, Columbia University, New York, NY, 10027, United States Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement learning, its theoretical analysis has proved challenging and few guarantees on its statistical efficiency are available. In this work, we provide a simple and explicit finite time analysis of temporal difference learning with linear function approximation. Except for a few key insights, our analysis mirrors standard techniques for analyzing stochastic gradient descent algorithms, and therefore inherits the simplicity and elegance of that literature. 3 - Ensemble Sampling Xiuyuan Lu, Stanford University, Stanford, CA, United States, Benjamin Van Roy Thompson sampling has emerged as an effective heuristic for a broad range of online decision problems. In its basic form, the algorithm requires computing and sampling from a posterior distribution over models, which is tractable only for simple special cases. In this talk, I will give a brief introduction on ensemble sampling, a heuristic that aims to approximate Thompson sampling while maintaining tractability even in the face of complex models such as neural networks. Ensemble sampling dramatically expands on the range of applications for which Thompson sampling is viable. I will talk about recent progress as well as some applications in digital marketing. 4 - Thompson Sampling with Information Relaxation Penalties Seungki Min, Columbia University, New York, NY, United States, Costis Maglaras, Ciamac Cyrus Moallemi We propose a general framework for improving Thompson sampling by applying penalties. At each decision epoch, instead of choosing the action myopically optimized to the sampled values of the unknown parameters, we incorporate an additional penalty to compensate for the information relaxation that arises assuming that these parameters are known. We illustrate the flexibility and performance of our method with examples. n MA38 North Bldg 225B APS Session Subramanian Sponsored: Applied Probability Sponsored Session Chair: Vijay Subramanian, University of Michigan, Ann Arbor, MI, 48109, United States 1 - Mode-suppression: A Simple and Provably Stable Chunk-sharing Algorithm for Peer-to-peer Networks Srinivas Shakkottai, Texas A&M University, College Station, TX, United States, Vamseedhar Reddyvari, Parimal Parag The ability of a P2P network to scale up throughput with increasing peer arrival rate depends on the chunk sharing policy. Some policies can result in low frequencies of a chunk, leading to system instability. Recent efforts have focused careful chunk boosting to mitigate this issue. We take a complementary view, and instead recommend preventing sharing of the most frequent chunk(s). We refer to this policy as mode suppression. We prove system stability using Lyapunov techniques, and also show that that the total download time under our policy does not depend on peer arrival rate. Finally, we design distributed heuristics, and show numerically that mode suppression outperforms all other policies. 2 - On Information State and Sequential Decomposition in Decision Problems with Strategic and Non-strategic Agents Hamidreza Tavafoghi, University of California-Berkeley, Berkeley, CA, United States, Ouyang Yi, Demosthenis Teneketzis We study dynamic multi-agent decision problems with asymmetric information. We propose the notion of sufficient information that provides a compression of agents’ private and common information in a mutually consistent manner. We show that restriction to sufficient information-based (SIB) strategies is without loss of generality when agents are not strategic (i.e. teams), and provide a sequential decomposition of dynamic teams over time. When agents are strategic (i.e. games), we show that the class of SIB strategies is closed under the best response map. Consequently, we propose a notion of SIB equilibrium and provide a sequential decomposition of dynamic games over time. 3 - On Communication in Multi-agent Reinforcement Learning Parinaz Nagizadeh, Purdue University, West Lafayette, IN, United States, Maria Gorlatova, Andrew Lan, Mung Chiang, Mung Chiang We study the role of communication in a multi-agent reinforcement learning problem, where agents aim to learn to coordinate their actions and maximize their rewards through repeated interactions with an unknown, partially observable environment. We assume agents can communicate their partial state observations, and identify two benefits of this information sharing when agents’ observation quality is heterogeneous: it facilitates coordination among them, and further enhances the learning rate of both better informed and less informed

agents. We show however that these benefits will depend on the communication timing, in that delayed information sharing may be preferred in certain scenarios. 4 - Local Weak Convergence Applied to a Local Preferences Based Random Graph Model Vijay Subramanian, University of Michigan, 1301 Beal Avenue, Random graph models are used to model individual preferences innate to graph construction. Each model proposes a unique construction that covers some properties of real-world networks. We study asymptotics of a new model that uses the self-optimizing behavior of individuals where the cost/benefit of a link factors in its existence: prove the model locally weakly converges to the rooted tree of a branching process, the Erlang Weighted Tree(EWT), and study the EWT. This allows us to study phase transitions to giant components of the graph family when direct and well-studied methods (first-moment and second-moment) do not apply, owing to the dependency structure begot by self-optimization. n MA39 North Bldg 226A Modern Scheduling Sponsored: Applied Probability Sponsored Session Chair: Alan Scheller-Wolf, Tepper School of Business, Tepper School of Business, Pittsburgh, PA, 15213, United States Co-Chair: Mor Harchol-Balter, Carnegie Mellon University, Pittsburgh, PA, 15213, United States 1 - Soap: One Clean Analysis of all Age-based Scheduling Mor Harchol-Balter, Carnegie Mellon University, 5000 Forbes Avenue, Computer Science Dept, Pittsburgh, PA, 15213, United States, Ziv Scully, Alan Scheller-Wolf We consider an extremely broad class of M/G/1 scheduling policies called SOAP: Schedule Ordered by Age-based Priority. The SOAP policies include most scheduling policies in the literature as well as an infinite number of variants which have never been analyzed, or maybe not even conceived. SOAP policies range from classic policies, like Class-based-Priority, Foreground-Background (FB), and Shortest-Remaining-Processing-Time (SRPT), to much more complicated scheduling rules, such as the famously complex Gittins index policy and Shortest-Expected-Remaining-Processing-Time (SERPT), which have resisted analysis. We present a universal analysis of response time for all SOAP policies. 2 - Shortest Remaining Processing Time for Multiserver Systems Isaac Grosof, Carnegie Mellon University, Pittsburgh, PA, United States, Mor Harchol-Balter, Ziv Scully The Shortest Remaining Processing Time (SRPT) scheduling policy and its variants have been extensively studied. While beautiful results are known for single-server SRPT, much less is known for multiserver SRPT. In particular, stochastic analysis of the M/G/k under multiserver SRPT is entirely open. We give the first stochastic analysis bounding mean response time of the M/G/k under multiserver SRPT. Using our response time bound, we show that multiserver SRPT has asymptotically optimal mean response time in the heavy-traffic limit. Beyond SRPT, we prove similar response time bounds and optimality results for several other multiserver scheduling policies. 3 - Optimal Scheduling and Exact Response Time Analysis for Multistage Jobs Ziv Scully, Carnegie Mellon University, 5000 Forbes Avenue, GHC 7121, Pittsburgh, PA, 15213, United States, Mor Harchol-Balter, Alan Scheller-Wolf Scheduling to minimize mean response time of the M/G/1 queue is usually addressed in one of two cases: that of known job sizes, in which SRPT is optimal, and that of unknown job sizes, in which the more complex Gittins index policy is optimal. However, in real systems we often have partial job size information. We introduce a new model of multistage jobs to capture this case. A multistage job has multiple stages of unknown size, but the scheduler always knows which stage is in progress. We give an optimal algorithm for scheduling multistage jobs and exactly analyze its mean response time. As a special case, we obtain a closed-form Mark S. Squillante, IBM Research, Yorktown Heights, NY, 10598, United States, Yingdong Lu, Siva Theja Maguluri, Tonghoon Suk We consider issues related to optimal scheduling in input-queued switches, which are widely used in modern computer and communication networks. Theoretical properties are established and computational experiments are presented to demonstrate the benefits of our approach. #4112 EECS, Ann Arbor, MI, 48109, United States, Mehrdad Moharrami, Mingyan Liu, Rajesh Sundaresan expression for mean response time of the Gittins index policy. 4 - On Optimal Scheduling in Input-queued Switches

133

Made with FlippingBook - Online magazine maker