INFORMS 2021 Program Book
INFORMS Anaheim 2021
SD32
2 - Fair Exploration via Axiomatic Bargaining Jackie W Baek, MIT, Cambridge, MA, 02139-4301, United States, Vivek Farias Exploration is often necessary to maximize long-term reward in online learning, but it comes at the cost of reducing immediate reward. We develop the Nash bargaining solution in the context of `grouped` bandits, which associates each time step with a group from some finite set of groups. The utility gained by a group under some policy is naturally viewed as the reduction in that group’s regret relative to the regret that group would have incurred ‘on its own’. We derive policies that yield the Nash bargaining solution, and we show that the ‘price of fairness’ under such policies is limited, while regret optimal policies are arbitrarily unfair under generic conditions. Our theoretical development is complemented by a case study on contextual bandits for warfarin dosing where we are concerned with the cost of exploration across multiple races and age groups. 3 - Fair Intervention Bundle Design Elisabeth Paulson, Stanford University, Stanford, CA, 02141-1437, United States This work introduces the Fair Product Line Design Problem (FPLDP) in which a service provider must determine the optimal number and set of product/service bundles to offer its users in order to minimize cost while meeting an individual- level fairness constraint. The fairness constraint ensures that each users’ resulting utility from their chosen (or assigned) bundle is above a prespecified threshold. This problem arises in settings such as healthcare and public policy (where services can be thought of as interventions or treatments), as well as retail settings in which fair outcome guarantees are desirable. We formulate FPLDP as a mixed- integer non-linear program, and develop a class of approximation algorithms for this problem whose solutions correspond to different trade-offs between robustness and cost. 4 - Taming Wild Price Fluctuations: Monotone Stochastic Convex Optimization with Bandit Feedback Jad Salem, Georgia Institute of Technology, Atlanta, GA, 30318- 5608, United States, Swati Gupta, Vijay Kamble Prices generated by automated price experimentation often display erratic fluctuations which can be perceived as unfair and may erode a customer’s trust. To address this concern, we propose demand learning under a monotonicity constraint on the sequence of prices. We give the first known sublinear-regret algorithms for monotonic price experimentation for smooth and strongly concave revenue functions under bandit and first-order feedback. Our key innovation is to utilize conservative gradient estimates to adaptively tailor the degree of caution to local gradient information. Importantly, we show that our algorithms achieve best-possible regret bounds up to logarithmic factors.This is joint work with Swati Gupta and Vijay Kamble. SD34 CC Room 209B In Person: Simulation and Reinforcement Learning General Session Chair: Ankit Shah 1 - Boosted Nonparametric Hazards with Time-dependent Covariates Donald Lee, Associate Professor, Emory University, Atlanta, GA, United States, Ningyuan Chen, Hemant Ishwaran Survival analysis permeates all fields of science, and in operations it manifests itself in the context of reliability analysis and queuing transition rates. This talk introduces a rigorous solution to a central problem in survival analysis: Estimating hazard functions nonparametrically in the presence of high-dimensional, time- dependent covariates. This is particularly relevant to healthcare analytics, given the availability of high-frequency data capture systems embedded within EHRs and wearables. We illustrate the performance of this technique using an open- source implementation called BoXHED. 2 - Dynamic Vulnerability Prioritization Using Deep Reinforcement Learning Soumyadeep Hore, University of South Florida, Tampa, FL, 33613- 4728, United States, Ankit Shah There has been a steep increase in the number of cyber vulnerabilities reported in the national vulnerability database. In response, the vulnerability mitigation strategies employed by the cybersecurity operations centers (CSOCs) have been static and rule based. In addition, due to the uncertainty in the arrivals of new vulnerabilities and their respective mitigation time, the CSOC is unable to optimally identify and prioritize critical vulnerabilities. There also exist a potential temporal threat associated with a vulnerability instance, which the current methods fail to capture. In this talk, we describe a deep reinforcement learning (DRL) approach to triage cyber vulnerabilities, individualized for a CSOC. Results show that the DRL agent can make accurate decisions by training in a simulated environment, which is powered by real-world vulnerability datasets.
SD32 CC Room 208B In Person: Platform Operations General Session Chair: Pnina Feldman, Boston University, Boston, MA, 2215, United States 1 - Entropy as a Driver of Engagement in Online Discussion Platforms Joseph Carlstein, University of Pennsylvania, Philadelphia, PA, 19104, United States, Gad Allon, Yonatan Gur With the rise of remote work and remote learning, it has become increasingly imperative for firms and educators to facilitate discussions in a clear and organized fashion. There are many possible objectives of these discussions, depending on the situation, from identifying a correct answer to a question, to building consensus, to sparking debate. However, in this presentation, we will focus on determining the key drivers of engagement in a group discussion on a closed online platform, and how the platform can leverage comment-level and discussion-level engagement drivers to design effective practical recommendation algorithms for directing traffic to different parts of the discussion, in order to maximize user engagement. 2 - Managing Customer Search: Assortment Planning for a Subscription Box Service Fernando Bernstein, Duke University, Durham, NC, 27708-9972, United States, Yuan Guo We consider subscription box services where the provider selects assortments of products to match customers’ needs and preferences. Customers choose between actively searching stores and subscribing to the box service. We use a cross-nested logit framework to model the impact of the overlap of products between the two channels on customer choice. We find that the box should include a collection of popular subsets of store products for customers experiencing either low or high search costs. We further explore box service strategies regarding exclusive brands and multiple product categories. 3 - Strategic Choices and Routing within Service Networks: Modeling and Estimation Using Machine Learning Kenneth Moon, University of Pennsylvania, Philadelphia, PA, 19104-6340, United States Service networks with open routing by self-interested customers have drawn attention in the theoretical literature. However, these networks, which range from shopping centers to amusement parks, remain challenging to explore empirically. Large-scale trajectory datasets offer new opportunities to understand customer motivations and behaviors but are complex to analyze. We develop structural empirical methods to recover customer demand preferences and congestion sensitivities from diverse trajectory patterns using machine learning. We employ adversarial neural networks to handle the high-dimensional space of (combinatorially many) trajectory types, collapse the dynamics of customer trajectory choices into static trajectory market shares, and derive theoretically efficient incentive-compatibility bounds on customers’ preferences. 4 - Contextual Pareto Bandit under Covariate Shift Apurv Shukla, Columbia University, New York, NY, 10025-1868, United States We consider the contextual bandit problem under covariate shift and vectorial rewards. We propose a tree-based policy that maintains separately discretizes action and covariate spaces. For vectorial feedback, we use Contextual Pareto regret as the performance metric of the proposed policy. We establish an upper bound on the performance of the proposed policy for multiple-models of covariate shift including single, multiple and smoothly varying context distributions. Finally, the efficacy of the proposed policy is described on a suite of numerical experiments. SD33 CC Room 209A In Person: Fairness in Data-Driven Operations General Session Chair: Vivek Farias, MIT, Cambridge, MA, 02142-1508, United States 1 - Stateful Offline Contextual Policy Evaluation and Learning Angela Zhou, Cornell University ORIE, 206 Rhodes Hall, Ithaca, NY, 14853-3801, United States We study off-policy evaluation and learning from sequential data that arise from repeated interactions with an exogenous context arrivals with unknown individual-level responses to agent actions that induce known transitions. This model is an offline generalization of contextual bandits with resource constraints. We adapt single-timestep doubly-robust estimation to this setting so that a state- dependent policy can be learned even from a single timestep’s worth of data. We study uniform convergence for off-policy learning, which can be viewed as a model-based approach in the marginal MDP.
40
Made with FlippingBook Online newsletter creator