What recommendation algorithms are optimal in (MAB) settings when they compete, and evaluates the intertemporal welfare effect of competition

Lead Research Organisation: University of Warwick
Department Name: Economics

Abstract

Which consumers gain when recommendation services like Trivago compete? What recommendation strategy should an entrant pick? This research project explores what recommendation algorithms are optimal in multi-armed bandit (MAB) settings when they compete, and evaluates the intertemporal welfare effect of competition.

MABs are a popular tool to model explore-exploit trade-offs - from medical trials to the internet economy. A recommender tries to gather information on payoff distributions of different products by persuading consumers to choose the desired good. When a recommender acts in isolation, for example, Google's ad-revenue-maximization algorithm, the problem is well studied. However, little attention has been given to situations where agents are faced with recommendations from multiple algorithms. In other words, the existing literature has explored the MAB problem when customers choose between products rather than product-recommender pairs.

When algorithms compete for customers, two effects impact consumer utility, (i) for a given set of algorithms, information acquisition is slower as each algorithm observes a strict subset of the available information, and (ii) different algorithms may be chosen in case of no competition. Smarter algorithms will reduce cumulative regret from persistent suboptimal choices but have a lower expected utility for earlier customers due to information acquisition. They are also computationally costlier. All of the above raises interesting questions about the intertemporal distribution of consumer utility.

My research question relates directly to two recent papers. In 'Competing Bandits: Learning Under Competition', Mansour et al. (2018) derive some analytical solutions under very restrictive ad hoc assumptions. Aridor et al. (2019) consider a more natural setting but use simulations to approximate the optimal algorithm in 'The Perils of Exploration under Competition: A Computational Modeling Approach'. A common feature of their models is that customers only receive one recommendation. To model setups like the internet economy, where checking multiple websites is effectively costless, I instead assume that each consumer observes a recommendation from each algorithm and chooses a product thereafter. Competition arises because the consumer only reveals their experience to one algorithm.

I plan to solve for the optimal algorithm analytically. Which algorithm is optimal will depend heavily on what customers know. In the Mansour et al. (2018) setup, the basic greedy algorithm wins, but in Aridor et al. (2019), for long enough horizons, more sophisticated algorithms beat the greedy one. Similarly, assumptions on the information principals can observe are extremely important. The more principals can observe, the greedier I expect the optimal algorithm to be in any non-cooperative equilibrium. I plan to supplement the theoretical findings with simulations where appropriate. The research has important implications for policymakers and regulators of online markets.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
ES/P000711/1 01/10/2017 30/09/2027
2570600 Studentship ES/P000711/1 01/10/2021 31/03/2026 Nick Scholz