Gastronomic lessons from a two-armed bandit

The two-armed bandit is a nice theoretical framework for a large class of problems we encounter in everyday life. You face a strange slot machine with two (or more) arms. You can play for free by pulling a lever... you know each lever will pay you a random amount following a fixed - but unknown - distribution. You start pulling levers, but you realize that unfortunately, you only have an hour to squeeze the most money out of this machine... how should you play?

The exact answer depends on your prior on the payoff distribution of each arm, on your risk aversion, and on your time preference. Finding the optimal solution is rarely doable, however, most solutions follow the same pattern... you start by pulling both arms alternatively - to gather information about the distribution of each arm - then you start pulling mostly the arm that provides you with the best risk/reward return.

Unfortunately, your time is scarce... you have to allocate your hour between two tasks: information gathering and money making, it's called the exploration / exploitation trade-off.

This trade-off is very common.

  • Should patients be given new experimental treatments or more mature treatments?
  • What should you study in college?
  • Should you stay with your girlfriend?
  • Should you quit your job?

All of these involve the decision to forgo a known payoff to discover a potentially greater unknown payoff.
I was recently surprised to realize I was playing this game very poorly when it comes to picking a restaurants or picking a dish on a menu. There are easily more than 10,000 restaurants in Manhattan, yet I found myself going to the same places over and over to order the same dishes over and over. I am not the only one. I observed many people displaying this behavior. The way people pick restaurants is generally reminiscent of a meta-heuristic called stochastic diffusion search. Everyone picks a few restaurants at random and from there discover new restaurants when they are invited by friends who made a different initial pick. This method doesn't work so well since people tend to have lunch/dinner with the same other people.

Generally speaking, the dish-picking or restaurant-picking behavior of most people seem to imply a ridiculously high risk aversion (I want a guaranteed lunch experience) or time preference (the benefits or discovering a better restaurant will mostly extend in the future where I will make more informed choices).

Why is this? I think we have a strong conservative bias when it comes to food. Most food is marketed as "old-fashioned", using "traditional recipes". Menus from Chinese restaurants feature Pagodas, not the Shanghai skyline, and menus from pizzerias often come with Renaissance illustration. You don't see Intel marketing it's CPU's as made in the time tested tradition of silicon wafer artisans.

One possible reason is that food conservatism used to be required for survival. Maybe the red berries are slightly tastier than the black berries; maybe they'll kill me... I think I'll stick with the black berries. This form of conservatism is still alive today, when people favor organic food for example. That may or may not be a rational thing to do; however, when it extends to picking a specific dish on a menu, I'm pretty sure it's an undesirable bias.

Picking dish is always a difficult experience for me. I am tempted by many dishes, but I always fear I will make a wrong decision and forgo the opportunity to have the most delicious dish. Of course, I could always go back to the restaurant, and I often do, but every time feels like it is the last opportunity for me to have the most likely best dish on the menu.

In the multi-armed bandit setting, it means I am always favoring exploitation over exploration. I recently decided to strongly favor exploration. I decided to pick restaurants solely based on customer ratings, not on previous experience. Should I go back to a restaurant I knew, I committed to always try a dish I didn't try before. While I had a few disappointments, I did discover that many of my previous restaurant choices and dish choices were sub-optimal. I have experienced a lot of new restaurants, and very often have I had the feeling "this place is great, I should come back here!", only to realize this is the kind of thinking that led me to avoid the place in the first place. So whenever I like a place, I make a commitment not to come back there for some time.

Can you think of activities where you strongly favor exploitation vs. exploration or the opposite?

Share this

Great post. You should send

Great post. You should send it Tyler Cowen; it's right up his alley as an economist and adventurous food critic.

I too suffer from the two-armed bandit culinary problem. I'm not sure if it's rational behavior, but it stems from my past experience of only liking one or two dishes on a given menu, and being disappointed by the rest. I'm willing to try new things, but when I'm really hungry and the restaurant is expensive, I don't feel like taking the risk. I wish I could be more adventurous, because my usual options are really limited.

Great entry

I learned something. On reflection, I think there is more going on. I am familiar with many restaurants in my area, and I am aware that some of them have better food than others, but despite my knowledge I choose to go to nearby convenient restaurants. This is not a case of exploitation versus exploration because I've already explored the restaurants that I am rejecting in favor of the nearby convenient restaurant. It may be force of habit. It may be a decision to minimize the time and money and mental focus I spend on food.

Oh, and to answer your

Oh, and to answer your question, one area where I favor exploration over exploitation is when playing World of Warcraft. I can't seem to settle on playing a single class or playing on a single server. I am a chronic reroller.

Sex. Porn.

I think men favor exploration over exploitation. I wonder if this comment is going to get caught in the filter. [edit - guess not]

We have a filter?

We have a filter?

hell yeah we do

Blogs cannot survive in this day without spam filters.

Yes, or at least did

The filter somehow assesses my mood, possibly by textual analysis. If on a certain day I'm particularly eager to have my comment show up quickly for whatever reason (maybe a momentarily inflated self-assessment), the filter catches my comment and displays the message that my comment has been held for questioning and will be released when the gods of eleutheria are good and ready, possibly as early as tomorrow if they're feeling particularly indulgent.

I also agree that this was a great post...

...and with the following ideas:

  • Arthur should post more often.
  • Whoever inserted that picture must be a sexy mofo.
  • Sex, from the male point of view, seems like an arena in which exploration is preferred over exploitation.

I'm also an exploiter rather than explorer when it comes to food. I'm too paranoid about not getting my money's worth and go for the sure thing.

That stochastic diffusion method is exactly how my friends and I learned about the restaurants in the small city in which I attended med school. Initially, we tried the restaurants independently. As we became friends, we'd introduce each other to the restaurants we knew and liked. Over 4 years, we had tried every place in the city.

Eventually, we got sick of the food there. We'd start "banning" places for various reasons. For example, I got food poisoning at one of the Chinese restaurants, so I banned it. When the friends got together, it was excluded from the list of potential places to eat. Another friend had a horrible service experience at the Chili's in town. So when the time came to pick where to eat and someone suggested Chili's, "Yeah, but Tim banned it" was the response. This seems like stochastic diffusion in reverse. Eventually we were thoroughly disgusted with just about every place there. Luckily we soon graduated and moved.

Second row, fourth column of

Second row, fourth column of that graphic... organge juice and a deli sandwich?