Talk:Multi-armed bandit/Archives/2016

Latest comment: 7 years ago by Dlougach in topic UCB description

Whittle and the war

There was a recent request to verify what was/is attributed to Whittle (1979). What he says is ..

"As I said the problem is a classic one; it was formulated during the war, and efforts to solve it so sapped the energies and minds of Allied analysts that the suggestion was made that the problem be dropped over Germany, as the ultimate instrument of intellectual sabotage."

So this corresponds to what is said in the article. Melcombe (talk) 10:40, 3 August 2011 (UTC)

I've seen the same story, but with the "weighing pennies" problem as the timewaster. This account seems more plausible, since that problem is simple enough for lots of people to spend time on it, and it's a pure distraction. I'll edit it to attribute the claim to Whittle.JQ (talk) 21:01, 14 May 2012 (UTC)
I accidentally removed this claim because I hadn't read the talk page (oops!) and the reference was incorrect (pointed to the initial Gittins paper, which clearly did not make the claim). I am going to restore it now Giuseppe Burtini (talk) 23:37, 11 December 2014 (UTC)
An excerpt from my M.Sc. thesis follows for further discussion on this attribution and the origin of the multi-armed bandit problem:
"Thompson (1933) provides an answer to a related question: how to identify the probability of a distribution being better than all others from a set of distributions, and has thusly been sometimes credited as the origin of the multi-armed bandit.
Even more confounding on the origins of the multi-armed bandit, Dr. Peter Whittle said in review of the 1979 paper of Gittins [67] the following:
'As I said, the problem is a classic one; it was formulated during the war, and efforts to solve it so sapped the energies and minds of Allied analysts that the suggestion was made that the problem be dropped over Germany, as the ultimate instrument of intellectual sabotage. In the event, it seems to have landed on Cardiff Arms Park. And there is justice now, for if a Welsh Rugby pack scrumming down is not a multi-armed bandit, then what is?'
As World War II ended in 1945, this provides evidence that the problem was under discussion at least privately by the military if not elsewhere prior to the Robbins (1952) paper. Robbins (1952) is the first indexed paper to call the problem the multi-armed bandit and provides a formulation similar to the formulation used to date." Giuseppe Burtini (talk) 03:21, 6 August 2015 (UTC)

As noted above, this quote comes from Peter Whittle's review of the paper of John Gittins at the Royal Society and can be found here http://www.eecs.berkeley.edu/~russell/classes/cs294/s11/readings/Gittins:1979.pdf at page 165 (page 19 of the PDF file) Pychron (talk) 08:49, 18 March 2016 (UTC)

Markovian Setting

The text states that the Gittins Index is defined for a Markovian setting of the MAB. Instead, the Gittins Index is defined also to non-Markovian settings, it is its calculation that so far has been devised only in a Markovian setting (both fully and partially observable). Pychron (talk) 08:49, 18 March 2016 (UTC)

Reward distributions

Reading this page and learning about bandits for the first time. I'm confused about the regret distributions  , as they related to the regret bounds stated later (for example   and regret bounds stated in terms of   and   in other articles). These bounds don't make sense unless the reward distributions are bounded - I strongly suspect in  . Is that correct? Should this bound be stated in the article? Even the notation   seems to vaguely imply this but I can't find it stated anywhere. — Preceding unsigned comment added by Emdeefive (talkcontribs) 13:06, 3 May 2016 (UTC)

UCB description

In my opinion, this article should give some basic overview of the UCB method. It mentions it in a couple of generalizations, but doesn't even explain what it is. Dlougach (talk) 13:29, 12 October 2016 (UTC)