Guess the M&Ms: a data analysis

by John Browne, on Nov 25, 2019 10:55:18 AM

Microsoft Ignite is the largest (in attendance) Microsoft event; we were there with a big jar of Peanut M&Ms (r).



You can see it here a little better:



And you can also see that Keri (left) is actually paying attention and DeeDee (right) is looking extremely happy. Why?

Because M&Ms.

The candy was there for a contest: guess the number of peanut M&Ms in the jar to win a kick-ass 49-inch (124.46 cm), 4K, 120 Hz refresh, gaming monitor. Closest guess to actual number without going over wins. Guess the number of red M&Ms for a shot at a drone.

It was a good contest, and a good idea, with one potential fatal flaw:

DeeDee Walsh (aka @ddskier on Twitter).

To say DeeDee has a thing for candy is like saying Saudi Arabia has a thing for oil. Imelda Marcos and shoes. Me and...well, let's not get into that, ok?

The fact is, once that jar of M&Ms appeared, DeeDee spent the next 4 days contemplating how to get into it to eat the M&Ms. I mean, look here:



Zoomed in:



That is the face of someone trying their best to not look at many many peanut M&Ms.



Dangerous, amiright?

So it was a big event.



And that's just the line for the men's room.

Lots of people stopped by to guess the number of M&Ms.






People wrote their guesses (total M&Ms and just red M&Ms) on slips of paper and put them in a fishbowl.

Then the counting began:



We sorted them by color first, then counted the total of each color.



The counting game

Throughout the process we had to restrain DeeDee from eating them before they could be counted. I won't equate this with getting Neil Armstrong onto the lunar surface, but it wasn't an easy mission either. Here's what the color distribution looked like:


Remember our second-place prize (drone!) was for guessing the number of just the red M&Ms. You can see from our sample that the colors are not distributed evenly.* (NB getting that 20% wedge to be green and the 17% to be yellow etc. was a lot harder than it should be. Shame on you Excel.)

Well, at the end of the event we had 564 (readable) entries. Before we get to the total M&Ms, let's look at the question of how many red M&Ms there are/were. Being a nerd, I thought--looking at a box of contest entries--"Ooh! a data set!" and so of course I had to do some analysis.

Looking above at the jar of M&Ms, I think it's reasonable to assume most estimators started with their best guess of the total number of candies, then took a SWAG at the number of red ones by assuming some kind of ratio. If we take all 564 estimates of total M&Ms and just red ones, and express the red ones as a percentage of the total, here's what our entrants thought shown as a distribution histogram:

Image of graph percentages of how many red m&m's are in the jar

Now, given that I already showed you the actual ratio is 8%, it looks like the majority of the guesses are pretty good, that is, they fall between 5 and 10% red/total. Drilling down on just that region shows us this:

Image of graph percentages of how many red m&m's are in the jar

Discarding guesses less than 4% and greater than 10%, you can see--at least in this case--the wisdom of the crowds was a little off, since the most common guess was between 9 and 10%. 

Crowd sourcing 

Speaking of crowd wisdom, let me tell you about Colin, one of our entries (sorry man, you didn't win). Colin bet me that the crowd source estimate would be closer to the actual number than the actual winner would be. His idea was that if we took all the guesses and averaged them, that result would be closer to the truth than any one individual would actually get. So I agreed to check this out. Let's see what we find, shall we?

Here's a histogram of the guesses for the total count of candies in the jar:

Histogram of the guesses for the total count of candies in the jar

I capped this at 3000 because we had some wild guesses in excess of 12,000 and all those data points just clutter up the histogram. You can see there are two prominent spikes at around 500 and 700, and a slightly smaller one at around 800. Then it tapers off fairly consistently with another outlying spike at 1900. Let's dig into the data from, say 300 - 1500:

Image of distribution on guesses between 300 and 1,500 M&m's

Filtering our dataset on those parameters yield 353 guesses between 300 and 1500 with an average of 835. Here's an easier look: 

Image of distribution on guesses between 300 and 1,500 M&m's

If the crowd-sourced average is the most accurate, then we expect the actual count of all the M&Ms to be pretty close to 835, right? 

Because the actual count was 1260. Not only was the crowd-source guess wrong, it wasn't even close.

But somebody was. Andy Mikkalson guessed 1257 and beat out a couple of very close contenders to score the 49" gaming monitor. His guess on the red M&Ms was pretty close at 127 but Kim White and John Sophy nailed the exact number of 102, and so each will be getting a drone. 

Congratulations to our three winners, and come see us at next year's Ignite for another chance at something (TBD)!


*Naturally one Googles "color distribution in peanut M&Ms" and gets both graphs and articles. This is an interesting one; it appears our samples were manufactured in Tennessee, although our percentages don't match exactly. 






Subscribe to Mobilize.Net Blog