In this post, I am looking at a bins and balls variation of the classic problem. The analysis may seem high-school level for some but I feel it is a cute and small problem for my first post. So the problem is this:

You have $ { k}$ balls and $ { n }$ bins. $ { k \leq n }$ . If you throw the balls in bins uniformly at random, how many bins will have at least one ball? In other words, how many bins will remain empty.

Let $ { X }$ be the random variable indicating the number of bins with at least one ball. Consider indicator random variable $ {X_i}$ as follows,

$ \displaystyle X_i = \begin{cases} 1 & \text{if } i^{th} \text{bin has at least one ball} \\ 0 & \text{if } \text{otherwise} \end{cases} $

Therefore, we can express $ {X}$ as the sum of indicator random variables.

$ {X = \sum_{i=1}^{n} X_i}$

Taking expectation, $ { E[X] = \sum_{i=1}^{n} E[X_i] }$, which basically means we now have to find out the the probability that a bin $ {i}$ has at least one ball. This is equal to, 1 - probability that bin i gets no ball in all the $ {k}$ trials which is,

$ {1 - (1- \frac{1}{n} )^k}$

There is an interesting inequality which says that

$ { (1 - \frac{1}{n})^{n-1} \geq e^{-1} }$

Using this inequality, we can claim that the probability that bin $ {i}$ has at least one ball is greater than $ { 1 - e^{ \frac{-k}{n-1} } }$. Using this value in the expectation expression earlier, the expected number of bins with at least one ball is

$ {n( 1 - e^{ \frac{-k}{n-1} } ) }$

Now this wasn’t too difficult right? So we make the problem a bit more interesting and ask, suppose you have $ {n}$ unique marbles in a box and you get to pick $ {k}$ of them uniformly at random with replacement. When you draw these out, you colour each of them red, so that you know they have been drawn at least once. You keep repeating this experiment till all marbles are colored red with high probability. What is the number of times you repeat this experiment?

By the above calculation, we know now that when you pick $ {k}$ marbles in any trial, you are actually picking just $ {n( 1 - e^{ \frac{-k}{n-1} } ) }$ distinct marbles. Now suppose that, there is one marble, lets name it $ {p}$. What is the probability that the $ {p}$ will be coloured red in one trial….that will be greater than,

$ {\frac{n( 1 - e^{ \frac{-k}{n-1} } )}{n}}$

Therefore, the probability of not selecting $ {p}$ in one trial is,

$ {1 - ( 1 - e^{ \frac{-k}{n-1} }) = e^{ \frac{-k}{n-1} }}$

Suppose, you decide to conduct these trials for $ { \frac{(n-1) \log n}{k} }$. Then the probability that $ {p}$ is not selected in any of the trials is

$ { e^{ \frac{-k}{n-1} \cdot \frac{(n-1) \log n}{k}} = \frac{1}{n}}$

So basically, we have proven that the probability that any marble $ {p}$ is not coloured red for $ { \frac{(n-1) \log n}{k} }$ consecutive rounds is extremely small. So there it is, if you perform the trials for $ { O( \frac{n \log n}{k} )}$ rounds, with high probability every node will be coloured red.

This is all for today folks. Please let me know if there is some mistake or any suggestion of doing it in a better way.

Updated: