Let's do a project on an interesting statistical problem named Birthday paradox. We go to the era where renaissance took place and randomly select famous artists, scientists, mathematicians. In this project, we will mathematically prove the birthday paradox theory in a fun way. In the process, you will also learn about renaissance folks and their contribution to humanity.
The objective of this project is to prove that with at least 23 people randomly selected, we can achieve a 50-50 chance that at least 2 people share their birthday. And when we increase the list of 75 people, chances are 99.9%.
Introduction:
According to Wikipedia, The birthday paradox also known as the birthday problem - states that in a random group of 23 people, there is about a 50% chance that two people have the same birthday. In a room of 75, there’s even a 99.9% chance of two people matching. By the pigeonhole principle, the probability reaches 100% when the number of people reaches 367.
Some trivia:
Real-world applications for the birthday problem include a cryptographic attack called the birthday attack, which uses this probabilistic model to reduce the complexity of finding a collision for a hash function, as well as calculating the approximate risk of a hash collision existing within the hashes of a given size of the population.
The result has been attributed to Harold Davenport; however, a version of what is considered today to be the birthday problem was proposed earlier by Richard von Mises.
Before starting the project, let's go through some basics.
Let's calculate the probability of 2 people sharing the same birthday.
Suppose we have person X & person Y. For brevity, we can ignore leap years.
If we ask Person X first, then he can be born on any day of the year. So his probability is 1 (365/365). Coming back to person Y, he must be born on the same day as X, his probability is 1/365. We want both events to occur, let's multiply their probabilities for the combined occurrence.
Total Probability of 2 people sharing the same birthday = (365/365)*(1/365) = 0.002739
Using complement theory, now let's calculate the probability of no one is sharing same birthday. We know the probability of at least 2 people sharing their birthday and probability of no one sharing their birthday covers almost all possible cases.
Hence the sum of these 2 probabilities = 1 We can rewrite below formula as:
P(at least 2 people shares birthday ) =
1 - P( no one sharing their birthday)
Requirement:
Notepad
Pen
Graph Paper
Computer with Internet Access
Steps:
1. Create a dataset i.e. a simple spreadsheet or a CSV file. Use Google to study different scientists, artists, and mathematicians from the Renaissance period and note down your favorite people in the below format.
Artist Name, Birthday
Leonardo da Vinci, Apr 15
Nicolaus Copernicus, Feb 19
and so on......
2. Let's work on an example of 30 artists: Randomly select 30 such artists with their birthdays.
3. Calculate the probability of at least 2 artists having the same birthday.
P (1st artist birthday) = 365/365
P (2nd artist birthday) = 364/365
P (3rd artist birthday) = 363/365
P (4th artist birthday) = 362/365
.
.
P (28th artist birthday) = 337/365
P (29th artist birthday) = 336/365
P (no one shares the same birthday) =. (365/365)(364/365)(363/365)(362/365)......(336/365)
We can rewrite the above series using factorial, so it will be
P (no one shares the same birthday) = 365! / (335! * 365^30) = 0.29
So, according to top derivation:
P ( at least 2 artists sharing same birthday) = 1 - 0.29 = 0.71
5, Repeat the procedure for at least 5-10 sets of randomly selected 23 artists and note down the probability.
6. Repeat the procedure for at least 5-10 sets of randomly selected 75 artists and note down the probability.
7. Repeat for the procedure and find the probabilities for below random samples:
0 renaissance figures
10 renaissance figures
20 renaissance figures
30 renaissance figures and so on to 120 renaissance figures
8. Compute the probability of at least two people sharing a birthday versus the number
of people. Plot the probability distribution graph for 120 renaissance figures. Take a
look at this graph example. You will find that after 75 people, the probability remains unchanged.
Image from Wikipedia
9. Create a detailed analysis report. Make it interesting by adding context to Renaissance
figures and their inventions and discoveries.
10. Find out the pairs sharing their birthdays. Share with us, if there is an interesting fact
attached.
Comments