10 Days of Statistics
Summary: solving some problems about statistics in 10 days. The problems are on HackerRank.
10 Days of Statistics
Check this blog on github.
- 10 Days of Statistics
- Day 0: Mean, Median, and Mode
- Day 0: Weighted Mean
- Day 1: Quartiles
- Day 1: Interquartile Range
- Day 1: Standard Deviation
- Day 2: Basic Probability
- Day 2: More Dice
- Day 2: Compound Event Probability
- Day 3: Conditional Probability
- Day 3: Cards of the Same Suit
- Day 3: Drawing Marbles
- Day 4: Binomial Distribution I
- Day 4: Binomial Distribution II
- Day 4: Geometric Distribution I
- Day 4: Geometric Distribution II
- Day 5: Poisson Distribution I
- Day 5: Poisson Distribution II
- Day 5: Normal Distribution I
- Day 5: Normal Distribution II
- Day 6: The Central Limit Theorem I
- Day 6: The Central Limit Theorem II
- Day 6: The Central Limit Theorem III
- Day 7: Pearson Correlation Coefficient I
- Day 7: Spearman’s Rank Correlation Coefficient
- Day 8: Least Square Regression Line
- Day 8: Pearson Correlation Coefficient II
- Day 9: Multiple Linear Regression
Day 0: Mean, Median, and Mode
https://www.hackerrank.com/challenges/s10-basic-statistics/problem
# Enter your code here. Read input from STDIN. Print output to STDOUT
def getMean(arr):
len_arr=len(arr)
acc = 0
for i in arr:
acc += i
result = acc / len_arr
return result
def getMedian(arr):
len_arr=len(arr)
arr = sorted(arr)
if (len_arr%2==1):
return arr[(len_arr)/2]
else:
return (arr[len_arr/2-1]+arr[len_arr/2])/2
def getMode(arr):
len_arr=len(arr)
arr=sorted(arr)
count={}
result=None
for i in arr:
if i in count:
count[i] +=1
else:
count[i]=1
if (result is None) or (count[i]>count[result]):
result=i
elif (count[i]==count[result]) and (i<result):
result=i
return result
n=int(input())
x=[int(val) for val in input().split()]
print(getMean(x))
print(getMedian(x))
print(getMode(x))
Day 0: Weighted Mean
https://www.hackerrank.com/challenges/s10-weighted-mean/problem?h_r=next-challenge&h_v=zen
def weighted_mean(values, weight):
numerator=0
denominator=0
for idx in range(len(values)):
numerator += values[idx]*weight[idx]
denominator += weight[idx]
return numerator/denominator
n=int(input())
x=[int(val) for val in input().split()]
w=[int(weit) for weit in input().split()]
print(round(weighted_mean(x,w),1))
Day 1: Quartiles
https://www.hackerrank.com/challenges/s10-quartiles/problem
# Enter your code here. Read input from STDIN. Print output to STDOUT
#Day 1: Quartiles
def getMedian(arr, start_index, end_index):
len_index = end_index - start_index + 1
m = start_index + len_index//2
med = 0
if (len_index%2) == 0:
med = (arr[m-1] + arr[m])/2
else:
med = arr[m]
return med
def getQuantile(arr):
arr = sorted(arr)
len_arr = len(arr)
m = len_arr//2
q2 = getMedian(arr, 0, len_arr-1)
if (len_arr%2) == 0:
q1 = getMedian(arr, 0, m-1)
q3 = getMedian(arr, m, len_arr-1)
else:
q1 = getMedian(arr, 0, m-1)
q3 = getMedian(arr, m+1, len_arr-1)
return q1, q2, q3
n = int(input())
x = [int(i) for i in input().split()]
result = map('{:g}'.format, getQuantile(x))
print(*result, sep="\n")
# q1, q2, q3 = getQuantile(x)
# print(q1)
# print(q2)
# print(q3)
Day 1: Interquartile Range
https://www.hackerrank.com/challenges/s10-interquartile-range/problem
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Day 1: Interquartile Range
def getMedian(arr, start_index, end_index):
len_index = end_index - start_index + 1
m = start_index + len_index//2
med = 0
if (len_index%2) == 0:
med = (arr[m-1] + arr[m])/2
else:
med = arr[m]
return med
def getQuantile(arr):
arr = sorted(arr)
len_arr = len(arr)
m = len_arr//2
q2 = getMedian(arr, 0, len_arr-1)
if (len_arr%2) == 0:
q1 = getMedian(arr, 0, m-1)
q3 = getMedian(arr, m, len_arr-1)
else:
q1 = getMedian(arr, 0, m-1)
q3 = getMedian(arr, m+1, len_arr-1)
return q1, q2, q3
def interquartile_range(x, f):
len_x = len(x)
s = []
for i in range(len_x):
x_ele = x[i]
f_ele = f[i]
for j in range(f_ele):
s.append(x_ele)
#s.append([x_ele] * f_ele)
s = sorted(s)
qt = getQuantile(s)
return float(qt[2] - qt[0])
n = int(input())
x = [int(i) for i in input().split()]
f = [int(i) for i in input().split()]
print(interquartile_range(x, f))
Day 1: Standard Deviation
https://www.hackerrank.com/challenges/s10-standard-deviation/problem?h_r=next-challenge&h_v=zen
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Day 1: Standard Deviation
def std_dev(arr):
len_arr = len(arr)
mean = sum(arr)/len_arr
result = 0
for val in arr:
result += (val - mean)**2
result = (result/len_arr)**0.5
return result
n = int(input())
x = [int(token) for token in input().split()]
print(round(std_dev(x), 1))
Day 2: Basic Probability
https://www.hackerrank.com/challenges/s10-mcq-1/problem
Task:
In a single toss of 2 fair (evenly-weighted) six-sided dice, find the probability that their sum will be at most 9.
Solution:
- scenarios summation equal 10: (4,6), (6,4), (5,5)
- scenarios summation equal 11: (5,6), (6,5)
- scenarios summation equal 12: (6,6)
answer: 1-(3+2+1)/(36) = 30/36 = 5/6
Day 2: More Dice
https://www.hackerrank.com/challenges/s10-mcq-2/problem
Task:
In a single toss of 2 fair (evenly-weighted) six-sided dice, find the probability that the values rolled by each die will be different and the two dice have a sum of 6.
Solution:
(1,5), (5,1), (2,4), (4,2)
4/36 = 1/9
Day 2: Compound Event Probability
https://www.hackerrank.com/challenges/s10-mcq-3/problem
Task:
There are 3 urns labeled X, Y, and Z.
- Urn X contains 4 red balls and 3 black balls.
- Urn Y contains 5 red balls and 4 black balls.
- Urn Z contains 4 red balls and 4 black balls.
One ball is drawn from each of the 3 urns. What is the probability that, of the 3 balls drawn, 2 are red and 1 is black?
Solution:
p(red) | p(black) | |
---|---|---|
X | 4/7 | 3/7 |
Y | 5/9 | 4/9 |
Z | 1/2 | 1/2 |
XYZ -> red,red,black -> 4/7 * 5/9 * 1/2
XYZ -> red,black,red -> 4/7 * 4/9 * 1/2
XYZ -> black,red,red -> 3/7 * 5/9 * 1/2
sum up is 17/42
Day 3: Conditional Probability
https://www.hackerrank.com/challenges/s10-mcq-4/problem
Task:
Suppose a family has 2 children, one of which is a boy. What is the probability that both children are boys?
Solution:
P(BB|B) = BB/(BG+GB+BB) = 1/3
Day 3: Cards of the Same Suit
https://www.hackerrank.com/challenges/s10-mcq-5/problem
Task:
You draw 2 cards from a standard 52-card deck without replacing them. What is the probability that both cards are of the same suit?
Solution:
Total choices C(52,2): $C_{52}^{2}$;
one of four suits: C(4,1);
two of same suit: C(13,2);
answer: C(4,1) * C(13,2) / C(52,2) = 12/51
Day 3: Drawing Marbles
https://www.hackerrank.com/challenges/s10-mcq-6/problem
Task:
A bag contains 3 red marbles and 4 blue marbles. Then, 2 marbles are drawn from the bag, at random, without replacement. If the first marble drawn is red, what is the probability that the second marble is blue?
Solution:
After 1 drawing, there are total 6 marbles, 4 blue marbles, the probability that the second marble is blue is 4/6 = 2/3.
Day 4: Binomial Distribution I
https://www.hackerrank.com/challenges/s10-binomial-distribution-1/problem
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Day 4: Binomial Distribution I
def factorial(n):
if n==0 or n==1:
return 1
else:
return n*factorial(n-1)
def nCr(n,r):
return factorial(n)/(factorial(r) * factorial(n-r))
n1, n2 = list(map(float, input().split(" ")))
p=n1/(n1+n2)
q=n2/(n1+n2)
sum1=0
for i in range(3,7):
temp=nCr(6,i)
temp1=(p**i)*(q**(6-i))
sum1+= temp*temp1
print(round(sum1,3))
Day 4: Binomial Distribution II
https://www.hackerrank.com/challenges/s10-binomial-distribution-2/problem
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Day 4: Binomial Distribution II
def factorial(n):
if n==0 or n==1:
return 1
else:
return n*factorial(n-1)
def binomial(n,r,p):
return (factorial(n)/(factorial(r) * factorial(n-r)))*p**r*(1-p)**(n-r)
p, N = list(map(int, input().split(" ")))
p=p/100
# sum1=0
# for i in range(0,3):
# sum1 += binomial(N,i,p)
# sum2=0
# for i in range(2,N+1):
# sum2 += binomial(N,i,p)
# print(round(sum1,3))
# print(round(sum2,3))
print(round(sum([binomial(N,r,p) for r in range(0,3)]),3))
print(round(sum([binomial(N,r,p) for r in range(2,N+1)]),3))
Day 4: Geometric Distribution I
https://www.hackerrank.com/challenges/s10-geometric-distribution-1/problem
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Day 4: Geometric Distribution I
numerator, denominator = list(map(float, input().split(" ")))
n=int(input())
p=numerator/denominator
result = ((1-p)**(n-1))*p
print(round(result, 3))
Day 4: Geometric Distribution II
https://www.hackerrank.com/challenges/s10-geometric-distribution-2/problem
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Day 4: Geometric Distribution II
numerator, denominator = list(map(float, input().split(" ")))
n=int(input())
p=numerator/denominator
result=0
for i in range(1,n+1):
result += (1-p)**(i-1)*p
print(round(result, 3))
Day 5: Poisson Distribution I
https://www.hackerrank.com/challenges/s10-poisson-distribution-1/tutorial
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Day 5: Poisson Distribution I
from math import exp, factorial
lamd = float(input())
x= int(input())
pois=((lamd**x)*exp(-lamd))/factorial(x)
print(round(pois, 3))
Day 5: Poisson Distribution II
https://www.hackerrank.com/challenges/s10-poisson-distribution-2/problem
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Day 5: Poisson Distribution II
meanA, meanB = list(map(float, input().split(" ")))
Ca = 160 + 40*(meanA + meanA**2)
Cb = 128 + 40*(meanB + meanB**2)
print(round(Ca, 3))
print(round(Cb, 3))
Day 5: Normal Distribution I
https://www.hackerrank.com/challenges/s10-normal-distribution-1/problem
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Day 5: Normal Distribution I
import math
mean, sd = list(map(float, input().split(" ")))
y = float(input())
y1, y2 = list(map(float, input().split(" ")))
def cumm_norm_prob(mean, sd, x):
return 0.5*(1 + math.erf((x-mean)/(sd * (2**0.5))))
print(round(cumm_norm_prob(mean, sd, y), 3))
print(round(cumm_norm_prob(mean, sd, y2)-cumm_norm_prob(mean, sd, y1), 3))
Day 5: Normal Distribution II
https://www.hackerrank.com/challenges/s10-normal-distribution-2/problem
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Day 5: Normal Distribution II
import math
mean, sd = list(map(float, input().split(" ")))
y1 = float(input()) #80
y2 = float(input()) #60
def cumm_norm_prob(mean, sd, x):
return 0.5*(1 + math.erf((x-mean)/(sd * (2**0.5))))
print(round((1-cumm_norm_prob(mean, sd, y1))*100, 2))
print(round((1-cumm_norm_prob(mean, sd, y2))*100, 2))
print(round(cumm_norm_prob(mean, sd, y2)*100, 2))
Day 6: The Central Limit Theorem I
https://www.hackerrank.com/challenges/s10-the-central-limit-theorem-1/problem
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Day 6: The Central Limit Theorem I
import math
max_weight, n, mean, sd = input(),input(),input(),input()
def cumm_norm_prob(max_weight, n, mean, sd):
new_mean = n * mean
new_sd = n**0.5 * sd
return round(0.5*(1 + math.erf((max_weight-new_mean)/(new_sd * (2**0.5)))), 4)
print(cumm_norm_prob(max_weight, n, mean, sd))
Day 6: The Central Limit Theorem II
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Day 6: The Central Limit Theorem II
import math
all_tickets, student, mean, sd = input(),input(),input(),input()
def cumm_norm_prob(all_tickets, student, mean, sd):
new_mean = student * mean
new_sd = student**0.5 * sd
return round(0.5*(1 + math.erf((all_tickets-new_mean)/(new_sd * (2**0.5)))), 4)
print(cumm_norm_prob(all_tickets, student, mean, sd))
Day 6: The Central Limit Theorem III
Learn Z-Score: Definition, Formula and Calculation.
Z-score:
$z = \dfrac{x-\mu}{\sigma / \sqrt(n)}$
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Day 6: The Central Limit Theorem III
n, mean, sd, per, z_score = input(),input(),input(),input(),input()
sd_error = sd / n**0.5
print(round(mean - sd_error * z_score, 2))
print(round(mean + sd_error * z_score, 2))
Day 7: Pearson Correlation Coefficient I
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Day 7: Pearson Correlation Coefficient I
n = int(input())
x_arr = [float(i) for i in input().split()]
y_arr = [float(i) for i in input().split()]
def mean(x):
return sum(x)/len(x)
def sd(x, x_mean):
data = [(val - x_mean)**2 for val in x]
return (sum(data) / len(data))**0.5
def pearson_corr(n, x_arr, y_arr):
x_mean = mean(x_arr)
x_sd = sd(x_arr, x_mean)
y_mean = mean(y_arr)
y_sd = sd(y_arr, y_mean)
numerator = sum((x_arr[i]-x_mean) * (y_arr[i]-y_mean) for i in range(n))
denominator = n * x_sd * y_sd
return round(numerator/denominator, 3)
print(pearson_corr(n, x_arr, y_arr))
Day 7: Spearman’s Rank Correlation Coefficient
https://www.hackerrank.com/challenges/s10-spearman-rank-correlation-coefficient/problem
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Day 7: Spearman's Rank Correlation Coefficient
n = int(input())
x = [float(i) for i in input().split()]
y = [float(i) for i in input().split()]
def rank(arr):
rank_of_arr = {}
arr_sorted = sorted(arr)
for i in arr:
rank_of_arr[i] = arr_sorted.index(i) + 1
return rank_of_arr
def spearman_rank_cc(n, x, y):
rank_of_x = rank(x)
rank_of_y = rank(y)
d_sqr = []
for i in range(n):
temp = (rank_of_x[x[i]] - rank_of_y[y[i]])**2
d_sqr.append(temp)
numerator = 6*sum(d_sqr)
denominator = n*(n**2-1)
return round(1-numerator/denominator, 3)
print(spearman_rank_cc(n, x, y))
Day 8: Least Square Regression Line
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Day 8: Least Square Regression Line
maths, stats = [], []
for i in range(5):
m, s = map(int, input().split())
maths.append(m)
stats.append(s)
def LSR(x,y):
n = len(x)
xy = [x[i]*y[i] for i in range(n)]
x_sqr = [i**2 for i in x]
numerator = n*sum(xy) - sum(x)*sum(y)
denominator = n*sum(x_sqr) - sum(x)**2
b = numerator/denominator
a = sum(y)/n - b*sum(x)/n
return a, b
a, b = LSR(maths, stats)
print(round(a+b*80, 3))
Day 8: Pearson Correlation Coefficient II
Task:
The regression line of y on x is 3x+4y+8=0, and the regression line of x on y is 4x+3y+7=0. What is the value of the Pearson correlation coefficient?
Solution:
Rewrite the 2 lines as:
y = -2 + (-3/4) * x
x = -7/4 + (-3/4) * y
as we know: $b=\rho \cdot \dfrac{\sigma_Y}{\sigma_X}$
so $\rho= b \cdot \dfrac{\sigma_X}{\sigma_Y}$
Pearson cc p equal:
- p = b1(x_std/y_std)
- p = b2(y_std/x_std)
multiply these 2 equations:
p^2 = 9/16
The answer is -3/4.
Day 9: Multiple Linear Regression
https://www.hackerrank.com/challenges/s10-multiple-linear-regression/problem
# Enter your code here. Read input from STDIN. Print output to STDOUT
# Day 9: Multiple Linear Regression
from sklearn.linear_model import LinearRegression
import numpy as np
m, n = map(int, input().split())
x, y = [], []
for i in range(n):
xy_val = map(float, input().split())
x.append(xy_val[:-1])
y.append(xy_val[-1])
lm = LinearRegression()
lm.fit(x, y)
# q = int(input())
# test = []
# for i in range(q):
# vals = map(float, input().split())
# test.append(vals)
# pred = lm.predict(test)
# for i in pred:
# print(i)
a = lm.intercept_
b = lm.coef_
for _ in range(int(input())):
f = np.array(input().split(), np.float)
y = a + np.sum(f * b)
print(np.ceil(y * 100) / 100)
Comments