Hello coders, today we are going to solve Day 9: Multiple Linear Regression HackerRank Solution which is a Part of 10 Days Of Statistics Series.
Objective
In this challenge, we practice using multiple linear regression.
Task
Andrea has a simple equation:
Y = a + b1 * f1 + b1 * f2 + . . . bm * fm
for (m + 1) real constants (a, f1, f2, f3, . . . , fm ). We can say that the value of Y depends on m features. Andrea studies this equation for n different feature sets ( f1, f2, f3, . . . , fm ) and records each respective value of Y. If she has q new feature sets, can you help Andrea find the value of Y for each of the sets?
Note: You are not expected to account for bias and variance trade-offs.
Input Format
The first line contains 2 space-separated integers, m (the number of observed features) and n (the number of feature sets Andrea studied), respectively.
Each of the n subsequent lines contain m + 1 space-separated decimals; the first m elements are features ( f1, f2, f3, . . . , fm ), and the last element is the value of Y for the line’s feature set.
The next line contains a single integer, q, denoting the number of feature sets Andrea wants to query for.
Each of the q subsequent lines contains m space-separated decimals describing the feature sets.
Constraints
- 1 <= m <= 10
- 5 <= n <= 100
- 0 <= xi <= 1
- 0 <= Y <= 106
- 1 <= q <= 100
Scoring
For each feature set in one test case, we will compute the following:
- d’i = |Compute value of Y – Expected value of Y| / Expected value of Y
- d’i = max(d’i – 0.1, 0). We will permit up to a +10% margin of error.
- si = max(1.0 – di, 0)
The normalized score for each test case will be: S = Σ si / q. If the challenge is worth C points, then your score will be S x C.
Output Format
For each of the q feature sets, print the value of Y on a new line (i.e., you must print a total of q lines).
Sample Input
2 7
0.18 0.89 109.85
1.0 0.26 155.72
0.92 0.11 137.66
0.07 0.37 76.17
0.85 0.16 139.75
0.99 0.41 162.6
0.87 0.47 151.77
4
0.49 0.18
0.57 0.83
0.56 0.64
0.76 0.18
Sample Output
105.22
142.68
132.94
129.71
Explanation
We’re given m = 2, so Y = a + b1 * f1 + b2 * f2. We’re also given n = 7, so we determine that Andrea studied the following feature sets:
- a + 0.18 * b1 + 0.89 * b2 = 109.85
- a + 1.0 * b1 + 0.26 * b2 = 155.72
- a + 0.92 * b1 + 0.11 * b2 = 137.66
- a + 0.07 * b1 + 0.37 * b2 = 76.17
- a + 0.85 * b1 + 0.16 * b2 = 139.75
- a + 0.99 * b1 + 0.41 * b2 = 162.6
- a + 0.87 * b1 + 0.47 * b2 = 151.77
We use the information above to find the values of a, b1, and b2. Then, we find the value of Y for each of the q feature sets.
Solution – Day 9: Multiple Linear Regression
Python
from sklearn import linear_model first = list(map(int, str.split(input(), " "))) m, n = first[0], first[1] data = [list(float(x) for x in input().split()) for i in range(n)] x = [[item[i] for i in range(m)] for item in data] y = [item[-1] for item in data] lm = linear_model.LinearRegression() lm.fit(x, y) a = lm.intercept_ b = lm.coef_ for i in range(int(input())): data = list(map(float, input().split())) ans = [b[j]*data[j] for j in range(m)] print(a+sum(ans))
Disclaimer: The above Problem (Day 9: Multiple Linear Regression) is generated by Hacker Rank but the Solution is Provided by CodingBroz. This tutorial is only for Educational and Learning Purpose.