Day 9: Multiple Linear Regression | 10 Days Of Statistics | HackerRank Solution

Hello coders, today we are going to solve Day 9: Multiple Linear Regression HackerRank Solution which is a Part of 10 Days Of Statistics Series.

Day 9: Multiple Linear Regression

Objective

In this challenge, we practice using multiple linear regression.

Task

Andrea has a simple equation:

Y = a + b1 * f1 + b1 * f2 + . . . bm * fm

for (m + 1) real constants (af1f2f3, . . . , fm ). We can say that the value of Y depends on m features. Andrea studies this equation for n different feature sets ( f1f2f3, . . . , fm ) and records each respective value of Y. If she has q new feature sets, can you help Andrea find the value of Y for each of the sets?

Note: You are not expected to account for bias and variance trade-offs.

Input Format

The first line contains 2 space-separated integers, m (the number of observed features) and n (the number of feature sets Andrea studied), respectively.
Each of the n subsequent lines contain m + 1 space-separated decimals; the first m elements are features  ( f1f2f3, . . . , fm ), and the last element is the value of Y for the line’s feature set.
The next line contains a single integer, q, denoting the number of feature sets Andrea wants to query for.
Each of the q subsequent lines contains m space-separated decimals describing the feature sets.

Constraints

  • 1 <= m <= 10
  • 5 <= n <= 100
  • 0 <= xi <= 1
  • 0 <= Y <= 106
  • 1 <= q <= 100

Scoring

For each feature set in one test case, we will compute the following:

  • d’i = |Compute value of Y – Expected value of Y| / Expected value of Y
  • d’i = max(d’i – 0.1, 0). We will permit up to a +10% margin of error.
  • si = max(1.0 – di, 0)

The normalized score for each test case will be: S = Σ si / q. If the challenge is worth C points, then your score will be S x C.

Output Format

For each of the q feature sets, print the value of Y on a new line (i.e., you must print a total of q lines).

Sample Input

2 7
0.18 0.89 109.85
1.0 0.26 155.72
0.92 0.11 137.66
0.07 0.37 76.17
0.85 0.16 139.75
0.99 0.41 162.6
0.87 0.47 151.77
4
0.49 0.18
0.57 0.83
0.56 0.64
0.76 0.18

Sample Output

105.22
142.68
132.94
129.71

Explanation

We’re given m = 2, so Y = a + b1 * f1 + b2 * f2. We’re also given n = 7, so we determine that Andrea studied the following feature sets:

  • a + 0.18 * b1 + 0.89 * b2 = 109.85
  • a + 1.0 * b1 + 0.26 * b2 = 155.72
  • a + 0.92 * b1 + 0.11 * b2 = 137.66
  • a + 0.07 * b1 + 0.37 * b2 = 76.17
  • a + 0.85 * b1 + 0.16 * b2 = 139.75
  • a + 0.99 * b1 + 0.41 * b2 = 162.6
  • a + 0.87 * b1 + 0.47 * b2 = 151.77

We use the information above to find the values of ab1, and b2. Then, we find the value of Y for each of the q feature sets.

Solution – Day 9: Multiple Linear Regression

Python

from sklearn import linear_model

first = list(map(int, str.split(input(), " ")))
m, n = first[0], first[1]

data = [list(float(x) for x in input().split()) for i in range(n)]

x = [[item[i] for i in range(m)] for item in data]
y = [item[-1] for item in data]
lm = linear_model.LinearRegression()
lm.fit(x, y)
a = lm.intercept_
b = lm.coef_

for i in range(int(input())):
    data = list(map(float, input().split()))
    ans = [b[j]*data[j] for j in range(m)]
    print(a+sum(ans))

Disclaimer: The above Problem (Day 9: Multiple Linear Regression) is generated by Hacker Rank but the Solution is Provided by CodingBroz. This tutorial is only for Educational and Learning Purpose.

Leave a Comment

Your email address will not be published. Required fields are marked *