Day 9: Multiple Linear Regression | 10 Days Of Statistics

Hello coders, today we are going to solve Day 9: Multiple Linear Regression HackerRank Solution which is a Part of 10 Days Of Statistics Series.

Table of Contents

Objective

In this challenge, we practice using multiple linear regression.

Task

Andrea has a simple equation:

Y = a + b₁ * f₁ + b₁ * f₂ + . . . b_m * f_m

for (m + 1) real constants (a, f₁, f₂, f₃, . . . , f_m ). We can say that the value of Y depends on m features. Andrea studies this equation for n different feature sets ( f₁, f₂, f₃, . . . , f_m ) and records each respective value of Y. If she has q new feature sets, can you help Andrea find the value of Y for each of the sets?

Note: You are not expected to account for bias and variance trade-offs.

Input Format

The first line contains 2 space-separated integers, m (the number of observed features) and n (the number of feature sets Andrea studied), respectively.
Each of the n subsequent lines contain m + 1 space-separated decimals; the first m elements are features ( f₁, f₂, f₃, . . . , f_m ), and the last element is the value of Y for the line’s feature set.
The next line contains a single integer, q, denoting the number of feature sets Andrea wants to query for.
Each of the q subsequent lines contains m space-separated decimals describing the feature sets.

Constraints

1 <= m <= 10
5 <= n <= 100
0 <= x_i <= 1
0 <= Y <= 106
1 <= q <= 100

Scoring

For each feature set in one test case, we will compute the following:

d’_i = |Compute value of Y – Expected value of Y| / Expected value of Y
d’_i = max(d’_i – 0.1, 0). We will permit up to a +10% margin of error.
s_i = max(1.0 – d_i, 0)

The normalized score for each test case will be: S = Σ si / q. If the challenge is worth C points, then your score will be S x C.

Output Format

For each of the q feature sets, print the value of Y on a new line (i.e., you must print a total of q lines).

Sample Input

2 7
0.18 0.89 109.85
1.0 0.26 155.72
0.92 0.11 137.66
0.07 0.37 76.17
0.85 0.16 139.75
0.99 0.41 162.6
0.87 0.47 151.77
4
0.49 0.18
0.57 0.83
0.56 0.64
0.76 0.18

Sample Output

Explanation

We’re given m = 2, so Y = a + b₁ * f₁ + b₂ * f₂. We’re also given n = 7, so we determine that Andrea studied the following feature sets:

a + 0.18 * b1 + 0.89 * b2 = 109.85
a + 1.0 * b1 + 0.26 * b2 = 155.72
a + 0.92 * b1 + 0.11 * b2 = 137.66
a + 0.07 * b1 + 0.37 * b2 = 76.17
a + 0.85 * b1 + 0.16 * b2 = 139.75
a + 0.99 * b1 + 0.41 * b2 = 162.6
a + 0.87 * b1 + 0.47 * b2 = 151.77

We use the information above to find the values of a, b1, and b2. Then, we find the value of Y for each of the q feature sets.

Solution – Day 9: Multiple Linear Regression

Python

from sklearn import linear_model

first = list(map(int, str.split(input(), " ")))
m, n = first[0], first[1]

data = [list(float(x) for x in input().split()) for i in range(n)]

x = [[item[i] for i in range(m)] for item in data]
y = [item[-1] for item in data]
lm = linear_model.LinearRegression()
lm.fit(x, y)
a = lm.intercept_
b = lm.coef_

for i in range(int(input())):
    data = list(map(float, input().split()))
    ans = [b[j]*data[j] for j in range(m)]
    print(a+sum(ans))

Disclaimer: The above Problem (Day 9: Multiple Linear Regression) is generated by Hacker Rank but the Solution is Provided by CodingBroz. This tutorial is only for Educational and Learning Purpose.

Day 9: Multiple Linear Regression | 10 Days Of Statistics | HackerRank Solution

Objective

Task

Input Format

Constraints

Output Format

Solution – Day 9: Multiple Linear Regression

Python

Related

Leave a Comment Cancel Reply

CodingBroz

Pages