# xav bigdata homework 3

you need to have idea on python.

Initially summarize the questions attempted

Q1: ISLR Textbook chapter review questions

Chapter 3 – Linear Regression Review the following sections in Python

- Load Datasets
- 3.1 Simple Linear Regression
- 3.2 Multiple Linear Regression
- 3.3 Other Considerations in the Regression Model

Q2 Textbook Theory Questions 3.7 Exercises

1. Describe the null hypotheses to which the p-values given in Table 3.4 correspond. Explain what conclusions you can draw based on these p-values. Your explanation should be phrased in terms of sales, TV, radio, and newspaper, rather than in terms of the coefficients of the linear model.

Q3 Applied Textbook Questions with Python 3.7 Exercises

Hint â€“ several github sites have the complete solution in python e.g.

https://github.com/mscaudill/IntroStatLearn

https://botlnec.github.io/islp/

8. This question involves the use of simple linear regression on the Auto data set. (a) Perform a simple linear regression with mpg as the response and horsepower as the predictor. Print the results. Comment on the output. For example:

i. Is there a relationship between the predictor and the response?

ii. How strong is the relationship between the predictor and the response?

iii. Is the relationship between the predictor and the response positive or negative?

iv. Predictions

- What is the predicted mpg associated with a horsepower of 98? What are the associated 95 % confidence and prediction intervals?
- Plot the response and the predictor, display the least squares regression line.
- Produce diagnostic plots of the least squares regression fit. Comment on any problems you see with the fit.

9. This question involves the use of multiple linear regression on the Auto data set.

(a) Produce a scatterplot matrix which includes all of the variables in the data set.

(b) Compute the matrix of correlations between the variables.

(c) Perform a multiple linear regression with mpg as the response and all other variables except name as the predictors and print the results. Comment on the output. For instance:

i. Is there a relationship between the predictors and the response?

ii. Which predictors appear to have a statistically significant relationship to the response?

iii. What does the coefficient for the year variable suggest?

(d) Produce diagnostic plots of the linear regression fit. Comment on any problems you see with the fit. Do the residual plots suggest any unusually large outliers? Does the leverage plot identify any observations with unusually high leverage?

(e) Fit linear regression models with interaction effects. Do any interactions appear to be statistically significant?

(f) Try a few different transformations of the variables, such as log(X), âˆš X, X2. Comment on your findings.

Opional

15. This problem involves the Boston data set, which we saw in the lab for this chapter. We will now try to predict per capita crime rate using the other variables in this data set. In other words, per capita crime rate is the response, and the other variables are the predictors.

(a) For each predictor, fit a simple linear regression model to predict the response. Describe your results. In which of the models is there a statistically significant association between the predictor and the response? Create some plots to back up your assertions.

(b) Fit a multiple regression model to predict the response using all of the predictors. Describe your results. For which predictors can we reject the null hypothesis H0 : Î²j = 0?

(c) How do your results from (a) compare to your results from (b)? Create a plot displaying the univariate regression coefficients from (a) on the x-axis, and the multiple regression coefficients from (b) on the y-axis. That is, each predictor is displayed as a single point in the plot. Its coefficient in a simple linear regression model is shown on the x-axis, and its coefficient estimate in the multiple linear regression model is shown on the y-axis.

(d) Is there evidence of non-linear association between any of the predictors and the response? To answer this question, for each predictor X, fit a model of the form Y = Î²0 + Î²1X + Î²2X2 + Î²3X3 + .

Credit.csv

Boston.csv

Auto.csv

Advertising.csv

HW03.docx

link for textbook

https://learning.oreilly.com/library/view/data-science-and/9781118876138/