The assignment is due on April 24, 2020 by 17:00
Submit your assignment via Blackboard in a word document1

Part I: Conceptual questions

  1. Understanding Linear Regression: In class, we discussed the idea behind a linear regression analysis. An OLS estimation is a linear fit to the data.

a What does this fit mean?

b. And, why do we use a linear fit, not other nonlinear options?

c. What do we mean by the parameters of a regression model, and how do we estimate these parameters? tip: Remember why we discussed the error terms!

  1. Figure shows the association between the number of Corona death cases (log) and population size (log) across the world until Apr 15.
\label{fig:OLS_1}

In class, I introduced the linear form of a bivariate regression as follow:

\[\begin{equation} y=\beta_0+\beta_1x+\epsilon \end{equation}\]

a. According to the provided information in the plot, explain what are \(y\) and \(x\).

b. Also, use the provided information and give an approximation of \(\beta_0\) and \(\beta_1\). Explain how you reached these numbers.

c. Now, I limit the data to only the countries with a relatively large population (more than 15 million) and repeated the above analysis. The new results are reported in Figure . How do the estimated parameters of \(\beta_0\) and \(\beta_1\) change? According to the fitted models in Figure and , do you think the effect of population on the number of deaths cases is different between small and large countries? Explain your answer.

\label{fig:OLS_2}

Part II: \(\mathcal{R}\) and GitHub questions

  1. In addition to regression analysis, \(\mathcal{R}\) help can help you to automize some of the work procedures that have a logic and needs to be repeated. This is a very helpful feature that can make our life easier, as you will see later. There are different methods to do it, but one of the most well-known one is the \({\tt for}\) loop. You can read more about the \({\tt for}\) loop here.

Below, I use the \({\tt for}\) loop to print all the years of the first two decades of 21st century.

Years=seq(2000, 2020, by=1)

for (i in Years){
  
  print(i)
}

Now, you use a for loop to say hello to 5 of your friends one by one! Feel free to Google how it can be done!

  1. GitHub project is a popular cloud platform among software engineers and scholars to document their projects and collaborate on them. As a free cloud-based repository, you also can use it for documenting your projects. For example, I would never ask you to submit your data set to me, instead, I will teach how read the data from the cloud repositories such as GitHub and Dropbox.

a. Watch this video and create a GitHub account.

b. Watch this video. You do not need to setup a desktop version, so ignore that part of the video. Now, create a new repository and label it “LU_QA_2020”.

c. Download the Corona virus dataset that we analyzed in question 2 from the below dropbox link: here. Create a new folder under “LU_QA_2020” and name it “Datasets”. Upload this corona dataset to your “Datasets” folder.

As the answer to this question, share with me the link to your GitHub profile. For example, mine is: https://github.com/babakrezaee. I expect to see a GitHub page, with a repository named “LU_QA_2020”, and with a folder named “Datasets”, which includes the Corona dataset.


  1. You will submit your following assignment as an \(\mathcal{R}\) markdown file, after we covered it in our next \(\mathcal{R}\) workshop.