Linear Regression Model

  1. \(Y = linear(X)\)
  2. Allow for discrepancies \(Y = \beta_1 + \beta_2X + u\)
    • \(\hat \beta\) is the estimate for \(\beta\)
    • \(u\) is the “disturbance terms” or “terms of error”: gap between actual values and the true regression line (that we are trying to estimate).
    • \(\hat u\) (or \(e\)) are called residuals, gap between the actual values and the fitted values of Y
  3. [image] (pdf)
    (Dagnelie, p. 3)
    • In each observation \(X_1\), the actual value of \(Y\) = \(\hat Y\) + \(\hat u\)
  4. Fitted, actual and residual graph.
  5. Conditional mean intuition
  6. Matrix form of the estimation
  7. Ordinary Least Squares.
    • Sum of Sq. of residuals \(SSR(\hat\beta)=\hat{u}'\hat{u}\)
    • F.O.C.: \(\dfrac{\partial SSR}{\partial \hat\beta}=0\)
    • If \(X\) has full rank, \(\beta = (X’X)^{-1}X’y\)
    • S.O.C.: \(2X’X\) must be positive definite
  8. Proof that \(v = X’X\) is positive definite (this \(v\) is just for the proof)
  9. Algebraic aspects: Residuals are orthogonal to the column vectors of \(X\), \(x_k\)
    1. The residuals always balance out around zero, meaning OLS regression does not systematically overestimate or underestimate the response variable.
    2. The regression hyperplane always passes through the mean of the data. This means that if you average all your input variables and plug them into the regression equation, you will get the mean of \(y\).
    3. The average predicted (fitted) value always equals the average observed value.
  10. Gauss-Markov Theorem: Under certain conditions OLS estimators are Unbiased and BLUE.
    1. H1: Model must be linear in parameters (Being linear in parameters means that the model can be written as a linear combination of the parameters \(\beta\), not in terms of \(\beta^2,\beta^3\) etc)
    2. H2: Strict Exogeneity: \(E(u|X) = 0\). This condition ensures that:
      • Residuals must not be correlated with \(X\).
      • No variables that systematically explain \(y\), should be excluded from \(X\), because if done that way, the effect of the omitted variable will be absorbed in \(u_i\) and thus we will get biased results.
    3. H3: No perfect multi-collinearity: \(\text{rank}(X) = k\) (must have full rank), there should be no EXACT linear dependence between the columns of \(X\).
      • Otherwise, we can’t calculate the matrix \((X’X)^{-1}\) needed for OLS calculation
      • Also, we won’t be able to estimate unique coefficients.
  11. Under the above 3 assumptions, \(\hat \beta\) is an unbiased estimator of \(\beta\). \(E(\hat \beta | X) = \beta\)
  12. H4: Homoscedasticity and serial independence. \(Var(u|X) = \sigma^2I_n\)
    1. a. \(Var(u_t|X) = \sigma^2\), \(t = 1,2,\dots,n\)
    2. b. \(Cov(u_t,u_s|X) =0\), for all \(t \neq s\)
  13. \(Var(\hat \beta) = \cdots = \sigma^2(X’X)^{-1}\)
    1. \(\beta\) is true vale with zero variance
    2. \(Var(Au)=AVar(u)A^′\), its a property
  14. Takeaways
    1. Gauss-Markov: OLS is BLUE
    2. \(var(\hat\beta) \geq \sigma^2(X’X)^{-1}\)  under homoskedasticity
    3. Linear means “linear in y”
  15. Reformulated the theorem
    1. No estimator can have variance matrix smaller than \(\sigma^2(X’X)^{-1}\)
    2. OLS estimator has exactly this, so no unbiased estimator has a lower variance than OLS
    3. So, OLS is efficient
  16. Unbiased estimator of residual variance can be written as :
    1. \(\hat\sigma^2 = \dfrac{\hat{u}'\hat{u}}{(n-k)}\)
    2. \(k\) is the number of parameters (intercept/constant included)
  17. \(H_5\) Normality of errors: \((u_t | X) \sim N(0,\sigma^2)\) i.i.d
    1. Theorem 5: Assumptions \(H_1\) to \(H_5\): \(\hat\beta \sim N(\beta,\sigma^2(X’X)^{-1})\)
    2. Lower bound of Cramer-Rao of Var-Covar matrices. \(\hat\beta\) is the unbiased estimator of minimal variance. So, \(\hat\beta\) is MLE of \(\beta\)
  18. Theorem 6: Assumptions \(H_1\) to \(H_5\)
    1. \(\dfrac{\hat\beta_j - \beta}{se(\hat\beta_j)} \sim t_{n-k}, j = 1,\dots,k\)
    2. Numerator follows a normal distribution
    3. Denominator follows a \(\chi^2\) distribution with \(n-k\) d.f.
    4. the ratio of the 2 follows a student dist with \(n-k\) d.f.
  19. Linear Regression Inference: Hypothesis test, p-value
  20. Hypotheses:
    1. Correct Model: \(E(u_i) = 0 \to\) consistent estimator
    2. Exogeneity: \(Corr(x_i,u_i) = 0\) \(\to\) consistent estimator
    3. No (perfect) colinearity
    4. homoskedasticity: \(Var(u_i) = s^2\) \(\to\) efficiency
      • Serial independence (no autocorrelation) \(Corr(u_i,u_j) = 0 \to\) efficiency
    5. Normality of residuals: \(u_i \sim N \to\) necessary in small samples
  21. Interpretation of a regression:
    1. \(E(y|X_1,X_2) = \beta_0 +\beta_1X_1 +\beta_2X_2\) gives us a min conditional upon the fixed values of the regressors
    2. \(\hat\beta\) represent the separate, individual marginal effects on the variable \(y\), holding all other variables constant.
  22. “If you increase the number of years of education by 1, hourly earnings increase by 1.88 USD,” (Dagnelie, p. 24) (pdf)
  23. Stargazer of regression output
  24. Explanation of (25)
    1. Col 1: SLR
    2. Col 2: MLR (Omitted Variable Bias)
    3. Regression from deviations from the mean: \(X^* = X - \bar{X}\)
      1. \(\beta_0^*=\bar{y}\)
      2. \(\beta_1^* = \dfrac{\sigma_{yS^*}}{\sigma^2_{S^*}}\) but \(\sigma_{y(S^* = S-\bar{S})}=\sigma_{yS}\) and so, \(\beta_1^* = \dfrac{\sigma_{yS}}{\sigma^2_S}\)
      3. Similarly, \(\beta_2^* = \dfrac{\sigma_{yEXP}}{\sigma^2_{EXP}}\)
      4. Standardized coefficients: \(\breve{\beta_0}=0\), \(\breve{\beta_1} = \dfrac{\hat{\sigma_S}}{\hat{\sigma_y}}\beta_1\) and \(\breve{\beta_2} = \dfrac{\hat{\sigma}EXP}{\hat\sigma_y}\beta_2\) (zero means and unit variances)
  25. “Frisch Waugh Lovell” (Dagnelie, p. 30) (pdf)