Proof that \(v = X’X\) is positive definite (this \(v\) is just for the proof)
Algebraic aspects: Residuals are orthogonal to the column vectors of \(X\), \(x_k\)
The residuals always balance out around zero, meaning OLS regression does not systematically overestimate or underestimate the response variable.
The regression hyperplane always passes through the mean of the data. This means that if you average all your input variables and plug them into the regression equation, you will get the mean of \(y\).
The average predicted (fitted) value always equals the average observed value.
Gauss-Markov Theorem: Under certain conditions OLS estimators areUnbiasedandBLUE.
H1: Model must be linear in parameters (Being linear in parameters means that the model can be written as a linear combination of the parameters\(\beta\), not in terms of \(\beta^2,\beta^3\) etc)
H2: Strict Exogeneity: \(E(u|X) = 0\). This condition ensures that:
Residuals must not be correlated with \(X\).
No variables that systematically explain \(y\), should be excluded from \(X\), because if done that way, the effect of the omitted variable will be absorbed in \(u_i\) and thus we will get biased results.
H3: No perfect multi-collinearity: \(\text{rank}(X) = k\) (must have full rank), there should be no EXACT linear dependence between the columns of \(X\).
Otherwise, we can’t calculate the matrix \((X’X)^{-1}\) needed for OLS calculation
Also, we won’t be able to estimate unique coefficients.
Under the above 3 assumptions, \(\hat \beta\) is an unbiased estimator of \(\beta\). \(E(\hat \beta | X) = \beta\)
H4: Homoscedasticity and serial independence. \(Var(u|X) = \sigma^2I_n\)
a.\(Var(u_t|X) = \sigma^2\), \(t = 1,2,\dots,n\)
b.\(Cov(u_t,u_s|X) =0\), for all \(t \neq s\)
\(Var(\hat \beta) = \cdots = \sigma^2(X’X)^{-1}\)
\(\beta\) is true vale with zero variance
\(Var(Au)=AVar(u)A^′\), its a property
Takeaways
Gauss-Markov: OLS is BLUE
\(var(\hat\beta) \geq \sigma^2(X’X)^{-1}\) under homoskedasticity
Linear means “linear in y”
Reformulated the theorem
No estimator can have variance matrix smaller than \(\sigma^2(X’X)^{-1}\)
OLS estimator has exactly this, so no unbiased estimator has a lower variance than OLS
So, OLS is efficient
Unbiased estimator of residual variance can be written as :
\(\hat\sigma^2 = \dfrac{\hat{u}'\hat{u}}{(n-k)}\)
\(k\) is the number of parameters (intercept/constant included)
\(H_5\)Normality of errors: \((u_t | X) \sim N(0,\sigma^2)\) i.i.d
Theorem 5: Assumptions \(H_1\) to \(H_5\): \(\hat\beta \sim N(\beta,\sigma^2(X’X)^{-1})\)
Lower bound of Cramer-Rao of Var-Covar matrices. \(\hat\beta\) is the unbiased estimator of minimal variance. So, \(\hat\beta\) is MLE of \(\beta\)
Serial independence (no autocorrelation) \(Corr(u_i,u_j) = 0 \to\) efficiency
Normality of residuals: \(u_i \sim N \to\) necessary in small samples
Interpretation of a regression:
\(E(y|X_1,X_2) = \beta_0 +\beta_1X_1 +\beta_2X_2\) gives us a min conditional upon the fixed values of the regressors
\(\hat\beta\) represent the separate, individual marginal effects on the variable \(y\), holding all other variables constant.
“If you increase the number of years of education by 1, hourly earnings increase by 1.88 USD,” (Dagnelie, p. 24) (pdf)
Stargazer of regression output
Explanation of (25)
Col 1: SLR
Col 2: MLR (Omitted Variable Bias)
Regression from deviations from the mean: \(X^* = X - \bar{X}\)
\(\beta_0^*=\bar{y}\)
\(\beta_1^* = \dfrac{\sigma_{yS^*}}{\sigma^2_{S^*}}\) but \(\sigma_{y(S^* = S-\bar{S})}=\sigma_{yS}\) and so, \(\beta_1^* = \dfrac{\sigma_{yS}}{\sigma^2_S}\)
Standardized coefficients: \(\breve{\beta_0}=0\), \(\breve{\beta_1} = \dfrac{\hat{\sigma_S}}{\hat{\sigma_y}}\beta_1\) and \(\breve{\beta_2} = \dfrac{\hat{\sigma}EXP}{\hat\sigma_y}\beta_2\) (zero means and unit variances)