"[n]o causation without manipulation" Holland (1986)
"[c]orrelation can sometimes provide pretty good evidence of a casual relation, even when the variable of interest has not been manipulated by a researcher or experimenter". (Angrist and Pischke 2008)
$$E(y|X)=X\beta$$
$$y=X\beta+\epsilon$$
Remmber that one of the assumptions of OLS is that the regressors and the error terms are independent, $cov(X,\epsilon)=0$, also is known as the orthogonality assumption, $X \perp \epsilon$.
Sometimes, this assumption is violated in our empirical analysis (Maybe better to see, it is violated most of the time). What happens if this assumption is violated?
OLS estimator: $$ \hat{\beta}=(X^TX)^{-1}(X^Ty)~~~(1) $$ and $$ y=X\beta+\epsilon~~~(2) $$
Replacing $y$ from (2) in (1):
$$ \hat{\beta}=(X^T X)^{-1}(X^T X\beta+\epsilon) $$
$$ \hat{\beta}=(X^T X)^{-1}(X^T X)\beta+(X^T X)^{-1}\epsilon$$
Given that $(X^T X)^{-1}(X^T X)=I$
$$ \hat{\beta}=\beta+(X^T X)^{-1}\epsilon$$
If $cov(X,\epsilon)=0$, then $\hat{\beta}=\beta$. However, $cov(X,\epsilon) \neq 0 \Rightarrow \hat{\beta}\neq \beta $. Thus, $\hat{\beta}$, that is OLS estimator, is biased.
This is one form of endogeneity problem. How can we solve this problem? Below, I explain how IV models can help us to mitigate this problem. But, before that, lets see if we can solve other types of endogeneity problem.
Another assumption of OLS models is that only $x$ can have an effect on $y$. However, in the real world, we know that simultaneous causality is common. (Can you name few of them?) The first problem is that we can argue that the estimated coefficient for the independent variable, $X$, is a causal association from the independent variable on the outcome variable, $y$. This also can cause endogeneity problem, and making the estimated coefficient biased. If $y$ increases, then we should expect that both $x$ and $\epsilon$ change, meaning $cov(x,\epsilon) \neq 0$.
The last type of endogenity problem that can be mitigated using IV models is measurement errors. It is assumed that we can measure the independent variable is measured without any measurement errors, $x=x^*+\eta$. If there is a measurement errors problem, but the error term is not correlated with $x$, i.e. $cov(x,\eta)=0$, then the estimated coefficient $\hat{\beta}$ is inefficient but unbiased.
However, if the measurement errors are correlated with the independent variable, i.e. $cov(x,\eta) \neq 0$, the estimated OLS coefficient is biased:
$$\hat{\beta_1}=\frac{cov(y,x_1)}{var(x_1)} $$
$$\hat{\beta_1}=\frac{cov(\beta_0+\beta_1(x_1-\eta)+\epsilon,x_1)}{var(x_1)} $$
$$\hat{\beta_1}=\frac{\beta_1 var(x_1)-\beta_1cov(x_1,\eta)}{var(x_1)} $$
$$\hat{\beta_1}=\beta_1 (1-\frac{\sigma^2_\eta}{var(x_1)}) $$
$$\hat{\beta_1}=\beta_1 (1-\frac{\sigma^2_\eta}{\sigma^2_{x_1^*}+\sigma^2_{e}}) $$
$$\hat{\beta_1}=\beta_1 (\frac{\sigma^2_{x_1^*}}{\sigma^2_{x_1^*}+\sigma^2_{e}}) $$
A way to address above problems is to use an instrument $Z$ for $X$. The IV formula is as follow:
$$\hat{\beta}_{IV}=(X^TP_ZX)^{-1}(X^TP_ZY) $$
where $P_Z=Z(Z^TZ)^{-1}Z^T$.
$\hat{\beta}_IV$ is consistent when $Z$ satisfies two conditions:
1) Z is uncorrelated with $\epsilon$
2) Z is correlated with X.
$$\hat{\Pi}=(Z^TZ)^{-1}Z^TX$$,
The predicted value of $X$ is:
$$ \hat{X}=Z\hat{\Pi}=Z(Z^TZ)^{-1}Z^TX=P_ZX$$
$$Y=\hat{X}\beta+\epsilon$$
Although IV is considered a string tool, it is difficult to find "good instrument", which are usually generated by real or natural experiments.