# Statistical Rethinking
Contains lots of strong opinionated comments which I absolutely enjoyed as much as learning Bayesian statistics.
## 1. Chapter 1: The Golem of Prague
Chapter 1 is makes a philosophical case for "statistical rethinking" i.e. thinking about uncertainty the most natural way instead of the idealistic way forced upon us by textbooks due to constraints of computation and theoretical proof before the era of computers.
"Golems" are Jewish clay dolls.According to a legend a Rabbi was able to "animate it with truth" to defend Jewish people. However due to unintented consequences it led to death of people. It is cultural symbolism for a false prophet or a good intention leading to bad result. Statisticians make golems (i.e models) that produce similarly dangerous and silly behavior. We learn a set of tools (t-test, anova, chi-square,etc. ) and are provided with a rubric on when to use what tool. Like the jewish golems they can run wild and cause destruction because they are used for unintended purposes.
A lot can go wrong with statistical inference. Chief among them is overfitting. It is easy to overfit. It always seems like you can get better performance than a linear model because of overfitting.
>[!quote] The point isn't that statistical tools are specialized. Of course they are. The point is that classical tools are not diverse enough to handle many common research questions.
>[!quote] Understanding any individual golem is not enough. We need some statistical epistemology, an appreciation of how statistical models relate to hypotheses and natural mechanisms of interest.
>[!quote] The greatest obstacle that I encounter among students and colleagues is the tacit belief that the proper objective of statistical inference is to test null hypothesis. The proper objective, the thinking goes, because Karl Popper argued that science advances by falsifying hypotheses. ...The above kind of folk Popperism is common among scientists, but not among philosophers of science. Infact deductive falsification is impossible in nearly every scientific context.. because 1) Hypotheses are not models 2) Measurement matters (varies)
>[!quote] 3) Falsification is consensual not logical. In light of real problems of measurement error and continuous nature of natural phenomena, scientific communities argue towards consensus about the meaning of evidence
The above point is very powerful insight. There is no one p-value that magically falsifies an hypothesis. Conclusions of science are deemed to be "valid" after being agreed by the scientific community. They maybe contested and argued by some. They are sometimes superceded by an alternate conclusion if the scientific community agrees. An example is the geocentric theory of the universe which at one point was the scientific consensus in the western Europe.
>[!quote] It may hurt the public by exaggerating the definitiveness of scientific knowledge.
The scientists tend to drink too much of their own Kool-Aid and the "public" contains a sufficient distribution of idiots and paranoids to question any scientific belief. Exaggeration of definitiveness of scientific knowledge hence is common.
>[!quote] Nothing in the world.. is truly random. Presumably if we had enough information, we can exactly predict anything. We use randomness to describe the uncertainty in the face of incomplete knowledge.
For example, we might say a coin toss is random. But if we knew exact inputs to a coin toss (force, angle of launch, weight, CG, turbulence, etc) we might be able to predict the exact outcome every time.
Author suggests cross validation as a key tool for validating models
>[!quote] Future data will not be exactly like the past data and any model that is unaware of this fact tends to make worse predictions that i could.
>[!quote] Fitting is easy. Predicting is hard
Multi-level models
>[!quote] One reason to be interested in multi-level models is because they help us deal with overfitting. Cross validation and information criteria measure overfitting risk and help us to recognize it. Multilevel models actually do something about it. They exploit an amazing trick known as partial pooling that pools information across units in the data in order to produce better estimates for all units
>[!quote] Multi-level regression deserves to be the default form of regression.
Causal models
>[!quote] A statistical model is never sufficient for inferring cause.
>[!quote] Causal inference requires a causal model that is different from a statistical model
>[!quote] For **statistical models** to provide scientific insight, they require additional **scientific(causal) models**
>[!quote] The reasons for a statistical analysis are not found in the data themselves but rather in the causes of the data
>[!quote] The causes of the data cannot be extracted from the data alone. No causes in no causes out.
i.e you need a causal model to extract a cause... duh!
DAGs (Directed Acyclic Graphs)
- **justify**. scientific effort
- **expose** it to useful critique
- **connect** theories to golems
## 2. Chapter 2: Small Worlds Large Worlds
When Columbus sailed the world he thought the world was smaller than it is based on what he knew.
This is a analogy to a model that is built and tested with some limited data and the model that is running in production in live new data to make predictions. Small world is the data you use to build the model. Large World is the productionized model. Models that do well in small world dont necessarily do well in the large world
>[!quote] In the large world there may be events that were not even imagined in the small world
>[!quote] In order to make a good inference model about what actually happened, it helps to consider everything that could have happened. A Bayesin analysis is a garden of forking data in which alternate sequences of events are cultivated. As we learn aout what did happen, some of these alternate sequences are pruned. In the end what remains is only what is logically consistent with our knowledge.