anecdotal evidence: 用极端的个例去判断整体的信息。例如“我叔叔每天吸三根烟身体很棒”来验证“吸烟对人体没有危害”。
type of data: 对数据进行进一步处理前,先想一想数据是什么类型,qualitative(有顺序还是无顺序)还是quantitative(连续还是离散)。
Correlation does not imply causation
observation 能让我们得到correlation(高级方法也可以得到causation) experiment能让我们得到causation
studies分为observational和experiment。 observation通产只能得到assignment(correlation),而experiment可以得到causual。 举个例子:判断是否workOut对energyLevel高低的影响。 obs: 分别从是否workOut中选取一组人,比较各自的energyLevel,能得到相关关系。但是energyLevel不一定是由workOut与否引起的,可能有其他不可控的因素(被称为confounding var)。 exp:从population中做random assignmen,然后分别让两个随机组做workOut与否的测试,然后测量energyLevel。这点来说,与“控制变量法”相似。
sample bias - convenience sample: 只选取容易获得的sample - non-response:只选取了随机样本的一部分 - voluntary respoonse:结果的如何取决于投票者的志愿
sample methods - simple random sample(SRS): each case is equally likely to be selected. - stratified sample: divide the population into homogenous strata, then rondomly sample - clusters: divide the population clusters, randomly sample a few clusters, then sample all obs within these clusters - multistage: like clusters, while randomly sample within these clusters(例如调查一个城市的情况,分成各个区,避免了每个区都去的情况)
principles of experimental design 1. control: compare treatment of interset to a control group 2. randomize: randomly assign subjects to treatments 3. replicate: collect a suufficiently large sample, or replicate the entire study 4. block: block for variables known or suspected to affect the outcome
more on blocking design an experiment investigating whether energy gels help you run faster treatment: energy gel control: no energy gel block: energy gel might affect pro and amateur athletes differently block for pro status: 1. divide the sample to pro and amateur 2. randomly assign pro and amateur athletes to treatment and control groups 3. pro and amateur athletes are equally represented in both groups
experimental terminology 1. placebo: fake treatment, often used as the control goup for medical studies 2. placebo effect: showing change despite being on the placebo(they believe that treatment, the mental reason) 3. blinding: experimenal units don’t know which group ther’re in 4. double-blink: both the experimental units and the researchers don’t know the group assignment
random sampling and random assignment 1. random sampling: In observation, random sample in the population. 2. random assignment: In experiment, random assign treatment and control group. 3. random sampling happens first , then random assignment. 4. only a study using random sampling and random assignment can be causal and generalizable.
modality 1. unimodal 2. bimodal 3. uniform 4. multimodal
robust statistics center: median ; not mean spread: IQR; not SD,range skew statistics is good at describing skewed data with extreme obes.
transformation 1. (natural) log transformation: often applied when much of the data cluster near zero(relative to the larger values in the data set) and all observations are positive. For example, the right skewed data transforms to the log data. Then the data is less skewed and has less extreme. 2. square root 3. inverse
goals of transformations 1. see the data structure differently 2. reduce skew assist in modeling 3. straighten a nonlinear relationship in a scatterplot