Inferential Statistics

Drawing conclusions about a population based on data collected from a sample of individuals from that population

Which predictors are associated with the response? It is often the case that only a small fraction of the available predictors are substantially associated with Y . Identifying the few important predictors among a large set of possible variables can be extremely useful, depending on the application.

• What is the relationship between the response and each predictor? Some predictors may have a positive relationship with Y , in the sense that increasing the predictor is associated with increasing values of Y . Other predictors may have the opposite relationship. Depending on the complexity of f, the relationship between the response and a given predictor may also depend on the values of the other predictors.

• Can the relationship between Y and each predictor be adequately summarized using a linear equation, or is the relationship more complicated? Historically, most methods for estimating f have taken a linear form. In some situations, such an assumption is reasonable or even desirable. But often the true relationship is more complicated, in which case a linear model may not provide an accurate representation of the relationship between the input and output variables.

For instance, consider a company that is interested in conducting a direct-marketing campaign. The goal is to identify individuals who will respond positively to a mailing, based on observations of demographic variables measured on each individual. In this case, the demographic variables serve as predictors, and response to the marketing campaign (either positive or negative) serves as the outcome. The company is not interested in obtaining a deep understanding of the relationships between each individual predictor and the response; instead, the company simply wants an accurate model to predict the response using the predictors. This is an example of modeling for prediction.

In contrast, consider the Advertising One may be interested in answering questions such as: – Which media contribute to sales? – Which media generate the biggest boost in sales? or – How much increase in sales is associated with a given increase in TV advertising?

  1. Population - our entire group of interest.

  2. Parameter - numeric summary about a population

  3. Sample - subset of the population

  4. Statistic - numeric summary about a sample

Ex:

info@bourassathermopompe.ca

Simpson's paradox

Simpson's paradox (or Simpson's reversal, Yule–Simpson effect, amalgamation paradox, or reversal paradox)[1] is a phenomenon in probability and statistics, in which a trend appears in several different groups of data but disappears or reverses when these groups are combined.

It is possible to draw two oposite conclusions from the same data depending on how you divide things up.

Example:

When comparing tests scores, one is better than the other

However, when we compare by ethnicity:

Actually the other school performs better. Sadly, Ethnicity has an impact on how well students score tests because of social economic backgrounds.

So yes, one school scores better, but that's because they have more socially economical advantaged students, not because it provides better education.

So to know what is right you need more context of what the statistic actually means, you can't find the answer with more statistics.

Last updated