What is the interpretation of a P-value of a hypothesis test in inferential statistics? How is it commonly misunderstood? amzn.to/3rjDOoA (Probability and Statistics with Applications: A Problem Solving Text, by Asimow and Maxwell)
#PValue #Statistics #DataScience
Links and resources
🔴 Subscribe to Bill Kinney Math: www.youtube.com/user/billkinneymath
🔴 Subscribe to my Math Blog, Infinity is Really Big: infinityisreallybig.com/
🔴 Follow me on Twitter: twitter.com/billkinneymath
🔴 Follow me on Instagram: www.instagram.com/billkinneymath/
🔴 You can support me by buying "Infinite Powers, How Calculus Reveals the Secrets of the Universe", by Steven Strogatz, or anything else you want to buy, starting from this link: amzn.to/3eXEmuA.
🔴 Check out my artist son Tyler Kinney's website: www.tylertkinney.co/
As an Amazon Associate I earn from qualifying purchases.
P-Value is always the probability of observing a test statistic as extreme or more extreme than what you actually observe when the null is true, that's, a mouthful, but it's worth writing down probability of observing a test statistic.
And you should know this for the exam, maybe even a multiple choice question as extreme or more extreme as what you actually observed as what you actually observed under the assumption that the null hypothesis is true.
That's, always what the p-value is in any situation.
Of course, that means you got a no test statistic.
And then the logic of thinking about making a decision with p-values is this.
This is really key as well.
Maybe I'll, ask you a multiple choice question about what I'm about to say, the logic is this.
If the p-value is small really small, how small is really small it's.
It is a matter of opinion.
But again, our standards of five percent that most people sort of default to is that really the best thing to default to it, not necessarily it's, just sort of habit tradition.
If the p-value is really small, then you then you observe have observed something rare when the null is true.
Therefore we think the null is false don't ever ever ever ever ever ever ever ever ever think that the p-value is the probability that the null is true.
That's sort of the common misperception.
Oh, the p-value is the probability of the null is true.
It's, small, therefore we think the null is false.
No that's, not what it is nobody.
But nobody can figure out the probability that the null is true.
Nobody can do such a thing it's impossible.
Not even a well-defined question.
Probably, I mean, when you think about it practically, I mean, if you're thinking about two-tailed tests, the probability that the null is true, even though it can't really be found in in any ordinary sense, you might say zero, if it's a two-tailed test right the probability that mu equals 20 exactly there's, pretty much no chance, but that's, not the philosophy of hypothesis testing.
The philosophy is you give the null benefit of the doubt until proven otherwise so to speak, these are important issues and it's important to you know, help your future work colleagues to understand that.
And they do have practical ramifications in terms of misunderstanding statistics and stuff people use up p-values all the time.
But they really don't understand them it's, the probability of observing a test statistic like a z-statistic or a t as extreme or more extreme than what you actually observed greater than or equal to or less than or equal to under the assumption that the null is true.
So when it's small, you observe something rare, if the null is true, therefore, you think the null's false.
How is this different alpha is the probability of a type, one error the probability of incorrectly incorrectly rejecting the null.
If you assume the null is true it's, when the p-value is less than alpha when you get a p-value, and it happens to be less than alpha that you say, okay, I'm going to reject the null.
And therefore, my probability of the type 1 error is alpha I'll say that again, well, let's just go ahead and compute it for an example.
Suppose let's, go ahead.
And suppose we get an x bar that I know ahead of time is going to be in the rejection region in terms of x bar.
Let's, pretend we get x bar equal to 18? Yeah, 18.16.
Suppose, the observed value of x bar is 18.1.
The p-value of the test then is the probability of getting x bar less than or equal to 18.1 given that the null is true.
I mean, usually when we compute a p-value, we don't write it in terms of x-bar.
We usually pretty much just compute z first, assuming the null is true so subtract 20.
So we have negative 1.9, divided by 0.8.
The observed value of the z statistic is negative 2.375.
So usually I don't write that first line that I just wrote right here, but it's, not a problem to write that.
And it is a less than or equal to here because it's a left-tailed test less than or equal to is what it means to say as extreme or more extreme it's, always with regard to the direction of the alternative hypothesis with the left-tailed test.
We want less than or equal to.
But by what I've just computed here, that's the same as the probability that z is less than or equal to negative 2.375.
And by what I've done already that's going to be a little less than 0.01 negative, 2.375 I'm going to get something slightly less than 0.01 0.009.
So the p-value is .009 less than alpha slightly less than alpha to decide to reject the null.
You don't have to think about alpha.
You could just say, hey, I got a small p value.
I've observed something rarer when the null is true, therefore I reject the null.
You don't have to think about alpha, however, in the classical approach, you do decide alpha ahead of time.
You do decide the rejection region you could reject the null just looking at the z statistic and seeing oh it's in the rejection region.
And you don't have to figure out the p-value, the more new approach is to figure out the p-value to specify exactly what the probability of observing.
The test statistic is if the null is true as extreme or more extreme.
If you want to relate it to the classical approach, then you say, oh it's less than alpha.
So I reject this is equivalent to z being in the reduction region because it's less than alpha.
You are rejecting the null, therefore your type one probability of a type.
One error is alpha, though we get into subtleties there, too because technically speaking, you've already done your random sampling.
So again, probabilities are really best interpreted before the fact, if you decide what alpha is going to be .01 it's best thought of as a probability before you actually do your sampling, you've set things up.
You've got a sample size of 25 and a standard deviation of four with this hypothesis test I'm about to do the test.
What are the chances that I'm going to make a type 1 error that's? What the best interpretation of the .01 is, I happen to get a p-value smaller than that.
So I do reject saying that my probability of a type 1 error now is .01 isn't after the fact probability is not quite technically the best interpretation of probability.
But most people don't worry about it, but don't ever say, the prob p value is the probability that the null's true it's, not there's, no way.
Anybody can find that.
Maybe, instead of emphasizing that the nulls mu equals 20 exactly.
Maybe you started emphasizing that the nulls mu is 20 with maybe a little bit of cushion of margin of error or something, but you're, not going to get into that either.
We want to keep things simple.
I mention these things to you so that you just realize maybe you've had misperceptions about things and plenty of other people do too sometimes the technical technically bad way of thinking about it doesn't have any effects like, oh to think of this as a probability of a type type one error.
Now after the fact, it's not a big deal really, but it is an after the fact probability, which is technically not correct it's kind of like with confidence levels with confidence.
Intervals as well.
If this had been a two-tailed test mu, not equal to 20 in the alternative hypothesis, I'm, telling you you should be able to handle it as far as the p-value goes.
You just double it.
Just double the one-tailed p-value as far as the rejection region's, actually a bit trickier.
Right? I have to find two tails for a two-tailed test where the area of both of them add to 0.01.
So the area of both of them would be half of that .005.
So p-value approach just double the one-tailed p-value for these symmetric distributions, at least with the rejection region approach it's a bit trickier.
You gotta split your rejection region into two pieces.
A P value is NOT an error rate, but alpha IS an error rate. By directly comparing the two values in a hypothesis test, it's easy to think they're both error rates. This misconception leads to the most common misinterpretations of P values.What is a misinterpretation of the p-value? ›
Common misinterpretations of single P values. ThePvalue is the probability that the test hypothesis is true; for example, if a test of the null hypothesis gaveP = 0.01, the null hypothesis has only a 1 % chance of being true; if instead it gaveP = 0.40, the null hypothesis has a 40 % chance of being true.How can p values be misleading? ›
A low or high P-value does not prove anything with regard to the effectiveness of an intervention: a P-value of 0.001 does not reflect a larger effect than a P-value of 0.04. Judgements on the clinical importance of a result should be based on the size of the effect seen rather than the P-value.Why scientists often misinterpret the p-value as the probability of the hypothesis? ›
One reason scientists may misinterpret the p-value as the probability of the hypothesis is that the terminology used in statistical hypothesis testing can be confusing, particularly for those without a strong background in statistics.What is the biggest problem with using p-value? ›
The p-value does not indicate the size or importance of the observed effect. A small p-value can be observed for an effect that is not meaningful or important. In fact, the larger the sample size, the smaller the minimum effect needed to produce a statistically significant p-value (see effect size).Why are p-values not reliable? ›
The P values do not tell how 2 groups are different. The degree of difference is referred as 'effect size'. Statistical significance is not equal to scientific significance. Smaller P values do not imply the presence of a more important effect, and larger P values do not imply a lack of importance.What are common p-value mistakes? ›
- "There is 2.9% probability the means are the same, and 97.1% probability they are different." We don't know that at all. ...
- "The p-value is low, which indicates there's an important difference in the means." ...
- "The low p-value shows the alternative hypothesis is true."
If the p-value is less than or equal to the specified significance level α, the null hypothesis is rejected; otherwise, the null hypothesis is not rejected. In other words, if p≤α, reject H0; otherwise, if p>α do not reject H0.What to say if p-value is not significant? ›
A p-value > 0.05 would be interpreted by many as "not statistically significant," meaning that there was not sufficiently strong evidence to reject the null hypothesis and conclude that the groups are different.What are some issues that often occur with interpretation of the p-value? ›
Unfortunately, P values are frequently misinterpreted. A common mistake is that they represent the likelihood of rejecting a null hypothesis that is actually true (Type I error). The idea that P values are the probability of making a mistake is WRONG!
Misleading statistics refers to the misuse of numerical data either intentionally or by error. The results provide deceiving information that creates false narratives around a topic. Misuse of statistics often happens in advertisements, politics, news, media, and others.How reliable are p values? ›
P-value gives you the likelihood of your null hypothesis. A small p-value (less than or equal to 0.05) indicates strong evidence against the null hypothesis. A large p-value (greater than 0.05) indicates weak evidence against the null hypothesis.Is the p-value the probability of a being wrong? ›
The p – value represents the probability of making a type I error, or rejecting the null hypothesis when it is true. The smaller the p value, the smaller is the probability that you would be wrongly rejecting the null hypothesis.Why are p-values problematic as a measure of confidence in data? ›
The P-value is not easily interpretable when the tested hypothesis is defined after data dredging, when a statistically significant outcome has been observed. If undisclosed to the reader of a scientific report, such post-hoc testing is considered scientific misconduct5.Is there enough evidence if the p-value is greater than the significance level? ›
A p-value more than the significance level (typically p > 0.05) is not statistically significant and indicates strong evidence for the null hypothesis. This means we retain the null hypothesis and reject the alternative hypothesis.What factors affect p-values? ›
- Effect size. It is a usual research objective to detect a difference between two drugs, procedures or programmes. ...
- Size of sample. The larger the sample the more likely a difference to be detected. ...
- Spread of the data.
Study design elements which can impact a P value
Multiple study design elements can have an impact on the calculated P value. These include sample size, magnitude of the relationship and error. Each of these elements may work independently or in concert to invalidate study results.
P > 0.05 is the probability that the null hypothesis is true. 1 minus the P value is the probability that the alternative hypothesis is true. A statistically significant test result (P ≤ 0.05) means that the test hypothesis is false or should be rejected. A P value greater than 0.05 means that no effect was observed.What is the p-value in layman's terms? ›
P-value is the probability that a random chance generated the data or something else that is equal or rarer (under the null hypothesis).What happens if p-value is less than significance? ›
If your P value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample gives reasonable evidence to support the alternative hypothesis.
Non-significance in statistics means that the null hypothesis cannot be rejected. In laymen's terms, this usually means that we do not have statistical evidence that the difference in groups is not due to chance.How do you explain not statistically significant? ›
This means that the results are considered to be „statistically non-significant‟ if the analysis shows that differences as large as (or larger than) the observed difference would be expected to occur by chance more than one out of twenty times (p > 0.05).How do you report an insignificant p-value? ›
In general, p values larger than 0.01 should be reported to two decimal places, and those between 0.01 and 0.001 to three decimal places; p values smaller than 0.001 should be reported as p<0.001.What is the relationship between error and p-value? ›
The p-value is instance-specific and conditional. It does not tell you the probability of a type I error, either before you sample (it can't tell you that, since it depends on the sample), or after: If p≥α then the chance you made a Type I error is zero.What is an example of misrepresentation of statistical data? ›
In 2007, toothpaste company Colgate ran an ad stating that 80% of dentists recommend their product. Based on the promotion, many shoppers assumed Colgate was the best choice for their dental health. But this wasn't necessarily true. In reality, this is a famous example of misleading statistics.What are examples of misinterpretation? ›
Misinterpretation is a case of misunderstanding something. You tried to assemble a set of bookshelves, but your misinterpretation of the directions meant you ended up with a table instead!What does misinterpreted results mean? ›
/ˌmɪs.ɪnˈtɜː.prət/ us. /ˌmɪs.ɪnˈtɝː.prət/ C2. to form an understanding that is not correct of something that is said or done: My speech has been misinterpreted by the press.How do you know if p-value is correct? ›
The p-value can be perceived as an oracle that judges our results. If the p-value is 0.05 or lower, the result is trumpeted as significant, but if it is higher than 0.05, the result is non-significant and tends to be passed over in silence.When should p-values not be used? ›
P-values should not be used in clinical decision making by medical physicists or engineers because p-values are often misused and misunderstood, and there are few -if any- practical uses of it in clinical practice.Is the p-value the standard error? ›
The standard error of the mean permits the researcher to construct a confidence interval in which the population mean is likely to fall. The formula, (1-P) (most often P < 0.05) is the probability that the population mean will fall in the calculated interval (usually 95%).
If your one-way ANOVA p-value is less than your significance level, you know that some of the group means are different, but not which pairs of groups.Should the statistical value of p be greater than the significance level for you to reject the null hypothesis? ›
If the p-value is below your threshold of significance (typically p < 0.05), then you can reject the null hypothesis, but this does not necessarily mean that your alternative hypothesis is true.What happens when p-value is equal to significance level? ›
When a P value is less than or equal to the significance level, you reject the null hypothesis. If we take the P value for our example and compare it to the common significance levels, it matches the previous graphical results.Why are p-values controversial? ›
The controversy exists because p-values are being used as decision rules, even though they are data-dependent, and hence cannot be formal decision rules. Incorrectly using p-values as decision rules effectively eliminates the idea of a valid decision rule from a test, and therefore invalidates the decision.What are the common misuses of the p-value? ›
Arguably the most common misinterpretation is interpreting P value >0.05 as that the null hypothesis is true and then concluding that say, there is no association between a risk factor and a disease such as cancer, or there is no treatment effect difference between two interventions that are compared in a cancer ...What error does p-value tell you? ›
The p-value only tells you how likely the data you have observed is to have occurred under the null hypothesis. If the p-value is below your threshold of significance (typically p < 0.05), then you can reject the null hypothesis, but this does not necessarily mean that your alternative hypothesis is true.What makes p-value not significant? ›
The p-value can be perceived as an oracle that judges our results. If the p-value is 0.05 or lower, the result is trumpeted as significant, but if it is higher than 0.05, the result is non-significant and tends to be passed over in silence.Does p-value rule out bias? ›
Consequently, the p-value measures the compatibility of the data with the null hypothesis, not the probability that the null hypothesis is correct. Statistical significance does not take into account the evaluation of bias and confounding.How reliable are P values? ›
P-value gives you the likelihood of your null hypothesis. A small p-value (less than or equal to 0.05) indicates strong evidence against the null hypothesis. A large p-value (greater than 0.05) indicates weak evidence against the null hypothesis.