Talk:Pearson correlation coefficient

Latest comment: 6 months ago by Glenbarnett in topic Meaning of the term "product moment"

Pronunciation

edit

The article presently says that the pronunciation of "Pearson" is /ˈpɪərsən/. This is surely incorrect. In my dialect, the natural way to say it would be more like /ˈp[invalid input: 'iəɹ']sən/, where the first schwa is debatable; notice the change to the first vowel and the rhotic. Indeed, this is how I have heard it pronounced throughout my personal experience.

Can we get a citation on the proper pronunciation or something? IbexNu (talk) 23:45, 28 August 2021 (UTC)Reply

Concur. I would just "edit aggressively", in good WP fashion, if I were you, IbexNu. Jmacwiki (talk) 20:18, 29 January 2023 (UTC)Reply

Wrong standard error formula

edit

The formula given is wrong. There should be no square root over 1-r^2. This can be seen from the sources given Bowley, A. L. (1928). "The Standard Deviation of the Correlation Coefficient". Journal of the American Statistical Association. 23 (161): 31–34. doi:10.2307/2277400. ISSN 0162-1459. JSTOR 2277400. "Derivation of the standard error for Pearson's correlation coefficient". Cross Validated. Retrieved 30 July 2021. 95.104.184.242 (talk) 04:57, 20 December 2024 (UTC)Reply

It looks like this has been corrected. Glenbarnett (talk) 10:19, 10 November 2025 (UTC)Reply

Meaning of the term "product moment"

edit

The text seems to indicate that the expression"product moment" doesn't mean a "multiplication of moments". However it is a weird sentence and is hard to understand what it means. Can an editor, who has the required knowledge on the subject, edit that sentence and change it with a more clear and meaningful one? What does product mean there? What does "moment" mean there? And what does "product moment" eventually mean? 78.162.44.128 (talk) 17:50, 9 June 2025 (UTC)Reply

The term product-moment is standard statistical terminology. Both 'product' and 'moment' take their usual mathematical meanings; while it is not a product of moments, it is a moment of a product. The population Pearson correlation comes from standardizing the expectation of the product of the two random variables μ₁₁ = E(XY) (so in turn the correlation, ρ, is itself literally the first moment of the product of standardized variables) and similarly for sample equivalents, mutatis mutandis. The first use of product moment in the correlation context that I am aware of is Pearson, K, Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia, Philosophical Transactions of the Royal Society of London. Series A, 1896, vol 187, p253–318, but there he borrowed the term "product moment" from the same place "moment" itself comes from -- physics; it is a pretty direct analogue of the physics concept of a product-moment. As Pearson puts it (p265): "S(xy) corresponds to the product-moment of dynamics, as S(x²) to the moment of inertia". The discussion under Definition thats in the article now seems clearer than the phrasing you discuss. so I presume it was fixed, and I think it looks quite reasonable as it stands. Glenbarnett (talk) 11:05, 10 November 2025 (UTC)Reply

Last sentence of "Testing using Student's t-distribution" is wrong

edit

The sentence I'm referring to presently says:

In the case where the underlying variables are not normal, the sampling distribution of Pearson's correlation coefficient follows a Student's t-distribution but the degrees of freedom are reduced

- this broad claim is demonstrably false in general.

It is a simple matter to produce examples where neither variable is marginally normal but the usual exact small-sample t-distribution applies with no loss of d.f. in the t, and on the other hand it's also quite simple to produce examples where (i) neither variable, (ii) exactly one variable, or (iii) both variables are marginally normal but the test statistic demonstrably does not have a t-distribution.

It seems likely the author of that sentence at least partly misunderstood the reference they give. I can't see the entire paper in the link from ref 29 but from what is viewable at the link it's clear that (i) the article is talking about *filtered* time series; (ii) that it is dealing with non-independence between pairs of values (not just correlation within-pairs under different marginal distributions as the quoted sentence suggests); (iii) some loss of d.f. arises as a result of that filtering operation (I anticipate that there may also be some d.f. loss resulting from the original serial dependence, but that's not explicit in what I could make out from the link). It also looks like in the general case it's not actually t-distributed, but rather is relying on large sample approximation by a t-distribuion. With suitable corrections and caveats the paper's conclusions (that an approximate large sample t-test for a null of zero cross-series correlation in filtered dependent time series can be obtained but has lower d.f. than the usual test) could perhaps fit in a subsection relating to serial dependence or time series but it does not seem to belong in a paragraph talking about non-normality in general, at least not in anything very like its present form.

Some suitable references discussing non-normal cases are given in this stackexchange answer: https://stats.stackexchange.com/a/196806/805