Talk:Cross-correlation

Latest comment: 1 month ago by 85.221.95.150 in topic Error in Time series Analysis Section

Old comments edit

This page is very difficult to follow, when expecting a form of cross correlation that isn't signal processing. Why not start with a general description? —Preceding unsigned comment added by Quatch (talkcontribs) 16:14, 2 September 2009 (UTC)Reply

I have a more basic problem with this entry than all of these other entries!
The author has described a CROSS CORRELATION as having a peak at zero shift.
THAT, MY FRIENDS, is an AUTO CORRELATION FUNCTION!!!
The CROSS CORRELATION FUNCTION has a peak at the time (or time shift) that is the DIFFERENCE in time betwixt the arrival time two nominally identical signals received at differing times - the Cross Correlation function measures THE DELAY TIME betwixt the arrival time of two, nominally identical signals.
Long ago while employed at National Technical Systems (NTS) I wrote software on an HP 5451C mainframe based Fourier Analyzer/data acquisition system for determining this and all manner of other typical signal processing functions.
This was decades before Matlab and similar applications!
We were on contract from DOD to determine if one could tell if a missile (can't remember which, too long ago!) was present in a launcher, of if there was a dummy present, using remote sensing techniques.
I used Bendat and Piersol's tomes on these subjects to write my programs.
I'm just say'n.
Somebody needs to fix this article. Not me, and not tonight. Pcmacd (talk) 02:24, 17 November 2022 (UTC)Reply

building up text to add once verified..

given a reference signal and an input signal,
sref = 01011010010110000010111101111001001011010010111000100101101111
sinp = 01111011100100111000000111110011110010011100100111100001001110
the cross-correlation of reference signal with the input signal reaches its maximum of 0.61 when the input signal is rotated to the left 5 places (\dt = -5).
 

Waveguy 06:10, 28 Jan 2005 (UTC)


Things to cover:

  • Different variations, like the above binary signals, "regular old" digital signals like PCM audio, 2D cross-correlation of images, etc.
  • Circular cross-correlation
  • Faster calculation with the use of FFTs

- Omegatron 04:41, Mar 20, 2005 (UTC)

Move to Cross covariance edit

Please look at cross covariance article.

I moved the orginal definition (without dividing sigmas) to the cross covariance page. I know there is a lot of disagreement on the difference between covariance and correlation or whether there is a difference, but It seems to be the consensus of the relevant pages that correlation does involve dividing by the sigmas, while covariance does not divide by the sigmas. See Covariance matrix for a short discussion. So, since the new stuff added did not divide by the sigmas, I reverted back to the original. Here is a table I have been using for the relevant pages.

NO SIGMAS WITH SIGMAS
Covariance Correlation
Cross covariance Cross correlation see ext
Autocovariance Autocorrelation
Covariance matrix Correlation matrix
Estimation of covariance matrices

PAR 02:35, 10 July 2005 (UTC)Reply

Discrete-Time Signal Processing by Oppenheim, Schafer, and Buck, which is the definitive textbook for DSP, defines the cross-correlation of two signals without dividing by any sigma. Numerical Recipes in C by Press et al. also defines it without dividing by sigma.
What you are calling the "cross correlation", dividing by sigma, is called the "linear-correlation coefficient" in the statistics text I happen to have on my shelf (Data Reduction and Error Analysis for the Physical Sciences by Bevington and Robinson.)
Perhaps there is a difference in usage between the statistics and signal-processing/engineering communities. Even if so, it is not Wikipedia's place to annoint one usage as the "right" one.
—Steven G. Johnson 19:13, July 10, 2005 (UTC)

I'm not annointing here, I'm just trying to clarify things. Looking at the table above, the cross-correlation was the only article that was in conflict with every other article in the table as far as the sigmas was concerned, so I changed it. If you have a better idea, lets do it. PAR 01:52, 11 July 2005 (UTC)Reply

Please realize that the comments hear apply to every other correlation article in the table. I think that the articles should list forms both with and without sigma, probably merging the covariance/correlation articles to avoid duplication, explain the context for the different usages in signal processing and statistics, and explain the impact of the sigma. As it stands, Wikipedia is annointing one particular usage as the correct one, which is wrong. —Steven G. Johnson 15:55, July 11, 2005 (UTC)
It seems that the definition is ambiguous. - Either we need to find the dominant definition and go with that, or we have to present both. Cburnett 19:23, July 11, 2005 (UTC)

After checking 7 different statistics books, the following is unanimous:

  • The covariance of two different random variates X and Y is
Cov(X,Y)=E( (X-E(X)) (Y-E(Y)) ) where E(X) is expectation value of X.
  • The (linear) correlation coefficient is
R(X,Y) = Cov(X,Y)/(S(X) S(Y)) where S(X) is the std. deviation of X.

Oppenheim et al is the only one to define cross-covariance and cross-correlation and they do it in a very consistent way:

cross correlation  
cross covariance  
autocorrelation  
autocovariance  

I think it's clear that my moving cross-correlation to cross-covariance was wrong. It's not the sigmas that distinguish correlation from covariance, it's the subtraction of the means. The division by the sigmas is another issue. I would like to alter all articles to conform with Oppenheim's definition, and add very clearly the alternate definitions. There will be no conflict with the 7 books I mentioned, but there will be a clear conflict with the autocorrelation article as it stands. I understand that we do not want to favor a particular set of definitions if there is any controversy, but it seems that the controversy is not so bad, and we do want some clarity and predictability to these articles, rather than conflicting or missing definitons. I will make these changes in a day or two unless there is an objection. PAR 00:18, 12 July 2005 (UTC)Reply

I have also seen "correlation function" used in physical contexts for the subtracted-mean version. The difference is also blurred because in many important cases the mean is zero. I would prefer if auto-correlation, auto-covariance, cross-correlation, and cross-covariance were all defined on a single page (with appropriate redirects). Mathematically, they are so closely related that it hardly makes sense to me to have separate pages. (I'm not sure what to do about the dividing-by-sigma variant, since I'm not so familiar with that). —Steven G. Johnson 01:37, July 12, 2005 (UTC)

OK - how about this: A page entitled "covariance and correlation" or something. It explains that there are conflicting definitions. Then it adopts Oppenheim's definitions for the sake of clarity, not because they are best, but because we need to settle on some consistent way of speaking about things. Then it redirects to the various pages, each of which is rewritten consistent with these definitions, including the important alternate definitions. They are also rewritten as if they might be subsections of the main page. If after this is all done, they look like they ought to go in the main article, we can do that. That way there's no big lump of work that needs to be done, it can all be done slowly, in pieces. PAR 02:07, 12 July 2005 (UTC)Reply

Sounds good to me. —Steven G. Johnson 02:45, July 12, 2005 (UTC)

Ok - I put up a "starter page" at Covariance and correlation. PAR 04:06, 12 July 2005 (UTC)Reply

By the way, I don't think there is anything wrong with an editorial policy that enforces consistent terminology. This is not at odds with NPOV: alternative definitions should be mentioned, but for the sake of coherence and consistency a common set of terms should be used. See for example groupoid vs. magma (algebra) for a precedent. --MarkSweep 16:58, 12 July 2005 (UTC)Reply

some example edit

i need examples of mean,median,mode,variablity,range,variance,co-relation,standard deviation ,skewness related to marketing

f*? edit

The article mentions   in many equations but doesn't define it. What's  ? —Ben FrantzDale 20:07, 19 December 2006 (UTC)Reply

A superscript asterisk indicates the complex conjugate. --Abdull 21:06, 20 May 2007 (UTC)Reply

Zero-Lag? edit

Could someone please add information on the zero-lag? We also need to mention that cross-correlation operation is associative, distributive but not commutative.

Cross-correlation and convolution edit

I don't see yet why  . Let's simplify things to f(t) and g(t) being real functions, therefore  .

As  , what does   look like expressed in integral form?

Besides, if convolution is commutative and cross-correlation is not commutative why can you say   at all? --Abdull 21:23, 20 May 2007 (UTC)Reply

answer to the commutativity question:

 

 

 

Confusion with definition on convolution page edit

The definition used on convolution page is inconsistent with that used here such that the first property of the cross-correlation is unclear with respect to how to derive it starting from the definition of convolution currently being used. Would it be such a bad thing to show the entire derivation here, i.e. just fill in the 2-3 steps between the correlation of f and g and the convolution of f*(-t) and g? Alternatively, one could define the convolution in terms of the covariance since it would not require a definition that's not listed on the page and would create consistency. --Kdmckale (talk) 13:17, 18 October 2015 (UTC)Reply

Corrected a mistake in the definition of cross-correlation as convolution:

  •  

Added property that:

  •  

Sergio Rescia (talk) 17:06, 9 February 2018 (UTC) Sergio ResciaReply

Dangling citation edit

The in-text citation "(Campbell, Lo, and MacKinlay 1996)" isn't all that useful without the actual reference. Could whoever put that in please add the full citation in a "references" section?

appropriate integration limits edit

The article curently does not describe what the appropriate integration limits are. Can someone who knows the answer please add them? Ngchen 14:58, 7 October 2007 (UTC)Reply

As an example of (to me) confusing integration limits, consider the convolution of two distributions X ~ Gamma( kx, tx ) and Y ~ Gamma( ky, ty ) (where ki is always >0) -- the goal being to derive a distribution for the difference d=X-Y. It took me most of an evening to realize that the answer is piecewise, with different formulas for d<0 and d>0 (because you have to make sure the arguments of the Gamma distributions are positive -- I think). I eventually figured it out :), but perhaps someone with a better understanding could clarify? Anon.

Possible split edit

I think that the usage and terminology in "signal processing" and in "statistics" are so different that a split into articles specific to each is required. Melcombe (talk) 11:47, 26 March 2008 (UTC)Reply

Things to add (but not by me) edit

Can someone please add some words about cross-correlating in the frequency domain, a basic description, its advantages (computing economy), disadvantages (only circular correlation unless periods are lengthened and high-pass is used), etc.

Can someone also add a few words about multidimensional correlation and how it reduces to sets of linear correlation?

Fleem (talk) 11:13, 26 September 2008 (UTC)Reply

Another thing that might be included here or in a disambiguation page is the computing definition of the term. Here is a brief def : A cross-correlator is a generic name for a process that compares usually two input streams and usually produces 'summing' or 'difference' streams. All streams will follow their own defined rules and 'summing' and 'difference' are generic labels rather than mathematical terms. The centre of the definition is that it is about feature comparison. Common examples might include text searching algorithms in word processors or database analysis, or various levels in image processing. Lucien86 (talk) 16:22, 13 May 2009 (UTC)Reply

Confusing Definition edit

For some reason when working with cross-correlations, the notation for the time-lag "t" and time variable "tau" are switched from the usual convention--little "t" as time and "tau" as a temporal shift. I think the definition could be substantially improved if the the time lag is stated explicitly as "t", as is done in "Numerical Recipes in C", 2nd Ed, on page 498, Eq 12.0.10. I will not change the page, however, because I am not in the habit of changing pages before I get feedback from others.

Bjf624 (talk) 17:07, 18 February 2009 (UTC)Reply

I agree, usually "tau" is used for the lag, and "t" for the dummy variable in the integration. --98.201.96.169 (talk) 23:29, 30 January 2011 (UTC)Reply

Help please edit

Is there any accepted definition of when a cross correlation can be rated as "good". Assuming any given normalized cross-correlation, a correlation coefficient of 1/-1 points to identical functions / inverted identical functions while the coefficient being 0 clearly defines a "non-correlation" (or orthogonal correlation). But what about the values inbetween? Is 0.8 a good correlation, or would 0.9 do the job? Is there a general classification of coefficients in terms of correlation quality? Or does that very much depend on the subject of application? I understand that the whole approach of cross-correlation is a statiscal one. Hence there may even be research into that field, describing correlation quality (cross-correlation coefficients) by statistical values such as standard deviation? Could someone knowledgeable maybe give some references and include information in the article? Thank you! --194.246.46.15 (talk) 16:13, 1 July 2009 (UTC) Thank youReply

No. Sometimes an apple is just an apple. --Drizzd (talk) 11:55, 4 July 2009 (UTC)Reply
Yes, there is more that could be said about that. It depends a lot on your signal-to-noise ratio and on the frequency of your expected signal. If you have zero noise, then 1 means perfect alignment and anything else is imperfect alignment. If you have band-limited (no infinite spatial frequencies) then values near 1.0 mean near-perfect alignment. SNR means you can never expect 1.0 for perfect alignment, and the worse the SNR, the lower the maximum expected correlation value. So yes, the value means something but no it isn't directly interpretable. e.g., a maximum correlation of 0.99 doesn't mean there is a 99% chance that that is the correct correlation. You have to think about what could cause less-than perfect correlation values. —Ben FrantzDale (talk) 19:42, 6 July 2009 (UTC)Reply
Intermediate values of correlation have an interpretation in terms of how well one series can be predicted from another in the sense that one minus the square of the correlation says what proportion of the variance of one series would be left as an error variance if a simple shift and scale is applied optimally from one series to the other. But in signal processing you could hope to do better than than this by using an optimally filtered version of one series to predict the other. Ideas about prewhitening both series could come in here. But even a small amount of correlation could be important depending on what is being predicted ... even a small improvement in predictive power might be important.Otherwise iy may be worth looking at other related articles such as correlation function and correlation. Melcombe (talk) 16:03, 7 July 2009 (UTC)Reply

Incorrect formulation of normalised cross-correlation edit

The formula given for the normalised cross-correlation is wrong/misleading. The normalisation varies over the convolution, and is calculated based on the current template position. Refer to Lewis's paper (http://scribblethink.org/Work/nvisionInterface/nip.pdf) for more information. --StefanVanDerWalt (talk) 09:50, 6 December 2009 (UTC)Reply

Known properties of cross-correlation should be added edit

I would suggest that all useful properties such as distributivity etc. are listed in an way analogous to how it is done in the convolution article. Furthermore, the notation with "h(-)" as used in the properties section was new to me. I didn't find a definition anywhere. Cheers, Jo. —Preceding unsigned comment added by 129.132.71.148 (talk) 07:40, 6 April 2010 (UTC)Reply

different asterisks edit

>> If either f or g is Hermitean, then f*g=f*g.

At first glance, this seems to be a trivial identity which holds for _any_ f and g. At a closer look, you notice the difference between the fivefold and the sixfold asterisk. I think this can easily lead to confusion. Should we use a different notation? DrBaumeister (talk) 01:31, 24 April 2011 (UTC)Reply

x(n)={1,2,3,4,8,-1,-7,-6} find

  a)auto correlation of x[n]
  b)cross correlation of x[n] with {-6,-7,-1,8,4,3,2,1}  — Preceding unsigned comment added by 14.139.128.15 (talk) 10:48, 12 July 2011 (UTC)Reply 

Implementation of Circular Cross-Correlation via FFTs edit

When implementing a cross-correlation in digital logic, it often useful to implement the algorithm as a circular cross-correlation through the use of FFTs.

In the following days, I plan to produce an Algorithms Section, including a description, and implementation of the FFt-based approach.

I will also include an implementation of the "shift, multiply and sum" implementation. Does anyone know if there is a formal name for what I am calling the "Shift, multiply, and sum" approach?

Also: does anyone know of any other algorithms?

Kyle.drerup (talk) 05:26, 25 January 2012 (UTC)Reply

Since cross-correlation is equivalent to convolution with a sign flip in the function argument, every fast convolution algorithm gives a fast correlation algorithm. See Convolution#Fast convolution algorithms. (Unfortunately, that section needs a lot of work. It doesn't even mention the Karatsuba algorithm or the Toom-Cook algorithm, and those algorithms in turn are described in their articles as "multiplication algorithms" when in fact they can be used for any convolution problem.) — Steven G. Johnson (talk) 14:41, 25 January 2012 (UTC)Reply

Vitya 28.10.2015

"Does anyone know if there is a formal name for what I am calling the "Shift, multiply, and sum" approach?"
I have read in few places, that it's called "brute-force algorithm".

Hermitian properties edit

>> If either f or g is Hermitian, then:   

The same statement can also be found in the Hermitian page.

Is that true? I think only if   is Hermitian then  .Can anyone show the proof of   if   is Hermitian? -Clarafly (talk) 07:23, 17 June 2012 (UTC)Reply

I agree that this statement is incorrect. Counterexample: f anything not Hermitian, g is the Dirac impulse. Then   and  . --Drizzd (talk) 13:59, 17 June 2012 (UTC)Reply

Visual comparison graphs are offset edit

The "Visual comparison" picture is a great idea, but the lines that represents f*g, f♦g and f♦f (where I'm using "♦" for correlation since I can't find a star symbol) are offset. Assuming f(t) and g(t) use the same time scale, f*g should be shifted right and f♦g and f♦f should be shifted left. To be a bit more precise, and for the sake of simplicity assuming f(t) and g(t) are nonzero between t=0 and t=1 (i.e., defining t=0 at the leading edges of f and g, and t=1 at the falling edge of f and where g(t) goes to zero again), f*g is zero for t<0 and maximum when t=1; whereas f♦g and f♦f are zero for t>1 and maximum when t=0. It's also really easy to read the wrong thing into the graphs of the overlapping functions, but on the other hand there's a lot of intuitive value in it as well, so I'm not sure I would change that. But if there's a way to change the picture to shift the black lines, it would be a lot clearer. — Preceding unsigned comment added by Gdlong (talkcontribs) 20:56, 15 November 2013 (UTC)Reply


Me to have noticed this. The cross correlation figure is wrong!! — Preceding unsigned comment added by Ahmedrashed00 (talkcontribs) 11:58, 7 July 2015 (UTC)Reply

Problem with Nonlinear Section edit

"This problem arises because some moments can go to zero and this can incorrectly suggest that there is little correlation between two signals when in fact the two signals are strongly related by nonlinear dynamics."

If the moments go to zero, and the correlation becomes zero, then it isn't incorrect to suggest that there is little correlation between two signals. The writer seems to be confusing "correlation" with "connection" or some other word that means a relationship exists. The problem arises, because correlation is a measurement of linear dependence, so it makes sense for nonlinear dependencies to circumvent the measurement. A Wikipedia article itself describes this for random variables (and stochastic processes are simply a series of random variables, so it applies just as well): http://en.wikipedia.org/wiki/Correlation_and_dependence — Preceding unsigned comment added by 71.80.79.67 (talk) 09:40, 6 February 2014 (UTC)Reply

Error in Time series Analysis Section edit

The writer says, correctly, that in time series the cross correlation is the normalized covariance function (i.e. the Pearson's correlation coefficient). It then shows in Latex the definition of the regular covariance function without normalization by the variances of the two random variables. — Preceding unsigned comment added by 71.80.79.67 (talk) 05:10, 9 February 2014 (UTC)Reply

When variables f and g are normalized, then the cross-correlation is identical with Pearson’s r.
Cross-correlation = SUM[f(i)*g(i-lag)]/N_overlap where normslizing of raw data time series F and G means: f(i)=(F(i)-F_mean)/F_stdev and g(i)=[G(i)-G_mean]/G_stdev
This is exactly the same as:
Cross-correlation=S_FG/Sqrt(S_FF*S_GG)
where:
S_FG=SUM[(F(i)-F_mean)*(G(i-lag)-G_mean)]
S_FF=SUM[(F(i)-F_mean)^2] for all i
S_GG=SUM[(G(i)-G_mean)^2] for all i 85.221.95.150 (talk) 19:19, 6 April 2024 (UTC)Reply

Zero Lag Peak Statement is Slightly Incorrect edit

"there will always be a peak at a lag of zero unless the signal is a trivial zero signal."

Peak here means "highest point in the signal", so even for the trivial zero signal, there is still a peak at a lag of 0 (equal to 0). — Preceding unsigned comment added by 96.38.109.155 (talk) 13:11, 14 March 2014 (UTC)Reply

Visual explanation needed edit

The article for convolutions has an extremely helpful section for understand what the convolution function does: https://en.wikipedia.org/wiki/Convolution#Derivations It unpacks the terms in the function, the steps in the algorithm, and animates it.

Someone should do the same for this article! Richard☺Decal (talk) 04:32, 26 February 2015 (UTC)Reply

Missing or misleading content in the introduction edit

(1) The definition given in the section entitled "Time-delay analysis" is different from both of the definitions given in the introduction. Yet it is a good, mature definition, and deserves to be mentioned in the introduction.

The "signal processing" definition in the introduction refers only to a time-lagged single inner product, not a random variable. On the other hand, the "probability and statistics" definition in the introduction does not have the structure of a time series or time-dependent signal; in particular it has no concept of "time lag".

(2) I also have trouble seeing how the time-lagged inner product of the introduction is useful for signals. Don't signals last forever, so they are not in L^2, so the convolution integral would be ill-defined or infinite?

I feel that is explained by "It is commonly used for searching a long signal for a shorter, known feature." An example is a matched filter for detecting the arrival of radar pulse returns. It cross-correlates the streaming signal with the finite-length pulse shape, producing peaks at the points that pulses occur.
--Bob K (talk) 13:31, 20 November 2016 (UTC)Reply

I'm sure that for true signals, the integration is done over a finite time-window, yet this is not mentioned in the article.

178.38.161.142 (talk) 14:07, 25 May 2015 (UTC)Reply

Questionable property edit

Regarding the last entry in the properties section:

I cannot reproduce the given equation  . According to my calculations:  

Can you give a reference for this property?

Best regards — Preceding unsigned comment added by 194.166.51.226 (talk) 08:45, 20 November 2016 (UTC)Reply

I get the same result as you. Using this relationship three times:
 
it follows that:
 
--Bob K (talk) 22:51, 23 November 2016 (UTC)Reply

Nice! — Preceding unsigned comment added by 213.162.68.159 (talk) 08:44, 26 November 2016 (UTC)Reply

Bad property edit

That minus sign before the f* in the first property is just flat out wrong. — Preceding unsigned comment added by 188.174.57.61 (talk) 13:58, 14 August 2017 (UTC)Reply

Reduction in animated .gif size edit

The animated .gif on this page (link here) is quite large at 7.13MB. It uses 20 fps for 40s (800 frames total). I suggest reducing the frame rate to get the .gif under 2 MB.

Would anyone have an objection to this? I cannot contact the original uploader to this .gif but it has CC 4.0 license

Davidjessop (talk) 00:16, 24 February 2019 (UTC)Reply

No objection. Frankly, I don't find it helpful. You can delete it completely, as far as I'm concerned. --Bob K (talk) 05:04, 24 February 2019 (UTC)Reply

asterisks edit

There is discussion in talk:convolution about the notation of the operators for convolution and cross-correlation. It seems that this article uses five and six pointed asterisks to distinguish them. Is there a reference for these being the WP:COMMONNAME (well, except that they aren't names) for them? After about 2 seconds, I forget which one is which. Gah4 (talk) 21:39, 7 June 2019 (UTC)Reply

Also, List_of_mathematical_symbols_by_subject suggests that rho is used for cross-correlation. Asterisks with any number of points are not mentioned. Gah4 (talk) 21:42, 7 June 2019 (UTC)Reply