After the tech bubble burst in 2000, many investors fled to real estate. This strategy made sense as the two markets had little correlation and that played out for several years. Then the financial crisis of 2008 came, and it hit most world economies. The accompanying housing collapse caused a meltdown of mortgage-backed securities, the equity market, insurance companies, investment as well as commercial banks, and resulted in credit freezing up. Ultimately, the collapse of the real estate market in the United States was felt everywhere.
Markets that economists and financial practitioners thought had little overlap suddenly were joined at the hip. The oversight was a misinterpretation of the correlation coefficient that, when calculated over a long period of time, one-year or more, shows little or no-synchronization between these markets. As many have said, in times of crises, all correlations go to one. Analysts once again disregarded the well-established case that variable non-correlation over a specific time frame does not automatically imply independent variables on all time frames.
The S&P 500 is the accepted measure of the U.S. equity markets. This index and the Shanghai Stock Exchange (SSE) composite, which the SSE launched in 2004 as a reflection of performance of the most influential stocks in the Shanghai market, are good examples of indexes that have little correlation over longer time periods but demonstrate high correlation over shorter periods, implying possible profitable trading opportunities.
The S&P 500 and the SSE are two different markets for which the trading times of the cash equities do not overlap (see “Mixing markets,” below). When stocks are trading on the Shanghai market, the U.S. market is closed and vice versa. A stock trader in the United States can act on news of market moves in China in the same day, but with a time lag of hours. On the other hand, a stock trader in China will have to wait until the next day to act on news of market moves in the United States. But can the news in each country have a ripple effect on the markets of both countries? A correlation coefficient will help give an answer to this question.
The first step is to explore the statistical properties of both markets. We used daily returns for the S&P 500 and the SSE for the period of Jan. 3, 2007, to July 20, 2010. From the cash market daily values, we calculated the continuous compounded daily returns shown in “Return comparison” (below).
The formula to calculate the continuous compounded daily returns is as follows:
rt = ln(Pt / Pt-1)
- rt is the continuously compounded one period return
- Pt is the value of our index at time t
- Pt-1 is the value of the index at time t - 1
- ln is the natural log function
Summary statistics give us a quick look at the data. From calculating these numbers, we found that both indexes have a mean and median close to zero. While the SSE appears more volatile than the S&P 500, the latter index had the largest one-day gains and losses. In addition to the summary statistics, we also tested the returns for normality, as the correlation calculation works only with normally distributed variables. We also calculated the correlation coefficient between them.
Summary statistics (returns)
S&P 500 SSE
Min -0.14 -0.13
Mean -0.00031 -0.00007
Median 0.00082 0.0014
Max 0.11 0.09
Std Dev 0.018 0.023
The correlation coefficient between SSE and S&P 500 returns is 0.081. A plot of SSE returns vs. S&P 500 returns confirms the small correlation between them (see “Plotted,” below).
The next step is to test for autocorrelation. Autocorrelation searches for relationships within data that have been shifted through time. Lagging one index and comparing it to itself shows the SSE and the S&P have little or no internal autocorrelation, but a lag of one day of just the S&P returns results in a correlation coefficient of 0.1726, which is significant, and more than twice the size of the standard correlation of 0.081 (see “Time shifts,” below).
These results raise many new questions that deserve being explored:
- If the lag of S&P 500 results in a significant correlation on a time window of over three years, what will the values be of the correlations on windows that are smaller: one year, one month or five days?
- If we find a high correlation, does it hold for at least two days?
- If it holds for at least two days can we take advantage of it?
Testing the lag
We calculated, first, the yearly correlation between the returns of S&P 500 and the one-day lagged returns of the SSE. The results were comparable to the three-year period. In 2007, the correlation was 0.164. In 2008, the correlation was 0.161. In 2009, the correlation was 0.183. We also calculated the monthly correlations. The monthly correlations ranged from -0.31 to 0.71 with the mean around 0.166, median around 0.173 and 43 observations.
We can pinpoint three interesting observations: The correlation is not uniformly weak through time -- it goes from weak to strong; the mean is similar in the three-year and one-year values and the values of the correlation tilt toward the positive (see “Skewed,” below).
These results encourage the exploration of whether the five-day correlations will provide further findings.
Assume that the window is only five days, and let’s have the S&P 500 lag the SSE by one day. We’ll calculate a moving correlation of five days and see if time windows exist that result in significant correlation, and whether the high correlation persists over many days thereafter, implying a predictable and tradable relationship.
Summary table (moving correlations)
Std Dev 0.50
No. of Observations 850
The summary statistics show that:
- 50% of these correlations have either values higher than 0.554 (significant values for a correlation) or less than -0.245.
- The median and the mean are still positive, and they are comparable to the means of the monthly and yearly correlations. A histogram of these values shows that the values tend to be positive.
- The extent of the correlations goes through almost all possible values for a correlation coefficient (see “Wide range,” below).
The key, however, is whether we uncover pockets of high correlation and if those correlations hold through time long enough for us to exploit them. In other words, can we expect multiple days of high correlations?
To check if high correlations tend to hold over time, we looked for clusters of consecutive days where the correlation was higher than 0.554 (a threshold that includes 25% of values). In “Extended relationships” (below) we observe a high count of clusters of two or three consecutive days with high correlations (25%), but most clusters represent seven-day-or-less consecutive correlations. Three clusters represent nine or more consecutive days of high correlations. These clusters had only one occurrence among the 850 correlations. Still, understanding the occurrence of such “rare” events can help create a trading strategy.
For the 850 five-day correlations we found:
- There are 46 clusters with two days or more of consecutive high correlations, or 190 days.
- The percentage of clusters with two or more days represents 67% of all clusters with high correlation.
- There are 23 clusters with only one-day high correlation, or 33% of all clusters.
- The median number of consecutive days with high correlation is two, if the one-day is taken into consideration.
- The median number of consecutive days with high correlation is three-and-a-half, if the one-day is excluded.
We also decided to create a transition matrix. The transition matrix shows the values and probabilities of going from low correlation (low) to a low or a high correlation (high), and vice versa. From the table we can see that the probability of going from a high to a high is 0.17, but the probability of seeing a high is 0.25. What we are interested in is the conditional probability; if we condition on the high, we can state the probability of seeing another high tomorrow, given that we saw one today, is 0.68 or 144/213.
Low (t+1) High (t+1) Total
Low (t) 567 (0.89) 69 (0.11) 636 (0.75)
High (t) 69 (0.32) 144 (0.68) 213 (0.25)
Total 636 (0.75) 213 (0.25) 849
What this means is that a condition based on a high at time t will imply that the probability of seeing another high at time t+1 is 0.68 -- a significant probability to trade with.
While the correlation coefficient between stocks on the Shanghai market and those representing the largest U.S. companies may be low, we should not discard it as insignificant. We showed that while the SSE and S&P 500 appear to have little or, in some instances, no correlation over certain periods of time, the correlation between these two indexes is significant enough when sliced differently to encourage trading over shorter time windows. We found that even monthly correlations returned a broad range of values, from high to negative to no correlation.
When we investigated the five-day moving correlations, we found that the correlations passed through the whole spectrum of possible values. Interestingly, 25% of these values had their correlations higher than 0.554. The probability that a high-correlation day will be followed by another high-correlation day jumps to 68%. In some cases, we have observed five days or more of continuous high correlations.
All of these observations provide a critical understanding of the relationship of these two markets that can guide traders of both the individual equities and, at least in the U.S. with the S&P 500, the indexes themselves. Understanding why and when these cases occur may provide traders with short-term opportunities.
The correlation coefficient measures the synchronization, or the degree of synchronized change, between two or more sets of variables over a given period of time. This coefficient helps to reveal a predictive relation that can be exploited by market makers because it suggests a possible causal or mechanical dependency between the variables at stake.
The value of the coefficient ranges between -1.00 and 1.00. A negative coefficient implies that the two variables are negatively correlated; they move in opposite directions. A possitive coefficient implies that they are in synch, or they move in the same direction. The further away the number is from zero, the stronger the correlation. A coefficient of 0.00 implies that the two variables have no synchronization over the period measured, meaning that each variable appears to move independently of the other.
The formula to calculate the sample correlation coefficient is:
In the above, we have a series of n measurements of variables X and Y written as xi and yi where i= 1, 2,..., n. The symbols x and y are the sample means of X and Y. The symbols sx and sy are the sample standard deviations of X and Y.
Outliers, and non-linearity relationships between two variables can distort the results of this correlation calculation. A plot of the two variables against each other can help confirm and validate the results from the correlation coefficient.
Dalila Benachenhou can be reached at firstname.lastname@example.org.