The U.S. Census Bureau publishes data on new home sales on roughly the 17th day of every month (www. census.gov/const/newressales). Data are presented both as seasonally adjusted and in raw form. (Periodically, the Department of Housing changes the seasonal adjustment, which has an effect on the reported number and may introduce a degree of error in this analysis.) The housing market, as represented by this data, is a major source of demand for the lumber market. It goes to reason, then, that the price of lumber is linked to new home sales.
Indeed, in looking at “New home sales & inventory”, which covers 2002 through the present, we’re vaguely reminded of the futures market represented in “Lumber prices”. Whether that correlation is strong enough for one to predict the other, however, will require more extensive analysis.
Today, both predictive directions are relevant — home sales as predictive of lumber prices and vice versa. It is, after all, possible to trade lumber prices or exchange-traded funds (ETFs) based upon the real estate index, home builders, real estate prices, etc. However, as traders, we’re only interested if that relationship is measurable and tradable. To begin such analysis, consider the following questions:
• Is there a correlation between sales of new homes and lumber prices?
• Is there a linear regression equation that could estimate lumber prices based upon new home sales with any confidence?
• Is there a time series adjustment that could project home sales — that is, can this month’s lumber prices predict next month’s home sales?
• Conversely, is there a time series adjustment that could project lumber prices — that is, can this month’s home sales project next month’s lumber price?
Before we embark on such analyses, we must consider whether there is a fundamental reason why this should work. Statisticians are aware correlations are nothing other than a comparison of two sets of data. A positive correlation indicates that as one set of data increases, the other set is correspondingly increasing, all else being equal. It does not mean the first set causes the second set to increase; thus the mathematical axiom “correlation does not imply causation.” All too often, traders make the mistake of confusing correlation with causation. If we have reason to conclude there may be a causal relationship between our data sets, it will give us more confidence in any predictive ability of our results.
In this case, the causality is obvious. Lumber’s primary use is in new home construction. If sales of new homes decrease, builders are less likely to begin new homes and the demand for lumber will decrease. Because it is assumed the supply of lumber is relatively constant, absent some global catastrophe, it logically follows that lumber prices are more dependent upon demand than supply factors.
THE EYE HAS IT
A good way to begin even complex data analysis is with simple graphs. The first challenge is to construct a time series overlay of lumber and home sales. The easiest way to compare the two is to normalize the data. We can do this by figuring the difference between the data point and the mean divided by the standard deviation of the respective data set. This is shown in “Lumber vs. home sales: Normalized” (page 38).
A cursory inspection of this chart suggests some hypotheses worth testing: One, that there is a correlation between the two data sets and two, that lumber prices appear to lead home sales.
The data set consisted of 103 points: the monthly close of lumber prices from January 2001 through July 2009 and the corresponding seasonally adjusted new home sales data in the same period. The average lumber price was $278.37 with a standard deviation of $64.94. The average number of homes sold was 930,490 with a standard deviation of 283,490. The period analyzed is essentially one full economic cycle.
The correlation (Pearson) between the two data sets is 76.87%. A linear regression model for predicting lumber prices based upon home sales data is:
Lumber Price =
$114.522 + 0.17609 * Home Sales
This model has an F statistic of 145.84 and a p-value less than 0.000. Both the fitted constant and the home sales variable are significant. Interestingly, the model projects $114 per thousand board feet (TBF) as the minimum price for lumber, even if no homes sell.
If new home sales returned to 600,000, the model lumber prices will close that month between $136 and $303, with a mean of $220 with 95% confidence in the estimate. This has some predictive utility because the home sales report is issued about 13 days prior to the close of the month.
TIME TO TRADE
Having addressed our first two questions, we can turn our attention to the next two. We shall start by offsetting the lumber price by one month. That is, we will use June’s price to estimate July’s home sale. (This will result in a trivial change in the mean and standard deviation of the lumber data set, so we deem the two sets virtually equivalent and any correlation comparisons remain valid.)
The correlation falls slightly to 75.04%. However, if we allow the constant to be 0, when lumber prices are 0 (which makes some logical sense), the correlation jumps to 98.14%. The linear regression equation for sales in month x+1 based upon lumber price in month x is:
Home Sales (x + 1) =
3.3846 * Lumber Price (x)
The model has an F statistic of 2,670 and a p-value less than 0.000 and is
Now, let's offset home sales data using June’s data to predict July’s lumber price. Recall that the actual data point released in July by the Census Bureau for new homes sold is June’s number. So, backing up July’s number by one month is placing June’s number in the month of June and brings the data more in line with reality.
The correlation between the two sets is 77.56%, if we fit a constant. If we force the constant to be fixed at the origin (0, $0), the correlation climbs to 98.37%. The two linear regression
Lumber Price (x + 1) =
$110.056 + 0.1804 * Homes (x)
F statistic: 150.2
Lumber Price (x + 1) =
0.28847 * Homes (x)
F statistic: 3,019
Assuming June 17’s home sale data release (end of May data) is 600,000, then the prediction for July’s closing price of lumber is $173 using the second model, ranging from $70 to $275, with 95% confidence in our estimated price.
The model improves even more if we offset by two months, allowing May’s data point to predict July’s closing price.
Lumber Price (x + 2) =
0.28840 * Homes (x)
F statistic: 3,316
A reading of 600,000 on May 17 (April’s sales) estimates a price of $173 with a slightly narrower range of $74
Because we now see that we can use the point two months back, we can test the addition of two more variables to the model, such as the sales price last month and the lumber price last month, both of which are also known to us. In other words, to project July’s lumber price, we shall use the home sales data from May and June and the lumber closing price from June.
Running various regression models, we find the following to produce the best adjusted correlation with all the data set forth:
Lumber Price (x + 1) =
.83515 * Lumber Price (x) + 0.04435 * Homes (x-1)
F statistic: 5,389.
Or, by way of example, on Oct. 17, 2005, the Census Bureau released the seasonally adjusted new home sales number as 1,346,000 for the end of September 2005, and the closing price of lumber on Oct. 31, 2005 was $310.50. Using the above model, we estimate the closing price of lumber for November to be $323.72 with a 95% confidence interval for price estimate ranging from $269 to $378. The actual closing price was $326.50. Our estimate proved to be less than $3 low. We also tested a three-month moving average of lumber price and found this was of no assistance to the model.
This model was developed in early August 2009. We can use the first available post-development monthly closing price, August’s close, as a quick test. In July 2009, the home sales data released on July 27 was 384,000 for June 2009; July’s closing lumber price was $196.10. Based on this, our model estimates a closing price for lumber in August 2009, at $181.29, with the price range of $127 to $236 with 95% confidence. If we reduce the confidence level to 90%, the range contracts to $136 to $227.
One way to exploit this information — supplemented by additional analysis that maximizes timing and risk management — would be with options. In the case of the August prediction, perhaps an option trader could elect to write a $225 call and a $135 put, and with a high degree of certainty both would expire worthless. If trading futures, a lumber trader could watch prices during August, and if they suddenly fell to $154, go long one contract with a reasonable likelihood of seeing prices climb back by $30 before month’s end, while assuming the downside risk was not more than another $18. Most would consider this an acceptable risk/reward ratio. For the record, either approach would have worked. Lumber closed August at $176.80, around $5 off our predicted price.
We set out to discover if there was a predictive relationship between home sales and lumber prices. Ultimately, we built a model incorporating the home sales data refined by the prior month’s lumber price to achieve a statistically significant regression equation for the following month’s close. The reader may wish to explore other similar variables, such as new home starts or average home sale prices as other potential independent variables.
Arthur M. Field has a Ph.D. in management science from Clemson and a J.D. from Rutgers. He is a former commodity broker and was co-editor of Fidelity’s Pacific Fund and in-house commodity fund. He wrote “The Magic Eight: The Only 8 Indicators You Ever Need to Make Millions.” E-mail him at TheMagicEight@hotmail.com
Go to next page for underlying data for this piece.
|New Home Sales (000s)||Closing lumber ($/TBF)||Projection ($/TBF)||Error|