EOD price gaps?

josephjah

Newbie
Messages
2
Likes
0
Hi everyone. I am new to the forum (and trading in general), but I plan on using the hell out of this forum... and hopefully reach a point where I can contribute a little back... (y)

For my first question, it deals with EOD data and specifically gaps in the dates, for example:

*Data provided by Yahoo*

2008-07-14,539.00,540.06,515.45,521.62,4424800,521.62
2008-07-11,536.50,539.50,519.43,533.80,4981400,533.80
2008-07-10,545.00,549.50,530.72,540.57,4331700,540.57
2008-07-09,550.76,555.68,540.73,541.55,4154000,541.55
2008-07-08,545.99,555.19,540.00,554.53,4932400,554.53
2008-07-07,542.30,549.00,535.60,543.91,4255200,543.91
2008-07-03,530.88,539.23,527.50,537.00,2400500,537.00
2008-07-02,536.51,540.38,526.06,527.04,4223000,527.04
2008-07-01,519.58,536.72,517.00,534.73,4959900,534.73
2008-06-30,532.47,538.00,523.06,526.42,3765300,526.42


Note how even though I have requested EOD, it sometimes skips 1,2, or even 3 days between points. I have tried a couple different sources and I they each have this problem, and I can't find an explanation anywhere as to why this is the case...

Ideally I would like to have constistant data for my back-testing, but if worse comes to worse I figure I might just average the surrounding points to fill the gaps.

At this point, I am profcient in C++ and I am actively performing research into statistics and I have developed a couple useful algorithms for EOD analysis but I am only starting out, so... any other information would be very helpful!

Thanks in advance, Joe.
 
That simple eh? Wow... This problem seriously had me stumped.

Thank you so much dcraig1!

But it does raise the issue of missing bars that you need to think about at a very early stage when designing data structures for time series. You can have missing data for a number of reasons. For example in higher frequency series (eg 1min), there will often be "missing" bars for thinly traded instruments - simply because no trade took place during that minute. If you wanted to calculate eg the correlation between two such series, how do you handle the "missing" bars ?
 
Some analysis packages, such as AmiBroker, allow you to pad and align the data on the basis of using a separate reference symbol (eg. an index). Also be aware of the implications of padding on indicators such as stochastics, that look at the relationships of highs and lows - with any padded you have actually "created" data out of no data so watch out.

Some securities don't trade due to being in a trading halt or suspension - you probably don't want to pad during these times too.

Also note that the NYSE actually traded on Saturdays until September 29 1952, so don't discard weekends on or before this date.

Cheers,
Richard.
 
Top