Right, this analysis (sorry about the essay, let me know if you have any questions).
You're going to need a Bloomberg or Reuters connection for starters. This is the sort of thing Reuters is better for but I don't use it so I'm going to explain what I'd do with bbg. You'll want the Excel addin for bbg also, so I'm just going to assume you have access to this. Attached is a sheet you can use as a starting point, containing the 500 companies in SPX and their market caps; it links to bbg so you should open it on a machine with bloomy if possible, otherwise tell excel not to update the data if you open it on yours.
If it's across various sizes and sectors, then you want to be using the members of the S&P 500 as a broad list of companies to look at. There's no need to look at the specific companies he's traded in since the analysis should be generally applicable rather than specific to what he's already done. I'd further divide it down into size groups to check whether it's something that happens more in companies of a certain size - possible that it could be more exagerated in small caps since they have higher vol. Maybe divide the list attached into groups of 50 to see this.
The data you're going to need to get is:
1) the open price (easy enough from Bloomberg using the wizard)
2) the average price in the first 30 mins of trading
3) the average price in the second 30 mins of trading
4) VWAP for the same day as the opening price (easy enough from Bloomberg using the wizard)
Ideally I'd recommend getting each series for at least 1 year of trading days, preferably longer (the longer the better from an accuracy perspective). The volume of data is going to be massive, and it's probably not going to fit on a single worksheet.
So you'll have 4 data series for each company, which you'll then want to average to get:
1) the average open price over the last year
2) the average of the average prices in the first 30 mins of trading over the last year
3) the average of the average prices in the second 30 mins of trading over the last year
4) the average VWAP for the same day as the opening price over the last year
At this point you need to rebase the figures to avoid distortion from the difference between 1 share price being 100 and the other being 10. To do this, for each company divide each of the 4 series by the opening price. Now you'll have 4 averages for each company - if you've done it correctly then the 1st data series (openning price) will be "1" for every company.
Once you've done the rebased averages for each company that's going to leave you with 4 data series, which represent the market as a whole. you then need to do the averages of the averages, which will tell you when of the 4 options is best to execute a buy or sell order as presumably you'll see something like in the second half an hour of trading the price is generally X% higher.
So, you'll be able to draw some initial conclusions - the final thing I would recommend doing is measure the strength of relationship of each data series. To do this just calculate the standard deviation for the differences between series 2, 3 and 4 and series 1. (so for each of the 3 create a new series which is =(series X - series 1). The higher the standard deviation the weaker your relationship. For reference 1 st dev each side contains about 60% of observations. Once you have the St dev you can do a hypothesis test, which will tell you: Statistically you can say with XX% certainty that series X will be higher/lower than series 1".
Once you've got the standard deviations I can help you with that. Use the =STDEV function to calc the standard devs. Make sure you do all the calcs using the rebased series otherwise it'll give you a load of gobbledeegook.
The hardest thing is going to be pulling the 30 minute interval series out of bbg. You want to have a look in the help file for the excel addin to see how to do that. I know it can be done I just don't know the exact formula.
Use the wizard for the bbg stuff, otherwise it'll be a headache. In terms of getting it all to fit in excel, it might not be a bad idea to do the following:
- don't use the entire sp500, instead take the 50 biggest, the 50 smallest and the 100 middle range. That will reduce the companies to 200, which means you can fit them horizontally across the page (excel max = 255) so that the data series flow downwards
- then use 1 worksheet for each data series, which means all your raw data will fit onto 4 sheets
- use a 5th sheet for the analytics
Probably the neatest way of doing it.