Struggles on model simulation

vs1 · Jul 26, 2011

Hello all,

I am part of a team that developed a prediction algorithm that has been prototyped in Matlab and is being tested these days.

We have performed some preliminary tests on a portfolio of US traded stocks, using Market-orders only 1-minute intraday data under an assumption that the Close price of the last minute is going to be available in the next minute. We are aware that this does not reflect market conditions and are interested to perform further, more valid tests.

One approach that we think of is, on similar data format (1-minute intraday), use mainly Limit orders, and for the purpose of verification of order execution, check the Highs and Lows of the current minute, and if the High is higher than our buy-price, we can conclude that our buy order has been executed, and similarly on the sell orders.

Another approach is to test this prediction-system on Tick Data (with seconds only – no milliseconds) and in order to estimate whether or not our order is executed, track the Trades parameter in this data, while taking into account both time delays and estimated quantities of stocks that is traded just before our (slow) order is executed.

Would be thankful if you could share with me your opinion on what the better steps are from where I stand

Thanks!!
vs1

alexander · Jul 26, 2011

What does "an assumption that the Close price of the last minute is going to be available in the next minute" mean?

With intraday data, the close of one bar is the open of the next bar, unless something happens in a fraction of a nanosecond.

To do this correctly you need to use bid/ask data.

However, I have a more basic problem with your question. How can you claim that you developed a prediction algorithm when you in the first place have such basic backtesting issues to resolve?

Maybe you want to be more specific and I am sure some experts here will come forward.

vs1 · Jul 26, 2011

Thank you Alexander.

I will reply and explain tomorrow

wackypete2 · Jul 26, 2011

alexander said:
What does "an assumption that the Close price of the last minute is going to be available in the next minute" mean?

With intraday data, the close of one bar is the open of the next bar, unless something happens in a fraction of a nanosecond.

This is incorrect. The close price of any bar does not have to equal the open of the next bar. They are 2 separate prints.

Peter

vs1 · Jul 27, 2011

alexander said:
What does "an assumption that the Close price of the last minute is going to be available in the next minute" mean?

With intraday data, the close of one bar is the open of the next bar, unless something happens in a fraction of a nanosecond.

To do this correctly you need to use bid/ask data.

However, I have a more basic problem with your question. How can you claim that you developed a prediction algorithm when you in the first place have such basic backtesting issues to resolve?

Maybe you want to be more specific and I am sure some experts here will come forward.

--------------
Hi Alexander,
Thanks again for your reply.
I first need to refer to your ‘more basic problem’ - I do Not mislead or claim any untrue claims.

I did not say that I developed the method by myself; I am partner with 2 Mathematicians that one of them has developed the method and the other built the prototype and runs the tests.

We (my partners) have good mathematical capabilities but we are new to the field of stock market and I can only convey my sincere thanks and appreciation to anybody in this forum who has good will to help.

With regards to this ‘check-if-Low-is lower than our sell-price’ in order to ~verify order execution, I had the below insights from somebody.

He tells that this suggested approach, in one-minute intervals does not verify execution, because:

In case that we placed a sell-order at 98, the Low of that minute can be 98 but there were many orders on that price and we cannot know if ours is one of them. (that was clear to me)

In case that we placed a sell-order at 98 and the Low of that minute is 97 we cannot know either because:

A. There are several channels of communication with the trading venues; It is possible (and when 1 minute interval is concerned can be frequent) that in one of the channels the 97 was already hit and the Low of that minute is updated according to it, while our order is in another channel in which 97 isn’t hit yet. (remember that we are the slowest)

B. Market-makers activity often disrupt the presented Highs and Lows; this is illegal but they do it (usually don’t get caught) as it serves their interests in options’ trading.

Do you find these insights correct ?

Our prediction ‘sub-system’ does not make any use of bids and asks; that being so it was suggested to me to acquire Tick Data (with seconds only) and track the Trades while taking into account practical delays in execution, and quantities of stocks that were traded Before my orders since I am the ~slowest trader

If this simulation on Tick-data is indeed essential, I will have to learn how to perform the best estimation which of the trades in the sequence are ‘mine’ if at all, since I understand that while using limit orders, execution is not guaranteed.

Thanks!

vs1 · Jul 27, 2011

wackypete2 said:
This is incorrect. The close price of any bar does not have to equal the open of the next bar. They are 2 separate prints.

Peter

---------

Thanks wackypete2 !

HowardCohodas · Jul 27, 2011

You appear to be quite new to developing and testing trading systems. With your strong background in mathematics I would suggest you read "The Evaluation and Optimization of Trading Strategies" by Robert Pardo

Back testing has many traps. The two most common are "curve fitting" and unintentionally using future knowledge to make your trade. Curve fitting occurs when you over optimize your rules to the test data set. This can be ameliorated (but not eliminated) by having a separate set of data that the algorithm has never seen to compare with the results from the training set.

There is a maxim in programming that is worthy of note. "In programming you are always off by 1." Even though I am a software engineer, I have found myself accidentally using future results on more occasions then I am comfortable thinking about. Sometimes these errors are quite subtle, so going through several iterations of your rules by hand using your test data is advised.

There is lots of other stuff to think about, but the three items mentioned above should get you started in the fascinating journey into system development.

intradaybill · Jul 27, 2011

wackypete2 said:
This is incorrect. The close price of any bar does not have to equal the open of the next bar. They are 2 separate prints.

Peter

I tend to agree with that but on a second thought it would all depend how the data was aggregated into 1-min intervals. If it is aggregated based on prints then you are correct. But if it is aggregated based on time, then Alexander is correct.

For example, if at 10:00 AM last price is 1325.25 then this is the close of the 09:59 - 10:00 minute bar. The next bar starts at 10:00+ and finishes at 10:01. If the charting program or data vendor aggregates data based on time then the open price will also be 1325.25. If the next print is at 1325.50, then this will be the open.

Does anyone know which is the case? I never bothered with 1-min data because that is "noise trading".

HowardCohodas · Jul 27, 2011

When I aggregated data to explore multiple time frames from the same basic set, I chose the open for the bar in the higher time frame as the open of the following bar in the lower time frame (seconds to minutes for example).

vs1 · Jul 27, 2011

intradaybill said:
I tend to agree with that but on a second thought it would all depend how the data was aggregated into 1-min intervals. If it is aggregated based on prints then you are correct. But if it is aggregated based on time, then Alexander is correct.

For example, if at 10:00 AM last price is 1325.25 then this is the close of the 09:59 - 10:00 minute bar. The next bar starts at 10:00+ and finishes at 10:01. If the charting program or data vendor aggregates data based on time then the open price will also be 1325.25. If the next print is at 1325.50, then this will be the open.

Does anyone know which is the case? I never bothered with 1-min data because that is "noise trading".

This is the format of the data

"Date","Time","Open","High","Low","Close","Volume"
06/28/2004,0931,37.49,37.50,37.45,37.46,1049200
06/28/2004,0932,37.48,37.50,37.45,37.48,450700
06/28/2004,0933,37.48,37.50,37.46,37.49,493700
06/28/2004,0934,37.49,37.50,37.47,37.47,756100
06/28/2004,0935,37.48,37.49,37.46,37.48,309000

Thanks,

vs1 · Jul 27, 2011

Thank you HowardCohodas.

HowardCohodas · Jul 27, 2011

vs1 said:
Thank you HowardCohodas.

You are quite welcome.

When it comes to methods of avoiding curve fitting in back testing, I am a fanatic. When you are ready, give me a shout and I'll share some of the methods I now have in my testing framework.

vs1 · Jul 27, 2011

HowardCohodas said:
You are quite welcome.

When it comes to methods of avoiding curve fitting in back testing, I am a fanatic. When you are ready, give me a shout and I'll share some of the methods I now have in my testing framework.

Appreciated Sir.

Struggles on model simulation

vs1

Junior member

alexander

Well-known member

vs1

Junior member

wackypete2

Legendary member

vs1

Junior member

vs1

Junior member

HowardCohodas

Experienced member

intradaybill

Well-known member

HowardCohodas

Experienced member

vs1

Junior member

vs1

Junior member

HowardCohodas

Experienced member

vs1

Junior member

Similar threads