Walk Forward Analysis - the only logical successor to backtesting [DISCUSS]

Darwin-FX · Nov 30, 2013

First of all, I do not come from a trading-background, but from a coding-background, and started AI-programming before I even knew what forex is.
So I might have had a different point of view from the beginning 🙂

This said, the crossvalidation of computer-models ("walk forward analysis" in the widest sense) is a common approach to determine robustness of such models.

So it is not so much about finding ideal parameters (that is just a nice side effect), it is about having as much (more or less independent) "past=>future-relationship-testcases" as possible, because the conventional method of backtesting only gives you 1, which is of course not enough to draw meanignfull conclusions.

If you are a very good trader, the significance of a backtest does not come from the testing-method itself (as it is only 1 datapoint, which is more or less irrelevant and a product of chance), but because you put knowledge into the trading-system.

Now, if you want to trade using algorithms (which are dumb) or are not a highly skilled trader and need to rely on the significance of your testing-results, the conventional techniques are not enough and you need something better, like a WFA.

Hope that answers your question. If not or if I overlooked something, please ask me again 🙂

-Darwin

trendie · Nov 30, 2013

Darwin-FX said:
.....
Hope that answers your question. If not or if I overlooked something, please ask me again 🙂

-Darwin

Honestly, I am none the wiser, as your answer addressed none of my points.

I will await the WFA.

Darwin-FX · Dec 4, 2013

It's released in 0.1 alpha version, feel free to test it and then ask me again 🙂

-Darwin

Purple Brain · Dec 4, 2013

Robert Pardo is another 'expert' that has made his name writing quality gobbledegook. WFO (or WFA) is simply back-testing, but using only the data up to the current test point in the historical test data on the historical data that follows it and then acting surprised when one of the 10,000 optimised tests shows a positive correlation. It's a dynamic curve fit. All of these techniques back-testing, data mining, curve fitting, walk forward - only work on historical data.

But it's a relatively new name and able to masquerade as a brand new, different and 'scientific', and there are only a dozen or so commercial WFO/WFA offerings out there at the moment. So good luck Darwin for jumping on the WF band-wagon. You'll likely make more from your work when you go commercial than you would from trading it.

numbertea · Dec 4, 2013

Purple Brain said:
Robert Pardo is another 'expert' that has made his name writing quality gobbledegook. WFO (or WFA) is simply back-testing, but using only the data up to the current test point in the historical test data on the historical data that follows it and then acting surprised when one of the 10,000 optimised tests shows a positive correlation. It's a dynamic curve fit. All of these techniques back-testing, data mining, curve fitting, walk forward - only work on historical data.

But it's a relatively new name and able to masquerade as a brand new, different and 'scientific', and there are only a dozen or so commercial WFO/WFA offerings out there at the moment. So good luck Darwin for jumping on the WF band-wagon. You'll likely make more from your work when you go commercial than you would from trading it.

Exactly.

Cheers

adinfinite · Dec 4, 2013

interesting..I am going to do a detailed write up on it and come back to post it here

avano · Dec 8, 2013

A) What do you do with trades at cross-points in WFO?

B) This method breaks down in a big way when markets change (ex. 2007 top to 2008 bust). You may lose more by using parameters determined during strong up trends when markets plunge than when you use parameters that where determined over the whole history.

C) As someone else said WFO is dynamic backtesting and AFAIK there is no rigorous justification for it. To the contrary, when using smaller samples significance of the results is degraded than when using a large sample.

D) Just develop the system in first 50% of the data and do an out-of-sample in the other 50%. If the profit factor is no lower than 50% and max DD is the same or less then you may have something. The rest is for vendors to keep people playing with their platforms hoping they get something. Every system developer should read this article. It also applies to WFO: if you reuse the data many times and you change the back and forward period many times you will finally get your fitted system.

Darwin-FX · Dec 11, 2013

WFO (or WFA) is simply back-testing, but using only the data up to the current test point in the historical test data on the historical data that follows it and then acting surprised when one of the 10,000 optimised tests shows a positive correlation. It's a dynamic curve fit.

Thats wrong.
First of all, it's not about finding one positive correlation, its about looking at all the samples.

Second, and this is the important part, WFA is a crossvalidation-method.
Crossvalidation is a wide spread method to evaluate computer models, not just in trading, but in all scientific fields that use such models.

You could say it's not the end of the road and I would agree, but saying that it is the same as back-testing, just because it uses the same data, is wrong and everybody that has ever used computer models will agree with that 🙂

You could also say that systems can still be overfitted or crappy, and I would agree, too, but saying that WFA produces overfitting in it's nature is incorrect. 😉

Don't forget: this is an interdisciplinary method that has proved that it benefits robust modells.
After all it's just a tool, like backtesting, but one that is a lot more powerfull.

All of these techniques back-testing, data mining, curve fitting, walk forward - only work on historical data.

That is also wrong. Don't judge from your own experience or the experience most traders had with EAs.
Just because 99% of all EAs out there are crappy, it does not mean that the underlying concepts can not work.

It's more because ppl do not know how to do it right, as most of them come from a trading-background, just knowing some MQL basics.
That will not work, of course.
But with solid algorithms and state-of-the-art AI techniques one should be able to write working code - don't forget that more and more trading, also in huge banks or hedge funds etc is taken over by algorithms.. So the question is not if it can work, but how it works!

Tough, I do not claim I know how this is done, yet. I just know that it CAN work and that I am on the best way towards a working solution, but I am still learning.

Btw, my private algotrading framework is already ~30.000 lines of code, so please do not compare "real" algorithms with EAs. 😛

But it's a relatively new name and able to masquerade as a brand new, different and 'scientific', and there are only a dozen or so commercial WFO/WFA offerings out there at the moment. So good luck Darwin for jumping on the WF band-wagon. You'll likely make more from your work when you go commercial than you would from trading it.

Don't get me wrong, I am not using WFA myself, I have a set of private algorithms that are able to analyse about 2500x as much data per system than a WFA.
For example, this diagram shows 250.000 single optimisation/forward-trading pairs:

X-achsis: Profit in in-sample.
Y-achsis: Profit in forward trading.

You see the clear trend? That is an in-depth view that no manual trader can EVER get on a trading system, as only algorithms are capable of generating and analysing such an amount of data.

Also, it takes only 1 click to show a parameter (like "moving average period") on the X-achsis, so I could clearly spot the parameters the system works best with.

So, what I want to say is: Tough algorithms are not easy to code (and all the EAs out there have nothing in common with real algotrading), they are still the most powerful tools a trader can work with, as long as they are seen as tools and not as replacement for humans.

A) What do you do with trades at cross-points in WFO?

Sophisticated algorithms will wait until the last trade closes, and then start the new WF-window afterwards. But that requires changes on the mql scripts, so the public WFanalyzer can not do this.

B) This method breaks down in a big way when markets change (ex. 2007 top to 2008 bust). You may lose more by using parameters determined during strong up trends when markets plunge than when you use parameters that where determined over the whole history.

Don't limit your imagination by my beginners write-up 😉
Again, sopthisticated algorithms will implement "emergency stops" to stop trading in such events.
Also, this is even more important, they will not even trade in timespans where the underlying and analysed inefficiency is not really present in the markets.

So you have a trend following system and the current markets are ranging? Then there is no trading!

These mechanisms are quite easy to implement when you look at the whole parameterspace of a system (with ~250.000 single test-cases, such trends get very obvious.)

Let's take the diagram I posted above as example. You see that the EA in question made most live-trading results when the in-sample profits are very high.
So when the profit during optimisation is not "very high", the system would not trade at all.

C) As someone else said WFO is dynamic backtesting and AFAIK there is no rigorous justification for it. To the contrary, when using smaller samples significance of the results is degraded than when using a large sample.

When you have 1 large sample and 100 smaller samples (that show the same amount of data than the large sample), I can not see a reason why the signigicance of the second sample-set should be less than the one of the first - of course only if you look at all of them, not only on some.

D) Just develop the system in first 50% of the data and do an out-of-sample in the other 50%. If the profit factor is no lower than 50% and max DD is the same or less then you may have something. The rest is for vendors to keep people playing with their platforms hoping they get something. Every system developer should read this article. It also applies to WFO: if you reuse the data many times and you change the back and forward period many times you will finally get your fitted system.

The second part is very, very right. Thanks for pointing that out 🙂 Yes, if you do this, then of course you will get overfitted systems in the end.
As I said, WFA is just a tool, and it is not protecting a trader from it's own lack of knowledge 😉

Tough, the first part is questionable. Did you ever make in-depth analysis about that? Why is it 50% of the data, not 45 or 60?
Why look at the profit factor, not the expected payoff? Or would the drawdown/profit be better?
Why "not lower than 50%"? Why not 75% or 30%?

You see? That is one of the problems, relying on inutitive chosen numbers will not work very well.
But algorithms could be used to determine the exact numbers, and thats what algo trading is all about - having very powerful tools and code to evaluate every single aspect of a system based on tons of data.
It is NOT about "starting a script and wait until I get rich".

For example, my above mentioned private evaluation algorithm is 100% parameterless, it has only 3 inputs:
a) The EA file to analyse (of course)
b) The Market/Timeframe (also obvious)
c) The timespan to analys (eg 2000-2012)

Every single other aspect regarding the EA, from optimisation and trading timespans to parameter ranges and everything you can imagine is evaluated. There is no "best guess" included in an solid algortihm.
And if it is, then this best guess is verified based on a proper evaluation before it is used within the system.

-Darwin

EDIT: Regarding the image above. That is just one way to look at the data. You can also average all trades on a per-day basis and get something like an equity curve, if that's what you want:

So the purpose of WFA and the more sopthisticated concepts is to harvest as much data on a single system as possible - whereas a backtest only gives you 1 point of view.

meanreversion · Jan 17, 2014

Purple Brain said:
Robert Pardo is another 'expert' that has made his name writing quality gobbledegook. WFO (or WFA) is simply back-testing, but using only the data up to the current test point in the historical test data on the historical data that follows it and then acting surprised when one of the 10,000 optimised tests shows a positive correlation. It's a dynamic curve fit. All of these techniques back-testing, data mining, curve fitting, walk forward - only work on historical data.

But it's a relatively new name and able to masquerade as a brand new, different and 'scientific', and there are only a dozen or so commercial WFO/WFA offerings out there at the moment. So good luck Darwin for jumping on the WF band-wagon. You'll likely make more from your work when you go commercial than you would from trading it.

Is your point that any form of back-testing is redundant? If you think that back-testing has some merit, then WF is just an attempt to prevent over-fitting, that's all.

Darwin-FX · Jan 24, 2014

Hi meanreversion,

I would not say it is _just_ an attempt to prevent overfitting, it also generates a different type of data.

Backtesting generates data about a system's performance, WFA generates data about the correlation of a system's performance in the (know) past and the (unknown) relative future.

-Darwin

drmark27 · Feb 6, 2014

I learned a lot about WFA from Howard Bandy through his books and message boards. However, in studying his words, something critical seems to break down.

What ratio of IS:OOS data should you use? Do you use two years IS, two months OOS? Do you use three years IS, six months OOS?

I asked Bandy this question a number of times and never got a good answer. He basically said "use what works." To me, that means you're curve-fitting precisely the tool you are using (WFA) in an effort to prevent curve-fitting! That is not going to give me the confidence to trade any backtested (with WFA) system live.

danielfppps · Feb 7, 2014

Walk forward optimization is just a more complicated back-testing route where data-mining bias is harder to determine. Here are a few things worth considering:

There is no concrete evidence that a system with successful WFO has a better chance at live trading success than a system optimized traditionally for the whole back-testing period. There are many instances where a system developed with successful WFO fails going forward.
There is no such thing as an "out-of-sample" period when using historical data. If you use WFO you will always go back to the drawing board (changing parameters, WFO variables, windows, optimization ranges, etc) until you get something that works in WFO. It is a harder fitting exercise - more degrees of freedom - but if you try for long enough you can always mine/find something that works on a WFO analysis. The OS periods are not "unknown" they are actually known because the WFO testing process is repeated as many times as needed. This repetition process introduces hindsight that is as bad as for any historical testing process (as I say above you simply modify things until they work in the WFO).
The data-mining bias is really hard to measure in WFO because the degrees of freedom are in fact increased by the WFO process (adds complexity).

I have developed many systems that have achieved profit without the need for WFO and have traded systems with successful WFO tests that failed in live trading as well. This technique gives no real advantage (larger probability of live trading success) over traditional back-testing optimization where the data-mining bias has been properly determined and accounted for. My advice is simply to use your whole data for back-testing/optimization and use randomly generated data to determine the data-mining bias for your strategy.

drmark27 · Feb 7, 2014

Hi Daniel,

I want to state up front that nothing of my response is meant to pick on you personally. What I say about you goes for anyone else including me. We're just having a theoretical discussion here about trading system development, here.

I'll start out by claiming of course there is no concrete evidence of anything pertaining to real trading/investing performance for a few different reasons. First, many of those who succeed trade on their own or with a big firm and do not advertise their results to the public. Second, many of those who fail walk quietly into the night with their tails between the legs. They do not advertise their results (failure) to the public, either. Third, many of those who claim success are snake oil salesmen (and women) of one sort or another and cannot be trusted anyway.

In other words, don't try and argue against any trading approach because "there is no concrete evidence."

Similarly, don't try and argue against any trading approach based on your supposed personal experience. We haven't seen your brokerage records, bank accounts, tax returns, etc., so we ultimately have no idea what you have really done. You may be a snake oil salesman yourself. These supporting elements are therefore irrelevant.

You say there is no OOS period when using historical data once it has been used. I think there's a common misconception that once data has been used it is corrupted and any further use leads to some sort of bias. I don't believe that is what causes the potential harm. I believe the problem is the failure to assess whether profitable results in backtesting are a matter of luck, curve-fitting, or likely Edge. To determine this, optimize parameters over a range and explore that parameter space. Spike regions are likely fluke, in my opinion, and not suggestive of future success. High plateau regions are what we want to see.

I agree with you that a problem of WFO is constant returning to the drawing board until you find something that works if you are referring to changing up the ratio of IS:OOS time intervals. I wonder, though, if this can be optimized over a range as discussed above and inspected for spike vs. plateau regions as well.

danielfppps said:
Walk forward optimization is just a more complicated back-testing route where data-mining bias is harder to determine. Here are a few things worth considering:

There is no concrete evidence that a system with successful WFO has a better chance at live trading success than a system optimized traditionally for the whole back-testing period. There are many instances where a system developed with successful WFO fails going forward.

There is no such thing as an "out-of-sample" period when using historical data. If you use WFO you will always go back to the drawing board (changing parameters, WFO variables, windows, optimization ranges, etc) until you get something that works in WFO. It is a harder fitting exercise - more degrees of freedom - but if you try for long enough you can always mine/find something that works on a WFO analysis. The OS periods are not "unknown" they are actually known because the WFO testing process is repeated as many times as needed. This repetition process introduces hindsight that is as bad as for any historical testing process (as I say above you simply modify things until they work in the WFO).

The data-mining bias is really hard to measure in WFO because the degrees of freedom are in fact increased by the WFO process (adds complexity).

I have developed many systems that have achieved profit without the need for WFO and have traded systems with successful WFO tests that failed in live trading as well. This technique gives no real advantage (larger probability of live trading success) over traditional back-testing optimization where the data-mining bias has been properly determined and accounted for. My advice is simply to use your whole data for back-testing/optimization and use randomly generated data to determine the data-mining bias for your strategy.

meanreversion · Feb 10, 2014

I use a 4 year IS period then a 1 year OOS period. I don't recall the precise reason I use that, but I remember researching it at the time and deciding that was a good mix.

Even then, you still need a way to determine whether the system is any good or not. Walk forward analysis definitely poses a number of fresh questions.

avano · Feb 11, 2014

WFA is just another form of optimization on shorter periods. The system will fail regardless if new conditions arise not covered by WFA.

drmark27 · Feb 11, 2014

I don't know whether or not WFO works because I haven't tried it [yet?]. Better yet, I don't know whether or not WFO works because I haven't tried it on a sufficiently large sample size of systems. I say this just to state my bias here: I have none. I don't know.

I believe WFO is based on the idea that what has happened more recently is a better guide to what will happen in the immediate future. I also think this is obvious.

Optimization is only a bad word if one uses it to pick out the best--thinking what was best in the past will be best in the future.

I believe optimization is necessary to see how the system performs over a range of parameter values. Based on this idea, if you don't optimize then you're much less likely to succeed. It's not about determining what worked in the past. It's about seeing something that worked in the past and having some sort of idea whether that was fluke.

Finally, I don't see why a system will necessarily fail if new conditions arise "not covered by WFO." It seems entirely possible that the "new conditions" may find a match with a particular set of parameter values and if the new condition was a gradual development then a WFO process may well be sufficient to adapt.

Darwin-FX · Feb 14, 2014

drmark27 said:
I learned a lot about WFA from Howard Bandy through his books and message boards. However, in studying his words, something critical seems to break down.

What ratio of IS:OOS data should you use? Do you use two years IS, two months OOS? Do you use three years IS, six months OOS?

I asked Bandy this question a number of times and never got a good answer. He basically said "use what works." To me, that means you're curve-fitting precisely the tool you are using (WFA) in an effort to prevent curve-fitting! That is not going to give me the confidence to trade any backtested (with WFA) system live.

He can not tell you because he does not know because WFA is flawed.
Well, I am currently writing a new article which should clarify this point, but at the moment know that WFA tries to judge a system that has a parameterspace in the billions (actually most of them have) by taking like 100 samples. That can and will not produce reliable results!

Also, WFA is flawed because for each IS:OOS datasizes to test you need a new WFA procedure that takes time.

My private algorithm (DATFRA - Darwins Algorithmic Trading Framework 😀) is able to solve both issues. First, it can generate about 8 million samples in ~24 hours. Also, it can analyse as many IS:OOS datasizes as one wants, in only one analysis procedure.

---------------------------------------------------------------------------------------------------------------------------------

danielfppps said:
Walk forward optimization is just a more complicated back-testing route where data-mining bias is harder to determine. Here are a few things worth considering:

There is no concrete evidence that a system with successful WFO has a better chance at live trading success than a system optimized traditionally for the whole back-testing period. There are many instances where a system developed with successful WFO fails going forward.

There is no such thing as an "out-of-sample" period when using historical data. If you use WFO you will always go back to the drawing board (changing parameters, WFO variables, windows, optimization ranges, etc) until you get something that works in WFO. It is a harder fitting exercise - more degrees of freedom - but if you try for long enough you can always mine/find something that works on a WFO analysis. The OS periods are not "unknown" they are actually known because the WFO testing process is repeated as many times as needed. This repetition process introduces hindsight that is as bad as for any historical testing process (as I say above you simply modify things until they work in the WFO).

The data-mining bias is really hard to measure in WFO because the degrees of freedom are in fact increased by the WFO process (adds complexity).

I have developed many systems that have achieved profit without the need for WFO and have traded systems with successful WFO tests that failed in live trading as well. This technique gives no real advantage (larger probability of live trading success) over traditional back-testing optimization where the data-mining bias has been properly determined and accounted for. My advice is simply to use your whole data for back-testing/optimization and use randomly generated data to determine the data-mining bias for your strategy.

You are right because of the reasons I just mentioned.

The thing is, when I first experimented with DATFRA, I noticed a thing:
I did a test, just like WFA (WFA would have analysed ~50 IS->OOS datapoints for this test), but I heavily increased the amount of data.
But how much is enough? Well, 25.000 still was not (for a simple system with 3 parameters and a parameterspace size of 10000) enough, data looked like random noise:

Data only began to show clear trends when I looked at about 500.000 single InSample->OutOfSample tests:

(Tests were done on the Default Moving Average Expert Advisor that comes with every new MT4 Installation on EURUSD H4 for 2005-2013)

---------------------------------------------------------------------------------------------------------------------------------

In other words, don't try and argue against any trading approach because "there is no concrete evidence."

Similarly, don't try and argue against any trading approach based on your supposed personal experience. We haven't seen your brokerage records, bank accounts, tax returns, etc., so we ultimately have no idea what you have really done. You may be a snake oil salesman yourself. These supporting elements are therefore irrelevant.

Right, analysis methods can only be discussed based on theory, because as soon as you try to prove or disprove them by using them with some EAs you
a) well only analyse some EAs with them, you would have to analyse hundreds or thousands to get reliable results
b) if a WFA shows good results and the strategy works, well, that does not mean the WFA has anything to do with it. Perhaps the trader designing the strategy was just very good
c) you would have to forward trade the strategies for many years to get reliable results per test imo

---------------------------------------------------------------------------------------------------------------------------------

e. To determine this, optimize parameters over a range and explore that parameter space. Spike regions are likely fluke, in my opinion, and not suggestive of future success. High plateau regions are what we want to see.

Well, it still depends on how you look at the data. You know, you always have to set them in context. So analyse parameters and fitness in optimisation <-> fitness in forward trading / in the relative future.

Then you might get the real insight. Let me bring up DATFRA as an example again:

As you see, you can analyse the relationship between up to 4 things with it. So it makes it very easy to see what parameters work if one wants to get high forward trading profit.
Disclaimer: I implemented this 3d/4d plotting yesterday and I have the feeling it is still buggy, so the plot shown above might be screwed, but its enough to get the concept.

---------------------------------------------------------------------------------------------------------------------------------

Even then, you still need a way to determine whether the system is any good or not. Walk forward analysis definitely poses a number of fresh questions.

Dont worry, I think I have solved this, also 😛
Remember when I said I do not just analyse like 100 IS->OOS Datapoints but 500.000 to a few millions? Well, with all this you can do some things...

First, you can empirically see how well a given parameter should work in live trading:

You can also do this based on Optimisation Profit etc. If a system tends to behave not like in the past, based on millions of measurements, it might be broken.

But there is more you can do with all this data, you can train Artificial Neural Networks in order to forecast profitability of your next live trading period:

And it will give you a Number between 0-100% determining the probability for a successfull live trading period. This is also a new feature and I am currently experimenting with it, but first results were very very good. So if forecast and reality tend to differ too much and too often, a system might be broken.

But DATFRA is new, so there might be more sophisticated methods to detect broken systems, but I know the answer lies in the Data generated by it!

---------------------------------------------------------------------------------------------------------------------------------

Better yet, I don't know whether or not WFO works because I haven't tried it on a sufficiently large sample size of systems.

And I thought nobody would ever mention that, too many ppl just want to see it tested on one or a few systems 😉

---------------------------------------------------------------------------------------------------------------------------------

Optimization is only a bad word if one uses it to pick out the best--thinking what was best in the past will be best in the future.
[...]
Finally, I don't see why a system will necessarily fail if new conditions arise "not covered by WFO." It seems entirely possible that the "new conditions" may find a match with a particular set of parameter values and if the new condition was a gradual development then a WFO process may well be sufficient to adapt.

Well, the second picture in this post actually analysed the correlation showing how well your live trading performance will be when picking, for example, always the best candidate. The thing is, in the few systems I have analysed until now (still busy coding this tool...) one thing is obvious: It is always the best idea to pick the best result from optimisation.
But this only gets clear when you look at hundred thousand cases, so in the traditional WFA it is just pure random luck what system you draw

Your second part is also true I guess. What I have found is that "new conditions" do not just arise, but often when your systems will get less profit in forward trading, you can already see that if you look at your optimisation results. Like with Neural Networks as mentioned above, or just if you see "well the best profit in optimisation was <5000$, in the database most candidtes with such a bad IS-Profit also had bad OOS-Profit".

So it is not about fearing unknown conditions, but about measuring the likelyhood of a system to still make profit

-Darwin

PS: If you are interested in DATFRA, and are a professional trader, please add me on Skype Darwin-FX , I am happy to discuss with you to get new input for my development. But no, you can not buy this yet. Perhaps in the future, perhaps only a few to keep it private, perhaps everyone.. I did not settle yet 🙂 But I am sure if you helped me with cool input in form of discussions, it wont hurt you 😉

Walk Forward Analysis - the only logical successor to backtesting [DISCUSS]

Darwin-FX

Junior member

trendie

Legendary member

Darwin-FX

Junior member

Purple Brain

Experienced member

numbertea

Well-known member

adinfinite

Junior member

avano

Junior member

Darwin-FX

Junior member

meanreversion

Senior member

Darwin-FX

Junior member

drmark27

Newbie

danielfppps

Junior member

drmark27

Newbie

meanreversion

Senior member

avano

Junior member

drmark27

Newbie

Darwin-FX

Junior member

Similar threads