Why "Walk Forward Analysis" is still unreliable and useless! [DISCUSS]

Darwin-FX · Feb 26, 2014

Hello, I am Darwin and today I want to talk about the limitations that walk forward analysis suffers from. This is my 3rd article, so if you do not know how WFA works, please read the other 2 (you can find them here on the forum).

The target audience of this article is everybody that deals with ExpertAdvisors and Backtesting / Walk Forward Analysis

Some of you might already have seen a few posts of me where I talk about some research I do in the fields of Trading System Analysis (in the course of writing a meta-algorithm that can build, analyse and trade strategies on its own). The goal is to write an algortithm that is so powerful that it can take every EA and, due to in-depth analysis, tell you how and when to trade it in order to make profit, no matter how good or bad the underlying EA is 🙂

So here is a new article in which I would like to lay down some insights that I could get in the process of writing this algorithm (DATFRA - Darwins Algorithmic Trading Framework)

Well, lets begin. My first concern is that the design of Walk Forward Analysis is, in its nature, unrewarding and not the kind of analysis a trader wants.

Also, I claim that the results of a WFA are more or less random, and if a system works well after a successful WFA, then not because the test was successful, but because the trader designing the system did a good job.

In this article I do not yet want to show how this problems can be solved, I just want to demonstrate that they exist. In my next article I will explain how I think this all can be solved in an elegant way.

The fundamental design problem

Walk Forward Analysis is designed to evaluate a trading construct you give to it.
This construct consists of:

* Trading System (eg an Expert Advisor)
* Market/Timeframe (eg "EURUSD / H4")
* System's Parameter ranges (eg "Moving Average Period from 50-150")
* Optimisation (In Sample) Timespan (eg "Optimise on 2 years of data")
* Forward Trading (Out of Sample) Timespan (eg "Forward Trade for one month")
* Preferred characteristic (eg "forward trade the candidate with highest profit").

So all this has to be pre-determined by the trader, out of intuition, and not based on true facts and data. But god, these are the most important decisions, how should one "guess" them?!

And then, WFA will only be able to tell you if this construct would have worked in the past or not, thats it.

So in order to find the best trading construct, you have to use trial&error and repeat WFA step multiple times. This would then, step by step, even lead to the worst case, your "unseen" out-of-sample tests would slowly become "known" in-sample data and the whole advantage of WFA over backtesting would fade away completely.

This design related problems are already showing that WFA can not be the end of the road in terms of system analysis.

In a perfect world, you should give the analysis algorithms only the trading system and the market/timeframe, no other parameters. And then, the algorithm should tell you the best choices for all the other parts of the trading construct, based on data and facts, not the other way round.

Side Note: it should NOT just tell you how to trade your systems, it should give you the possibilities to look into the system's characteristics on your own. You should never be forced to trust any algorithms without the possibilities to check it's findings!

This is very, very important. It is not very much of value to evaluate a single trading construct, but it is a gamechanger if you can look into your strategies in a way that would allow you to just "see" how they work and what trading construct will work best (More on this in my next article)

Even worse: Unreliable results because of lacking data

Ok, so even if a trader could come up with a good trading construct out of intuition/knowledge, WFA would still be a more or less random thing. But first, let's make a rough calculation:

An example trading system and a small estimation of its parameterspace-size

So, a system that enters trades based on a Moving Average Crossover and RSI Indicator, and exits them using a different Moving Average Crossover has at least 5 Parameters (2x2 for MA-Periods + RSI Threshold). It's 6 if you take into account the StopLoss.

Let's say the "fast" Moving Average Periods can be 10-50 and the "slow" ones 50-250, the RSI threshold can be 1-100 and the StopLoss 50-150 pips (this is no real system, just an example!)

So this system can already be traded in 40*200*40*200*100*100 different ways. That is 640 billion (640.000.000.000), which is quite a huge number.

One might question my exact example strategy, but can not question the millions or billions of possible parameter combinations, even for small systems.

But thankfully, if we take into account that a lot of these parameter-combinations would behave very similar, we do not need to evaluate them all, but we need at least a meaningfull sample of it, like a few hundred thousand or a few million.

So, keep this huge amount in mind, even for small systems, because with every new dimension for our optimisation problem's solution space (every new parameter) the amount of possible parameter-combinations grows exponentially.

Walk Forward Analysis - missed data during optimisation

Ok, now lets look at the first step of WFA, and the first problem: Missed data because of inefficient algorithm design and computing time concerns.

During optimisation step of WFA, the algorithm should, in a perfect world, evaluate all 640 billion combinations in order to determine which of them work best. Of course this is not possible, but a "meaningfull" sample (let's say 500.000) would be feasible and _needed_ if we want to look at the "real" picture.

The problem is, due to limitations of WFA algorithms, optimisation has to be done in every single Walk Forward Window.

Let's say we do a WFA on 10 years of data and our Forward Trading Timespan is 2 weeks: That makes 240 Walk Forward Windows. That means 500.000 tested parameter combinations per window would need 120.000.000 single simulations.

And then, remember that WFA relies on a trial&error principle, so you will most likely have to do this a few times.

You see? Evaluating the "real" picture would take very, very long, and therefore most WFA implementations are forced to only evaluate an very much cropped fraction of the actual parameterspace because it is not possible to evaluate the whole parameterspace (or a meaningfull sample of it) in a reasonably small timespan, because optimisation has to be done in every single WF-Window.

This means, WFA most likely does not evaluate 500.000 parameter combinations per window but only 10.000 or 50.000 or something like that. So eventually we already lose like 90% of all data in this step.

This is a problem that could be solved if the trader has lots of time for his/her analysis (which is not likely, especially based on the trial&error method), or with a more efficient design of these algorithms. Nevertheless, in praxis, this problem is ever-present.

For comparison: DATFRA, which is my private research project, only has to do one single simulation per parameter-combination, no matter how many WF-Windows it analyses. In the above example, that would already decrease the computing time by the factor 240.

Parenthesis: What kind of data do we look at when analysing trading systems, what is a "datapoint"

I will talk about "datapoints" and "data" quite frequently in this article and in my posts, so here is an explanation.

When analysing systems, it is always about a trinity of informations. Remember how WFA works:

So a datapoint, of which 1 is generated per Walk Forward window, consists of:
* The performance in the RED optimisation window
* The performance in the GREEN forward trading window
* The used parameter-combination for this specific test

So, in our example, a WFA would generate 240 of them, whereas 120million (500k * 240) would be possible for our example system. That should already give you headache.

Walk Forward Analysis - tons of missed data during forward trading

Ok, now lets look at the second step of WFA, and the second problem: Missed data because of _wrong_ algorithm design and computing time concerns.

Now remember, a meaningfull sample of our trading system's parameterspace would be 500.000, and we have 240 WF-Windows. That would make a total of 120.000.000 optimisation-candidates. And out of this huge amount, a WFA algorithm takes the very best per window, 240 in this example.

That is 0,0002% of the total amount of all datapoints that we could use to describe/analyse this system and it's ability to produce good forward trading results, based on good optimisation results.

And then WFA takes these few datapoints and claims it gives a somehow realistic view on a trading system's performance / robustness.

Thats nonsense! You also would not judge a picture's colour by looking at 1 pixel, would you?

A word about fluctuations and why the "very best" parameter combination is not meaningfull

You could argue that it is not important if we forward trade all 500.000 candidates per window, because we are only interested in the top performers, as they are the ones we trade in realtiy.

Well this argument would _only_ works if:

* We would ignore the ~90% of data lost in the optimisation step
* The very best candidates would be meaningfull, which means that all candidates that are following (like the next 10 or 20 or 50, which is not much compared to 500.000) would behave in quite the same way.

But reality is different, the performance of the top candidates per window fluctuates quite much and taking the "very best" therefore leads to more or less random outcomes.

Experiment 1

Here are some examples, I plotted the forward trading performance of the best (left) and the next 4 candidates of some random strategies I created and evaluated with DATFRA. Most of the analysed WF Windows looked like these:

These were just a few examples to illustrate my point of view, I could show hundreds or thousands of them.

So, for the real picture, you would AT LEAST need to evaluate a few hundred of the top candidates, not just one, as it does not show the "real" picture. It's performance is more or less random!

A perfect analysis algorithm would evaluate every single candidate that made at least 1$ profit during optimisation. That would give the real picture and most likely 1000 or 10.000 as many datapoints than what a WFA gives.

Experiment 2

Here are some more examples, this time I plotted the overall WFE (red) and the WFE of single windows (green) of some random strategies I created and evaluated with DATFRA.

WFE (Walk Forward Efficiency) is a measurement that compares in-sample and out-of-sample performance and is used as THE statistic about system robustness in WFA (google for it if you want to know more about it)

This clearly shows the flucutating nature of the results a WFA generates, and that the end result is not really telling much about your expected live trading performance.

Btw: To keep the plot scale in limits I did map all points > 2.5 to 2.5 and all points < -2.5 to -2.5, so reallity is even a lot worse. That is also the reason why the second image in the second row does not look "right"

A word about feasibility

Please do not think I only talk about grey theory here "as it is not possible to do this kind of simulations in a short enough amount of time anyway".

If the algorithm is designed well, one would not need a single further simulation in order to determine forward trading profit and not a new optimisation procedure for each WF-window.

So for the used example, DATFRA can generate 34.000.000 "Optimisation=>Forward Trade datapoints" in ~24 hours and on a mid-end PC (8GB Ram, quadcore 3GHZ).

Still not 120millions, sure, but compared to 240, I think its a very good result.

So it IS feasible to analyse a system with such a level of insight, even on today's hardware.

Some afterthoughts

To everyone claiming that backtesting strategies does not work: Well, in its current form it does not, but if you look at enough data, it does, and it can aid a trader in taking funded decisions.

To everyone using backtests/WFA: It does not work that way, you can never rely on your analysis results, and if your EAs/Trading Strategies make profit, then not because of the good tests, but because you did a very good job designing them!

In about 1-2 weeks I will post my next article, in which I will explain how an ultimative state-of-the-art system analysis algorithm works and what can be done with it. You will be stunned, promised! 🙂

"Are you just trying to sell stuff?"

People keep asking me this whenever I post stuff.

No, I post this because I want to discuss my concepts and thoughts with other advanced traders. The side benefit is the educational effect for everyone that is willing to learn more about algotrading.

And yes, I am developing an algorithm that is based on the concepts that I explain in my articles (especially the next one) and that is able to solve the issues discussed here. Well, basically I have already developed it, it's in first alpha version at the moment and works great.

But I am developing this for my private usage, so No, I am not here trying to sell you stuff, as most of the people reading this will not have the chance to purcase it.

It will only be sold to a few people, just enough so that I can fund my own trading accounts (I am young and therefore need the money 😛).

Most likely I will limit the amount of copies sold or only sell to expert traders or only to companys or charge enough money so most ppl won't want it or sell copies in silent auctions or ..... Well, I do not know yet how it will work out, I can just say that I will keep it private to a small circle of happy few, so do not read this article with the bias of "this guy just wants to sell me stuff", thanks.

-Darwin

PS: As always, just add me on Skype if you want to discuss further and/or want to have more informations about DATFRA: Darwin-FX is my SkypeID.

PPS: I know I could have hidden the part about somewhen selling something, but hidden agendas are cowardly, so live with the truth, as it does not make my arguments any less valid 😉

LauraRomans · Mar 2, 2014

Hi Darwin - Agreed, back testing 2 years is insufficient data. How many years/decades would you back test the emini?

Not trying to get your program, just your ideas. Sounds like you have a firm foot in the game. Hoping the best for your trading program and capital collection.

Darwin-FX · Mar 3, 2014

It does not matter how long you backtest your system, you will never be able to tell if your results are due to overfitting, luck or because of a solid edge.

WFA makes this a bit better, but if you did read the article you should know that it is still very unreliable.

If you want to use WFA nevertheless, use at least 5-6 years for HTF systems (use every data you have for D1 systems). You can use my WFanalyzer if you want, its free and here on the forum.

-Darwin

Darwin-FX · Mar 4, 2014

Come on guys, nobody an opinion on this?
There were some pretty cool arguments on my other threads in this forum, so come on 😛

-Darwin

Darwin-FX · Mar 4, 2014

I see you know nothing about algotrading. So no, WFA is way better than backtesting (tough far from perfect).

Let me make this more clear: Nobody, that knows at least a bit about algotrading, got an opinion on this topic? 🙂

-Darwin

LauraRomans · Mar 4, 2014

Hi Darwin,

Thank you for helping in my education. I got my "version" of Algo Trading from Wikipedia. But this is for Algorithmic trading not algotrading. And, you are right that I know nothing about WFA.

"Algorithmic trading, also called automated trading, black-box trading, or algo trading, is the use of electronic platforms for entering trading orders with an algorithm which executes pre-programmed trading instructions whose variables may include timing, price, or quantity of the order, or in many cases initiating the order by a "robot", without human intervention"

This is how I trade. Thank you for correcting me on the use of the name. I do backtest my program that operates in the blackbox.

At this point, due to your recommendation, I will be a learner rather than a participator in this discussion.

Darwin-FX · Mar 4, 2014

Algotrading == Algorithmic Trading 🙂

You should read about WFA (I have an article written about this, its here on the forum), because the problem in algotrading (here it is the most important problem) and in automated trading (here it is also very important) is the need to verify your trading programs, so they run correctly and according to your tests, even in the future.

I would differentiate between algotrading and automated trading.
Algotrading is, imo, if the trading strategies used are developed by or with heavy usage of algorithms.
Automated trading is if the strategies are mostly developed by a human, based on observation, knowledge and experience, which are then just put into a script so they can be executed without further interaction.

-Darwin

Darwin-FX · Mar 4, 2014

How about some reasons, punctuation, spelling and grammar?
And if you have that fixed, we can begin why you clearly seem to have not the smallest clue what you are talking about.

Also, i did release a FREE Walk Forward Analyzer here on the forum...

And last but not least, my whole thread is about why WFA is NOT GOOD, so god damnit, please use some other threads to spread your unfunded bull****, thanks.

-Darwin

LauraRomans · Mar 4, 2014

Thank Darwin. How true it is to test test and re-test. My first automated version sank me umpteen times because of glitches. And, Im a programmer. Ha! It was a disconnect in the programming language and the trading platform. Not a "What you see is what you get". If it were easy to program, Microsoft and Apple would not need to have patches.

Sonicscooter · Mar 5, 2014

You are without doubt, brain dead.

Darwin-FX · Mar 5, 2014

Its like arguing with a child.. so ok, however, you are for sure the coolest kid around. Go play somewhere else, come back when you have something intelligent to say.

-Darwin

Why "Walk Forward Analysis" is still unreliable and useless! [DISCUSS]

Darwin-FX

Junior member

LauraRomans

Member

Darwin-FX

Junior member

Darwin-FX

Junior member

Darwin-FX

Junior member

LauraRomans

Member

Darwin-FX

Junior member

Darwin-FX

Junior member

LauraRomans

Member

Sonicscooter

Experienced member

Darwin-FX

Junior member

Similar threads