Why backtests are useless, EAs are flawed and their parameters are bad [DISCUSS]

Darwin-FX

Junior member
35 0
Imo this does not tackle the problem because you can always find a system if you look hard enough that will pass WFA and fail in forward trading. Just do a neural network search and you will see what I mean. WFA does not work if new conditions develop in the markets that will make an unprofitable system in WFA turn profitable in forward trading. It is a very conservative approach that basically will reject 99.99% of systems and you will end up trading none. Just my 2 cents.

If you use the wfa in the wrong way, of course, you will find systems that are "overfitted" towards the WFA-approach and will fail in live trading.
If you have a normal system-development-process and just use WFA as the last step, it will be hard to overfitt. The thing is, if the WFA gives you "bad" results, you should in best cases just throw the EA away or make only minor changes, but as soon as you optimise the system towards WFA results, you might overfitt, yes.

Also, a good WFA (or better, the trader that uses the WFA) should be able to detect if completly new conditions develop for which the EA is not suited, and then just stop to trade the EA. Tough, my new algo will have ways to determine this automatically :)

It's as conservative as you want it to be. To make it very non-conservative, just use the last few years for analysis (instead of 13) and tune your parameterranges accordingly :)

It's just a tool afterall, not doing your work as trader ;)

-------------------------------------------------------------------------------------------------------------------------------------


Therefore, the million dollar question is, how do we know whether the results of our testing represent real "relationships" or are just random results?

Hey, thanks for the whole writeup. Yeah, I know of the random-walk-experiments and yes, markets tend to move that random way most of the times.

The problem you describe is curvefitting, right?
Well, the way we want to make sure we have real relationships is the following:

First we optimise our parameters on some data, to find the best match for the "relationships" we want to prove that they exist.

Then, if these parameters hold "in the future" and still describe the relationship well, we can conclude that it is a real relationship, as if we would have optimised our parameters on some random data/relationship, and then used this parameters "in the future" (if it is also random), it could not work for the average test case.

So, if we then repeat this test 100-150 times, and always just look at the past=>future-relationship, and most of them are "valid", chances are high that we have a real, non-overfitted, relationship.

My new algo will be able to measure up to 100.000 of these past=>future-relationships, giving us the whole picture, but at the moment 150 should be a good start! :)

-------------------------------------------------------------------------------------------------------------------------------------


Towards the discussion about null hypothesis: Do it without me, I am not good at math ;)


-Darwin
 

Shakone

Senior member
2,458 665
My problem with a Null Hypothesis: Using my coin tossing example I formulate a null hypothesis that “longing after tossing 4 heads” does not generate profits over the ensuing 432 coin tosses. But the test shows that the “longing after 4 heads” theory was profitable, therefore the null hypothesis is falsified. As there is no middle ground possible, my original theory must be true. But it clearly isn’t – it is random coin tosses. In a nutshell, the null hypothesis method can’t cope with randomness.

If you can spot an error in my logic I’d be delighted to be enlightened.

Ok, logical flaws:

1) Excel does not do a great job of generating random numbers (unless it has updated it's RNG in a recent version that I haven't used), but it should be sufficient for the purposes here, so not a big problem.

2) You haven't actually carried out a hypothesis test, have you? It's not about whether a set of 20 trades ends up profitable. It's about whether the results are statistically significant. And for testing that, you would want a lot more than 21 trades, especially when they're so easy to generate, and you would want to choose a sensible null hypothesis. Do you need help on this point?

3) On your sheet, 15 wins, 6 losses. If we recalculate the cells, then on my first recalculation it has become a losing system, one more loss than win. Next recalculation has 7 wins and 11 losses, and so on. It should be obvious that the results vary quite a bit, and that you can't give such significance to your 15 wins and 6 losses result, and furthermore that you must apply some statistical test of significance which is what I suggested.

It's not that the null hypothesis method can't cope with randomness, it's great for exactly this type of thing.
 
Last edited:

Solas0077

Active member
236 14
My problem with a Null Hypothesis: Using my coin tossing example I formulate a null hypothesis that “longing after tossing 4 heads” does not generate profits over the ensuing 432 coin tosses. But the test shows that the “longing after 4 heads” theory was profitable, therefore the null hypothesis is falsified. As there is no middle ground possible, my original theory must be true. But it clearly isn’t – it is random coin tosses. In a nutshell, the null hypothesis method can’t cope with randomness.

If you can spot an error in my logic I’d be delighted to be enlightened.

I don't understand how the test shows that. Each time I hit F9 I get another result. You may have to do that like 100,000 times to calculate a statistic and see if the hull can be rejected. Am I missing something?
 

meanreversion

Senior member
3,398 537
Darwin - good thread, well done!

I use Amibroker for testing strategies.. the walk-forward analysis is excellent and is just a click of a button like the backtest.

There is a decent book on forward testing by Robert Pardo, you may have come across this. What is not entirely satisfactory (to me at least!) about his approach is that his analysis of the results themselves is a little qualitative, focusing primarily on raw profit numbers (without regard to drawdown).

Would you care to share with us how you process walk forward data? In other words, which output are you looking to maximise, or what would lead you to believe a system is robust - is this something you determine quantitatively?
 

Darwin-FX

Junior member
35 0
Darwin - good thread, well done!

I use Amibroker for testing strategies.. the walk-forward analysis is excellent and is just a click of a button like the backtest.

There is a decent book on forward testing by Robert Pardo, you may have come across this. What is not entirely satisfactory (to me at least!) about his approach is that his analysis of the results themselves is a little qualitative, focusing primarily on raw profit numbers (without regard to drawdown).

Would you care to share with us how you process walk forward data? In other words, which output are you looking to maximise, or what would lead you to believe a system is robust - is this something you determine quantitatively?

Thanks for the nice feedback!

Well, with this algorithm it's up to the user to define a desired characteristic, like profit or drawdown or profit_per_drawdown or something, that is then used to pick a parameterset for live trading. (the one I am currently developing can seek the best characteristics on it's own btw :) )
So it's not really the output that is maximised, but the "how to choose parameters for live trading"-methodology that is tested.

But then, to see if a system is robust, I assemble all out-of-sample-trades to one backtest, not just a few meaningless numbers :) So you get an equity curve, "overall statistics", statistics split for short/long, trade-return-distributions, monthly-return-distributions and also some diagramms that show you the fluctuation of your fitness-values (like profit or profit-factor etc) on a per-year and a per-month basis :)

So, where the initial WFA approach only gives you a "WFE" number (which is useless btw, as far as my experience shows), I give you an in-depth analysis of the system :)

-Darwin
 
Last edited:

Tyger

Junior member
20 1
Shakone – thanks for your response. The coin-tossing example was meant to be a simplified version to explain my concern, which is that even with a large sample size we cannot distinguish between patterns that are due to randomness and patterns which are due to genuine relationships. However, having thought about your response for a couple of days, I conclude that we can apportion a likelihood of such a pattern being random or not. Thus, we never know if an individual test or strategy is showing randomness or relationships, but we can determine that, say, 9 times out of 10 the results are due to relationships. Have I got this now?

Darwin – Surely past observations can be explained by an infinite number of hypotheses. The only way to decide which ones are better is by seeing how well they can predict observations whose outcomes are not yet known, ie the future. Can you give us an idea of how well your approach is working? (eg average ROC over 12 months with max 50% drawdown in any one year).

Also, I thought that financial markets were non-stationary, ie the statistical characteristics change over time (in mathematical terms it is a multi-agent system/complex adaptive system). So how does your approach deal with this issue?
 
  • Like
Reactions: Shakone

Darwin-FX

Junior member
35 0
Hey Tyger :)

Darwin – Surely past observations can be explained by an infinite number of hypotheses. The only way to decide which ones are better is by seeing how well they can predict observations whose outcomes are not yet known, ie the future.

Yes, the solution space is inifinite, right.
As stated in my new article (http://www.trade2win.com/boards/tra...uccessor-backtesting-discuss.html#post2230130) WFA tries to tackle the problem by just taking into account the relative future for system evaluation.

Of course, all data used to do it is still just the past, so we can not be sure if the one trading-system we have chosen, out of the infinite amount of trading-systems, will be profitable in the future. But we can at least analyse if the performance in the past and the performance in the relative future is somehow correlated.

But still, it's up to the trader to see when a traded inefficiency is not longer existing and the system is "broken".

I have 1 or 2 ideas to solve this automated and 100% algorithmically (let systems trade less frequently if they become worse and worse, and vice versa.) and I will make some experiments regarding this topic when my new algo is working in it's core, but that is still a bit down the road.

Can you give us an idea of how well your approach is working? (eg average ROC over 12 months with max 50% drawdown in any one year).

No, first because I am still coding and researching. But the main problem with this statistics is that they depend on the particular trading-system, and not on the analysis method.
So even if I would have years of live trading, these statistics would just prove that the EA I traded was good, but can't say much about the approach itself (for this we need to test many, many EAs for many, many years).


Also, I thought that financial markets were non-stationary, ie the statistical characteristics change over time (in mathematical terms it is a multi-agent system/complex adaptive system). So how does your approach deal with this issue?

As long as the statistical characteristics don't change too rapid and the underlying inefficiency is still intact (for example, there is still a course correction after a heavy trend but the length of the correction has changed or something), WFA can, by definition, handle this by adjusting the parameters for the strategy.

If the underlying inefficiency is not longer existing, we have the problem I described in the first part of this post.


-Darwin
 

meanreversion

Senior member
3,398 537
Well, with this algorithm it's up to the user to define a desired characteristic, like profit or drawdown or profit_per_drawdown or something, that is then used to pick a parameterset for live trading. (the one I am currently developing can seek the best characteristics on it's own btw :) )
So it's not really the output that is maximised, but the "how to choose parameters for live trading"-methodology that is tested.

But then, to see if a system is robust, I assemble all out-of-sample-trades to one backtest, not just a few meaningless numbers :) So you get an equity curve, "overall statistics", statistics split for short/long, trade-return-distributions, monthly-return-distributions and also some diagramms that show you the fluctuation of your fitness-values (like profit or profit-factor etc) on a per-year and a per-month basis :)

So, where the initial WFA approach only gives you a "WFE" number (which is useless btw, as far as my experience shows), I give you an in-depth analysis of the system :)

-Darwin

Yes, Amibroker does all this as well.. it assembles the OOS simulations together to give an overall OOS backtest.

Nonetheless, you still need to decide the output which determines whether you think the system is "robust" or not. Typically, I will look for return/drawdown of greater than 1, but other factors such as % win rate and profit factor are also relevant.

Are you going to give us an example of your work?
 

numbertea

Well-known member
257 9
Nice discussion D-FX!
I'm sure your product will be enlightening to all who experience it. Best of luck to you and thank you management for simplifying the thread for those who might otherwise have been confused!

Cheers
 

Darwin-FX

Junior member
35 0
Nonetheless, you still need to decide the output which determines whether you think the system is "robust" or not. Typically, I will look for return/drawdown of greater than 1, but other factors such as % win rate and profit factor are also relevant.

Are you going to give us an example of your work?

Ah, now I understand. Well, I can only give you an analysis-report, and then it's up to the trader to decide if the system is good enough or not.
Tough, an cloud-based evaluation of analysis reports or something would be cool, but too far down the road to even think about it yet.

Well, yes, sure, here is an example: http://85.214.116.235/btana_example.html
That was done on the default "Moving Average"-EA that is shipped with every MT4 installation, so don't expect anything good.

This analysis report was the first alpha, I think it's not the current state but will give you an idea of what to expect. Also I will add more and more stuff to it when I feel the need for it.

Also, I will release the tool tomorrow on some forums and beginning at next week on all the others, so you can soon make your own opinion :)

Nice discussion D-FX!
I'm sure your product will be enlightening to all who experience it. Best of luck to you and thank you management for simplifying the thread for those who might otherwise have been confused!

Cheers

Thank you very much for this feedback, I also hope so haha :)
Also, if someone is still confused, they hopefully post here so I can un-confuse them ;)

-Darwin
 
Last edited:

Darwin-FX

Junior member
35 0
@meanreversion:

Well I have to revert my above statement.
There IS a fixed number that indicates robustness.

If, on a huge number of samples (eg 100.000), the chance to have a profitable trade window is > 50% (and the average trade window is >0$), that would mean it does not matter which parameterset you trade, so as long as you trade the EA with any settings, you will make profit in the long run :)

I think that is how robust strategies are defined, isnt it?

-Darwin
 
 
AdBlock Detected

We get it, advertisements are annoying!

But it's thanks to our sponsors that access to Trade2Win remains free for all. By viewing our ads you help us pay our bills, so please support the site and disable your AdBlocker.

I've Disabled AdBlock