3rd generation NN, deep learning, deep belief nets and Restricted Boltzmann Machines

basic tests

than i tried another combination: 1200 bars for training and 775 for OOS

From results you can see that

for buy orders - system don't generate any buy signals :clap: very good !!
for sell orders - system generate continous sell signal : :clap: very good !!
 

Attachments

  • 1200IS_775OOS_BUY.jpg
    1200IS_775OOS_BUY.jpg
    25.9 KB · Views: 338
  • 1200IS_775OOS_SELL.jpg
    1200IS_775OOS_SELL.jpg
    26.9 KB · Views: 382
Basic tests

Than final tests I made setting 1st 500 bars of downtrend for training than last 500 for OOS. So it is trained on downtrend and trading on downtrend

from result you can see

for buy orders - no buy orders generated :clap: very good
for sell orders - it started to generated sell orders, stopped and resume again
 

Attachments

  • 500IS_500OOS_BUY.jpg
    500IS_500OOS_BUY.jpg
    28 KB · Views: 340
  • 500IS_500OOS_SEL.jpg
    500IS_500OOS_SEL.jpg
    28.6 KB · Views: 368
conclusion

So from all those post above is possible to conclude that basic funcionality of TradeFX for
sell/buy orders is working however in order to have good results test patterns must be included in training patterns set, otherwise TradeFX gets confused. I used PipMaximizer strategy for this test.

Lets see if DBN and RBM will do better than SVM
 
CRBM and 'feafure of feature'

Here are two papers which are making comparative analysis of performance of DBN
versus SVM for document classification and CRBM net used by Taylor for character recognition. Reading those two papers and conclusions two question are coming to my mind.

1) Does FOREX series containing 'feature of feature' ?? In my opinion yes due to different traders trading on different TFs

2) Does FOREX serie is a continous proces which can be treated like e.g. human motion ??. In my opinion no, its well known that e.g very often move of price is started by starting trading session or news. Than maybe data should be partition somehow ??

Any opinions ??

Conclusion & Suggestions for Improvement

Deep belief networks show promise for certain applications in the future, given the theory behind them. Much information has been published on the expressive power of DBNs in[5] and [6].

However, the results of my experiments show that DBNs are perhaps more
sensitive to having an appropriate implementation for a given application than other
learning algorithms/classifiers. Even though the same binary training data was presented
to both the DBN and the SVM and NB classifiers, the SVM and NB classifiers
outperformed the DBN for the document classification application. As mentioned
previously, the most likely reason for this is due to the fact that DBNs iteratively learn
“features-of-features” in each level’s RBM. If the network has an appropriate
implementation for the task at hand, this will potentially lead to a very high accuracy
classifier (in [10] Hinton describes how a DBN can be built to outperform other types of
classifiers on the task of digit recognition). However, if the network and data do not
work perfectly well together, this feature-of-feature learning can potentially lead to
recursively learning features that do not appropriately model the training data. This is
what I believe to be the case in the experiments described in this paper, since the DBN
performed far better than random guessing, but was outperformed by the relatively stock SVM and NB classifiers.

==========================================================================

5.2 Results and Conclusions
The 2-level CRBM models, whose training was performed by mini-batch gradient updates
that are computed using the Contrastive Divergence rules, clearly do not perform
well on the character sequences dataset. Additionally, the mAR models outperform the
2-level CRBM on the specific dataset. An interesting observation made here, is that
when trained on the simpler character datasets, the mAR models generally perform
much better, whereas the CRBM models exhibit slightly better results in certain cases
only, and are still incapable of capturing the structure of the input space, even when
trained on samples from single characters.

In the mAR models, longer history typically leads to better generic capabilities. When
trained on the 20 characters dataset, the mAR models need to consider history that
roughly corresponds to the length of the observed waveforms in order to perform relatively
well. When trained on smaller datasets, even shorter history - roughly 14-th of
the observed waveforms - is adequate. On the other hand, the 2-level CRBM models
generate a contractive system when trained on a very short history and an expanding
system when trained on very long history. The appropriate length of history - the one
that corresponds to the “best” performing model - depends on the difficulty of the learning
task. An indication for this is the fact that in the case of the 20 characters dataset
the “19-6” averaging scheme is the one that corresponds to the best model, whereas
in the case of the “a-d” characters dataset the “19-6” model generates an expanding
system1.

From our analysis, a possible reason why the mAR models outperform the 2-level
CRBM, when trained on the characters dataset, is the fact that the CRBM models try
to capture a more complex structure, but at the same time expend the representational
resources of their hidden layers in only a few hidden states, which leads to an inefficient
exploitation of this added complexity. As a result the much simpler structure captured
by the mAR models leads to generations that are more closely related to handwriten
characters.

On the other hand, when trained on the motion dataset, the 2-level CRBM model reveals
its representational power. It clearly outperforms the mAR models, as it is able to
discover a pattern for each attribute, which is close to the main trend of the corresponding
observed values over time, and it is able to reproduce this pattern by effectively using
its hidden layers in order to encode the different states of the system. On the other
hand, the mAR model is intially able to accurately generate synthetic motion, but fails
to re-produce the same pattern over time and thus results to a contractive system.

The comparative analysis of the 2-level CRBM with the mAR on the motion dataset,
additionally, reveals a potential difficulty of the learning task on the characters dataset.
1This is also the case for the experiments with the single character datasets
The attributes of the motion dataset consists of small repeated sub-sequences, of roughly
30 time frames length. Therefore the observed process that they define can be seen as
a 49-dimensional “harmonic” waveform that is continuous in time. On the other hand,
the character sequences are not continuous in time. Each process has its start and ending
and in between, since the three attributes that define the process are in derivative
space, we have a waveform that makes approximately two cycles.
This property of the data, combined with the fact that training in the 2-level CRBM
is performed by considering individual subsequences in a random order and more importantly
the fact that the training algorithm does not do smoothing, suggests that the
bad performance of the 2-level CRBM on the characters data may be, at least, partially
accredited to the training procedure currently used.
 

Attachments

  • IM080611_TS_DBN_CRBN.pdf
    1.3 MB · Views: 1,545
  • Document_classification_using_DBN.pdf
    120.8 KB · Views: 2,311
SVM, RBM and DBN results

Here are some results for EURUSD 30min obtained for SVM, RBM net and DBN net.

Experiment was done for 2 strategies, instantpip and pipmaximizer for training on 1000 and 2500 bars. Number of OOS bars in both cases was 500.

SVM was using linear kernel, RBM was using 500 neurons and trained for 500 epochs,
DBN net had configuration 500-500-500 and trained for 500 epochs also.

Results were classified if the equity curve ended positive or negative or no trade

pipmaximaizer

no trade - 8
positive - 1 RBMsell2500
negative - 3

instantpip

no trade - 1
positive - 5 DBNsell1000, RBMsell100(final eq=0), RBMsell2500,SVMsell100(finaleq=0),SVMsell2500
negative - 6

so clearly instantpip strategy seems to extract series features much better than pipmaximaizer.

than 2 questions are coming to my mind

1) are we tricked by randomenss here - I dont think so. TradeFX tries to trade on each bar
so we have 500 real OOS samples but......

2) Are those features are constant for certain TF and currency pair or they are changing in time ??

Perhaps some WF test should be done to have more data.

Any ideas ??
 

Attachments

  • instantpip.zip
    340 KB · Views: 269
  • pipmaximizer.zip
    206.7 KB · Views: 256
Re: some basic tests

After some nice posts lets return to reality and check if TradeFx actually works and how it behaves in some stress conditions so trading against the patterns which it didn't learn

First I applied TradeFX against simple sinus series. It has quite long period but training was including three full periods.

From the screenshots is clear that it works well. Both Buy and Sell singals were generated
at the bottom and tops of sinwave.

So far so good:)

A while back I have tested SVMs on noisy cycles; I got terrible results. Could you confirm that with your setup?
 
Re: basic tests

than i tried another combination: 1200 bars for training and 775 for OOS

From results you can see that

for buy orders - system don't generate any buy signals :clap: very good !!
for sell orders - system generate continous sell signal : :clap: very good !!

Could you please show the full equity curve including the trend transition?
 
Re: Basic tests

Than final tests I made setting 1st 500 bars of downtrend for training than last 500 for OOS. So it is trained on downtrend and trading on downtrend

from result you can see

for buy orders - no buy orders generated :clap: very good
for sell orders - it started to generated sell orders, stopped and resume again

SVM are classifiers so it is not surprising. You need at least two separate classes for them to discriminate well. I don’t think your test is conclusive because learning only from a trend is a one class problem. SVM will try to separate this class explaining the bad results you got.
 
Re: some basic tests

A while back I have tested SVMs on noisy cycles; I got terrible results. Could you confirm that with your setup?

I think it depends if you will denoise input data to SVM. I just provided output from different indicators so data was not denosied.

Could you please show the full equity curve including the trend transition?

see post 60 and 61. All those post show only OOS part,

SVM are classifiers so it is not surprising. You need at least two separate classes for them to discriminate well. I don’t think your test is conclusive because learning only from a trend is a one class problem. SVM will try to separate this class explaining the bad results you got.

The purpose of those test was to see if basic funcionality of TradeFX is working
and to see which trading signal will be generated in case when OOS pattern wont match
training pattern and result was as expected. The same behaviour shows RBN and DBN.

Krzysztof
 
The way forward

Obviously the results from post 65 were not so much encouraging. But it's not end of the world.

The following ways forward coming my mind:

1) Use the strategies with numeric values. Current strategies are using binary values
so it's just 7 and 9 bits of information. It can be not enough to extract market info

In this case classifiers based on RBM and DBN can no be used but
classifiers based on CRBM and FCRBM should be used instead

2) Use ensembled methods so multiple classifiers. There is a lot if ensembled methods like

* AdaBoostM1
* Bagging
* Up Sampling
* Down Sampling
* Stacking / Hierachial Classification
* Stacking with Feature Subspaces
* Majority Voting
* Multi-Class Classification Wrapper using Output Coding
* Multi-Label Classification Wrapper

For more info see book 'PATTERN CLASSIFICATION USING ENSEMBLE METHODS'

3) Design other strategies which hopefully will extract more info from market

4) Use other (e.g. tree) classifiers

Any other ideas ??
 
Re: some basic tests

I think it depends if you will denoise input data to SVM. I just provided output from different indicators so data was not denosied.

see post 60 and 61. All those post show only OOS part,

The purpose of those test was to see if basic funcionality of TradeFX is working
and to see which trading signal will be generated in case when OOS pattern wont match
training pattern and result was as expected. The same behaviour shows RBN and DBN.

Krzysztof

You post 58 has clean sinusoids. As the first part of the equity lines seems to indicate (your posts 60, 61) I'd like to have confirmation that SVM don't work on noisy (market) conditions.

I'd like to see the full equity line including the transition to the down trend; your post 62 does not show it. Just curious to see whether SVM could be wrong in detecting the change in trend for various training periods (the start or end date vary).
 
Re: some basic tests

You post 58 has clean sinusoids. As the first part of the equity lines seems to indicate (your posts 60, 61) I'd like to have confirmation that SVM don't work on noisy (market) conditions.

I'd like to see the full equity line including the transition to the down trend; your post 62 does not show it. Just curious to see whether SVM could be wrong in detecting the change in trend for various training periods (the start or end date vary).

You see it in post 60 and yes SVM is wrong. After transition of trend to downtrend it generate continous buy signal so don't adapt at all. But when its trained on both uptrend and downtrend than is OK (post 61)
 
Last edited:
Re: some basic tests

You see it in post 60 and yes SVM is wrong. After transition of trend to downtrend it generate continous buy signal so don't adapt at all. But when its trained on both uptrend and downtrend than is OK (post 61)

Your post 61 shows OOS results on a downtrend only. I want to see OOS results on data that has a change in trend! The questions are: are SVMs wrong in detecting a change in trend even though a change in trend was learned from the training samples? What happens if the change in trend varies? Do SVMs need to learn an infinite number of changes in trend of varying amplitude to correctly detect them? If true SVMs don’t generalize.

I am not trying to be cantankerous here. Just want to draw a final conclusion about SVMs.
 
I believe you should read posts more careful, post 60 shows equity curve in case of changing trend

775 Bars trainig and 1200 OOS bars, signal is 2000 bars, 1000 up and 1000 down so you have in post 60 - 225 bars up switch to downtrend and 1000 bars down.

If you want to see more results you are welcome to participate more, so download TradeFX from link from 1st post, generate some test signals and draw your own conclusions.

Krzysztof
 
Last edited:
I believe you should read posts more careful, post 60 shows equity curve in case of changing trend

775 Bars trainig and 1200 OOS bars, signal is 2000 bars, 1000 up and 1000 down so you have in post 60 - 225 bars up switch to downtrend and 1000 bars down.

If you want to see more results you are welcome to participate more, so download TradeFX from link from 1st post, generate some test signals and draw your own conclusions.

Krzysztof

Your posts don’t show an equity curve with both a change in trend in the training set and a change in trend in the testing set. Anyway I might look into it myself. I don’t think I’ll test TradeFX though; the future leaks you have discovered were not bugs; they were coded intentionally so I doubt of its utility even after your fix.
 
Re: Ensemble learning

For interested people some info about Ensemble learning

http://www.scholarpedia.org/article/Ensemble_learning

http://videolectures.net/bootcamp07_saffari_ieb/

end enclosure - extended version of scholarpedia article

Ensemble Learning in general and boosting in particular are appealing because they have a tendency to not overfit when the learners are well chosen. I tried to boost adaptive neural nets; unfortunately, bad initial results did not encourage me to pursue in that direction. I believe the reason is its sensitivity to label noise. Boosting is supervised or semi-supervised meaning that new bars need to be labeled (“up” or “down” for the sake of example) so that the weak learners know whether they are correct or wrong. It is OK as long as we do classification after the close of each range so that the labels are correct. This has limited use though because classifiers do not generalize unless they learn from all possible price patterns which we know is not possible. As far as my test is concerned my predicted labels were far too noisy to make it work. I believe this is a real showstopper.
 
Re: Ensemble learning

. As far as my test is concerned my predicted labels were far too noisy to make it work. I believe this is a real showstopper.


Here is a link to Amir Saffari page

http://www.ymer.org/amir/2010/04/18/online-multi-class-lpboost-code/
http://www.ymer.org/amir/2010/11/03/miforest-source-code/

he recently released the code for 'Multiple Instance Learning Algorithm with Random Forests' and 'Online Multi-Class LPBoost' Both of them are online algos which suppouse to be superior to Adaboost. Maybe you can try this.

Do you know something about Learn++ algorithm from Polikar

http://users.rowan.edu/~polikar/RESEARCH/index_files/Ensemble.html

Krzysztof
 
Feature extraction

Did anybody try to extract the features for Financial Time Series ??

According to this paper error from pruned feature set is lower.

There is also very good video lecture about it

http://videolectures.net/mmdss07_guyon_fsf/

Krzysztof
 

Attachments

  • Feature selection for SVM in FTS.pdf
    725.7 KB · Views: 3,023
Top