3rd generation NN, deep learning, deep belief nets and Restricted Boltzmann Machines

0007 · Aug 31, 2011

rathcoole_exile said:
or, to summarise,

"this is all a crock of shyte"

rathcoole's ability to sum up a 98 post thread in just a few words is wonderful!

Krzysiaczek99 · Apr 6, 2012

result snapshot

Lets revive this thread. Here is a snapshot of OOS results for 8 different algos
for Dukas 1min EURUSD at 2011.09.15. Results are divided on BUYs and SELLs

It clear that on this day none of the algos was able to generate profit both on the buy and sell side so it points to very difficult data

I wonder if anybody is able to show the results of any automatic strategy which would be profitable on this day ??

Krzysztof

EURUSD1dukas_2011.09.15_BUY_J48__IS_bars=20000_OOS_bars=1440_PP=37.2832_SL=15_TP=30_AC=0.67639_Profit=-1765_epoch=5_instantPip170
EURUSD1dukas_2011.09.15_BUY_LIBLINEAR_-s 1 -c 1 -w1 1000_IS_bars=20000_OOS_bars=1440_PP=20.6897_SL=15_TP=30_AC=0.72569_Profit=-1650_epoch=5_instantPip170
EURUSD1dukas_2011.09.15_BUY_PegasosLR__IS_bars=20000_OOS_bars=1440_PP=NaN_SL=15_TP=30_AC=0.7375_Profit=0_epoch=5_instantPip170
EURUSD1dukas_2011.09.15_BUY_PegasosSVM__IS_bars=20000_OOS_bars=1440_PP=NaN_SL=15_TP=30_AC=0.7375_Profit=0_epoch=5_instantPip170
EURUSD1dukas_2011.09.15_BUY_Ridor__IS_bars=20000_OOS_bars=1440_PP=100_SL=15_TP=30_AC=0.74028_Profit=326_epoch=5_instantPip170
EURUSD1dukas_2011.09.15_BUY_Ripper__IS_bars=20000_OOS_bars=1440_PP=29.6754_SL=15_TP=30_AC=0.55486_Profit=-18810_epoch=5_instantPip170
EURUSD1dukas_2011.09.15_BUY_Tlogistic_cg__IS_bars=18560_OOS_bars=1440_PP=NaN_SL=15_TP=30_AC=0.7375_Profit=0_epoch=50_instantPip170
EURUSD1dukas_2011.09.15_BUY_Tlogistic_sgd__IS_bars=18560_OOS_bars=1440_PP=NaN_SL=15_TP=30_AC=0.7375_Profit=0_epoch=1000_instantPip170

EURUSD1dukas_2011.09.15_SELL_J48__IS_bars=20000_OOS_bars=1440_PP=24.1844_SL=15_TP=30_AC=0.25764_Profit=-94637_epoch=5_instantPip170
EURUSD1dukas_2011.09.15_SELL_LIBLINEAR_-s 1 -c 1 -w1 1000_IS_bars=20000_OOS_bars=1440_PP=23.697_SL=15_TP=30_AC=0.2375_Profit=-98744_epoch=5_instantPip170
EURUSD1dukas_2011.09.15_SELL_PegasosLR__IS_bars=20000_OOS_bars=1440_PP=29.7662_SL=15_TP=30_AC=0.45069_Profit=-54194_epoch=5_instantPip170
EURUSD1dukas_2011.09.15_SELL_PegasosSVM__IS_bars=20000_OOS_bars=1440_PP=30.4762_SL=15_TP=30_AC=0.47847_Profit=-49844_epoch=5_instantPip170
EURUSD1dukas_2011.09.15_SELL_Ridor__IS_bars=20000_OOS_bars=1440_PP=13.6364_SL=15_TP=30_AC=0.28542_Profit=-84595_epoch=5_instantPip170
EURUSD1dukas_2011.09.15_SELL_Ripper__IS_bars=20000_OOS_bars=1440_PP=13.3978_SL=15_TP=30_AC=0.39514_Profit=-70665_epoch=5_instantPip170
EURUSD1dukas_2011.09.15_SELL_Tlogistic_cg__IS_bars=18560_OOS_bars=1440_PP=NaN_SL=15_TP=30_AC=0.76319_Profit=0_epoch=50_instantPip170
EURUSD1dukas_2011.09.15_SELL_Tlogistic_sgd__IS_bars=18560_OOS_bars=1440_PP=17.6471_SL=15_TP=30_AC=0.75556_Profit=-1380_epoch=1000_instantPip170

fralo · Apr 9, 2012

Glad to see you're still around. You think maybe this horse is dead?😀

Krzysiaczek99 · Apr 9, 2012

I don't think its dead. On good day with equity curve trading i was able to obtain something like this OOS

http://www.trade2win.com/boards/met...ator-mt4-using-neuroshell-92.html#post1751632

see also this

http://www.trade2win.com/boards/met...tor-mt4-using-neuroshell-101.html#post1790068

Beside this just yesterday I found the bug in area of rescaling the data so perhaps will have to recalculate some results. Lets see....

Krzysiaczek99 · Apr 9, 2012

next result snapshot

Here is another result snapshot from 18.04. Its clear that this day was much more friendly for AI algos than from previous snapshot, on the SELL side most of the algos were profitable.

In this snapshot there are 3 types of deep nets
1 layer RBM with 170 neurons trained on 5 epochs (RBM)
5 layers DBN stacked with 5 RBMs layers trained on 5 epochs (DBN)
5 layer DBN with pretraining, finetuning and validation trained on 5 and 20 epochs (TDBN) This net is kind of 'state of art' from deep nets research from last years. None of those nets were able to converge properly on FOREX data and generate positive classess.

Krzysiaczek99 · Apr 11, 2012

some conclusions and way forward

After playing with this project for almost 1.5 year(with breaks of course) I would like to share some conclusions

It's pretty clear for me is that the most deterministic parameter for performance is not really algorithm but the data itself and its bias (up/down). Next the most important parameter is value of SL/TP which determines the label.
Using fixed SL/TP actually may be one of the biggest weakness of this system, perhaps adaptive SL/TP should be used like ATR based or Sharpe ratio based.

The changing of the training range length didin't have very much impact, however it was stabilizing the results.

Deep nets didn't show greater performance over e.g. SVM, they shown up big difficulty in training instead.

The possible ways forward are try to e.g. create some syntetic pair and try this method on this data or use e.g.data segmentation and also to introduce other SL/TP calculation methods. Maybe multiple labeling should be also considered.

Another possible way forward is to introduce some cut off mechanism which would disable losing algorithm. I tried this using equity curve (in MQL) with some success.

This post concludes this thread from side, I think continuation with this type of research goes much beyond the scope of this thread.

Krzysztof

Krzysiaczek99 · Sep 17, 2012

GPUMLib library

GPUMLib | Free software downloads at SourceForge.net

GPUMLib library seems to be extended for a new net types. All of them are using CUDA and contain example including MS VS project ready to compile.

Krzysztof

Features

Multiple Back-Propagation
Back-Propagation
Radial Basis Functions
Non-Negative Matrix Factorization
Autonomous Training System
Restricted Boltzmann Machines
Deep Belief Networks

Krzysiaczek99 · Dec 30, 2012

new DeepLearning toolbox for MATLAB

https://github.com/rasmusbergpalm/DeepLearnToolbox

Krzysztof

surfeur · May 21, 2013

Hi Krzysiaczek99,

Very good thread 🙂

Have you stay in this trading method ? Have you test or forward test in live mode ?

Thanks.
Regards

Krzysiaczek99 · Nov 14, 2013

I've got inspired by this blog http://dekalogblog.blogspot.com and decided to revive this thread. Dekalog seems to work pretty hard and not giving up !!!
The subject seems to be pretty heavy and difficult, a lot of money is involved and possible to make/lose. Market seems to be constantly changing adapting to a new strategies/methods of market participants. I'm going to stick to low time frames (1min) as it guarantee enough of trades to be more sure about results. Beside this I'm not buying the story that low TFs are more noisy. Personally I believe is quite opposite i.e. high TFs beside providing less information (due to down sampling unless you are able to collect e.g. 100 000 daily bars) just introducing additional risk that event can happen and suddenly strategy model will not work anymore.

So this whats I'm going to do:

Introduce automatic Walk Forward testing to be able to collect more OOS trades

Instead of fixed SL/TP try to use adaptive ones

Try to use some more advanced inputs e.g. from digital filters, SVD components etc to see if it improves
classifier performance

Try to make some feature extraction to see if ti helps. This can be done manually in Weka after converting MATLAB data to WEKA

Try to use ensemble of non correlated signals (e.g ARIMA and signal from AI algo)

Additionally I will try to add some Markov Regime Switching Models to see if it helps.

The finally I will try to decompose the original time series and train classifier or fit the predictor on subset of original data.

If anybody has some more ideas let me know

Krzysztof

Krzysiaczek99 · Dec 13, 2013

2 new algorithms

I added 2 new algorithms to my system: Stacked Denoising Autoencoder and Simple logistic. Additionally I added possibility of output in sample results (OOS_bars=0) to see if those algos are able to converge during training.

the data sets are the same like for other algos i.e. 20000 training 1 min EURUSD bars from dukas and 1440 1min bars for trading (OOS) on 2011.04.18 and 2011.09.15.

The results seems to be consistent with results from other algos for those days,
SDAE looks like underperforming simple because simple logistic didn't make any trades on the BUY side so generated just 0 (no trade) class.

I also realized than the evaluation just using accuracy and precision (% profitable)
is not good enough so I added kappa statistic and Mathew correlation indicator.
For comparing the classifier I use McNemar test. See screen shoot and tutorial
from N. Japkowicz

http://www.site.uottawa.ca/~nat/#Tutorials

Krzysztof

Krzysiaczek99 · Jan 7, 2014

rebinning of data

Hi,

Its time to post some results. Last weeks I was working to find out if rebinning of data will improve the results. The idea of rebinning is pretty simple = it is just down sampling and each data point is made as a mean of preceding data points.

I took the code of rebinner from this site

http://www.quantatrisk.com/2013/03/22/rebinning-of-financial-time-series/

In my case I used factor of 5 for rebinning so e.g. 20000 1min EURUSD bars was rebinned to 4000. Its more or less like switching from 1min to 5min chart and classifiers were trained on rebinned data (so like 5min)

The questions what I wanted to answer were

1) Is rebinning improving the results ?
2) What training length is optimal for algos ?

so I trained 17 algos on 10000, 20000 and 50000 1min bars and its rebinned equivalent (2k, 4k, 10k). Trading days were 2011.04.18 and 2011.09.15 so 1440 1 min bars, Stop loos 15pips, Take profit 30 pips. Algos used were tree algos (J48, LMT Ripper), rules Ridor, SVMs like LIBSVM LIBLINEAR Pegasos, deep nets (DBN, RBM) and few others.

Additional performance measures were added - kappa statistic and Mathew correlation index and the top of precision, accuracy and profit.

Excel sheet contains the results. From ALL OOS sheet seems that average profit and precision(% profitable) is similar for both rebinned and not rebinned data i.e. 33% and -11k loss but situation is changing if we look only in the most precise algos. I put algos with precision >= 40% and for normal data average profit is 35727 and for rebinned is 27385(sheet most precise 1) Additionally from sheet most precise you can compare different training lengths and you can see that shorter is better....Hmmmm I thought should be opposite - longer = more examples to train but it seems not to be a case.

Fell free to sort this sheet as you want and share your conclusions

Krzysztof

fabwa · Jan 9, 2014

Hey Krzysztof,
i have been skimming through your thread the last day.
To quickly introduce myself, i graduated in AI and following a PhD in ML in a european University now. Your stats are interesting. Could you give a few lines summary how your system is operating right now? Throughout the posts i lost a little bit the big picture where you are standing at and what exactly you are using right now. I am happy to give my bits on your approach and if time allows contribute to this thread!

greetings

fabwa

Krzysiaczek99 · Jan 9, 2014

fabwa said:
Hey Krzysztof,
i have been skimming through your thread the last day.
To quickly introduce myself, i graduated in AI and following a PhD in ML in a european University now. Your stats are interesting. Could you give a few lines summary how your system is operating right now? Throughout the posts i lost a little bit the big picture where you are standing at and what exactly you are using right now. I am happy to give my bits on your approach and if time allows contribute to this thread!

greetings

fabwa

The system uses FOREX 1min data. It tries to buy and sell on each bar and if
successful (i.e. take profit is hit it generate class 1 if not successful (stop loss)
class 0. So on every trading day it generates 2 testing data sets; one on BUY and one on SELL side. I have a script which combines those results from 2 into one total result.

The best to see how it performs is to sort excel sheet from my previous post.
You can see that a lot depends of the day e.g. some algos were very successful
on 04.18 and unsuccessful on 09.15. System is MATLAB based with interfaces to WEKA, THEANO and other MATLAB toolboxes.

Clearly some algos perform well, other worse, however kappa statistic and Mathew correlation index are very little above random level on average.

In my last attempt I tried to rebin the data to see if it improve the results,
now I will try some other things like under/oversampling, removing some aliasing
noise from the data etc, trying to improve precision and stability.
At the moment is clear for me that it can not work on its own, rather like a part
giving trading signal.

Krzysztof

deskipper · Jan 9, 2014

SALE! SALE! SALE!

Только до 13 января Злюка Бобер по супер-цене $999.
Не упустите свой шанс! http://mql5.com/1fbv

Only till January 13th, 2014, Angry Beaver just for $999.
Do not lose your chance to get my Christmas offer! http://mql5.com/1zl5

Krzysiaczek99 · Jan 14, 2014

feature extraction volunteurs

here is a link to the data what I'm using during my tests in WEKA format. If somebody is interested in help than can try to make feature extraction (input elimination) or another pre processing method with the goal of improving results
so better precision, kappa mc and accuracy.

Weka has a lot of different feature extraction and pre processing methods build in
so just matter of trying, no programming or MATLAB skill necessary for this.

http://www.4shared.com/rar/rLgys0neba/Arff.html

Krzysztof

fabwa · Jan 15, 2014

Krzysiaczek99 said:
here is a link to the data what I'm using during my tests in WEKA format. If somebody is interested in help than can try to make feature extraction (input elimination) or another pre processing method with the goal of improving results
so better precision, kappa mc and accuracy.

Weka has a lot of different feature extraction and pre processing methods build in
so just matter of trying, no programming or MATLAB skill necessary for this.

http://www.4shared.com/rar/rLgys0neba/Arff.html

Krzysztof

I will have time soon to do some testing but first lets be clear about the input parameters / features you are using. I have in depth knowledge about feature selection methods and imbalanced learning. Yet the input space is very important on any classification task. If i understood correctly u are just using binary inputs? Furthermore testing 2 random days has near no statistical significance...

Krzysiaczek99 · Jan 15, 2014

features

fabwa said:
I will have time soon to do some testing but first lets be clear about the input parameters / features you are using. I have in depth knowledge about feature selection methods and imbalanced learning. Yet the input space is very important on any classification task. If i understood correctly u are just using binary inputs? Furthermore testing 2 random days has near no statistical significance...

2 random days = 2x1440 bars = 2880x2 labels (buy and sell side), i think is quite many

anyway I connected my PCs as cluster and trying to get more data.

As far as imbalance learning is concerned. Sometimes it is imbalanced, sometimes not e.g. when trend is down, the 'buy' data set will be imbalanced as will be a few
successful buys against the trend and it will be oposite on sell side. So the model must be really robust.

features are both binary and real values. List is below

cond(1) = within(upBol_1(t),price(t)); % within 5 pips of upper Bollinger band
cond(2) = within(upBol_15(t),price(t)); % within 5 pips of upper Bollinger band
cond(3) = within(upBol_20(t),price(t)); % within 5 pips of upper Bollinger band
cond(4) = within(upBol_25(t),price(t)); % within 5 pips of upper Bollinger band
cond(5) = within(upBol_30(t),price(t)); % within 5 pips of upper Bollinger band

cond(6) = within(lwBol_1(t),price(t)); % within 5 pips of upper Bollinger band
cond(7) = within(lwBol_15(t),price(t)); % within 5 pips of upper Bollinger band
cond(8) = within(lwBol_20(t),price(t)); % within 5 pips of upper Bollinger band
cond(9) = within(lwBol_25(t),price(t)); % within 5 pips of upper Bollinger band
cond(10) = within(lwBol_30(t),price(t)); % within 5 pips of upper Bollinger band

cond(11) = upBol_1(t);
cond(12) = lwBol_1(t);
cond(13) = upBol_15(t);
cond(14) = lwBol_15(t);
cond(15) = upBol_20(t);
cond(16) = lwBol_20(t);
cond(17) = upBol_25(t);
cond(18) = lwBol_25(t);
cond(19) = upBol_30(t);
cond(20) = lwBol_30(t);

% Price trend

cond(21) = trend(price,t,'DOWN',2); % duration 2 bars
cond(22) = trend(price,t,'DOWN',3); % duration 3 bars
cond(23) = trend(price,t,'DOWN',4); % duration 4 bars
cond(24) = trend(price,t,'DOWN',5); % duration 5 bars
cond(25) = trend(price,t,'DOWN',6); % duration 6 bars

cond(26) = trend(price,t,'UP',2); % duration 2 bars
cond(27) = trend(price,t,'UP',3); % duration 3 bars
cond(28) = trend(price,t,'UP',4); % duration 4 bars
cond(29) = trend(price,t,'UP',5); % duration 5 bars
cond(30) = trend(price,t,'UP',6); % duration 6 bars

if t > 2
cond(31) = high(t) < high(t-1) && low(t) < low(t-1);
cond(32) = high(t) < high(t-1) && low(t) > low(t-1);
cond(33) = high(t) > high(t-1) && low(t) < low(t-1);
cond(34) = high(t) > high(t-1) && low(t) > low(t-1);
end

cond(35) = price(t) < open(t);

% RSI

cond(36) = trend(RSI8,t,'DOWN',2);
cond(37) = trend(RSI8,t,'DOWN',3);
cond(38) = trend(RSI8,t,'DOWN',4);
cond(39) = trend(RSI8,t,'DOWN',5);
cond(40) = trend(RSI8,t,'DOWN',6);

cond(41) = trend(RSI8,t,'UP',2);
cond(42) = trend(RSI8,t,'UP',3);
cond(43) = trend(RSI8,t,'UP',4);
cond(44) = trend(RSI8,t,'UP',5);
cond(45) = trend(RSI8,t,'UP',6);

cond(46) = trend(RSI14,t,'DOWN',2);
cond(47) = trend(RSI14,t,'DOWN',3);
cond(48) = trend(RSI14,t,'DOWN',4);
cond(49) = trend(RSI14,t,'DOWN',5);
cond(50) = trend(RSI14,t,'DOWN',6);

cond(51) = trend(RSI14,t,'UP',2);
cond(52) = trend(RSI14,t,'UP',3);
cond(53) = trend(RSI14,t,'UP',4);
cond(54) = trend(RSI14,t,'UP',5);
cond(55) = trend(RSI14,t,'UP',6);

cond(56) = trend(RSI50,t,'DOWN',2);
cond(57) = trend(RSI50,t,'DOWN',3);
cond(58) = trend(RSI50,t,'DOWN',4);
cond(59) = trend(RSI50,t,'DOWN',5);
cond(60) = trend(RSI50,t,'DOWN',6);

cond(61) = trend(RSI50,t,'UP',2);
cond(62) = trend(RSI50,t,'UP',3);
cond(63) = trend(RSI50,t,'UP',4);
cond(64) = trend(RSI50,t,'UP',5);
cond(65) = trend(RSI50,t,'UP',6);

cond(66) = trend(RSI200,t,'DOWN',2);
cond(67) = trend(RSI200,t,'DOWN',3);
cond(68) = trend(RSI200,t,'DOWN',4);
cond(69) = trend(RSI200,t,'DOWN',5);
cond(70) = trend(RSI200,t,'DOWN',6);

cond(71) = trend(RSI200,t,'UP',2);
cond(72) = trend(RSI200,t,'UP',3);
cond(73) = trend(RSI200,t,'UP',4);
cond(74) = trend(RSI200,t,'UP',5);
cond(75) = trend(RSI200,t,'UP',6);

cond(76) = RSI8(t);
cond(77) = RSI14(t);
cond(78) = RSI50(t);
cond(79) = RSI200(t);

% CCI

cond(80) = trend(CCI5,t,'DOWN',2);
cond(81) = trend(CCI5,t,'DOWN',3);
cond(82) = trend(CCI5,t,'DOWN',4);
cond(83) = trend(CCI5,t,'DOWN',5);
cond(84) = trend(CCI5,t,'DOWN',6);

cond(85) = trend(CCI5,t,'UP',2);
cond(86) = trend(CCI5,t,'UP',3);
cond(87) = trend(CCI5,t,'UP',4);
cond(88) = trend(CCI5,t,'UP',5);
cond(89) = trend(CCI5,t,'UP',6);

cond(90) = trend(CCI10,t,'DOWN',2);
cond(91) = trend(CCI10,t,'DOWN',3);
cond(92) = trend(CCI10,t,'DOWN',4);
cond(93) = trend(CCI10,t,'DOWN',5);
cond(94) = trend(CCI10,t,'DOWN',6);

cond(95) = trend(CCI10,t,'UP',2);
cond(96) = trend(CCI10,t,'UP',3);
cond(97) = trend(CCI10,t,'UP',4);
cond(98) = trend(CCI10,t,'UP',5);
cond(99) = trend(CCI10,t,'UP',6);

cond(100) = trend(CCI21,t,'DOWN',2);
cond(101) = trend(CCI21,t,'DOWN',3);
cond(102) = trend(CCI21,t,'DOWN',4);
cond(103) = trend(CCI21,t,'DOWN',5);
cond(104) = trend(CCI21,t,'DOWN',6);

cond(105) = trend(CCI21,t,'UP',2);
cond(106) = trend(CCI21,t,'UP',3);
cond(107) = trend(CCI21,t,'UP',4);
cond(108) = trend(CCI21,t,'UP',5);
cond(109) = trend(CCI21,t,'UP',6);

cond(110) = trend(CCI35,t,'DOWN',2);
cond(111) = trend(CCI35,t,'DOWN',3);
cond(112) = trend(CCI35,t,'DOWN',4);
cond(113) = trend(CCI35,t,'DOWN',5);
cond(114) = trend(CCI35,t,'DOWN',6);

cond(115) = trend(CCI35,t,'UP',2);
cond(116) = trend(CCI35,t,'UP',3);
cond(117) = trend(CCI35,t,'UP',4);
cond(118) = trend(CCI35,t,'UP',5);
cond(119) = trend(CCI35,t,'UP',6);

cond(120) = CCI5(t);
cond(121) = CCI10(t);
cond(122) = CCI21(t);
cond(123) = CCI35(t);

% MAs

cond(124) = price(t) < MA5(t);
cond(125) = price(t) < MA15(t);
cond(126) = price(t) < MA30(t);
cond(127) = price(t) < MA70(t);
cond(128) = price(t) < MA150(t);

cond(129) = price(t) < EMA10(t);
cond(130) = price(t) < EMA20(t);
cond(131) = price(t) < EMA50(t);
cond(132) = price(t) < EMA100(t);
cond(133) = price(t) < EMA200(t);

cond(134) = MA5(t);
cond(135) = MA15(t);
cond(136) = MA30(t);
cond(137) = MA70(t);
cond(138) = MA150(t);

cond(139) = EMA10(t);
cond(140) = EMA20(t);
cond(141) = EMA50(t);
cond(142) = EMA100(t);
cond(143) = EMA200(t);

% stochastics

cond(144) = stochK143(t);
cond(145) = stochK215(t);
cond(146) = stochK3610(t);
cond(147) = stochK5021(t);

cond(148) = stochD143(t);
cond(149) = stochD215(t);
cond(150) = stochD3610(t);
cond(151) = stochD5021(t);

cond(152) = DPO10(t);
cond(153) = DPO20(t);
cond(154) = DPO50(t);
cond(155) = DPO200(t);

cond(156) = m(t);
cond(157) = d(t);
cond(158) = h(t); % hour
cond(159) = mn(t);% minute
cond(160) = day(t);

cond(161) = open(t);
cond(162) = high(t);
cond(163) = low(t);
cond(164) = close(t);

cond(165) = lag1(t);
cond(166) = lag2(t);
cond(167) = lag3(t);
cond(168) = lag4(t);
cond(169) = lag5(t);
cond(170) = lag6(t);

fabwa · Jan 16, 2014

Thanks for the list,
make sure you don't have any absolute values as that does not make much sense.. (better choose relative values like log return for these features)

make sure you add some volume features aswell!

There was a post before with a valid point:

as long as you stick to technical analysis (w/o fundamentals) all your features are computed from open/close/high/low, volume and time. These 6 values over time already contain all the information all the other features are constructed from. In theory classifiers are able to learn these dependencies wrt the output label themselve. Of course learning gets easier when some noise is removed (but maybe also valueable information? 🙂 - just a side note

ps: "2 random days = 2x1440 bars = 2880x2 labels (buy and sell side), i think is quite many"

what makes you think this is enough? Moreover who tells you that these 2 days are representative for any other day? Just keep in mind the market is so complex that any bias from particularity should be avoided.

Krzysiaczek99 · Jan 16, 2014

fabwa said:
Thanks for the list,
make sure you don't have any absolute values as that does not make much sense.. (better choose relative values like log return for these features)

make sure you add some time and volume features aswell!

There was a post before with a valid point:

as long as you stick to technical analysis (w/o fundamentals) all your features are computed from open/close/high/low, volume and time. These 6 values over time already contain all the information all the other features are constructed from. In theory classifiers are able to learn these dependencies wrt the output label themselve. Of course learning gets easier when some noise is removed (but maybe also valueable information? 🙂 - just a side note

ps: "2 random days = 2x1440 bars = 2880x2 labels (buy and sell side), i think is quite many"

what makes you think this is enough? Moreover who tells you that these 2 days are representative for any other day? Just keep in mind the market is so complex that any bias from particularity should be avoided.

Here are the features with absolute values

cond(11) = upBol_1(t);
cond(12) = lwBol_1(t);
cond(13) = upBol_15(t);
cond(14) = lwBol_15(t);
cond(15) = upBol_20(t);
cond(16) = lwBol_20(t);
cond(17) = upBol_25(t);
cond(18) = lwBol_25(t);
cond(19) = upBol_30(t);
cond(20) = lwBol_30(t);

cond(134) = MA5(t);
cond(135) = MA15(t);
cond(136) = MA30(t);
cond(137) = MA70(t);
cond(138) = MA150(t);

cond(139) = EMA10(t);
cond(140) = EMA20(t);
cond(141) = EMA50(t);
cond(142) = EMA100(t);
cond(143) = EMA200(t);

cond(156) = m(t);
cond(157) = d(t);
cond(158) = h(t); % hour
cond(159) = mn(t);% minute
cond(160) = day(t);

cond(161) = open(t);
cond(162) = high(t);
cond(163) = low(t);
cond(164) = close(t);

features 156-160 it is a time information i.e. month, day of the week etc
All features are normalized 0-1 later. You think I should remove them or replace with their log returns or lags ??

Anyway, I tried some feature selection and none of the method from WEKA improved the results maybe PCA in some cases (but i didnt try all of them !!!)
I know that maybe 100 of features is not necessary, maybe just lags are enough,
results are kind of similar even if half of them is removed so it confirms theory that classifiers are able to learn this information anyway.

But what you would suggest as a preprocessing for this data ??

what makes you think this is enough?

of course its not enough, even if it is a lot of trades they are strongly correlated
so it does not prove anything. So I'm trying now to get results for more days.
That the only way to make some conclusions in my opinion.

Krzysztof

3rd generation NN, deep learning, deep belief nets and Restricted Boltzmann Machines

Senior member

Well-known member

Active member

Well-known member

Well-known member

Attachments

Well-known member

Well-known member

Well-known member

Junior member

Well-known member

Well-known member

Attachments

Well-known member

Attachments

Junior member

Well-known member

Newbie

Well-known member

Junior member

Well-known member

Junior member

Well-known member