Vrothdar
Well-known member
- Messages
- 344
- Likes
- 30
I'm in the process of downloading forex tick data from GAIN Capital Rate Data Archive with a view to cleaning it up and using it to test out mechanical trading systems.
Now, before I can do that, I need a little help. Firstly I'm after some confirmation of what each of the columns are. (See attached) The timezone I can check by comparing to another data source but if anybody can confirm that it's EST that would help. Any guidance on the other two columns would be greatly appreciated as I haven't a clue!
Secondly, for a confidence boost as much as anything else, I could do with a little advice with regards to cleaning the data up. From what I've been able to find out I need to determine and correct the following:
1) Overlapping data - Should I be looking for data points with the same timestamp down to the second? If so then this shouldn't be too much trouble either.
2) Erroneous tick sizes should be dead easy. Flag up any tick that is greater than a single pip different from either the tick before or after then delete them.
3) My first thoughts for tracking down gaps in the data would be to look at the data points flagged up when looking for erroneous tick sizes - if there is more than a single pip of movement there is at least one missing tick that should be between the two.
I've also been thinking about how I deal with time changes for DST. Would I expect to see 2 hours in a row that are, apparently, the same time and then when the time is moved in the other direction expect to see a "missing" hour? Would the timestamps actually be adjusted for DST at all? How is the open/close handled when DST comes into effect?
Thanks in advance for any advice.
Now, before I can do that, I need a little help. Firstly I'm after some confirmation of what each of the columns are. (See attached) The timezone I can check by comparing to another data source but if anybody can confirm that it's EST that would help. Any guidance on the other two columns would be greatly appreciated as I haven't a clue!
Secondly, for a confidence boost as much as anything else, I could do with a little advice with regards to cleaning the data up. From what I've been able to find out I need to determine and correct the following:
- Overlapping (repeated) data
- Erroneous tick sizes (causing a massive, say 500 tick, spike lasting one tick)
- Gaps in the data
1) Overlapping data - Should I be looking for data points with the same timestamp down to the second? If so then this shouldn't be too much trouble either.
2) Erroneous tick sizes should be dead easy. Flag up any tick that is greater than a single pip different from either the tick before or after then delete them.
3) My first thoughts for tracking down gaps in the data would be to look at the data points flagged up when looking for erroneous tick sizes - if there is more than a single pip of movement there is at least one missing tick that should be between the two.
I've also been thinking about how I deal with time changes for DST. Would I expect to see 2 hours in a row that are, apparently, the same time and then when the time is moved in the other direction expect to see a "missing" hour? Would the timestamps actually be adjusted for DST at all? How is the open/close handled when DST comes into effect?
Thanks in advance for any advice.