How can I reduce time wasted on data cleaning?

CSData

Newbie
2 1
I'm fairly new to the trading scene, started in 2019 with TA and this year got more into more quant based strategies. So my problem is that my models are constantly getting issues because of timestamps not being synchronized (can't interlink data sets) or because the data set has gaps. I've noticed that every 10 hours I work in modelling I spend 5 just cleaning the data.

This topic is a bit general, but as I'm quite new to this I'd like to know how to make this faster, and is it normal to spend this amount of time in preparing the data? Any suggestions are welcome. I use python/jupyter and data usually I find from free online sources and/or fetch from exchanges.
 

Trader333

Moderator
8,599 931
What datasets are you having to clean, how do you clean it and where are you getting it from ?
 

CSData

Newbie
2 1
What datasets are you having to clean, how do you clean it and where are you getting it from ?
The data is coming directly from the exchange websocket connection. Because we are collecting data from multiple exchanges the symbols, timestamps etc. are different. To make a normalised data feed, we need to sanitize and make it in the common format. Also, sometime connection breaks randomly, so there are gaps in data.
 
  • Like
Reactions: Trader333
 
AdBlock Detected

We get it, advertisements are annoying!

But it's thanks to our sponsors that access to Trade2Win remains free for all. By viewing our ads you help us pay our bills, so please support the site and disable your AdBlocker.

I've Disabled AdBlock