How can I reduce time wasted on data cleaning?

CSData

Newbie
Messages
2
Likes
1
I'm fairly new to the trading scene, started in 2019 with TA and this year got more into more quant based strategies. So my problem is that my models are constantly getting issues because of timestamps not being synchronized (can't interlink data sets) or because the data set has gaps. I've noticed that every 10 hours I work in modelling I spend 5 just cleaning the data.

This topic is a bit general, but as I'm quite new to this I'd like to know how to make this faster, and is it normal to spend this amount of time in preparing the data? Any suggestions are welcome. I use python/jupyter and data usually I find from free online sources and/or fetch from exchanges.
 
What datasets are you having to clean, how do you clean it and where are you getting it from ?
 
What datasets are you having to clean, how do you clean it and where are you getting it from ?
The data is coming directly from the exchange websocket connection. Because we are collecting data from multiple exchanges the symbols, timestamps etc. are different. To make a normalised data feed, we need to sanitize and make it in the common format. Also, sometime connection breaks randomly, so there are gaps in data.
 
Top