C prog to clean Disk Trading data

blackcab

Established member
Messages
523
Likes
51
If anyone can make use of it, this program takes a historical intraday data file in Disk Trading's format (date time O H L C U D) and "cleans" it - creates a new file that consists only of:

- market-hours data (set to 9.30-16.00 EST in the program but you can change it)

- complete days, ie days with no missing bars at all. It works on 1/5/15/30/60 minute files - you set the interval in the program. For YM almost no 1-min days are complete, but nearly all 5-minute and higher days are.

It also converts US mm/dd/yyyy format to dd/mm/yyyy. It expects the source file to be called 'in.csv' and it writes to 'out.csv'. I wrote it so that charts would only show high volume periods and would have no missing data, and for designing mechanical systems that would get messed up by missing bars. I do everything in code and Excel and don't know if charting programs can do this stuff already - if so it's reinventing the wheel but never mind.

/* Read a Disk Trading data file, extract only complete days with no
* missing lines, convert US to UK date format, and write to a new file.
*
* Works on a 1/5/15/30/60-minute file, not a tick file or a daily file
* (the processing is irrelevant for them).
*
* First line:
*
* "Date","Time","O","H","L","C","U","D" 0x0D 0x0A
*
* Subsequent lines:
*
* 2*[0-9] - month (leading zeroes, eg 05. 0x30-0x39)
* / - slash (0x2F)
* 2*[0-9] - day (leading zeroes, eg 22)
* / - slash
* 4*[0-9] - year (eg 2004)
* , - comma (0x2C)
* 4*[0-9] - time (leading zeroes, eg 0930)
* , - comma
* ?*[0-9] - open (variable no. of chars, eg 9500)
* , - comma
* ?*[0-9] - high (variable no. of chars, eg 9510)
* , - comma
* ?*[0-9] - low (variable no. of chars, eg 9500)
* , - comma
* ?*[0-9] - close (variable no. of chars, eg 9510)
* , - comma
* ?*[0-9] - up ticks (variable no. of chars, eg 11)
* , - comma
* ?*[0-9] - down ticks (variable no. of chars, eg 7)
* 0xD - line end
* 0xA - line end (also last byte in file)
*
* First verify the entire file conforms to the above spec. If it doesn't,
* report which line fails and exit. Look at the line (row) in Excel or editor
* if line>65536. Note: didn't do this, assumed data is ok.
*
* Processing is as follows:
* 1. Remove out-of-hours lines. All Disk Trading times are EST, so remove
* all lines outside 9.30-16.00.
* 2. Remove days that have any missing lines. For YM 2003 that's nearly all
* days for 1-min, 14 for 5-min and 4 for higher.
*
* Method:
* 1. Create array to hold n strings where n is enough to hold 24 hrs x 1-min
* 2. Record state: 0=first line, 1=waiting for next market open, 2=recording lines in array as, so far, the day is good (no missing lines)
* 3. Read line
* 4. If time is in market hours and time=last time+n where n=1/5/15/30/60 then
* add line to array
* 5. Else set status to 1 as this day is now not worth recording
* 6. If time=1600 and status=2 then write out complete array to output
* file as the whole day was valid
* 7. Continue until end of file
*/

#include <stdio.h>

#define INTERVAL 5
#define MAX_LINE_LENGTH 50

char line[24*60][MAX_LINE_LENGTH];
FILE *fin,*fout;

main() {
int state,lineptr,i;
fin=fopen("in.csv","r");
fout=fopen("out.csv","w");
lineptr=0;
state=0;
while (!feof(fin)) {
fscanf(fin,"%s",line[lineptr]);
if (strlen(line[lineptr])>0) {
switch (state) {
case 0: //first time through loop, ie on the header line
fprintf(fout,"%s\n",line[lineptr]);
state=1;
break;
case 1: //not recording, waiting for the next market open
if (line[lineptr][11]=='0'&&
line[lineptr][12]=='9'&&
line[lineptr][13]=='3'&&
line[lineptr][14]=='0') {
swapdaymonth(line[lineptr]);
lineptr++;
state=2;
}
break;
case 2: //recording a so-far good day
swapdaymonth(line[lineptr]);
if (mins(&line[lineptr][11])!=mins(&line[lineptr-1][11])+INTERVAL) {
lineptr=0;
state=1;
} else if (line[lineptr][11]=='1'&&
line[lineptr][12]=='6'&&
line[lineptr][13]=='0'&&
line[lineptr][14]=='0') {
for (i=0;i<=lineptr;i++) {
fprintf(fout,"%s\n",line);
}
lineptr=0;
state=1;
} else {
lineptr++;
}
break;
}
}
}
}

int mins(char *ptr) {
return ((*ptr-48)*10+(*(ptr+1)-48))*60+(*(ptr+2)-48)*10+*(ptr+3)-48;
}

swapdaymonth(char *ptr) {
char tmp;
tmp=*(ptr+3);
*(ptr+3)=*ptr;
*ptr=tmp;
tmp=*(ptr+4);
*(ptr+4)=*(ptr+1);
*(ptr+1)=tmp;
}
 
Top