Best place to start my crawlers

gemmawright

Newbie
Messages
4
Likes
0
Hello!

Following an article I read about someone creating a system that fairly accurately predicts the Dow Jones based on analysing the general 'Mood' of the twitter population by looking for key words in statements such as "I feel..." - "I am feeling..." etc I decided to try something similar with the FTSE.

Rather than analysing Twitter I've decided to write a webcrawler that reads about 10,000 pages an hour starting from 5 roots that I picked at random, e.g. World business, finance, and political news from the Financial Times - FT.com, BBC News - Business etc (I'll come back to this in a minute)

I picked the top 100 words that came up from the crawls e.g. strong,exit,credit,latest,best,buy,first being the top few. And then set up a server to collect the stats on these words every 30 minutes (it takes 20 mins to crawl) during trading times.

I've only had the system running for a few days, but already if I assign e.g. +1 to 'Good' words and -1 to 'Bad' words and have the system spit out a 'score' every 30 mins I can already see my score fluctuating along with the FTSE (although very much too early to confirm!)

My aim is to use the stats for each word to train a Neural Net against what the FTSE does in the next 1hr, 1day, 3days(where Twitter was optimum) and 1 week.

OK so my question!! At the moment I am using the following 'roots' for my crawl that I picked purely from Googling things like 'FTSE news':

"http://www.telegraph.co.uk/finance/markets",
"http://www.ft.com",
"http://www.bbc.co.uk/news/business/",
"http://www.guardian.co.uk/business/marketforceslive",
"http://www.londonstockexchange.com/home/homepage.htm"

What I want to know is what roots should I be using for optimal data? What sites do you think really affect the FTSE, ones that you think really reflect the mood or more importantly influence the mood of investors and traders?

If anyone is working on similar systems or wants updates please let me know.

Thanks
 
Forget those sites. Trawl the financial bulletin board sites like this one; I think there are one or two others too…

Identify ‘trends’ (no really, no pun, OK, well, maybe just a little one) and then FADE THAT MUTHA’…
 
Very interesting. Could you incorporate data from share tipsters and financial channel gurus (high inverse correlation, I should imagine)?
 
i'd maybe suggest at least looking at the following to add to your news crawl sources.

marketwatch.com

moneyam.com

iii.co.uk

advfn.com
 
Surely the challenge here is trying to get the information ahead of the impact on the FTSE - the media is reporting what just happened, as opposed to what is about to happen. As such the correlation isn't surprising but I can imagine it's happening after the event.

Derwent Capital ran a hedge fund based on Twitter sentiment - they quietly shut that after a month after it blew itself up....
 
Surely the challenge here is trying to get the information ahead of the impact on the FTSE - the media is reporting what just happened, as opposed to what is about to happen. As such the correlation isn't surprising but I can imagine it's happening after the event.

Derwent Capital ran a hedge fund based on Twitter sentiment - they quietly shut that after a month after it blew itself up....

or wait until you have so much of the same message out there in overload mode............you go contrarian to the message

N
 
I dont really understand the motivation for people to post trade calls on social media (other than pump and dump schemes, or vendors running the usual con tricks that these platforms fascillatate)

However, if there is evidence to suggest that there are people who are posting legitimate calls then why not create your own platform something along the lines of stocktweets, or one of those trade audit type sites. The basic idea would be to identify the best and worst performing subscribers in some category, follow the best performers and fade the worst. The big problem with this approach however is differentiating those with a genuine edge, from those who are currently on a streak of good luck, or those with systems in phase with current market conditions. This particular problem is hard enough to do as a system developer with all of the relevant information available. I honestly cant see how you'd handle this problem as someone who only see's the calls, and the final result of the call without understanding the mechanics of those trades.
 
Top