BSDCan2007 - Confirmed Schedule

BSDCan 2007
The Technical BSD Conference

Speakers
Robert Krten
Schedule
Day 3
Room SITE B0138
Start time 15:00
Duration 01:00
Info
ID 9
Event type Lecture
Track User
Language English
Feedback

Getting, Managing, and Analyzing Stock Market information with FreeBSD

Using a combination of open source and custom tools for fun and profit

This presentation describes how to download a large variety of equity and option data from various sources on the internet, how to manage the data (parsing, archiving, etc), and finally how to present the data to applications with a focus on efficiency and access speed. Public domain / open source tools like curl and lynx are highlighted, as well as the author's own custom tools. The entire database schema is presented, and then the use of mmap() is shown for complete efficiency.

While getting stock market data for a few stocks is very simple, the problem rapidly becomes complicated when data needs to be fetched from different data providers (some free, some subscription based) and different data formats (equities, options, automated recommendations). Storing the data is analyzed as well -- for efficiency and for archival purposes. Two passes of data parsing are required; one to get the data from the Internet format (be it a set of comma separated values from an FTP site, or deeply-buried HTML tags from a web site) and translate that into an archivable text format, and the second pass is to take the text database files and convert them into a binary database that is fast and simple to use for all downstream applications.

This presentation will show the use of curl and lynx as the basic data transports to fetch the data, and then show custom tools the author has created to parse the data. Then we'll look at the format of the text database and the rationale. Finally, the database schema is presented, and a description of the data fields is given, to illustrate how all of the different forms of data (equities, options, recommendations) are stored in a logical and consistent manner.

Code will be presented throughout, with a particular focus on the parsing and the mmap() database access, as well as the API for accessing the stock database.

Some discussion will take place of the downstream data consumers, like the automated trading system and the quotes and option selection system.