Building a Neural Network for Trading

Neural networks and machine learning in general is a fascinating field. For this project I wanted to implement and train a neural network in Python and use it as a signal generator for my trading algorithm. In order to train my neural network, I needed some data. The input will be a mixture of company fundamentals and pricing data (i.e.: returns).

I chose not to use any of the existing Python libraries out there like TensorFlow or PyTorch to implement my neural network as I wanted to really grasp and understand how they work. I am using NumPy and pandas for basic matrix multiplication and data manipulation.

In this part of the series I am discussing the landscape of this project and go into some of the architecture details of the system implemented to fetch the data that will be used for training my model. The application that fetches the data is called Turbine and the source code is available here.

I am fetching fundamental data directly from the the SEC and historical stock prices from Yahoo Finance.

The landscape is illustrated below and in this part we are going to focus on the green part of the landscape.

_config.yml

The main workers of the framework are the Poller, Parser and the Sender.

The DirectoryWatcher watches the input directory for any incoming request.json files. These files contain information about what data Turbine should fetch. This can be fundamental data such as long term debt or historical pricing data.

Poller threads connect to the source and poll for data. The source is SEC for funtamental data and Yahoo for stock prices. The data is cached on disc as JSON or CSV file and then passed on to the Parser for further processing. Pollers will first look if the requested data is cached before trying to fetch it from the original source.

Parsers get the data files polled by the Pollers. They extract data and make it available for persistence. For data extraction they dynamically load extractors (Python classes implementing a specific interface) from disc to handle the different data formats. The results are then passed on to the Senders.

Sender threads are responsible for persisting data into an SQLite database.

For maximum performance Turbine can be configured (using a .ini file) to spawn multiple poller, parser and sender threads since different data requests can be handled in parallel. It can also be configured to listen on a specified port. That way one can use the build in client to remotely connect to it for monitoring reasons or to stop Turbine.

Any errors that may have occured or files that could not be processed are being logged by a transaction handler for later investigation.

The fetched requested data is persisted into an SQLite relational database. This data will then be read and fed into my neural network in order to train in. I will describe this process in a later part.

Written on August 28, 2021