How to Get (Almost) Free Tick Data

Access to high quality, cost effective market data is a continuing problem for retail traders. I was recently told about the ongoing efforts of the startup brokerage “Alpaca”. The gentleman I spoke with said the API gave access to the tick data of thousands of stocks everyday and without cost.

I thought it was too good to be true but recently I took a little bit of time to investigate.

In this article I will describe the basic process to accessing the tick data, and some basic code I was experimenting with. I may make this into a short series because eventually you will want to automate this process and store the data for later analysis.

First things first, you have to create an account with Alpaca. There are two account levels to consider. You can sign up completely free and get some API Keys that will allow you to access IEX data. However, I don’t believe this data is offered at a tick level, and the API restrictions are stricter.

To access the tick level data via Alpaca’s integration with polygon.io you must create a trading account and fund it. Because you have to fund your account, in my view that’s almost free, but for all intents and purposes it is free. At the time of this writing there was no minimum deposit requirement. Once you fund the account make sure to generate new API keys.

Overall the process was relatively simple. Anyways here’s the code I’m experimenting with.

from pathlib import Path

import pandas as pd
import numpy as np
import psycopg2 as pg
import alpaca_trade_api as ata
import alpaca_api_config as api_cfg
import datetime as dt
from pytz import timezone

the alpaca_api_config is just a python file that contains my api keys.

# =====================================================
# this is the main function to download trade data


def download_data(
    symbol: str, date: str, start: str, ticks: int, cond: bool
) -> pd.DataFrame:
    """

    Parameters
    ----------
    symbol : asset symbol
    date : YYYY-MM-DD
    start : ex. "9:30"
    ticks : number of ticks to request, max 50,000
    cond : whether to return the trade condition codes

    Returns
    -------
    df: contains the tick data

    REF:
        https://polygon.io/docs/#get_v2_ticks_stocks_trades__ticker___date__anchor
        https://medium.com/automation-generation/easily-get-tick-data-in-python-with-alpaca-api-c79564a97c9
    """
    full_date = date + " " + start
    st = dt.datetime.strptime(full_date, "%Y-%m-%d %H:%M:%S")
    st = timezone("US/Eastern").localize(st)
    st = int(st.timestamp()) * 1000
    trades = API.polygon.historic_trades_v2(symbol, date, timestamp=st, limit=ticks)
    trades.df.reset_index(level=0, inplace=True)
    # convert exchange numeric codes to names for readability
    exchanges = API.polygon.exchanges()
    ex_lst = [[e.id, e.name, e.type] for e in exchanges]
    dfe = pd.DataFrame(ex_lst, columns=["exchange", "exch", "excode"])
    trades.df["exchange"] = trades.df["exchange"].astype(int)
    df = pd.merge(trades.df, dfe, how="left", on="exchange")
    df = df[df.exchange != 0]
    df.drop("exchange", axis=1, inplace=True)
    # add symbol column for easier aggregation
    df = df.assign(symbol=[symbol] * len(df))

    if cond:
        # convert sale condition numeric codes to names for readability
        conditions = API.polygon.condition_map()
        C = conditions.__dict__["_raw"]
        df[[f"condition_{i}" for i in range(1, 5, 1)]] = (
            df.conditions.astype(str)
            .str.replace("[", "", regex=True)
            .str.replace("]", "", regex=True)
            .str.split(",", expand=True)
        )
        df["condition_1"] = df["condition_1"].map(C)
        df["condition_2"] = df["condition_2"].map(C)
        df["condition_3"] = df["condition_3"].map(C)
        df["condition_4"] = df["condition_4"].map(C)
        df.drop(["conditions"], axis=1, inplace=True)
    else:
        df.drop(["conditions"], axis=1, inplace=True)
    return df

The above code is the main download function. Basically it’s downloading the trade data for the specified date. It also calls the API to gather the exchange code data and merges that data with the trade data. Finally it maps human readable text to the condition codes that apply to the trades. You can run it like the following:

# =====================================================
# initialize the API

api_key = api_cfg.live_api_key
secret_key = api_cfg.live_secret_key

API = ata.REST(key_id=api_key, secret_key=secret_key)

# =====================================================
# download some sample data

ticker = "NVDA"
date = pd.to_datetime("today").date().strftime("%Y-%m-%d")

start = "9:30:00"
conditions = True
ticks = 50000
df = download_data(ticker, date, start, ticks, cond=conditions)
print(df)

EXAMPLE OUTPUT

So as I mentioned before this code is still experimental. This implements version 2 of the Alpaca API and some of the docs don’t fully explain the function of all the parameters. This could be because it is really the Polygon.io API under the hood and the docs are based on what they have written.

Also I noticed some issues that I have to resolve. For example, I’m not sure how to use pagination to request more tick data from the day. Also the timestamps aren’t intuitive as I expected to see the NYC timezone. However the first timestamp is around 4:00 and the last is around 10:50. No matter what timestamp I input, that didn’t change either. If anyone has better familiarity with the API output let me know.

Besides the aforementioned issues the overall process was relatively straightforward, and I always try to support those who provide high quality data for reasonable prices.

References