Exploring Our Scraped Options Data Bid-Ask Spreads

BlackArbs Admin
Exploring Our Scraped Options Data Bid-Ask Spreads

Post Outline

  • The Objective
  • The Data
  • Basic Data Analysis
  • Bid-Ask Spread Analysis
    • How Do Aggregate Bid-Ask Spreads Vary with Days To Expiration?
    • How Do Bid-Ask Spreads Vary with Volume?
    • How Do Bid-Ask Spreads Vary with Volatility?
  • Summary Conclusions

The Objective

Compared to the equity market, the options market is a level up in complexity. For each symbol there are multiple expiration dates, strike prices for each expiration date, implied volatilities, and that's before we get to the option greeks.

The increased complexity presents us with more opportunity. More complexity means less ground truth, more errors, more gaps, and more structural asymmetries. Consider that THE dominant factor underlying options pricing - implied volatility - cannot be directly measured only estimated! To estimate it requires other observable factors and a pricing model. We already know "All models are wrong. Some are Useful" thus there are opportunities to exploit the errors of others. To do that requires a better understanding than our competitors thus beginning our study of the options market.

This is the next step in the series for developing an options trading dashboard using Python and Python based tools. Thus far I have demonstrated two methods 1 2 of scraping the necessary data. Now that the data has been collecting for a bit we can begin some initial exploratory analysis. As this is a purpose driven process we should set an objective for our study.

In this particular article I want to focus on exploring bid-ask spreads as that data is often unavailable for free.

The Data

The data is a cleaned hdf5/.h5 file comprised of a collection of daily options data collected over the period of 05/17/2017 to 07/24/2017. By cleaned I mean I aggregated the daily data into one set, removed some unnecessary columns, cleaned up the data types and added the underlying ETF prices from Yahoo. I make no claims about the accuracy of the data itself, and I present it as is. It is approximately a 1 GB in size and I have made it available for download at the following link:

Options Data

To import the data into your python environment: import pandas as pd; data = pd.read_hdf('option_data_2017-05-17_to_2017-07-24.h5', key='data')

Data Analysis

First the package imports.

Some convenience functions...

Let's import the data and view some basic info...

Bid-Ask Spread Analysis

How do aggregate Bid-Ask spreads vary with days to expiration?

Let's define a convenience function to plot the data.

Some things are interesting. From ~250 through ~600 days in both call and put options the bid-ask spreads are compressed towards zero. There also appears to be less dispersion in put bid-ask spreads overall.

We can look at a few select ETFs.

A convenience plotting function for boxplots.

Looking at these plots we see further evidence of bid-ask spreads showing less dispersion across puts vs calls. Also it's surprising to see DIA options having such a wide range of values compared to SPY and QQQ; this is especially true for the call options.

How do bid-ask spreads Vary with volume?

A convenience function for plotting...

Again we see put bid-ask spreads squeezed towards zero even as volume increases. We also see SPY and USO with small spreads as both volume and open interest increases. This suggests there are symbols/contracts with higher relative trading capacity.

how do bid-ask spreads vary with volatility?

Some notes. DIA again appears to have the highest dispersion in bid-ask spreads for both calls and puts. GLD is also notable. It is also somewhat surprising that for these selected ETFs increased volatility doesn't appear with increased bid-ask spreads.

Summary Conclusions

  • Put options have less overall dispersion in bid-ask spreads than calls relative to days to expiration, volume, and volatility.
  • Bid-ask spreads have a major compression range between ~250 to ~600 days to maturity that appear smaller than all other buckets.
  • Bid-ask spreads show greater dispersion at lower levels of implied volatility.
  • DIA in particular shows the greatest variability in bid-ask spreads of the selected ETFs.
  • SPY and USO show high capacity as bid-ask spreads remain near zero even at elevated volume and open interest levels.

Enjoyed this post?

Subscribe for more research and trading insights.

By clicking "Subscribe," you agree to our Terms of Use and acknowledge our Privacy Policy. You can unsubscribe at any time.

No spam. Unsubscribe anytime.