1) How do you mathematically determine the underlying prob. distributions? I am not able to find this in the code? Also, did you compare that to results obtained from Breeden & Litzenberger? Right now, I find it hard to get a feeling for how well your algorithms perform.
2) It would be interesting to see how the change in Implied Vol's in the original trades affected the price. Did you look at the historic prices or do you know where we could ge them from?
Good call! I just ran it a moment ago and it looks like the market is pricing in a normal distribution, with the expectation that price will be below $220 at every expiration until 02/02/2024. From there, the market is pricing in a good probability of the price being above $220 and recovering.
Hi! I just noticed on the GitHub page containing the code to this project, there are API keys filled in for the variables on some of the scripts. I don’t know if that was intentional. Just wanted to let you know:)
So, I started leaving the API keys in there because I wanted to make the experiments as accessible as possible to readers who may not be willing/able to purchase licenses (data is expensive haha!)
But thank you for looking out! I keep things like the database credentials obfuscated to prevent accidental deletions and things, but I’ll try to keep the API keys in there (for as long as I can).
I love that! You’re right, I’ve been trying to follow your articles, but as a college student, it’s hard to get expensive data. Thank you for all you do, your articles are amazing and inspiring!
hi, thanks for another great article. i'm curious why you use "ticker_call_contracts" and not puts too?
also, when i run the option-probability-distribution.py file, i can only see 'last_quote.last_updated' probably 9 times but i don't get any other output. i can generate the "ticker_call_contracts", "call", "put" dataframes, when i call the respective names, but nothing else. just wondering if the api has something to do with it? thanks for your help
So, the ticker_call_contracts line is just to get the available strikes of that ticker. Some stocks have strikes increments of 2.5 (e.g., 192.5, 195), and some are single (e.g., 192, 193). Every strike has a call and a put, so if we take the 10 appropriate strikes from the calls, it will be equivalent to the 10 appropriate strikes as the puts (i.e., 110 strike has both a 110 call and a 110 put).
And you likely received no output because of the date parameter defined at the start of the code. It uses snapshots of the most recent trading day, so that parameter should be set to the last trading day. Currently, it takes today's date and subtracts 2 days from it because the post was released on Sunday, so for the people running it that day, the most recent trading day was Friday, 2 days prior. If you ran it yesterday, the date would be set to Saturday which has no data.
If you're running this during the week and/or on a trading day, simply comment out the first date variable and un-comment the second one like this:
original code:
date = (datetime.today() - timedelta(days=2)).strftime('%Y-%m-%d')
Hi, Thanks again for the detailed article. Appreciate you are sharing the knowledge. just want to probe your thought on this little more. i am trying to leverage this framework and explore to see if we track the imbalance on a intraday basis to see the impact on the price direction on short duration. do you any opinion/thoughts/suggestion on this to adapt it for short duration price direction?
Seeing if the imbalances have an effect intraday can definitely be an interesting test! 1 minute data might be noisy, but having the imbalance per hour might show some predictability.
There might also be some edge on a higher frequency basis where, for example, if a big option trade comes in on SPY, the MM may try to hedge the through shares instantly (<5 seconds), etc.
Few question for ya. I'm assuming we're looking for option strikes around the red line implied least likely strike?
If I understood the code correctly, I think I saw an API call for all strike prices for 2 days being used for the model.
Does this model only require end of day (historical) data meaning no live data needed?
How much data is being pulled from Polygon per ticker used?
Trying to get an idea of if that data being pulled with the API stays within the free plan with Polygon.
Is polygon the only place to get the options data or is it just easier to get it from Polygon? Considering other sources like, Yahoo, Databento, OpenBB, or brokers with an API that may be free or lower cost than Polygon since i may need data for several tickers.
The code calls Polygon’s snapshot endpoint which only has data for the last updated timestamp (usually instant). The comment for going 2 days back is that when we pass in the date, it needs to be a trading day and if running on a Sunday, Friday is the last one, being 2 days back.
The reason I don’t think this specific experiment can extend to other data vendors is because it uses a snapshot as opposed to OHLCV/Quotes. If an option might not have been traded that specific day, then no OHLCV points are anywhere, but with a snapshot, it’s what the entire option chain looked like most recently if you were looking at it through a broker. Being able to get those most recent quotes on the otherwise unavailable small, quiet stocks is highly important since the goal is to be the one who makes the first trades in those options. Outside of big names like SPY and QQQ, reliable option data can be tricky to get otherwise.
According to the pricing page it’s available on the starter plan, but polygon doesn’t charge by usage so there’s no limits. What’s important is that whatever API you end up choosing, is to pick one that you won’t outgrow — yfinance (the scraper package, not Yahoo) and openbb, while free, don’t have reseller agreements with OPRA and the exchanges, so they won’t be good for long-term, serious operations outside of the basic stuff.
And yes, the red line implies what the market thinks is the least likely strike (where investors are bidding the least), so if your thesis is that the share price has a higher likelihood of ending there, then you would know that you’re legitimately betting against what’s priced in. This is also true for areas that are in the far tail ends of the curve.
Hi! I love your posts! :-)
Two questions:
1) How do you mathematically determine the underlying prob. distributions? I am not able to find this in the code? Also, did you compare that to results obtained from Breeden & Litzenberger? Right now, I find it hard to get a feeling for how well your algorithms perform.
2) It would be interesting to see how the change in Implied Vol's in the original trades affected the price. Did you look at the historic prices or do you know where we could ge them from?
Boeing might be a good one to try this on.
Good call! I just ran it a moment ago and it looks like the market is pricing in a normal distribution, with the expectation that price will be below $220 at every expiration until 02/02/2024. From there, the market is pricing in a good probability of the price being above $220 and recovering.
Btw, to find this, I ran the "implied-probability-production.py" file in the repository.
Thanks!
Hi! I just noticed on the GitHub page containing the code to this project, there are API keys filled in for the variables on some of the scripts. I don’t know if that was intentional. Just wanted to let you know:)
Hi Troy,
So, I started leaving the API keys in there because I wanted to make the experiments as accessible as possible to readers who may not be willing/able to purchase licenses (data is expensive haha!)
But thank you for looking out! I keep things like the database credentials obfuscated to prevent accidental deletions and things, but I’ll try to keep the API keys in there (for as long as I can).
Thanks!
I love that! You’re right, I’ve been trying to follow your articles, but as a college student, it’s hard to get expensive data. Thank you for all you do, your articles are amazing and inspiring!
I'm so glad you're enjoying them, thank you for the kind words! We're only just getting started :)
hi, thanks for another great article. i'm curious why you use "ticker_call_contracts" and not puts too?
also, when i run the option-probability-distribution.py file, i can only see 'last_quote.last_updated' probably 9 times but i don't get any other output. i can generate the "ticker_call_contracts", "call", "put" dataframes, when i call the respective names, but nothing else. just wondering if the api has something to do with it? thanks for your help
Hi there, apologies for the delay!
So, the ticker_call_contracts line is just to get the available strikes of that ticker. Some stocks have strikes increments of 2.5 (e.g., 192.5, 195), and some are single (e.g., 192, 193). Every strike has a call and a put, so if we take the 10 appropriate strikes from the calls, it will be equivalent to the 10 appropriate strikes as the puts (i.e., 110 strike has both a 110 call and a 110 put).
And you likely received no output because of the date parameter defined at the start of the code. It uses snapshots of the most recent trading day, so that parameter should be set to the last trading day. Currently, it takes today's date and subtracts 2 days from it because the post was released on Sunday, so for the people running it that day, the most recent trading day was Friday, 2 days prior. If you ran it yesterday, the date would be set to Saturday which has no data.
If you're running this during the week and/or on a trading day, simply comment out the first date variable and un-comment the second one like this:
original code:
date = (datetime.today() - timedelta(days=2)).strftime('%Y-%m-%d')
# date = datetime.today().strftime('%Y-%m-%d')
weekday code:
# date = (datetime.today() - timedelta(days=2)).strftime('%Y-%m-%d')
date = datetime.today().strftime('%Y-%m-%d')
Hope that helped!
thanks a lot for replying. oddly enough i could obtain the data when i entered "calls" or "puts" in my jupyter notebook.
No problem, if anything else is tripping you up just let me know!
Hi, Thanks again for the detailed article. Appreciate you are sharing the knowledge. just want to probe your thought on this little more. i am trying to leverage this framework and explore to see if we track the imbalance on a intraday basis to see the impact on the price direction on short duration. do you any opinion/thoughts/suggestion on this to adapt it for short duration price direction?
Hi there,
Seeing if the imbalances have an effect intraday can definitely be an interesting test! 1 minute data might be noisy, but having the imbalance per hour might show some predictability.
There might also be some edge on a higher frequency basis where, for example, if a big option trade comes in on SPY, the MM may try to hedge the through shares instantly (<5 seconds), etc.
Another great article. Thanks!
Few question for ya. I'm assuming we're looking for option strikes around the red line implied least likely strike?
If I understood the code correctly, I think I saw an API call for all strike prices for 2 days being used for the model.
Does this model only require end of day (historical) data meaning no live data needed?
How much data is being pulled from Polygon per ticker used?
Trying to get an idea of if that data being pulled with the API stays within the free plan with Polygon.
Is polygon the only place to get the options data or is it just easier to get it from Polygon? Considering other sources like, Yahoo, Databento, OpenBB, or brokers with an API that may be free or lower cost than Polygon since i may need data for several tickers.
Hi TJ, thanks!
The code calls Polygon’s snapshot endpoint which only has data for the last updated timestamp (usually instant). The comment for going 2 days back is that when we pass in the date, it needs to be a trading day and if running on a Sunday, Friday is the last one, being 2 days back.
The reason I don’t think this specific experiment can extend to other data vendors is because it uses a snapshot as opposed to OHLCV/Quotes. If an option might not have been traded that specific day, then no OHLCV points are anywhere, but with a snapshot, it’s what the entire option chain looked like most recently if you were looking at it through a broker. Being able to get those most recent quotes on the otherwise unavailable small, quiet stocks is highly important since the goal is to be the one who makes the first trades in those options. Outside of big names like SPY and QQQ, reliable option data can be tricky to get otherwise.
According to the pricing page it’s available on the starter plan, but polygon doesn’t charge by usage so there’s no limits. What’s important is that whatever API you end up choosing, is to pick one that you won’t outgrow — yfinance (the scraper package, not Yahoo) and openbb, while free, don’t have reseller agreements with OPRA and the exchanges, so they won’t be good for long-term, serious operations outside of the basic stuff.
And yes, the red line implies what the market thinks is the least likely strike (where investors are bidding the least), so if your thesis is that the share price has a higher likelihood of ending there, then you would know that you’re legitimately betting against what’s priced in. This is also true for areas that are in the far tail ends of the curve.
That makes sense. Thanks for the response.