In CFRM505 we learned about a sequential investment strategy. That is dealing with random variables that depend on an event occurring. For this case it would be returns of a certain asset (random variable) dependent on earnings. The random variable being percent returned, will differ dependent on the outcome of its earnings. To keep things simple the earnings outcome will be classified as either a 1 or 0 (beat or miss). Based on this event the Return of the asset will be generated through monte carlo simulation. To account for this we use the composition approach. The composition approach allows us to dissect the simulated forecast into parts. Earnings data was gathered for a given asset and classified weather it was a beat or a miss. After that the assets reaction to the beat or miss classification can be described by its own unique distribution (we use normal). The beats distribution will typically have a positive \(\mu\) while the misses will have a negative \(\mu\), while both having very large \(\sigma\) as expected in during an earnings event. This is overly simplified of course but it’s a start. With having the classified data of beats and misses, we can find a probability of occurrence for each particular outcome. That probability of occurrence will be utilized in the Monte Carlo simulation section as it portrays which returns (from our beats or misses normal distribution) should be used. The overall returns will portray the aggregate of sampling from the two distributions (based on probability of occurrence for each outcome) to describe the forecasted return for that asset on Earnings day.
For a given stock, data of earnings is fairly sparse (with it only occurring 4 times a year). The data source used goes back to around 2014 and some random earnings in late 1990’s (which we decided to use). We use the data that we are able to obtain (the process is the important part). Earnings estimates, actual earnings, before or after market hours on the earnings date are all given. An example of some cleaned up data is shown below.
## date Estimate Actual Before_After pct_change beat_or_not
## 0 2023-10-26 0.58 0.94 AfterMarket 6.832818 1
## 1 2023-08-03 0.35 0.65 AfterMarket 8.269335 1
## 2 2023-04-27 0.21 0.31 AfterMarket -3.979239 1
## 3 2023-02-02 0.18 0.25 AfterMarket -8.431494 1
## 4 2022-10-27 0.22 0.17 AfterMarket -6.804254 0
## 5 2022-07-28 0.14 0.18 AfterMarket 10.361465 1
## 6 2022-04-28 0.42 0.37 AfterMarket -14.049441 0
## 7 2022-02-03 0.18 0.29 AfterMarket 13.535909 1
## 8 2021-10-28 0.45 0.31 AfterMarket -2.151124 0
## 9 2021-07-29 0.62 0.76 AfterMarket -7.564890 1
## 10 2021-04-29 0.48 0.79 AfterMarket -0.112061 1
## 11 2021-02-02 0.36 0.70 AfterMarket -1.996154 1
## 12 2020-10-29 0.37 0.62 AfterMarket -5.445639 1
## 13 2020-07-30 0.07 0.52 AfterMarket 3.696082 1
## 14 2020-04-30 0.31 0.25 AfterMarket -7.597413 0
## 15 2020-01-30 0.20 0.32 AfterMarket 7.379135 1
## 16 2019-10-24 0.23 0.21 AfterMarket -1.092218 0
## 17 2019-07-25 0.28 0.26 AfterMarket -1.558906 0
## 18 2019-04-25 0.24 0.35 AfterMarket 2.543304 1
## 19 2019-01-31 0.28 0.30 AfterMarket -5.381881 1
## 20 2018-10-25 0.16 0.29 AfterMarket -7.819681 1
## 21 2018-07-26 0.13 0.25 AfterMarket 0.512721 1
## 22 2018-04-26 0.06 0.16 AfterMarket 3.600885 1
## 23 2018-02-01 0.09 0.11 AfterMarket 2.874101 1
## 24 2017-10-26 0.03 0.03 AfterMarket 13.216375 0
## 25 2017-07-27 0.07 0.02 AfterMarket -2.481836 0
## 26 2017-04-27 0.05 0.07 AfterMarket 0.719746 1
## 27 2017-02-02 0.07 0.08 AfterMarket -3.541877 1
## 28 2016-10-27 0.04 0.03 AfterMarket -5.137103 0
## 29 2016-07-28 0.06 0.09 AfterMarket 0.823800 1
## 30 2016-04-28 0.03 0.05 AfterMarket 9.566445 1
## 31 2016-01-28 0.08 0.05 AfterMarket -7.609979 0
## 32 2015-10-22 -0.01 0.01 AfterMarket 6.227944 1
## 33 2015-07-23 -0.01 0.01 AfterMarket 9.797171 1
## 34 2015-04-23 -0.01 -0.01 AfterMarket 14.131132 0
## 35 2015-01-29 0.01 0.02 AfterMarket 13.711592 1
## 36 2014-10-23 -0.04 -0.05 AfterMarket -8.340252 0
## 37 1998-01-22 -0.04 -0.03 BeforeMarket 1.261830 1
## 38 1997-10-27 -0.03 -0.03 BeforeMarket -15.041783 0
## 39 1997-07-10 -0.03 -0.02 BeforeMarket 10.335917 1
## Beat Estimates: 0.675
## Missed Estimates: 0.325
You as an investor are waiting on an event \(I\). \(I\) in this case the event can take the form of either beating or missing estimates. \(I=\{1,0\}\). Depending on what \(I\) is \(R_j\) (random variable) the return is sampled from \(F_i\) distribution. So it is a sequential process as the Monte Carlo will simulate from a distribution which is dependent on outcome of the event (earnings 1 or 0). Simply put the sampling (random variable generation \(R_j\)) is dependent on the distribution relating to the event (earnings beat/miss). Sampling occurs in a sequential manner. \[ P(X\leq x) = \sum_{j=1}^{\infty} P(X \leq x|I=j)P(I=j)\] \[ \sum_{j=1}^{\infty} P(R_j \leq x)P(I=j)\] \[ \sum_{j=1}^{\infty} F_j(x)p_j = F(x)\]
Once the data is cleaned it can be sorted in terms of weather the earnings estimate was beat or missed. With that classification being done, the reaction (price movement) of the asset is also known. The normal distribution is then created with the mean and standard deviation for the two classifications. Knowing the number of beats and misses through classification, the probabilities of the two events are also known. We can now create the Monte Carlo simulation portion as shown below.
n = 100000
U1 = np.random.rand(n)
r_beat = np.random.normal(loc=mu_beat, scale=std_beat, size=n)
r_not = np.random.normal(loc=mu_not, scale=std_not, size=n)
p = prob_beat
R = (U1<p)*r_beat + (U1>=p)*r_not
The above code samples returns from both the beat and miss distributions. Overall return R is generated n sample points based on the assets probability of weather Beating or Missing Estimates. This is the method of composition Monte Carlo simulation using historical earnings. The output is the expected stock price reaction in terms of percentage on earnings day, through Monte Carlo simulation.
Below are a few histograms for JPM, C, and BAC (earnings out tomorrow 1/12/24) with the Expected return and standard deviation both in percentage.
The table below shows the outcomes after earnings and weather our forecasts are relevant. The Expected return is the forecasted mean from our composition monte carlo, the range low and high are the expected return plus/minus one standard deviation of percent move in the stock. As all three banking stocks had earnings before market open on 1/12/24 we take into account the closing prices on 1/11/24 and 1/12/24. If the closing price on 1/12/24 is within the range low/high price we expected with our forecast, we deem it a success. All three assets show to have relevant forecasts in this particular example. However, analysis across many more stocks would be needed to show weather this method is truly relevant, but this is a good start. Especially to gauge weather certain call/put options are worth the premium tagged on come earnings time.
import numpy as np
import pandas as pd
import eod_ohlc_pull as eod_ohlc
import eod_fundamental as eod_fundamental
import pandas_market_calendars as mcal
from bokeh.plotting import figure, show, output_file, gridplot, save
from bokeh.layouts import column
from bokeh.models import Span, Label
def get_price(date):
# Create a market calendar for NYSE
nyse = mcal.get_calendar('NYSE')
# Define your date
date = pd.Timestamp(date)
# Get the full trading days for the year
trading_days = nyse.schedule(start_date=date - pd.DateOffset(years=1), end_date=date + pd.DateOffset(years=1))
# Find the trading day before and after your specified date
before = trading_days[trading_days.index < date].iloc[-1].name.date()
after = trading_days[trading_days.index > date].iloc[0].name.date()
return str(before), str(after)
def histogram(R,title):
# Define mu and std
mu = R.mean()
std = R.std()
# Create a histogram
hist, edges = np.histogram(R, bins=100, density=True)
# Create a Bokeh plot
p = figure(title=title.format(symbol), tools='', background_fill_color="#fafafa", width=1200, height= 600)
p.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], fill_color="navy", line_color="white", alpha=0.5)
p.xaxis.axis_label = "pct change %"
p.yaxis.axis_label = "Frequency"
# Add a line for mean
mean_line = Span(location=mu, dimension='height', line_color='black', line_dash='dashed', line_width=3)
p.add_layout(mean_line)
# Add labels for mean and std
mean_label = Label(x=mu, y=max(hist), x_offset=20, y_offset=-450, text='mu: {:.3f}'.format(mu), border_line_color="black",text_font_style='bold',background_fill_color='white')
p.add_layout(mean_label)
# Add a label for standard deviation
std_label = Label(x=std, y=max(hist), x_offset=-85, y_offset=-470, text='std: {:.2f}'.format(std), text_font_style='bold', background_fill_color='white', border_line_color='black')
p.add_layout(std_label)
return p
def earnings_distribution(symbol):
data = eod_fundamental.fundamental_data(symbol)
e_dates = [i for i in data['Earnings']['History'].keys()] #dates available
earnings = {'date':[], 'Estimate':[], 'Actual':[],'Before_After':[]}
for i in e_dates:
earnings['date'].append(data['Earnings']['History'][i]['reportDate'])
earnings['Estimate'].append(data['Earnings']['History'][i]['epsEstimate'])
earnings['Actual'].append(data['Earnings']['History'][i]['epsActual'])
earnings['Before_After'].append(data['Earnings']['History'][i]['beforeAfterMarket'])
df = pd.DataFrame.from_dict(earnings)
df_ = df[df['Before_After'].notnull()]
df_ = df_.reset_index(drop=True)
df_ = df_[df_['Estimate'].notnull() & df_['Actual'].notnull()]
df_ = df_.reset_index(drop=True)
df_['pct_change'] = None
df_['beat_or_not'] = 0
for i in range(len(df_)):
if df_['Actual'][i] > df_['Estimate'][i]:
df_['beat_or_not'][i] = 1
if df_['Before_After'][i] == "AfterMarket":
before, after = get_price(df_['date'][i])
price = eod_ohlc.ohlc(before,after, symbol)
df_['pct_change'][i] = (price[0]['adjusted_close']/price[1]['adjusted_close'] -1)*100
if df_['Before_After'][i] == "BeforeMarket":
before, after = get_price(df_['date'][i])
price = eod_ohlc.ohlc(before,after, symbol)
df_['pct_change'][i] = (price[1]['adjusted_close']/price[-1]['adjusted_close'] -1)*100
beat = df_[df_['beat_or_not'] == 1]
not_beat = df_[df_['beat_or_not'] == 0]
mean_beat = beat['pct_change'].mean()
std_beat = beat['pct_change'].std()
mean_not = not_beat['pct_change'].mean()
std_not = not_beat['pct_change'].std()
prob_beat = len(beat)/len(df_)
sample_points = len(df_)
return mean_beat, std_beat, mean_not, std_not, prob_beat, sample_points
symbol = "BAC"
mu_beat, std_beat, mu_not, std_not, prob_beat, sample_points = earnings_distribution(symbol)
print(mu_beat,std_beat,mu_not,std_not, sample_points)
n = 100000
U1 = np.random.rand(n)
r_beat = np.random.normal(loc=mu_beat, scale=std_beat, size=n)
r_not = np.random.normal(loc=mu_not, scale=std_not, size=n)
p = prob_beat
R = (U1<p)*r_beat + (U1>=p)*r_not
#compare against normal historical returns
start = "2022-01-01"
end = "2024-01-01"
stock = eod_ohlc.ohlc(start, end, symbol)
stock = pd.DataFrame.from_dict(stock)
stock['date'] = pd.to_datetime(stock['date'])
# Sort the DataFrame by the date column in ascending order
stock.sort_values(by='date', ascending=True, inplace=True)
# Reset the index of the DataFrame
stock.reset_index(drop=True, inplace=True)
stock['pct_change'] = stock['adjusted_close'].pct_change() * 100 # The result is in percentage
stock['pct_change'].fillna(0, inplace=True)
x_stock = np.random.normal(loc=stock['pct_change'].mean(), scale=stock['pct_change'].std(), size=n)
a = histogram(R,"Composition MC Earnings {}".format(symbol))
b = histogram(x_stock,"Historical Returns {}".format(symbol))
plot = column(a,b)
output_file("earnings_{}.html".format(symbol))
save(plot)
show(plot)