Introduction

In CFRM505 we learned about a sequential investment strategy. That is dealing with random variables that depend on an event occurring. For this case it would be returns of a certain asset (random variable) dependent on earnings. The random variable being percent returned, will differ dependent on the outcome of its earnings. To keep things simple the earnings outcome will be classified as either a 1 or 0 (beat or miss). Based on this event the Return of the asset will be generated through monte carlo simulation. To account for this we use the composition approach. The composition approach allows us to dissect the simulated forecast into parts. Earnings data was gathered for a given asset and classified weather it was a beat or a miss. After that the assets reaction to the beat or miss classification can be described by its own unique distribution (we use normal). The beats distribution will typically have a positive \(\mu\) while the misses will have a negative \(\mu\), while both having very large \(\sigma\) as expected in during an earnings event. This is overly simplified of course but it’s a start. With having the classified data of beats and misses, we can find a probability of occurrence for each particular outcome. That probability of occurrence will be utilized in the Monte Carlo simulation section as it portrays which returns (from our beats or misses normal distribution) should be used. The overall returns will portray the aggregate of sampling from the two distributions (based on probability of occurrence for each outcome) to describe the forecasted return for that asset on Earnings day.

To sum it up we follow the steps below for a chosen stock.

  1. We get the historical earnings beats/misses.
  2. We then calculate the stock movement (pct change in price) associated with the beat/miss classification.
  3. Knowing the stock movement we create two normal distributions associated with either a beat or miss.
  4. With both distributions at hand (the event of earnings), price forecasting via composition monte carlo is then generated.

Cleaning the data

For a given stock, data of earnings is fairly sparse (with it only occurring 4 times a year). The data source used goes back to around 2014 and some random earnings in late 1990’s (which we decided to use). We use the data that we are able to obtain (the process is the important part). Earnings estimates, actual earnings, before or after market hours on the earnings date are all given. An example of some cleaned up data is shown below.

##           date  Estimate  Actual  Before_After  pct_change  beat_or_not
## 0   2023-10-26      0.58    0.94   AfterMarket    6.832818            1
## 1   2023-08-03      0.35    0.65   AfterMarket    8.269335            1
## 2   2023-04-27      0.21    0.31   AfterMarket   -3.979239            1
## 3   2023-02-02      0.18    0.25   AfterMarket   -8.431494            1
## 4   2022-10-27      0.22    0.17   AfterMarket   -6.804254            0
## 5   2022-07-28      0.14    0.18   AfterMarket   10.361465            1
## 6   2022-04-28      0.42    0.37   AfterMarket  -14.049441            0
## 7   2022-02-03      0.18    0.29   AfterMarket   13.535909            1
## 8   2021-10-28      0.45    0.31   AfterMarket   -2.151124            0
## 9   2021-07-29      0.62    0.76   AfterMarket   -7.564890            1
## 10  2021-04-29      0.48    0.79   AfterMarket   -0.112061            1
## 11  2021-02-02      0.36    0.70   AfterMarket   -1.996154            1
## 12  2020-10-29      0.37    0.62   AfterMarket   -5.445639            1
## 13  2020-07-30      0.07    0.52   AfterMarket    3.696082            1
## 14  2020-04-30      0.31    0.25   AfterMarket   -7.597413            0
## 15  2020-01-30      0.20    0.32   AfterMarket    7.379135            1
## 16  2019-10-24      0.23    0.21   AfterMarket   -1.092218            0
## 17  2019-07-25      0.28    0.26   AfterMarket   -1.558906            0
## 18  2019-04-25      0.24    0.35   AfterMarket    2.543304            1
## 19  2019-01-31      0.28    0.30   AfterMarket   -5.381881            1
## 20  2018-10-25      0.16    0.29   AfterMarket   -7.819681            1
## 21  2018-07-26      0.13    0.25   AfterMarket    0.512721            1
## 22  2018-04-26      0.06    0.16   AfterMarket    3.600885            1
## 23  2018-02-01      0.09    0.11   AfterMarket    2.874101            1
## 24  2017-10-26      0.03    0.03   AfterMarket   13.216375            0
## 25  2017-07-27      0.07    0.02   AfterMarket   -2.481836            0
## 26  2017-04-27      0.05    0.07   AfterMarket    0.719746            1
## 27  2017-02-02      0.07    0.08   AfterMarket   -3.541877            1
## 28  2016-10-27      0.04    0.03   AfterMarket   -5.137103            0
## 29  2016-07-28      0.06    0.09   AfterMarket    0.823800            1
## 30  2016-04-28      0.03    0.05   AfterMarket    9.566445            1
## 31  2016-01-28      0.08    0.05   AfterMarket   -7.609979            0
## 32  2015-10-22     -0.01    0.01   AfterMarket    6.227944            1
## 33  2015-07-23     -0.01    0.01   AfterMarket    9.797171            1
## 34  2015-04-23     -0.01   -0.01   AfterMarket   14.131132            0
## 35  2015-01-29      0.01    0.02   AfterMarket   13.711592            1
## 36  2014-10-23     -0.04   -0.05   AfterMarket   -8.340252            0
## 37  1998-01-22     -0.04   -0.03  BeforeMarket    1.261830            1
## 38  1997-10-27     -0.03   -0.03  BeforeMarket  -15.041783            0
## 39  1997-07-10     -0.03   -0.02  BeforeMarket   10.335917            1
## Beat Estimates:  0.675
## Missed Estimates:  0.325

Theory behind Composition Monte Carlo

You as an investor are waiting on an event \(I\). \(I\) in this case the event can take the form of either beating or missing estimates. \(I=\{1,0\}\). Depending on what \(I\) is \(R_j\) (random variable) the return is sampled from \(F_i\) distribution. So it is a sequential process as the Monte Carlo will simulate from a distribution which is dependent on outcome of the event (earnings 1 or 0). Simply put the sampling (random variable generation \(R_j\)) is dependent on the distribution relating to the event (earnings beat/miss). Sampling occurs in a sequential manner. \[ P(X\leq x) = \sum_{j=1}^{\infty} P(X \leq x|I=j)P(I=j)\] \[ \sum_{j=1}^{\infty} P(R_j \leq x)P(I=j)\] \[ \sum_{j=1}^{\infty} F_j(x)p_j = F(x)\]

Creating the Beat and Miss distributions and running the MC Simulation

Once the data is cleaned it can be sorted in terms of weather the earnings estimate was beat or missed. With that classification being done, the reaction (price movement) of the asset is also known. The normal distribution is then created with the mean and standard deviation for the two classifications. Knowing the number of beats and misses through classification, the probabilities of the two events are also known. We can now create the Monte Carlo simulation portion as shown below.

n = 100000
U1 = np.random.rand(n)
r_beat = np.random.normal(loc=mu_beat, scale=std_beat, size=n)
r_not = np.random.normal(loc=mu_not, scale=std_not, size=n)
p = prob_beat
R = (U1<p)*r_beat + (U1>=p)*r_not

Conclusion

The above code samples returns from both the beat and miss distributions. Overall return R is generated n sample points based on the assets probability of weather Beating or Missing Estimates. This is the method of composition Monte Carlo simulation using historical earnings. The output is the expected stock price reaction in terms of percentage on earnings day, through Monte Carlo simulation.

Below are a few histograms for JPM, C, and BAC (earnings out tomorrow 1/12/24) with the Expected return and standard deviation both in percentage.

The table below shows the outcomes after earnings and weather our forecasts are relevant. The Expected return is the forecasted mean from our composition monte carlo, the range low and high are the expected return plus/minus one standard deviation of percent move in the stock. As all three banking stocks had earnings before market open on 1/12/24 we take into account the closing prices on 1/11/24 and 1/12/24. If the closing price on 1/12/24 is within the range low/high price we expected with our forecast, we deem it a success. All three assets show to have relevant forecasts in this particular example. However, analysis across many more stocks would be needed to show weather this method is truly relevant, but this is a good start. Especially to gauge weather certain call/put options are worth the premium tagged on come earnings time.

Graphs of Results

All Code Associated

import numpy as np
import pandas as pd
import eod_ohlc_pull as eod_ohlc 
import eod_fundamental as eod_fundamental 
import pandas_market_calendars as mcal
from bokeh.plotting import figure, show, output_file, gridplot, save
from bokeh.layouts import column
from bokeh.models import Span, Label

def get_price(date):
    # Create a market calendar for NYSE
    nyse = mcal.get_calendar('NYSE')

    # Define your date
    date = pd.Timestamp(date)

    # Get the full trading days for the year
    trading_days = nyse.schedule(start_date=date - pd.DateOffset(years=1), end_date=date + pd.DateOffset(years=1))

    # Find the trading day before and after your specified date
    before = trading_days[trading_days.index < date].iloc[-1].name.date()
    after = trading_days[trading_days.index > date].iloc[0].name.date()

    return str(before), str(after)

def histogram(R,title):
    # Define mu and std
    mu = R.mean()  
    std = R.std()  
    
    # Create a histogram
    hist, edges = np.histogram(R, bins=100, density=True)

    # Create a Bokeh plot
    p = figure(title=title.format(symbol), tools='', background_fill_color="#fafafa", width=1200, height= 600)
    p.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], fill_color="navy", line_color="white", alpha=0.5)
    p.xaxis.axis_label = "pct change %"
    p.yaxis.axis_label = "Frequency"
    
    # Add a line for mean
    mean_line = Span(location=mu, dimension='height', line_color='black', line_dash='dashed', line_width=3)
    p.add_layout(mean_line)

    # Add labels for mean and std
    mean_label = Label(x=mu, y=max(hist), x_offset=20, y_offset=-450, text='mu: {:.3f}'.format(mu), border_line_color="black",text_font_style='bold',background_fill_color='white')
    p.add_layout(mean_label)

    # Add a label for standard deviation
    std_label = Label(x=std, y=max(hist), x_offset=-85, y_offset=-470, text='std: {:.2f}'.format(std), text_font_style='bold', background_fill_color='white', border_line_color='black')
    p.add_layout(std_label)

    return p

def earnings_distribution(symbol):
    data = eod_fundamental.fundamental_data(symbol)
    e_dates = [i for i in data['Earnings']['History'].keys()] #dates available
    earnings = {'date':[], 'Estimate':[], 'Actual':[],'Before_After':[]}
    for i in e_dates:
        earnings['date'].append(data['Earnings']['History'][i]['reportDate'])
        earnings['Estimate'].append(data['Earnings']['History'][i]['epsEstimate'])
        earnings['Actual'].append(data['Earnings']['History'][i]['epsActual'])
        earnings['Before_After'].append(data['Earnings']['History'][i]['beforeAfterMarket'])

    df = pd.DataFrame.from_dict(earnings)
    df_ = df[df['Before_After'].notnull()]
    df_ = df_.reset_index(drop=True)
    df_ = df_[df_['Estimate'].notnull() & df_['Actual'].notnull()]
    df_ = df_.reset_index(drop=True)

    df_['pct_change'] = None
    df_['beat_or_not'] = 0
    for i in range(len(df_)):
        if df_['Actual'][i] > df_['Estimate'][i]:
            df_['beat_or_not'][i] = 1
        if df_['Before_After'][i] == "AfterMarket":
            before, after = get_price(df_['date'][i])
            price = eod_ohlc.ohlc(before,after, symbol)
            df_['pct_change'][i] = (price[0]['adjusted_close']/price[1]['adjusted_close'] -1)*100
        if df_['Before_After'][i] == "BeforeMarket":
            before, after = get_price(df_['date'][i])
            price = eod_ohlc.ohlc(before,after, symbol)
            df_['pct_change'][i] = (price[1]['adjusted_close']/price[-1]['adjusted_close'] -1)*100

    beat = df_[df_['beat_or_not'] == 1]
    not_beat = df_[df_['beat_or_not'] == 0]
    mean_beat = beat['pct_change'].mean()
    std_beat = beat['pct_change'].std()
    mean_not = not_beat['pct_change'].mean()
    std_not = not_beat['pct_change'].std()
    prob_beat = len(beat)/len(df_)
    sample_points = len(df_)
    return mean_beat, std_beat, mean_not, std_not, prob_beat, sample_points

symbol = "BAC"
mu_beat, std_beat, mu_not, std_not, prob_beat, sample_points = earnings_distribution(symbol)
print(mu_beat,std_beat,mu_not,std_not, sample_points)

n = 100000
U1 = np.random.rand(n)
r_beat = np.random.normal(loc=mu_beat, scale=std_beat, size=n)
r_not = np.random.normal(loc=mu_not, scale=std_not, size=n)
p = prob_beat
R = (U1<p)*r_beat + (U1>=p)*r_not

#compare against normal historical returns
start = "2022-01-01"
end = "2024-01-01"
stock = eod_ohlc.ohlc(start, end, symbol)
stock = pd.DataFrame.from_dict(stock)
stock['date'] = pd.to_datetime(stock['date'])
# Sort the DataFrame by the date column in ascending order
stock.sort_values(by='date', ascending=True, inplace=True)
# Reset the index of the DataFrame
stock.reset_index(drop=True, inplace=True)
stock['pct_change'] = stock['adjusted_close'].pct_change() * 100  # The result is in percentage
stock['pct_change'].fillna(0, inplace=True)
x_stock = np.random.normal(loc=stock['pct_change'].mean(), scale=stock['pct_change'].std(), size=n)

a = histogram(R,"Composition MC Earnings {}".format(symbol))
b = histogram(x_stock,"Historical Returns {}".format(symbol))
plot = column(a,b)

output_file("earnings_{}.html".format(symbol))
save(plot)
show(plot)