Monte Carlo CFRM501

Introduction

In CFRM501 we’ve found that assets tend to have on average near zero daily log-return. When looking at the returns we also see that it’s not quite normal. The tails tend to be a bit fatter and the function tends to drop off quicker as you move away from the mean when comparing to a standard Normal distribution.

On one of our assignments we took note that a normal distribution does not illustrate the log-returns as well as a Laplace distribution. We used the method of moments to tailor a Laplace distribution which fit our assets daily log-return data more accurately.

stuff We see that the red-curve which is the the tailored Laplace distribution fits the actual data more accurately than the normal distribution.

Question

Visually seeing that a Laplace distribution is a better fit than Normal for historical daily log-returns, when forecasting prices through a monte carlo simulation does sampling from a Laplace distribution provide more accurate forecast when compared to sampling from a Normal distribution?

Normal Distribution

Below is a 100,000 iterations over a 1 day forecast when sampling from a Normal Distribution. The pdf being sampled from uses the mean and variance from the particular stock data for “DIS”.

Example of stock with Laplace distribution forecast

Below is an example of the same amount of iterations over the same 1 day forecast sampling from a Laplace distribution. The pdf being sampled from is tailored to the particular stock “DIS” with the mean, variance, and skewness being solved through the method of moments. stuff

We see from the above graphs when looking at a short period of time the monte carlo simulation price forecasts look similar to the pdf it is sampling from. However as we increase the forecast days the simulated pricing distribution from the Laplace distribution will tend to look more normal, which is expected from the Central Limit Theorem.

Experiment

Below is the main code for running the monte carlo simulation. There are some dependent scripts being called that work behind the scene relating to organizing data from an API source, plotting using bokeh, and calculations of n-degree moments using method of moments. The full source code can be found on my github, but the main chunk of the monte carlo is shown below.

How the code works is it iterates through a list of tickers, in this case it is a list of the S&P 500. For each ticker historical pricing is used to form a probability density function (pdf), Laplace or Normal. SciPy is used to create a custom pdf which can be sampled randomly. Once the distribution is created for that ticker there are 10,000 iterations per forecasted day making up the simulated price distribution. Accuracy and percentage away from the actual price are then calculated and appended for that asset per day. Accuracy is calculated as whether the actual price falls within 1 standard deviation from the simulated price distribution. Percentage away is a comparison of how far the simulated price is from the actual price. The simulated price is taken to be the average of the simulated price distribution. This process is done for the rest of the remaining tickers.

import histogram_df as histogram
import sample_parameters as sample
import datetime as dt
import numpy as np
import pandas as pd
import random
import histogram as hgram
import eod_ohlc_pull as eod
import pandas_market_calendars as mcal
import time
from bokeh.plotting import figure, show
from bokeh.layouts import row, column
import normal_distribution_log
from matplotlib import pyplot as plt
from scipy import stats
import random_var_method_moment as rv_mom 

def data_pull(start, end, symbol):
    df = eod.ohlc(start, end, symbol)
    df = pd.DataFrame.from_dict(df)
    df['date'] = pd.to_datetime(df['date'])
    df['date'] = df['date'].dt.date
    df = df[['date', 'adjusted_close']]
    df = df.sort_values(by='date', ascending=True)
    df = df.reset_index(drop=True)
    return df

def mc_check(ticker, results, time_frame, time_frame_pct, forecast_amount):
    try:
        #Form dataframe of historical pricing data
        start = '2018-06-18'
        end = '2022-01-30'
        nyse = mcal.get_calendar('NYSE')
        early = nyse.schedule(start_date= '2022-01-30', end_date='2022-11-30')
        #forecast date is what is changing incrementally as it adds 1 more day at a time
        forecast_days = forecast_amount
        df = data_pull(start, end, ticker)
            
        df['date'] = pd.to_datetime(df['date'])
        df['date'] = df['date'].dt.date
        df['log_returns'] = [np.nan]*len(df)
        #create log returns
        for i in range(1,len(df)):
            df['log_returns'][i] = np.log(df['adjusted_close'][i]/df['adjusted_close'][i-1])

        mu = df['log_returns'].mean()
        sigma2 = df['log_returns'].var()
        variance = sigma2
        skew = df['log_returns'].skew()
        x_min = df['log_returns'].min()
        x_max = df['log_returns'].max()
        
        x,y = rv_mom.random_variable(x_min, x_max, mu, variance, skew)
        #x1 = x
        #y1 = y
        x = x*1000.00
        y = y/sum(y)
        custm = stats.rv_discrete(name='custm', values=(x, y))

        for i in range(1, forecast_days):
            t_days = i
            forecast_date = early['market_open'][i].date()
            forecast_date = str(forecast_date)
        
            s_0 = df['adjusted_close'][len(df)-1]
            simulation_prices = {'return_output_price':[]}
            #outer Monte Carlo define the days of projection
            for j in range(int(1e4)):#number of iterations
                sum_return = 0.00
                #generates t_days worth of random variables from distribution
                rv_x = custm.rvs(size=t_days)
                rv_x = rv_x/1000.00
                sum_return += np.sum(rv_x)
                #exponential function to sum up all total returns and calculate final price at end of t_days
                s_1 = s_0*np.exp(sum_return)
                #appending to projected price of sum log returns
                simulation_prices['return_output_price'].append(s_1)

            simulation_prices = pd.DataFrame.from_dict(simulation_prices)
            #from the sample data find the mu, sigma, and std dev
            mu_hat = sample.sample_mean(simulation_prices)
            sigma2_hat = sample.sample_var(simulation_prices, mu_hat)
            sigma_hat = np.sqrt(sigma2_hat)
            skewness_hat = sample.sample_skewness(simulation_prices, mu_hat, sigma2_hat)
            kurtosis_hat = sample.sample_kurtosis(simulation_prices, mu_hat, sigma2_hat)
            #from method of moments find mu, sigma, and std dev

            #define the date projected to
            price_final = data_pull(forecast_date, forecast_date, ticker)
            price_final = price_final['adjusted_close'][0]

            results['t_days'].append(t_days)
            results['ticker'].append(ticker)
            results['price_initial'].append(s_0)
            results['price_final'].append(price_final)
            results['Expected_return'].append(mu_hat)
            results['Variance'].append(sigma2_hat)
            results['std_dev'].append(sigma_hat)

            #calculate the range of 1 standard deviation of what was forecasted to the actual price
            minus_sigma = mu_hat - sigma_hat 
            plus_sigma = mu_hat + sigma_hat
            acuracy = 0
            if price_final >= minus_sigma and price_final <= plus_sigma:
                acuracy = 1
            else:
                pass
            time_frame[ticker].append(acuracy)
            #calculate pct away forecasted mc price was from actual
            pct_away = (mu_hat - price_final)/price_final
            time_frame_pct[ticker].append(pct_away)
        #hgram.mc_histogram(ticker, simulation_prices, 100, mu_hat, sigma2_hat, x1, y1)
        return results, time_frame
    except:
        return 0 ,0

#read tickers from csv
tickers = pd.read_csv('sp500.csv')
tickers = tickers['Ticker'][:]
results = {'t_days':[], 'ticker':[], 'price_initial':[], 'price_final':[], 'Expected_return':[], 'Variance':[], 'std_dev':[]}

#declare forecast days 
forecast_amount = 22
t_days = np.arange(1,forecast_amount,1)

#2 dictionaries for forecast timeframes
time_frame = {'t_days':t_days}
time_frame_pct = {'t_days':t_days}

#Go through tickers and add to time frame dictionaries
for i in tickers:
    time_frame.update({i:[]})
    time_frame_pct.update({i:[]})
    value1, value2 = mc_check(i, results, time_frame, time_frame_pct, forecast_amount)
    if value1 == 0:
        time_frame.pop(i,None)
        time_frame_pct.pop(i,None)

final_results = pd.DataFrame.from_dict(results)
time_frame = pd.DataFrame.from_dict(time_frame)
time_frame_pct = pd.DataFrame.from_dict(time_frame_pct)

time_frame = time_frame.set_index('t_days')
time_frame['1_sigma_accuracy'] = time_frame.sum(axis=1)
time_frame['1_sigma_accuracy'] = time_frame['1_sigma_accuracy']*(1/(len(time_frame.keys())))

time_frame_pct = time_frame_pct.set_index('t_days')
time_frame_pct['portfolio_accuracy'] = time_frame_pct.mean(axis=1)

# time_frame.to_csv('Laplace_output_accuracy_22days_january.csv')
# time_frame_pct.to_csv('Laplace_output_percentage_away_22days_january.csv')
# final_results.to_csv('Laplace_output_forecast_22days_january.csv')

Results

There were 4 tests run for both Normal and Laplace. Each test used 1.5 years worth of historical data with forecasting 21 days out. Time frames used were for the year 2022 in the months January, March, June, and September. I chose to focus on the percentage away calculations rather than accuracy. The reason being my defined accuracy depended on standard deviation which increases as the forecast days get longer. Meaning the goal post essentially gets larger as the forecast days get larger which doesn’t really make a great deal of sense in defining accuracy.

Looking at the 4 tests run there are 8 graphs outputted below. The first 4 are more general in that it is the average percentage away for all tickers per forecasted day. A noticeable trend to see is that no matter the distribution, as time of forecast increases the percent from actual price diverges from 0 which would be spot on.

The last 4 graphs dives into how the monte carlo forecast did in terms of specific sectors. Again, it is still averaging the percentage away of the forecasted price with the actual price, but now we dive into the specific sector. From the results we see that energy was definitely an outlier no matter what trial was run, which makes sense as the energy sector strongly outperformed every other sector in 2022, and thus deviated from it’s historical behavior. Other than that an easy recognition is that across all trials the monte carlo forecast described Utilities the best out of all the other sectors.

When looking at the sectors for all 4 trials over 11 sectors it seems the Normal distribution was more accurate in forecasting than the more tailored Laplace distribution. Out of 4 trials the monte carlo using the Normal distribution was closer to the real price for 26/44 sector results. Many of the sector results look very similar as well, which can be reasoned by the Central Limit Theorem as the more you sample from the Laplace distribution the prcing distribution becomes more normal.

import numpy as np
import pandas as pd 
from bokeh.plotting import figure, show, output_file
from bokeh.layouts import row, column, gridplot
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.transform import dodge
import time

def graph_output(df_l, df_n, sp500, title):
    sector_dict_normal = {}
    sectors = []
    for ticker in df_n.keys():
        if ticker not in ['t_days', 'portfolio_accuracy']:
            wow = sp500.loc[sp500['Ticker']==str(ticker)]
            wow = wow.reset_index(drop=True)
            sector = wow['Sector'][0]
            if sector not in sector_dict_normal.keys():
                sector_dict_normal[sector] = []
            sector_dict_normal[sector].append(df_n[ticker].mean())
    sector_dict_laplace = {}
    for ticker in df_l.keys():
        if ticker not in ['t_days', 'portfolio_accuracy']:
            wow = sp500.loc[sp500['Ticker']==str(ticker)]
            wow = wow.reset_index(drop=True)
            sector = wow['Sector'][0]
            if sector not in sector_dict_laplace.keys():
                sector_dict_laplace[sector] = []
            sector_dict_laplace[sector].append(df_l[ticker].mean())
    sectors = list(dict.fromkeys(sectors))
    
    values = list(sector_dict_laplace.keys())
    for i in values:
        if len(sector_dict_laplace[i]) < 2:
            sector_dict_laplace.pop(i,'None')
            sector_dict_normal.pop(i,'None')
    sectors = list(sector_dict_laplace.keys())

    sector_results_normal = {}
    for i in sector_dict_normal.keys():
        if i not in sector_results_normal.keys():
            sector_results_normal[i] = [sum(list(sector_dict_normal[i]))/len(sector_dict_normal[i])]

    sector_results_laplace = {}
    for i in sector_dict_laplace.keys():
        if i not in sector_results_laplace.keys():
            sector_results_laplace[i] = [sum(list(sector_dict_laplace[i]))/len(sector_dict_laplace[i])]

    normal_data = pd.DataFrame.from_dict(sector_results_normal)
    laplace_data = pd.DataFrame.from_dict(sector_results_laplace)

    normal = normal_data.stack()
    normal[0].keys()
    normal_values = []
    for i in normal[0].keys():
        normal_values.append(normal[0][i])

    laplace = laplace_data.stack()
    laplace[0].keys()
    laplace_values = []
    for i in laplace[0].keys():
        laplace_values.append(laplace[0][i])

    data = {'sectors':sectors,
            'normal_distribution':normal_values,
            'laplace_distribution':laplace_values}

    source = ColumnDataSource(data=data)

    p = figure(x_range=sectors, y_range=(min(min(normal_values), min(laplace_values))-0.02, max(max(normal_values),max(laplace_values))+0.02), 
            title="Sector Analysis Laplace vs Normal Distribution 22 days '{}' ->".format(title), width= 800,
            height=400)

    p.vbar(x=dodge('sectors', -0.25, range=p.x_range), top='normal_distribution', source=source,
        width=0.2, color="#35B778", legend_label="Normal Distribution")

    p.vbar(x=dodge('sectors',  0.0,  range=p.x_range), top='laplace_distribution', source=source,
        width=0.2, color="#2171b5", legend_label="Laplace Distribution")

    p.x_range.range_padding = 0.1
    p.xgrid.grid_line_color = None
    p.xaxis.major_label_orientation = np.pi/6
    p.legend.location = "top_left"
    p.legend.orientation = "horizontal"
    
    p.add_tools(HoverTool(
    tooltips=[
        ( 'Normal',   '@normal_distribution{%0.4f}'            ),
        ( 'Laplace',  '@laplace_distribution{%0.4f}' ), # use @{ } for field names with spaces
    ],
    formatters={
        'normal_distribution'      : 'printf', # use 'datetime' formatter for 'date' field
        'laplace_distribution' : 'printf',   # use 'printf' formatter for 'adj close' field
                                  # use default 'numeral' formatter for other fields
    },
    # display a tooltip whenever the cursor is vertically in line with a glyph
    mode='vline'
        ))

    ticker = "portfolio_accuracy"
    g = figure(title = "Laplace vs Normal 22days '{}' ->".format(title), x_axis_label='time (trading days)',y_axis_label = "pct away from actual price", width = 800, height=400)
    g.line(df_l['t_days'], df_l[ticker], legend_label = "Laplace", color="red")
    g.line(df_n['t_days'], df_n[ticker], legend_label = "Normal", color="blue")

    return p,g

sp500 = pd.read_csv('sp500_w_sectors.csv')
laplace_csv = ['Laplace_output_percentage_away_22days_january.csv','Laplace_output_percentage_away_22days_6mo_before.csv','Laplace_output_percentage_away_22days_june.csv','Laplace_output_percentage_away_22days.csv']
normal_csv = ['normal_output_percentage_away_22days_january.csv','normal_output_percentage_away_22days_6mo_before.csv','normal_output_percentage_away_22days_june.csv','normal_output_percentage_away_22days.csv']

graphs = []
portfolio_graphs = []
months = ['January', 'March', 'June', 'September']

for i in range(len(laplace_csv)):
    df_l = pd.read_csv(laplace_csv[i])
    df_n = pd.read_csv(normal_csv[i])
    title = months[i]
    p,g = graph_output(df_l, df_n, sp500, title)
    graphs.append(p)
    portfolio_graphs.append(g)

grid = gridplot([[portfolio_graphs[0]],[portfolio_graphs[1]], [portfolio_graphs[2]], [portfolio_graphs[3]],
    [graphs[0]], [graphs[1]],[graphs[2]], [graphs[3]],])

output_file("example.html")
show(grid)