In CFRM501 we’ve found that assets tend to have on average near zero daily log-return. When looking at the returns we also see that it’s not quite normal. The tails tend to be a bit fatter and the function tends to drop off quicker as you move away from the mean when comparing to a standard Normal distribution.
On one of our assignments we took note that a normal distribution does not illustrate the log-returns as well as a Laplace distribution. We used the method of moments to tailor a Laplace distribution which fit our assets daily log-return data more accurately.
We see that the red-curve which is the the tailored Laplace distribution
fits the actual data more accurately than the normal distribution.
Visually seeing that a Laplace distribution is a better fit than Normal for historical daily log-returns, when forecasting prices through a monte carlo simulation does sampling from a Laplace distribution provide more accurate forecast when compared to sampling from a Normal distribution?
Below is a 100,000 iterations over a 1 day forecast when sampling
from a Normal Distribution. The pdf being sampled from uses the mean and
variance from the particular stock data for “DIS”.
Below is an example of the same amount of iterations over the same 1
day forecast sampling from a Laplace distribution. The pdf being sampled
from is tailored to the particular stock “DIS” with the mean, variance,
and skewness being solved through the method of moments.
We see from the above graphs when looking at a short period of time the monte carlo simulation price forecasts look similar to the pdf it is sampling from. However as we increase the forecast days the simulated pricing distribution from the Laplace distribution will tend to look more normal, which is expected from the Central Limit Theorem.
Below is the main code for running the monte carlo simulation. There are some dependent scripts being called that work behind the scene relating to organizing data from an API source, plotting using bokeh, and calculations of n-degree moments using method of moments. The full source code can be found on my github, but the main chunk of the monte carlo is shown below.
How the code works is it iterates through a list of tickers, in this case it is a list of the S&P 500. For each ticker historical pricing is used to form a probability density function (pdf), Laplace or Normal. SciPy is used to create a custom pdf which can be sampled randomly. Once the distribution is created for that ticker there are 10,000 iterations per forecasted day making up the simulated price distribution. Accuracy and percentage away from the actual price are then calculated and appended for that asset per day. Accuracy is calculated as whether the actual price falls within 1 standard deviation from the simulated price distribution. Percentage away is a comparison of how far the simulated price is from the actual price. The simulated price is taken to be the average of the simulated price distribution. This process is done for the rest of the remaining tickers.
import histogram_df as histogram
import sample_parameters as sample
import datetime as dt
import numpy as np
import pandas as pd
import random
import histogram as hgram
import eod_ohlc_pull as eod
import pandas_market_calendars as mcal
import time
from bokeh.plotting import figure, show
from bokeh.layouts import row, column
import normal_distribution_log
from matplotlib import pyplot as plt
from scipy import stats
import random_var_method_moment as rv_mom
def data_pull(start, end, symbol):
df = eod.ohlc(start, end, symbol)
df = pd.DataFrame.from_dict(df)
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.date
df = df[['date', 'adjusted_close']]
df = df.sort_values(by='date', ascending=True)
df = df.reset_index(drop=True)
return df
def mc_check(ticker, results, time_frame, time_frame_pct, forecast_amount):
try:
#Form dataframe of historical pricing data
start = '2018-06-18'
end = '2022-01-30'
nyse = mcal.get_calendar('NYSE')
early = nyse.schedule(start_date= '2022-01-30', end_date='2022-11-30')
#forecast date is what is changing incrementally as it adds 1 more day at a time
forecast_days = forecast_amount
df = data_pull(start, end, ticker)
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.date
df['log_returns'] = [np.nan]*len(df)
#create log returns
for i in range(1,len(df)):
df['log_returns'][i] = np.log(df['adjusted_close'][i]/df['adjusted_close'][i-1])
mu = df['log_returns'].mean()
sigma2 = df['log_returns'].var()
variance = sigma2
skew = df['log_returns'].skew()
x_min = df['log_returns'].min()
x_max = df['log_returns'].max()
x,y = rv_mom.random_variable(x_min, x_max, mu, variance, skew)
#x1 = x
#y1 = y
x = x*1000.00
y = y/sum(y)
custm = stats.rv_discrete(name='custm', values=(x, y))
for i in range(1, forecast_days):
t_days = i
forecast_date = early['market_open'][i].date()
forecast_date = str(forecast_date)
s_0 = df['adjusted_close'][len(df)-1]
simulation_prices = {'return_output_price':[]}
#outer Monte Carlo define the days of projection
for j in range(int(1e4)):#number of iterations
sum_return = 0.00
#generates t_days worth of random variables from distribution
rv_x = custm.rvs(size=t_days)
rv_x = rv_x/1000.00
sum_return += np.sum(rv_x)
#exponential function to sum up all total returns and calculate final price at end of t_days
s_1 = s_0*np.exp(sum_return)
#appending to projected price of sum log returns
simulation_prices['return_output_price'].append(s_1)
simulation_prices = pd.DataFrame.from_dict(simulation_prices)
#from the sample data find the mu, sigma, and std dev
mu_hat = sample.sample_mean(simulation_prices)
sigma2_hat = sample.sample_var(simulation_prices, mu_hat)
sigma_hat = np.sqrt(sigma2_hat)
skewness_hat = sample.sample_skewness(simulation_prices, mu_hat, sigma2_hat)
kurtosis_hat = sample.sample_kurtosis(simulation_prices, mu_hat, sigma2_hat)
#from method of moments find mu, sigma, and std dev
#define the date projected to
price_final = data_pull(forecast_date, forecast_date, ticker)
price_final = price_final['adjusted_close'][0]
results['t_days'].append(t_days)
results['ticker'].append(ticker)
results['price_initial'].append(s_0)
results['price_final'].append(price_final)
results['Expected_return'].append(mu_hat)
results['Variance'].append(sigma2_hat)
results['std_dev'].append(sigma_hat)
#calculate the range of 1 standard deviation of what was forecasted to the actual price
minus_sigma = mu_hat - sigma_hat
plus_sigma = mu_hat + sigma_hat
acuracy = 0
if price_final >= minus_sigma and price_final <= plus_sigma:
acuracy = 1
else:
pass
time_frame[ticker].append(acuracy)
#calculate pct away forecasted mc price was from actual
pct_away = (mu_hat - price_final)/price_final
time_frame_pct[ticker].append(pct_away)
#hgram.mc_histogram(ticker, simulation_prices, 100, mu_hat, sigma2_hat, x1, y1)
return results, time_frame
except:
return 0 ,0
#read tickers from csv
tickers = pd.read_csv('sp500.csv')
tickers = tickers['Ticker'][:]
results = {'t_days':[], 'ticker':[], 'price_initial':[], 'price_final':[], 'Expected_return':[], 'Variance':[], 'std_dev':[]}
#declare forecast days
forecast_amount = 22
t_days = np.arange(1,forecast_amount,1)
#2 dictionaries for forecast timeframes
time_frame = {'t_days':t_days}
time_frame_pct = {'t_days':t_days}
#Go through tickers and add to time frame dictionaries
for i in tickers:
time_frame.update({i:[]})
time_frame_pct.update({i:[]})
value1, value2 = mc_check(i, results, time_frame, time_frame_pct, forecast_amount)
if value1 == 0:
time_frame.pop(i,None)
time_frame_pct.pop(i,None)
final_results = pd.DataFrame.from_dict(results)
time_frame = pd.DataFrame.from_dict(time_frame)
time_frame_pct = pd.DataFrame.from_dict(time_frame_pct)
time_frame = time_frame.set_index('t_days')
time_frame['1_sigma_accuracy'] = time_frame.sum(axis=1)
time_frame['1_sigma_accuracy'] = time_frame['1_sigma_accuracy']*(1/(len(time_frame.keys())))
time_frame_pct = time_frame_pct.set_index('t_days')
time_frame_pct['portfolio_accuracy'] = time_frame_pct.mean(axis=1)
# time_frame.to_csv('Laplace_output_accuracy_22days_january.csv')
# time_frame_pct.to_csv('Laplace_output_percentage_away_22days_january.csv')
# final_results.to_csv('Laplace_output_forecast_22days_january.csv')
There were 4 tests run for both Normal and Laplace. Each test used 1.5 years worth of historical data with forecasting 21 days out. Time frames used were for the year 2022 in the months January, March, June, and September. I chose to focus on the percentage away calculations rather than accuracy. The reason being my defined accuracy depended on standard deviation which increases as the forecast days get longer. Meaning the goal post essentially gets larger as the forecast days get larger which doesn’t really make a great deal of sense in defining accuracy.
Looking at the 4 tests run there are 8 graphs outputted below. The first 4 are more general in that it is the average percentage away for all tickers per forecasted day. A noticeable trend to see is that no matter the distribution, as time of forecast increases the percent from actual price diverges from 0 which would be spot on.
The last 4 graphs dives into how the monte carlo forecast did in terms of specific sectors. Again, it is still averaging the percentage away of the forecasted price with the actual price, but now we dive into the specific sector. From the results we see that energy was definitely an outlier no matter what trial was run, which makes sense as the energy sector strongly outperformed every other sector in 2022, and thus deviated from it’s historical behavior. Other than that an easy recognition is that across all trials the monte carlo forecast described Utilities the best out of all the other sectors.
When looking at the sectors for all 4 trials over 11 sectors it seems the Normal distribution was more accurate in forecasting than the more tailored Laplace distribution. Out of 4 trials the monte carlo using the Normal distribution was closer to the real price for 26/44 sector results. Many of the sector results look very similar as well, which can be reasoned by the Central Limit Theorem as the more you sample from the Laplace distribution the prcing distribution becomes more normal.
import numpy as np
import pandas as pd
from bokeh.plotting import figure, show, output_file
from bokeh.layouts import row, column, gridplot
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.transform import dodge
import time
def graph_output(df_l, df_n, sp500, title):
sector_dict_normal = {}
sectors = []
for ticker in df_n.keys():
if ticker not in ['t_days', 'portfolio_accuracy']:
wow = sp500.loc[sp500['Ticker']==str(ticker)]
wow = wow.reset_index(drop=True)
sector = wow['Sector'][0]
if sector not in sector_dict_normal.keys():
sector_dict_normal[sector] = []
sector_dict_normal[sector].append(df_n[ticker].mean())
sector_dict_laplace = {}
for ticker in df_l.keys():
if ticker not in ['t_days', 'portfolio_accuracy']:
wow = sp500.loc[sp500['Ticker']==str(ticker)]
wow = wow.reset_index(drop=True)
sector = wow['Sector'][0]
if sector not in sector_dict_laplace.keys():
sector_dict_laplace[sector] = []
sector_dict_laplace[sector].append(df_l[ticker].mean())
sectors = list(dict.fromkeys(sectors))
values = list(sector_dict_laplace.keys())
for i in values:
if len(sector_dict_laplace[i]) < 2:
sector_dict_laplace.pop(i,'None')
sector_dict_normal.pop(i,'None')
sectors = list(sector_dict_laplace.keys())
sector_results_normal = {}
for i in sector_dict_normal.keys():
if i not in sector_results_normal.keys():
sector_results_normal[i] = [sum(list(sector_dict_normal[i]))/len(sector_dict_normal[i])]
sector_results_laplace = {}
for i in sector_dict_laplace.keys():
if i not in sector_results_laplace.keys():
sector_results_laplace[i] = [sum(list(sector_dict_laplace[i]))/len(sector_dict_laplace[i])]
normal_data = pd.DataFrame.from_dict(sector_results_normal)
laplace_data = pd.DataFrame.from_dict(sector_results_laplace)
normal = normal_data.stack()
normal[0].keys()
normal_values = []
for i in normal[0].keys():
normal_values.append(normal[0][i])
laplace = laplace_data.stack()
laplace[0].keys()
laplace_values = []
for i in laplace[0].keys():
laplace_values.append(laplace[0][i])
data = {'sectors':sectors,
'normal_distribution':normal_values,
'laplace_distribution':laplace_values}
source = ColumnDataSource(data=data)
p = figure(x_range=sectors, y_range=(min(min(normal_values), min(laplace_values))-0.02, max(max(normal_values),max(laplace_values))+0.02),
title="Sector Analysis Laplace vs Normal Distribution 22 days '{}' ->".format(title), width= 800,
height=400)
p.vbar(x=dodge('sectors', -0.25, range=p.x_range), top='normal_distribution', source=source,
width=0.2, color="#35B778", legend_label="Normal Distribution")
p.vbar(x=dodge('sectors', 0.0, range=p.x_range), top='laplace_distribution', source=source,
width=0.2, color="#2171b5", legend_label="Laplace Distribution")
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.xaxis.major_label_orientation = np.pi/6
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
p.add_tools(HoverTool(
tooltips=[
( 'Normal', '@normal_distribution{%0.4f}' ),
( 'Laplace', '@laplace_distribution{%0.4f}' ), # use @{ } for field names with spaces
],
formatters={
'normal_distribution' : 'printf', # use 'datetime' formatter for 'date' field
'laplace_distribution' : 'printf', # use 'printf' formatter for 'adj close' field
# use default 'numeral' formatter for other fields
},
# display a tooltip whenever the cursor is vertically in line with a glyph
mode='vline'
))
ticker = "portfolio_accuracy"
g = figure(title = "Laplace vs Normal 22days '{}' ->".format(title), x_axis_label='time (trading days)',y_axis_label = "pct away from actual price", width = 800, height=400)
g.line(df_l['t_days'], df_l[ticker], legend_label = "Laplace", color="red")
g.line(df_n['t_days'], df_n[ticker], legend_label = "Normal", color="blue")
return p,g
sp500 = pd.read_csv('sp500_w_sectors.csv')
laplace_csv = ['Laplace_output_percentage_away_22days_january.csv','Laplace_output_percentage_away_22days_6mo_before.csv','Laplace_output_percentage_away_22days_june.csv','Laplace_output_percentage_away_22days.csv']
normal_csv = ['normal_output_percentage_away_22days_january.csv','normal_output_percentage_away_22days_6mo_before.csv','normal_output_percentage_away_22days_june.csv','normal_output_percentage_away_22days.csv']
graphs = []
portfolio_graphs = []
months = ['January', 'March', 'June', 'September']
for i in range(len(laplace_csv)):
df_l = pd.read_csv(laplace_csv[i])
df_n = pd.read_csv(normal_csv[i])
title = months[i]
p,g = graph_output(df_l, df_n, sp500, title)
graphs.append(p)
portfolio_graphs.append(g)
grid = gridplot([[portfolio_graphs[0]],[portfolio_graphs[1]], [portfolio_graphs[2]], [portfolio_graphs[3]],
[graphs[0]], [graphs[1]],[graphs[2]], [graphs[3]],])
output_file("example.html")
show(grid)