The purpose of this article is to compare how portfolios constructed from different asset-allocation procedures backtest over a given period. We will be analyzing Mean-Mean Variance, Mean-Mean Absolute Deviation, and Mean-CVaR. In conclusion we saw that Mean-CVaR generally came out on top in regards to overall returns. But it comes at a cost as it also has the highest daily 5% CVaR. Portfolio Drawdown had mixed results as there was no clear winner.
Portfolios were constructed for each method and backtested to see overall returns, portfolio drawdown, and daily 5% CVaR. There were three separate years tested for each asset-allocation procedure. Each year consisted of 12 monthly portfolios generated based on each asset-allocation procedure. There will be 12 signals for each procedure from January to December. Portfolios were constructed at the first trading day of the month. During backtesting when a new portfolio needs constructing the existing portfolio will be liquidated and the new signals would be bought, both at the adjusted closing price for that day. The years tested were 2018, 2019, and 2020. The universe of assets available were 12 ETF’s that represent each sector of the market. Those assets being:
## ticker Description
## 0 XLF Finance
## 1 XLB Materials
## 2 XLI Industrials
## 3 XLY Consumer Discretionary
## 4 XLK Technology
## 5 XLE Energy
## 6 XLRE Real Estate
## 7 XLV Health Care
## 8 XLP Consumer Staples
## 9 XLU Utilites
## 10 ITOT Total US Stock
## 11 SHV Bonds
In regards to mean variance, the portfolio is optimized on the
equation below which is the vector format for minimizing portfolio
variance, based on historical returns and a user defined target
return.
Objective: \[\text{Min } \vec{w}^T
\vec{\Sigma} \vec{w}\] Constraints: \[
\Sigma_{i=1}^n w_i = 1 \] \[ w_i \geq
0\] Proof that variance of portfolio is equivalent to vector
format: \[\sigma_{p}^2=
E[(r-\bar{r})^2]\] \[ =
E[(\Sigma_{i=1}^n w_ir_i - \Sigma_{i=1}^n w_i\bar{r_i})^2]\]
\[ = E[(\Sigma_{i=1}^n w_i(r_i -\bar{r_i}))
(\Sigma_{j=1}^n w_j(r_j -\bar{r_j}))]\] \[ = E[\Sigma_{i,j=1}^n w_i
w_j(r_i-\bar{r_i})((r_j-\bar{r_j}))]\] \[ = \Sigma_{i,j=1}^n w_i w_j \sigma_{ij}\]
Expansion example for two assets (Sigma is the covariance matrix):
\[ \sigma_{12} = \sigma_{21}\] \[ \Sigma_{i,j=1}^2 w_i w_j \sigma_{ij} =
w_1^2\sigma_1^2 + w_2^2 \sigma_2^2 + 2w_1 w_2 \sigma_{12}\] \[ \vec{w^T} \vec{\Sigma} \vec{w} = (w_1 w_2)
\begin{pmatrix}
\sigma_{11} & \sigma_{12} \\
\sigma_{21} & \sigma_{22} \\
\end{pmatrix}
\begin{pmatrix}
w_1 \\
w_2
\end{pmatrix}\] \[ = w_1^2\sigma_1^2 +
w_2^2 \sigma_2^2 + 2w_1 w_2 \sigma_{12}\] This is essentially the
Markowitz model. It is trying to construct a portfolio based on
historical returns, and covariances between assets to find optimal
weights which are a byproduct of the minimization of variance. This
would be the simplest of the models.
MAD is portfolio optimization in that it tries to minimize the absolute deviation from the expected return of the assets. We can see that this method can be highly sensitive to skewness. Below is the objective function. \[\text{Min } \frac{1}{n} \Sigma_{i=1}^l |\vec{w}^T \vec{R}^T - \vec{w}^T \vec{\mu}| \] From the above equation we can see that \(\vec{\mu}\) is the expected returns for each asset based on historical returns. What’s happening is the summation of all the scenarios. The scenarios being the difference between the daily returns and the expected return for each asset being calculated and taking the absolute value of it. The optimal weights are the byproduct of minimizing the variance from the expected returns for each asset.
Example for two assets: \[\vec{\mu} =
\begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix} \] \[ \vec{w}^T = (w_1 w_2) \] \[R^T =
\begin{pmatrix}
R_{11} & R_{12} \\
R_{21} & R_{22} \\
\end{pmatrix}\]
Scenario l=1 for two assets and two returns:
\[ \frac{1}{2} [ \begin{pmatrix} w_1 \\ w_2
\end{pmatrix}
\begin{pmatrix}
R_{11} & R_{12} \\
R_{21} & R_{22} \\
\end{pmatrix}\ - (w_1 w_2) \begin{pmatrix} \mu_1 \\ \mu_2
\end{pmatrix}]\]
\[ \frac{1}{2} \Sigma |(w_1R_{11}+w_2R_{21} \text{ } w_1R_{12}+w_2R_{22}) - w_1\mu_1-w_2\mu_2| \] \[ \frac{1}{2}[|w_1R_{11}+w_2R_{21}-w_1\mu_1-w_2\mu_2 + w_1R_{12}+w_2R_{22} - w_1\mu_1-w_2\mu_2|] \] This is done over multiple scenarios with different weights in search for minimization of Mean Absolute Deviation. You can see that the summation is the daily difference from the asset return and the asset expected return. The optimal weights are that which minimize portfolio variation from the expected return of the portfolio.
Objective:
\[\text{Min } \gamma +
\frac{1}{n}\frac{1}{1-\alpha}\Sigma_{i=1}^{l}(f(x,y)-\gamma)^+\]
\[Z = f(x,y) - \gamma = \text{Loss function -
VaR}\]
Constraints:
\[\text{Loss} \geq \vec{w}^T \vec{R}^T -
\gamma\] \[\text{Loss} \geq
0\]
The Mean-CVaR method optimizes a portfolio with the least expected
shortfall given historical data and a level of risk. The \(\gamma\) is the Value at Risk and the
weights are optimized from the assets available such that Losses that
are larger than \(\gamma\) are
minimized. In the CVaR equation we see that only the scenarios with
positive values of Losses are accounted for. We are looking at the
historical assets returns and taking only into account which meet or
exceed our Value at Risk. From that we try to find the weights that
minimize the function CVaR. The \(\frac{1}{n-n\alpha}\) relates to the length
of Losses which we take into account and the percentile we are looking
at. If \(\alpha = 0.95\) we are looking
at the worst 5% of Losses.
Example of Losses:
\[w^T = (w_1 w_2)\]
\[R^T =
\begin{pmatrix}
R_{11} & R_{12} \\
R_{21} & R_{22} \\
\end{pmatrix}\]
We look at two assets with historical returns of two days to keep
things simple. The Losses are defined as below and we will expand it
out.
\[\text{Loss} = \vec{w}^T \vec{R}^T -
\gamma\]
\[\vec{w}^T \vec{R}^T - \gamma =
(-w_1R_{11}-w_2R_{21} \text{ } -w_1R_{12}-w_2R_{22}) -
\gamma\]
\[ = (-w_1R_{11}-w_2R_{21}-\gamma \text{ }
-w_1R_{12}-w_2R_{22}-\gamma)\]
The new matrix is 1x2 and represents the historical returns of the two days. In the first equation and our constraint we see that only the positive values are accounted for. This makes sense as we only care about the values that exceed our Value at Risk. It is just a sign convention to get the values that we desire for our optimization. In this case they are the positive values, which would require the returns to be negative. This is what we are after because the negative returns are the returns which make the loss function larger than \(\gamma\) our VaR. These are the values we want to account for when considering potential risk. We sum the qualified losses up and then divide by the length of the returns that qualify to calculate the average or esentially the expected value assuming equal probability. We then add this onto our \(\gamma\) which is our VaR. The variables of weight and \(\gamma\) are iterated through programatically until the value of CVaR is minimized. Once CVaR is minimized the weights associated with that minima are what are considered optimal.
There are three scripts to this experiment.
The experiment is testing how each asset-allocation procedure performs over the course of a year. Three different years were tested (2018,2019,2020). Each year needs 12 monthly signals that need to be outputted per asset-allocation procedure. The signal generation script outputs the signals as csv’s, which can then be fed into the backtest for testing. Below is an example output for CVaR signal generation for the year 2018. Optimization of weights are done using python library CVXPY. CVXPY is a library which can solve convex optimization problems.
## 2018-01-02 2018-02-01 2018-03-01 ... 2018-10-01 2018-11-01 2018-12-03
## Ticker ...
## XLF 0.090007 0.146930 0.116298 ... 0.000000 0.000000 0.000000
## XLB 0.000000 0.000000 0.000000 ... 0.000000 0.136735 0.125592
## XLI 0.000000 0.000000 0.000000 ... 0.042591 0.000000 0.000000
## XLY 0.000000 0.006235 0.117790 ... 0.051193 0.000000 0.000000
## XLK 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000
## XLE 0.026607 0.000000 0.000000 ... 0.088018 0.031377 0.025869
## XLRE 0.000000 0.007629 0.015912 ... 0.158186 0.161051 0.167260
## XLV 0.183865 0.089207 0.000000 ... 0.014311 0.000000 0.014223
## XLP 0.250000 0.250000 0.250000 ... 0.145700 0.170837 0.167055
## XLU 0.199521 0.250000 0.250000 ... 0.250000 0.250000 0.250000
## ITOT 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.000000
## SHV 0.250000 0.250000 0.250000 ... 0.250000 0.250000 0.250000
##
## [12 rows x 12 columns]
The Backtest script takes the csv of the signals generated from the previous script and sees how it plays out using a $100,000 portfolio. It tracks the amount of assets held, the weight of each asset, and overall worth of portfolio all at daily increments. At the end these stats are outputed to a csv which is fed into the graphing script. Portfolio turnover is done on the first trading day of the month which lines up with the new generated signals, all transactions are done with the adjusted close for that day. No trading costs are accounted for. An output file of CVaR 2018 backtest is shown below:
## 2018-01-02 2018-01-03 ... 2018-12-28 2018-12-31
## asset ...
## XLF 359.005552 359.005552 ... 0.000000 0.000000
## XLB 0.000000 0.000000 ... 264.791407 264.791407
## XLI 0.000000 0.000000 ... 0.000000 0.000000
## XLY 0.000000 0.000000 ... 0.000000 0.000000
## XLK 0.000000 0.000000 ... 0.000000 0.000000
## XLE 46.388814 46.388814 ... 51.164724 51.164724
## XLRE 0.000000 0.000000 ... 607.863390 607.863390
## XLV 240.429691 240.429691 ... 17.176730 17.176730
## XLP 508.347059 508.347059 ... 356.118280 356.118280
## XLU 451.994001 451.994001 ... 546.748249 546.748249
## ITOT 0.000000 0.000000 ... 0.000000 0.000000
## SHV 243.924333 243.924333 ... 255.976008 255.976008
## Total_value 100000.000000 100098.527047 ... 100361.736956 100632.945827
##
## [13 rows x 251 columns]
For all years of 2018, 2019, 2020 graphs of portfolio networth, portfolio drawdown, and daily portfolio 5% CVaR were created. Scrolling through each year you can see that CVaR came out on top in terms of portfolio networth. However it did consistently have larger daily 5% CVaR compared to Mean-Variance, and MAD. We can possibly see this as high risk high reward.
Overall there were interesting results of how the three asset-allocation procedures played out. Through backtesting it seems that CVaR is the most desirable, and MAD being the least desirable. I would like to make note that the universe of assets used in this experiment was quite small, but the same methodology can be applied in a larger universe. Since these methods are essentially used for reducing risk I would in the future like to test it out on strategies with and without using CVaR for weight optimization. Ideally I would like to see the trade off between return and risk.
import numpy as np
import pandas as pd
import cvxpy as cp
import eod_ohlc_pull as eod
import pandas_market_calendars as mcal
import datetime as dt
import time
def df_consolidate(start, end, tickers):
data = {}
for i in tickers:
df = eod.ohlc(start, end, i)
df = pd.DataFrame.from_dict(df)
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by='date')
df = df.reset_index(drop=True)
if i == 'XLF':
date = dt.datetime(2020,2,17)
df = df.loc[df['date'] != date]
df = df.reset_index(drop=True)
df = df[['adjusted_close']]
df['pct_change'] = df['adjusted_close'].pct_change()
df = df.dropna()
data[i] = df['pct_change']
data = pd.DataFrame.from_dict(data)
return data
def stats(df):
R = []
mu = []
for i in df.keys():
R.append(df[i])
mu.append(df[i].mean())
mu = np.array(mu)
R = np.array(R)
R = R.T
X =[]
for i in df.keys():
X.append(list(df[i]))
covariance = np.cov(X) # Plot an assets correlation matrix
return mu, covariance, R
def min_variance(mu, covariance, R):
n_assets = len(mu)
w = cp.Variable(n_assets)
objective = cp.Minimize(cp.quad_form(w, covariance))
constraints = [
cp.sum(w) == 1.0,
w.T @ mu >= 0.000,
w >= 0,
w <= 0.25
]
prob = cp.Problem(objective, constraints)
prob.solve()
return w.value
def MAD(mu, covariance, R):
n_assets = len(mu)
w = cp.Variable(n_assets)
objective = cp.Minimize(cp.sum(cp.abs(w @ R.T - (w.T @ mu))))
constraints = [
cp.sum(w) == 1,
w.T @ mu >= 0.000,
w >= 0,
w <= 0.25,
]
prob = cp.Problem(objective, constraints)
prob.solve()
return w.value
def cVar(mu, covariance, R):
n_assets = len(mu)
w = cp.Variable(n_assets)
gamma = cp.Variable(1)
Loss = cp.Variable(len(R))
objective = cp.Minimize(gamma + (1/len(R))*(1/(1-0.95))*(cp.sum(Loss)))
# Portfolio constraints:
constraints = [
cp.sum(w) == 1,
w.T @ mu >= 0.000, #0.00026
w >= 0,
w <= 0.25,
Loss >= -w.T @ R.T - gamma,
Loss >= 0.0,
]
prob = cp.Problem(objective, constraints)
prob.solve()
return w.value
def signal_mean_variance(T, tickers):
signals = {'Ticker':[]}
for i in tickers:
signals['Ticker'].append(i)
for i in T:
signals[i[1]] = []
for i in T:
start = i[0]
end = i[1]
df = df_consolidate(start, end, tickers)
mu, covariance, R = stats(df)
print(R[0].T)
print(df.head())
time.sleep(100)
weights = min_variance(mu, covariance, R)
for j in range(len(weights)):
signals[i[1]].append(round(weights[j],6))
signals = pd.DataFrame.from_dict(signals)
return signals
def signal_MAD(T, tickers):
signals = {'Ticker':[]}
for i in tickers:
signals['Ticker'].append(i)
for i in T:
signals[i[1]] = []
for i in T:
start = i[0]
end = i[1]
df = df_consolidate(start, end, tickers)
mu, covariance, R = stats(df)
weights = MAD(mu, covariance, R)
for j in range(len(weights)):
signals[i[1]].append(round(weights[j],6))
signals = pd.DataFrame.from_dict(signals)
return signals
def signal_CVaR(T, tickers):
signals = {'Ticker':[]}
for i in tickers:
signals['Ticker'].append(i)
for i in T:
signals[i[1]] = []
for i in T:
start = i[0]
end = i[1]
df = df_consolidate(start, end, tickers)
mu, covariance, R = stats(df)
weights = cVar(mu, covariance, R)
for j in range(len(weights)):
signals[i[1]].append(round(weights[j],6))
signals = pd.DataFrame.from_dict(signals)
return signals
start = '2019-01-01'
end = '2019-12-31'
nyse = mcal.get_calendar('NYSE')
early = nyse.schedule(start_date= start, end_date=end)
first_trading_days = early.groupby(early.index.month).first()
start = '2020-01-01'
end = '2020-12-31'
nyse = mcal.get_calendar('NYSE')
early = nyse.schedule(start_date= start, end_date=end)
first_trading_days_2 = early.groupby(early.index.month).first()
T = []
for i in range(1,len(first_trading_days)+1):
value_1 = first_trading_days['market_open'][i].date()
value_2 = first_trading_days_2['market_open'][i].date()
T.append([value_1,value_2])
etf = pd.read_csv('stuff_application/tickers_etf.csv')
tickers = etf['ticker']
mv = signal_mean_variance(T, tickers)
mad = signal_MAD(T, tickers)
cvar = signal_CVaR(T, tickers)
#print(mv, mad, cvar)
#mv.to_csv('signals_2020_MV.csv')
#mad.to_csv('signals_2020_MAD.csv')
#cvar.to_csv('signals_2020_CVaR.csv')
import numpy as np
import pandas as pd
import pandas_market_calendars as mcal
import eod_ohlc_pull as eod
import datetime as dt
import time
def my_db_price(start, end, tickers):
db = {}
df = eod.ohlc(start, end, 'SPY')
df = pd.DataFrame.from_dict(df)
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by='date')
df = df.reset_index(drop=True)
db['date'] = list(df['date'])
for i in tickers:
df = eod.ohlc(start, end, i)
df = pd.DataFrame.from_dict(df)
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by='date')
df = df.reset_index(drop=True)
if i == 'XLF':
date = dt.datetime(2020,2,17)
df = df.loc[df['date'] != date]
df = df.reset_index(drop=True)
db[i] = list(df['adjusted_close'])
my_db = pd.DataFrame.from_dict(db)
return my_db
def df_price(start, end, ticker):
df = eod.ohlc(start, end, ticker)
df = pd.DataFrame.from_dict(df)
price = df['adjusted_close'][0]
return price
def portfolio_calculations_rebalance(weights, tickers, P_assets, t_day, i):
P_assets[t_day] = []
if i == 0: # First trading day to construct portfolio
initial_value = 1e5
sum_val = 0.0
for i in range(len(tickers)):
#price = df_price(t_day, t_day, tickers[i])
price = my_db[tickers[i]].loc[my_db['date'] == t_day]
price = price[0]
asset_amount = (float(weights[i])*initial_value)/float(price)
sum_val += price*asset_amount
P_assets[t_day].append(asset_amount)
P_assets[t_day].append(sum_val)
else:
initial_value = P_assets[list(P_assets.keys())[-2]][-1]
sum_val = 0.0
for i in range(len(tickers)):
#price = df_price(t_day, t_day, tickers[i])
price = my_db[tickers[i]].loc[my_db['date'] == t_day]
price = price.reset_index(drop=True)
price = price[0]
asset_amount = (float(weights[i])*initial_value)/float(price)
sum_val += price*asset_amount
P_assets[t_day].append(asset_amount)
P_assets[t_day].append(sum_val)
return P_assets
def portfolio_calculations_daily(tickers, P_assets, t_day):
previous_day = list(P_assets[list(P_assets.keys())[-1]])
P_assets[t_day] = previous_day
sum_val = 0.0
for i in range(len(tickers)):
#price = df_price(t_day, t_day, tickers[i])
price = my_db[tickers[i]].loc[my_db['date'] == t_day]
price = price.reset_index(drop=True)
price = price[0]
sum_val += price*P_assets[t_day][i]
P_assets[t_day][-1] = sum_val
return P_assets
start = '2020-01-01' #*****
end = '2020-12-31' #*****
nyse = mcal.get_calendar('NYSE')
early = nyse.schedule(start_date= start, end_date=end)
first_trading_days = early.groupby(early.index.month).first()
first_of_month = []
for i in range(1,len(first_trading_days)+1):
first_of_month.append(first_trading_days['market_open'][i].date())
signals = pd.read_csv('/Users/roo_mac_mini/Library/CloudStorage/OneDrive-Personal/CFRM503/project_1/stuff_application/2020/signals_2020_CVaR.csv') #*****
signals = signals.drop('Unnamed: 0', axis=1)
values = signals.keys() #1st of the month, re-balancing days
tickers = list(signals['Ticker'])
my_db = my_db_price(start, end, tickers)
P_value_weights = {'asset':list(signals['Ticker'])} # Portfolio weights
P_assets = {'asset':list(signals['Ticker'])}
P_assets['asset'].append('Total_value')
for i in range(len(early)): # Iterate through the calendar year
t_day = str(early['market_open'][i].date()) # Day we are on
print(t_day)
# This is a re-balancing day
if t_day in signals.keys():
weights = list(signals[t_day]) # Extract what the weights should be for the new re-balance
P_value_weights[t_day] = weights
P_assets = portfolio_calculations_rebalance(weights, tickers, P_assets, t_day, i)
continue
P_value_weights[t_day] = weights
# Calculate worth of portfolio
P_assets = portfolio_calculations_daily(tickers, P_assets, t_day)
df = pd.DataFrame.from_dict(P_assets)
print(df)
df.to_csv('/Users/roo_mac_mini/Library/CloudStorage/OneDrive-Personal/CFRM503/project_1/stuff_application/2020/first_run_CVaR_2020.csv') #*****
dw = pd.DataFrame.from_dict(P_value_weights)
dw.to_csv('/Users/roo_mac_mini/Library/CloudStorage/OneDrive-Personal/CFRM503/project_1/stuff_application/2020/first_run_CVaR_weights_2020.csv') #*****
import pandas as pd
import numpy as np
from bokeh.plotting import figure, output_file, show, save
from bokeh.layouts import column, gridplot
from bokeh.models import ColumnDataSource, HoverTool
import portfolio_stats as p_stats
mv = pd.read_csv('/Users/roo_mac_mini/Library/CloudStorage/OneDrive-Personal/CFRM503/project_1/stuff_application/1_Year/2019/first_run_MV_2019.csv')
mad = pd.read_csv('/Users/roo_mac_mini/Library/CloudStorage/OneDrive-Personal/CFRM503/project_1/stuff_application/1_Year/2019/first_run_MAD_2019.csv')
cvar = pd.read_csv('/Users/roo_mac_mini/Library/CloudStorage/OneDrive-Personal/CFRM503/project_1/stuff_application/1_Year/2019/first_run_CVaR_2019.csv')
mv = mv.drop('Unnamed: 0', axis=1)
mad = mad.drop('Unnamed: 0', axis=1)
cvar = cvar.drop('Unnamed: 0', axis=1)
mv = mv.set_index('asset')
mad = mad.set_index('asset')
cvar = cvar.set_index('asset')
dates = []
for i in mv.keys():
if i != 'asset':
dates.append(i)
dates = pd.to_datetime(dates)
data = {'dates': list(dates),
'MV':list(mv.loc['Total_value']),
'MAD':list(mad.loc['Total_value']),
'CVaR':list(cvar.loc['Total_value'])}
source = ColumnDataSource(data=data)
p = figure(title="2019 Risk Measures Comparison", x_axis_type = 'datetime', x_axis_label='date', y_axis_label='Value', width=1000) #*****
p.line(x='dates',y='MV',source=source, legend_label="MV", line_width=2, color='blue')
p.line(x='dates',y='MAD',source=source, legend_label="MAD", line_width=2, color='green')
p.line(x='dates',y='CVaR',source=source, legend_label="CVaR", line_width=2, color='red')
p.add_tools(HoverTool(
tooltips=[
( 'MV', '@MV{$0.2f}'),
( 'MAD', '@MAD{$0.2f}' ),
( 'CVaR', '@CVaR{$0.2f}' ),
],
formatters={
'MV' : 'printf',
'MAD' : 'printf',
'CVaR' : 'printf',
},
# display a tooltip whenever the cursor is vertically in line with a glyph
#mode='vline'
))
files_weights = ['1_Year/2019/first_run_MV_weights_2019.csv', '1_Year/2019/first_run_MAD_weights_2019.csv','1_Year/2019/first_run_CVaR_weights_2019.csv']
files_values = ['1_Year/2019/first_run_MV_2019.csv', '1_Year/2019/first_run_MAD_2019.csv','1_Year/2019/first_run_CVaR_2019.csv' ]
r_cvar = p_stats.risk_stats(files_weights)
draw_down = p_stats.draw_down(files_values)
#output_file("/Users/roo_mac_mini/Library/CloudStorage/OneDrive-Personal/CFRM503/project_1/stuff_application/1_Year/2019/2019_1_year_history.html")
#save(p)
grid = gridplot([[p],[draw_down],[r_cvar],])
show(grid)