GitHub Repo of project: https://github.com/RooFernando/CFRM_523_project
Note: This final project is building off the original paper, which
was replicated.
Title: Constructing Cointegrated Cryptocurrency Portfolios for
Statistical Arbitrage
Authors: Tim Leung, Hung Nguyen
The original proposal objective was to identify prime stat-arb
trading hours for Cryptocurrencies. This makes the inherent assumption
that the cointegration strategy is profitable, and we were looking for
hours of the day in which cointegrated relationships were abundant. Once
we found a period which had the most cointegrated relationships, trading
of the strategy would be done within that window of time. The assets
being tested were originally BTC, ETH, LTC, BCH, and XRP. However, this
changed to BTC, ETH, LTC, and SOL due to data quality issues on the
minute scale. Just as in the original paper, spreads consisting of
varying combinations of the available assets were traded.
Through the process this objective changed in a few ways. Originally
we were to split the project into 2 parts. Part 1 being finding a two
hour window of time that had the most cointegrated relationships between
assets. Part 2 was to backtest the cointegration strategy on the optimal
times found in part 1.
Working through the project, this methodology was flawed in a few
ways. The largest flaw was the assumption that the cointegration trading
strategy was profitable. Some spreads are profitable! But that quickly
vanishes once a reasonable commission of 0.1% per transaction was added.
We also found the need for splitting into 2 parts redundant. Optimizing
the strategy with the objective of maximizing net profit, with adjusting
the parameters of time of day to trade, and window of time to trade (how
long each day to trade) essentially combines the previously stated parts
together.
So going forward the strategy being tested was the cointegrated relationship between the assets BTC, ETH, LTC, and SOL. Adding to the the original signal of crossing upper/lower bounds of cointegrated relationships would be a signal from machine learning forecast. Once a boundary is crossed, we only enter the trade if a seperate ML forecast of the spread is also in the direction of mean reversion. Moving forward the traditional cointegration strategy will be compared against itself with the addition of ML forecast signal.
## ['BTC-USD', 'ETH-USD']
## ['BTC-USD', 'LTC-USD']
## ['BTC-USD', 'SOL-USD']
## ['ETH-USD', 'LTC-USD']
## ['ETH-USD', 'SOL-USD']
## ['LTC-USD', 'SOL-USD']
## ['BTC-USD', 'ETH-USD', 'LTC-USD']
## ['BTC-USD', 'ETH-USD', 'SOL-USD']
## ['BTC-USD', 'LTC-USD', 'SOL-USD']
## ['ETH-USD', 'LTC-USD', 'SOL-USD']
## ['BTC-USD', 'ETH-USD', 'LTC-USD', 'SOL-USD']
The above are all combinations of the assets being analyzed. There are a total of 11 spreads. We hope to find a profitable spread.
Cointegration:
Once a cointegrated relationship is identified the signal of buying and
selling is dependent on the spread crossing the established lower or
upper boundary. This boundary will be dependent on the sigma, which will
be amongst one of the parameters to optimize for, just as done in the
original paper.
Machine Learning:
Spreads will be calculated based on each DNN asset price forecast and
used for sizing in trade. Example: If spread crosses lower boundary (we
want to buy the spread) and our spread forecast of t+60 is in the
direction of mean reversion, we buy 1 unit of the spread. The strategy
is still cointegration, but we use ML for confidence in mean
reversion.
Constraints:
- Stop loss of 10%. - Trailing stop loss of 5%. - Close out trades at
later defined period of time (only trade within optimized time). -
Cointegration model is based on history lookback window (time needed for
formulation). - Buying and Selling of one Unit of a defined spread. -
Entering and exiting trade occurs: - Stopped out. - Reaching the other
boundary, example: buy 1 unit of spread at lower boundary, exit trade
once spread reaches the upper boundary.
Objective: - Maximize Net Profit. - When optimizing for parameters: history lookback, trading window, and sigma values, all optimizations on objective of Maximizing Net Profit.
Benchmark: - The original cointegration strategy will be used as the benchmark. - The cointegration strategy + ML is the contender.
## date ticker open high low close volume
## 0 2024-03-01 BTC-USD 61179.03 61240.13 61162.83 61240.13 67.244532
## 1 2024-03-01 LTC-USD 79.99 80.23 79.99 80.23 473.582634
## 2 2024-03-01 SOL-USD 125.74 126.14 125.68 126.08 1108.448844
## 3 2024-03-01 ETH-USD 3341.78 3344.53 3340.15 3343.95 164.428233
Indicator 1: Cointegration upper and lower thresholds - Applying the two step method per spread, we define a upper and lower bound defined by a sigma away from the mean in each direction.
Indicator 2: ML 60 minute forward forecast - Once spread has crossed either upper or lower boundary from the formulated cointegrated relationship, ML model is triggered to forecast 60 minutes ahead. - Model is only run for predictions when cointegration boundaries are crossed.
Testing of these indicators separately are done in the final results.
## ADF BTC : 0.0
## Phillips Perron BTC: 0.0
## KPSS BTC: 0.8014388295028906
## ADF ETH : 0.0
## Phillips Perron ETH: 0.0
## KPSS ETH: 0.7962983160499207
## ADF LTC : 9.216253748417339e-21
## Phillips Perron LTC: 0.0
## KPSS LTC: 0.07677680498885443
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LinearRegression()
## coefficients: [ 1. -17.07767596 46.483788 ]
## intercept: 7623.439888223496
Like in the paper replication we test the once differenced asset prices for stationarity. ADF and PP null hypothesis of non-stationarity are all rejected above, and KPSS null hypothesis of stationary is accepted for all 3 assets.
Our coefficients come out to be: [ 1.0, -17.07767596, 46.483788 ]
We see from the image above a cointegrated relationship between the assets BTC, ETH, and LTC over 12 hours.
Signal 1: Using Cointegration
Hypothetical Cointegrated Relationship from the calculations above:
\(S_t = 1 BTC_t - 17 ETH_t +
46LTC_t\)
Signal 2: Using Machine Learning (Disclaimer: Signal 2 is only
triggered if Signal 1 is triggered)
We follow the same hypothetical cointegrated relationship from
above.
The rules of the strategy are similar to the original paper
replication.
We test the rules incrementally by running two analysis.
Parameters available for optimization:
Walk forward analysis was implemented on optimizing for the \(\sigma\) value. The objective function was
Total Net Profit.The purpose of applying walk forward analysis is to
minimize overfitting of the parameters being optimized.
Implementation: Rolling window
\(\sigma\) =
[1,1.5,2,2.5,3,3.5,4,4.5,5]
There is opportunity for overfitting on the training set especially in regards to parameters history lookback, and window of trading. This is because the entirety of March data was used when backtesting each spread once per parameter value. There is less opportunity for overfitting for the parameter sigma as it was obtained by a robust walk forward analysis.
ML models maybe prone to overfitting as well. Though the training and validation data seem to be solid for the test set in March, performance is likely to vary on the full testing of April.
This process can be implemented on essentially any asset class. Pairs trading is quite common, we know this because most of the cohorts presentations were on pairs trading.
Adding sophistication:
Assets: BTC, ETH, LTC, SOL
History Lookback and Trading Window:
- History lookback is the amount of bars used to formulate our
cointegrated relationship.
- Trading Window is the window of time trading is permitted.
Disclaimer:
These two parameters were optimized using the training period of March
entirely. Optimizing Sigma uses the walk forward analysis.
We choose the history lookback to be set at 21 hours, as 3 spreads are profitable with the non-profitable spreads looking to be at a local minima at 21 hours.
Looking at the results we see that a good window of trading parameter can be identified at 18. Though this value doesn’t yield the highest Net Profit, it does show two spreads to be profitable. Most of the spreads are not profitable but at a trading window of 18 hours it shows the other spreads loose less money.
Walk Forward analysis was applied
Summary: The walk forward method was utilized to find optimal
parameters for sigma and reduce the likelihood of overfitting. A rolling
window of 5 days were moved through for the entire month of March.
Example of the processs of the Walk Forward method for spread
1:
Disclaimer: Though some sigma values in the table above show a “Total Net Profit” of 0 on the testing period (meaning no trades). This is not the case in the training period as backtests which resulted in 0 trades for sigma values were discarded.
Through analysis we conclude that a sigma value of 3.5 is ideal. Looking at the full data there isn’t much profitability to begin with. And the most occurring sigma values that come out of the training subsets are 3.5, and 5. However, a closer examination of those values we see that a sigma of 5 results in no trades with a Total Net Profit being $0. So an optimal sigma value of 3.5 is chosen.
We use the optimized parameters obtained from the previous section.
Backtesting was done for both \(\sigma\) values of 3.5 and 5, with the
lookback window being 21 hours, and window of trading per day being 18
hours (from midnight to 6 pm UTC).
First we use cointegration with no ML to see how our strategy played out in the month of April.
Lookback Window: 21
Window of Trading: 18
Sigma value: 3.5
Lookback Window: 21
Window of Trading: 18
Sigma value: 5
We see that the results were almost all profitable for either case of
sigma (3.5, 5).
Drawdown does generally decrease for a sigma value of 5. This can be the
result of a decrease in trading as signals are less likely to be
triggered with such a high sigma value.
Lookback Window: 21
Window of Trading: 18
Sigma value: 3.5
Lookback Window: 21
Window of Trading: 18
Sigma value: 5
Unfortunately nearly
all of the spreads remain unprofitable, with the pair “LTC-USD, SOL-USD”
slightly profiting over $0. However, 9 out of 11 spreads improved “Total
Net Profit” and “Drawdown” with the addition of the ML forecasting. The
Spreads which didn’t improve were “ETH-USD, SOL-USD” and “ETH-USD,
LTC-USD, SOL-USD”.
Again, nearly all of
the spreads remain unprofitable as above. However, 10 out of 11 spreads
improved in “Total Net Profit”, while 9 out of 11 improved in “Drawdown”
with 1 spread remaining the same.
When optimizing for parameters in March there were 3 spreads that were clearly profitable. Those 3 spreads were not profitable whatsoever in April.
In conclusion we see that adding a machine learning signal does in fact help increase profitability (even though the spreads are not profitable). When conducting the original backtests, comissions were set to 0.1%, this value was chosen as several exchanges online had this fee. However towards the end of this project a classmate Jiachen had explained to me how RobinHood does not charge commision for trading crypto. With that knowledge, some of these spreads will be profitable as they were hovering just near profitability.