One of the major issue of quantitative investing is that it heavily relies on the historical data. When financial market is experiencing structural changes, the historical data tend not be able to catch the changes in time. Markowitz portfolio is a very common strategy for financial investor to make trivially profitable investment strategies, and usually it assumes mean returns and covariances are given based on historical data. In this project, we compare five different methods to estimate the returns and keep using historical data for covariances to help Markowitz portfolio make a better prediction under different market circumstances. More specifically, we compare the performances of constructed portfolios when using a historical weighted estimated return, a time-series estimated return, a macroeconomic factor estimated return, and two methods of time series estimated return. The assets' returns and Sharpe Ratios of the portfolio are focused when we compare the performances of the different portfolios.
Portfolio selection has been discussed widely across the industry for a long time, while the traditional Markowitz mean-variance optimization serves as a basic theory of portfolio selection. However, there are some controversial views against the model as well, \cite{michaud1989markowitz} points out that the accuracy of the model relies on prediction on return and variance of assets, while mean-variance optimization will "overweight those securities that have large estimated returns, negative correlations and small variances." This concern is natural since mean-variance optimization problem is largely depends on estimation of return and variance, while estimation error could be enlarged in the optimization framework. In this paper, we focus on exploring different methods of estimating return and test their performance on portfolio selection.
The paper composes with several parts as followed: The second section introduces the data collection and cleaning process. The third section introduces the basic framework of mean-variance optimization we used and four methods of estimating return, including exponential weighted model, factor model, ARIMA, and GARCH model.Also, the factor model includes a subsection explaining our factor selection using PCA method. In the fourth section we back-test the previous four models along with the equal-weighted portfolio as the benchmark on different time period. In the last part, we point out some potential directions of future works.
In order to construct a multi-assets portfolio, we choose nine ETFs with relatively long duration and continuous data, which includes Materials Select Sector SPDR Fund (NYSE:XLB), Energy Select Sector SPDR Fund (NYSE:XLE), Financial Select Sector SPDR (NYSE:XLF),Industrial Select Sector SPDR Fund (NYSE:XLI), Technology Select Sector SPDR Fund (NYSE:XLK), Consumer Staples Select Sector SPDR Fund (NYSE:XLP), Utilities Select Sector SPDR Fund (NYSE:XLU), Health Care Select Sector SPDR Fund (NYSE:XLV), Consumer Discretionary Select Sector SPDR Fund (NYSE:XLY). These nine ETFs reflect a wide range of assets and less sensitive to market turmoil compared to a single stock.
Beside the ETFs return data, we also need data about key fundamental marco-economics factors. Since most of the marco-economics factors are updated monthly, we consistently use monthly return in our study. We obtain monthly data from Jan 1999 to December 2018 from Yahoo Finance.
Total of 22 fundamental macro-economic factors, such as unemployment rates and consumer confidence index, were chosen initially based on monthly reports. Our team then performed a Principle Component Analysis to reduced those factors to 5 principle components.
In this study, we follow the basic Markowitz mean-variance optimization framework. Using this framework, we could either minimize risk with constraints in expected return, or maximize return with some constraint in risk. For this project, we use the return maximization framework in optimization, which is calculated as following:
Actually, in the classification part, I used logistic regression, random forest, boosting and XGboost. In order to keep the report less,
I want to choose two of them (random forest and XGboost) to discuss, but I will give a comparison of all the models I create at the end of the following section.
\begin{gather*}
\text{max } \mu_0x_0 + \mu^Tx \\
\text{s.t. } xV^Tx \leq \sigma^2\\
x_0 + e^Tx + trans.cost = 1\\
x = xx+y\\
trans.cost \sum_{i=1}^n |y_i| = total.trans.cost
\end{gather*}
Risk free rate \(\mu_0\)
: Use 3-Month Treasury Bill as the indicator for risk-free rate, and do necessary transformation when we use it in monthly frequency.
Equal-weighted benchmark portfolio
: Equal weighted portfolio gives each assets in the pool the same weight, which can be used as a baseline for risk and return.
Risk constraint \(\sigma\)
: We decide sigma by multiple risk from equal weighted portfolio with an allowable risk multiplier.
Transaction cost
: Transaction costs in trading includes all kind of aspects. For ETFs, people normally consider internal costs such as management fee, external costs generated during exchange, and tax costs.
In this study, we define transaction costs as 0.005.
Initial Wealth
>: In this study, initial wealth is defined as 10000 for all models.
Restriction on the portfolio
\[abs(x - 0.1) \leq 0.05 \]
We could manually add some constraints on the optimization problem. For example, if we expect the portfolio not put too assets in one ETF, we could add a condition on CVX solver that for each assets should compose -5\% to 15\% of the portfolio.
In order to find a reliable estimation for the $\mu_T$, the most straightforward method is by adding weighted historical mean of return:
\[ \gamma: \textrm{rate of decay} \]
\[\mu_T = \frac{1}{N} ( (1-\gamma)r_{T-1} + (1-\gamma)^2 r_{T-2} + ... + (1-\gamma)^N r_{T-N})\]
This method approximates the return in time = \(T\) by using weighted moving average of previous N period of historical return, while put more emphasis on recent data than the previous one. We could expect this method gives a reliable estimation for the reason that it only gives one step ahead approximation of the return. Holding the assumption that the market will not dramatically change in a very short period, this approximation will perform nicely if the market is not experiencing huge fluctuations and the frequency historical data is relatively high.
This method is our default Mean-Variance optimization model. (We refer it later as Markowitz).
ARIMA(p,d,q) model is a method to predict future values as linear combinations of previous data value and previous error terms. Compared to the exponential moving average method, ARIMA will choose the right amount of lag of data to include, rather than simply average over all previous weighted N data point.
To manually select the right parameters of ARIMA model, we first need to figure out if our data set is stationary. Stationary time series data has constant mean, constant variance, and finite expectation of second-order of variance. Thus, the first step will be using some techniques to transform the data set into stationary time series. Then, we could apply Autoregressive-moving-average(ARMA) model to choose the right lag and coefficient to fit the stationary time series data. Compared to simply average over a time period, ARIMA model will more precisely capture the relationship of value in time T to its previous value, thus might have better predictive power than the Weighted Moving Average method.
Although ARIMA model could handle non-stationary in mean, it doesn't work well with non-stationary variance which is very common in financing data. In GARCH model, it incorporate variances as a changing variable in predictions. It represents the reality in ETF returns that during market turmoil the variance can be extremely high,
while in other time the variance may stay relatively stable. The equation of GARCH(m,r) could be written as:
\[ x_t = \sigma_t e_t \]
\[ \sigma_t^2 = \alpha_0 + \alpha_1x_t^2 + .... + \alpha_mx_{t-m}^2 + \beta_1\sigma{t-1}^2 + ... + \beta_r\sigma{t-r}^2\]
In our study, we choose GARCH(1,1) as our prediction tools for the return.
Since there are in total 9 ETF assets, the factors we choose could not exceed that number to avoid over-fitting. In order to take all 22 factors into consideration, we perform Principal Component Analysis to reduce the dimension of our factors.
The monthly returns of all the factors have been calculated first. We choose 5 principal components, and from principle component 1 to principle component 5,
we find they could explain 0.592, 0.235, 0.069, 0.059 and 0.021 of total variance, respectively. These five principle components can explain 97.6% total variance of the return data.
According to the non-unique line of the factor load matrix, the factor rotation can be performed so that each variable has a large load on the unique factor for common factor analysis.
From the partial load matrix after rotation below, the absolute value of maximum load factors of initial features for each principle components has been used to determine which principle
component the initial factors could be classified as. The complete version of transformed loading matrix can be found in appendix.
In factor model, there are several parameter to decided. We decide to test the rebalance start date from 90, 100 and 120 after the first day of the data set and try to set number of sample to 20 and 50 in computing return averages and covariances matrix. We choose the horizon to be 1 (one month) since our data is in monthly frequency. The number of the total macro economics factors is 22, thus we perform PCA linear combination of different macro factors to condense to 5 principal component factors.
The basic equation for the factor model is:
\begin{gather*}
r_i = \alpha_i + \sum_{j=1}^{m}\beta_{ij}f_i+\epsilon_i\\
\mu = \alpha + BE(f)\\
V = BFB^T + \Delta
\end{gather*}
Since marco-economics factor returns after PCA manipulation\(f\) and actual returns \(r\) are observable, we need to calculate out \(alpha\), \(epsilon\), \(beta\) first, and then calculate delta, mean and covariance correspondingly.
In this project, we capture the relationship between assets returns and macro economics factors' returns by using linear regression, then store the information in \(alpha\), \(epsilon\), and \(beta\). Using the stored information, we are able to calculate mean and variance. The calculation is done iterative during each rebalance, and each time we move the windows of assets returns and macro economics factors' returns by frequency we specify (1 month).
Marco-economics factor model is also compared to original Mean-Variance optimization and equal-weighted benchmark. All these models are under the same parameter conditions so we could compare the performance across models.
We have already built three mean prediction models: ARIMA, GARCH and PCA marco-economics factor model. We want to compare the performance of these three models and two benchmark models described before (Markowitz model and equal weighted benchmark model). The performance of these models is different according to different parameters (start time, sample time window and number of rebalance time).
Considering 2008 financial crisis, it is a special period that need to pay attention to. From the figure 4.1.1 below, we choose the start day from January 2006
and look at the performance of five different models. There is a dramatic drop in the 32th month from January 2006, which is 2008 financial crisis period.
In general, the wealth of all models is decreasing. However, among these models, the amount of decreasing of Markowitz (the yellow line) model is the smallest,
followed by ARIMA, factor model, benchmark. The worst performance during this period is GARCH model. Moreover, the wealth difference between GARCH and other models
is very large. The main reason could be GARCH model is quite "aggressive" since the model itself will adjust the lag value no matter the number of sample
(time window) changes. The output shows that GARCH model performs poorly in fluctuated period. However, after the crisis, the GARCH model recovers back and
over-performs benchmark and factor model around peirod 90.
In this time period, the Markowitz model has a consistently good performance in 2008 and other months. The wealth from Markowitz model stays high in 2008 crisis and it remains as the best performer after that, followed by ARIMA, GARCH and benchmark models.
Regarding the factor model, although it performs better than the benchmark model in 2008, it gradually becomes the worse among the five models. This reason might be the factor model contains some leading macro economics factors that could forecast some future unknown issues. It is a conservative method of predicting returns and thus performing better during recession but extremely bad on expansion. The model is also too pessimistic to stack on some high risk but high return assets.
In conclusion, the Markowitz model is consistently the best in most of the time, while the GARCH model perform worst during market turbulence. The GARCH model, however, can quickly adjust during economic expansion. The factor model is performing well during 2008 crisis, but it could have the worse performance compare to other models under a stable market condition.
To an investor, choosing different starting time to invest could result in quite different wealth in the future while using same model to predict. From figure 4.2.1, the investment is starting at 2009 while in 4.1.1 the investment is starting at 2006. The period in figure 4.1.1 includes year 2008. The results from two different investment starting time are quite different.
If we start investment from 2009, the overall financial environment is warming up, making GARCH model becomes the best model among all others. The wealth difference between GARCH model and other models is growing larger and larger for every consecutive year. Markowitz and ARIMA model follow as the second and the third, while the worst model is still PCA factor model.
As mentioned before, GARCH model adjusts quickly to the good financial environment, so in the long run, GARCH will present good predictions in a stable financial market. Also because of the "aggressive" property of GARCH, the return of using this model is very attracting.
However, in this starting point of investment, Markowitz model has mediocre performance compared to the previous case. It could be explained that without the advantages of wealth accumulated during 2008 crisis, Markowitz has no major advantages over benchmark model and ARIMA model.
The Figure 4.3.2 below summarizes 6 different tests and its results of final wealth. Observing the table we could find that the investment started from 2006 is the lowest, then 2007, and the highest wealth is from 2009. It's because 2008 economic recession decreases the investment wealth a lot and later wealth need to cover the loss.
We set different number of samples to calculate the mean and variance in this study. In factor model, number of samples could affect the result of the coefficient,
constant and residuals of linear regression. For Markotiz model, the number of samples could affect the prediction of weighted average of mean and variance.
According to figure 4.3.1 and figure 4.1.1, the model rank based on the the wealth over 2006 to 2008 is not affected by the difference between sample size.
In other words, Markowitz model is the best model, followed by ARIMA, GARCH, benchmark and PCA factor model. The only difference is the small total wealth difference:
From figure 4.3.2, not only the total final investment wealth of sample 50 is always smaller than that of sample 20, but also the annualized rate of return.
The main reason could be that size 50 (more than 4 years) is too long to be used to predict the future return. Using size 20 could capture more recent market change
and information, which could predict the future risk asset returns better compare to use size 50.
To sum up, use different length of time window to calculate mean and covariance would not affect the rank of model performance under the three scenario we test, but could result in different prediction on return and then influence the final investment wealth.
However, in this starting point of investment, Markowitz model has mediocre performance compared to the previous case. It could be explained that without the advantages of wealth accumulated during 2008 crisis, Markowitz has no major advantages over benchmark model and ARIMA model.
The Figure 4.2.1 below summarizes 6 different tests and its results of final wealth. Observing the table we could find that the investment started from 2006 is the lowest, then 2007, and the highest wealth is from 2009. It's because 2008 economic recession decreases the investment wealth a lot and later wealth need to cover the loss.
Applying these five models to estimate return, we find that there doesn't exist an absolute best prediction across all time period and different setting in parameters. If the market does not experience huge turbulence in time, GARCH model will give the best performance on portfolio selection. The performance of Markowitz is good overall under the monthly frequency of data, with better results in financial market turmoil. PCA factor models, however, does not perform well in the long run. In general, PCA takes account all the factor into consideration instead of choosing important factor which are related to the asset returns. Although PCA performs well in explaining total variance and avoid correlation, it doesn't make PCA a good factor selection tool. ARIMA model also fails to outperform Markowitz model. It might because exponential averaging technique performs well in one-step-ahead scenario. ARIMA is more conservative than GARCH model, but has no substantially more predictive power than exponential averaging estimate of return.