Portfolio Diversification Analysis with PCA

In this article I will analyze my (long only) stock portfolio using Principal Component Analysis or PCA to see whether it is properly diversified or not. I will then use my findings to make an educated decision as to how to enhance the portfolio’s diversification if required.

The portfolio consists of equal weight positions in a number of ETFs, with some holding thousands of stocks across multiple geographies. These are listed below:

  • iShares MSCI World Momentum Factor UCITS ETF – ISIN IE00BP3QZ825
  • iShares MSCI EM UCITS ETF (Acc) – ISIN IE00B4L5YC18
  • iShares MSCI EM Asia UCITS ETF (Acc) – ISIN IE00B5L8K969
  • SPDR S&P US Dividend Aristocrats UCITS ETF – ISIN IE00B6YX5D40
  • Vanguard FTSE All-World UCITS ETF (Acc) – ISIN IE00BK5BQT80

Although one might think that holding an ETF consisting of thousands of stocks would add proper diversification to a portfolio, this is not necessarily the case. I will use PCA to prove that the above portfolio is not properly diversified and use my findings to make an educated decision as to how to enhance the portfolio’s diversification.

When talking about diversification I am talking about the number of truly unrelated investments or risk drivers the portfolio has access to. Each risk factor must be uncorrelated to all of the other risk factors in the portfolio. The more uncorrelated bets I have in my portfolio, the higher the level of diversification.

By PCA definition principal components are uncorrelated (eigenvectors are orthogonal to each other). We will compute the loadings for each principal component and look how each asset loads in each component. The ideal case would that assets load heavily on different components. I also want the number of principal components required to explain roughly 75% of the variation in the portfolio to be maximized.

Finally I will backtest my portfolio, both without and with changes after PCA. The backtesting rebalances the assets monthly (equally weighted) during the testing period and accounts for transaction costs as well (order_fees + 0.0025*dollar_amount + exchange_fees).

For my analysis I am using price data spanning from 07.2019 – 02.2021. This period was chosen because some ETFs do not date back further than 07.2019. The pricing data can be obtained from msci.com and finance.yahoo.com.

The first thing I did is to plot a correlation matrix of the monthly returns of all my assets in my portfolio. This is just to get a sense of how my invest behave and correlate with respect to each other.


From the figure above I can already see that there is a high correlation between some of the holdings in my portfolio. FTSE seems to be less correlated the rest, which is a good sign.

Next, I mean centered and scaled the prices of each investments/feature and did a PCA on the data. Since PCA yields a feature subspace that maximizes the variance along the axes, it makes sense to standardize the data. A point worth mentioning is that even though after standardization the variance of each feature will be 1, there will be directions (linear combinations of features) in which variation won’t be 1, and directions with greater variation will be selected.

Together, the first two principal components contain 97% of the information. The first principal component explains 93% of the variance and the second principal component contains 4% of the variance. The third and fourth principal component contained the rest of the variance of the dataset. Principal components are constructed in such a manner that the first principal component accounts for the largest possible variance in the data set.

Variability 93.13% 4.06% 1.83% 0.89% 0.08%
Cumulative 93.13% 97.19% 99.03% 99.92% 100.00%

For my taste I want to have relative diversification. Meaning, the number of principal components required to explain roughly 75% of the variation in the portfolio should be maximized and at least 3. Plotting the loadings I can see that most of my variables load relatively heavy on the first component.


This might indicate that the first principal component is just a proxy for equity risk. In other words all the investments provide more or less the same risk factor. That doesn’t suprise me, seeing as the first principal component describes 93% of the variance in the data. I can see that EM, EM Asia and FTSE load heavy on PC5 and PC3 respectively.

If I take two of those assets (ignoring EM at this point) and rebalance them monthly (equally weighted) I get the following portfolio performance:


I can see that when the benchmark (SPY) is down my portfolio’s performance behaves in a similar way.

Lets see what happens if I try to hedge my positions. For my hedge I will open up a position in FLOW. Flow Traders is a leading global technology-enabled liquidity provider, specialized in Exchange Traded Products (ETPs). Basically FLOW makes money when volatility is high. Let’s plot the asset correlation matrix again with FLOW included.


From the above matrix I can already see that FLOW doesn’t rally correlate with my other investments. It even has a negative correlation with some of the other investments. Lets do a PCA.


It is good to see above that FLOW does not load on the same components as the rest of my investments. Let simulate trading.


Just as I thought. When the market is down, FLOW actually pushes my portfolio up. Not bad for a hedge that also pays me a dividend while I hold. Plus it looks like a good growth story. When markets panic and investors will be dumping their positions in panic, the spreads will be huge and companies like FlowTraders will be making a good profit.

Written on August 21, 2021