Exploring Risk Analytics using PCA with Python

5 min readJul 3, 2020

The domain of risk analytics is gaining significant importance in the recent times. Almost every financial daily and periodicals on capital markets have significant coverage of risk management. Over the years, the financial industry has witnessed innovative products being introduced to the market. In the past, market participants, investors and banks alike, have eagerly traded such novel products, earning handsome payoffs and on certain occasions getting their hands burnt. New financial product entails new types of risks. Over the last decade, especially post the 2008 meltdown, industry participants have been ramping up their risk infrastructure in order to better manage risks and stay clear of being caught unawares when the market scenario changes for the worse. Risk systems in organizations have seen a significant investment thereby echoing the concerns of the management towards ensuring sound management of risks.

Innovation has made significant strides in the field of Risk analytics. Advanced mathematical and statistical concepts are being applied to develop sophisticated models. One such concept that is borrowed from linear algebra is the concept of Principal Component Analysis (hereinafter also referred to as ‘PCA’). PCA has found application in many areas of finance including yield analysis, risk management etc.

In one of my earlier posts we had introduced ourselves to the concept of PCA. Here is the link to that post for reference: https://medium.com/@abhyankar.ameya/principal-component-analysis-for-finance-b18ce112d3ab

In this post we will discuss the practical implementation of PCA to risk analytics using Python programming

Input data:

1. Historical time series of the interest rate data. We have considered a hypothetical dataset of time series of interest rates across tenors spanning 3 months uptil 10 years.

2. We have assumed that we have a model for measuring the PVBP for our hypothetical bond portfolio and that we have the tenor wise data for pv01 as on a particular date.

Above files have been saved on Github at: https://github.com/Ameya1983/TheAlchemist

The algorithm for measuring risk of the portfolio using the concept of PCA has been implemented in Python as follows:

a. Importing required libraries in Python:

In this step we import the libraries that will be required in our program. Below is the set of libraries we will use:

import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale

b. Standardization:

We have standardized the interest rate time series data for it to be consumed by the program for further analysis. Accuracy of algorithm is a function of the accuracy of inputs. So, in the very first step of the algorithm, we perform a standardization which results in all variables getting transformed to a same scale. It has been implemented below:

data = pd.read_csv(‘F:\\PCA for Risk Analytics\\MarketData.csv’)
x = pd.DataFrame(data)
df = x.drop(axis=1,columns=[‘Date’])
X = df.values
#Normalization of the data
X = scale(X)

c. Computing factor loadings and contribution of variance as explained by the Principal components

Factor loadings can be calculated as below:

pca = PCA(n_components=9)
pca.fit(X)
factor_loading = pca.components_
df_factor_loading = pd.DataFrame(factor_loading)

Factor loadings explain the relation between the impact of a factor on interest rates at respective tenor points.

In PCA we also analyse the amount of dispersion explained by each of the PCs. Idea of PCA is to leverage the concept of dimensionality reduction. Dimensionality reduction implies, we attempt to capture the essence of a dataset into fewer number of variables that would explain the required result. So subsequent to generation of individual PCs, only those PCs are selected that explain the maximum variation thereby capturing the essence of the analysis. Machine learning libraries like scikit-learn support this computation.

Therefore, now we will see which PC contributes how much amount of variance/dispersion.

variance percent of each PC
variance_percent_df = pd.DataFrame(data=pca.explained_variance_)
variance_ratio_df = pd.DataFrame(data=pca.explained_variance_ratio_)
variance_ratio_df = variance_ratio_df * 100

The output is as below:

From the table alongside, we observe that PC1 explains almost 96% of the total variation, and PC2 explains close to 1.95% of total variation. Therefore, rather than using all PCs in the subsequent calculation, we will only use PC1 and PC2 in further calculation as these two components explain close to 98% of the total variance.

PC1 corresponds to the roughly the parallel shift in the yield curve.

2. PC2 corresponds to roughly a steepening in the yield curve.

This is in-line with the theory of fixed income risk measurement which states that majority of the movement in the price of a bond is explained by the parallel shift in the yield curve and the residual movements in the price is explained by steepening and curvature of the interest rate curve

We will neglect the remaining PCs from further analysis. This substantially reduces the load on the system resources, since now, the system will use only those PCs as have been chosen by the algorithm. This enables freeing up of valuable system resources which can be used for other productive purposes.

d. Interest Rate sensitivity data:

The bond portfolio’s interest rate sensitivity is read into the program as below:

# Read portfolio specific data
portfolio_data = pd.DataFrame(data=pd.read_csv(‘F:\\ PCA for Risk Analytics\\PortfolioData.csv’))

This dataframe will contain the portfolio specific numbers pertaining to interest rate sensitivity.

e. Risk measurement of the portfolio:

For risk calculation, the respective factor loading (from step ‘c’ above) is multiplied by the PV01 for that specific tenor (from step ‘d’ above). This result is then multiplied by the variance of the factor loadings (from step ‘c’ above). We will do this for both the PCs namely PC1 and PC2 .

Standard deviation of the portfolio is computed as below:

σ (portfolio) = √ (w1² * σ1² + w2² * σ2²)

where,

w1, w2 are a product of: factor loadings of respective PCs and the pv01 numbers for respective tenors, and then squaring this result to get w1² and w2².

σ1², σ2² are: the variance of the factor loadings for PC1 and PC2 respectively.

It should be noted that there is no correlation term involved in the formula of σ (portfolio). The reason for this is that the PCs are independent of each other and thus correlation between them is 0.

Next, let’s assume we are calculating the 99% 1-day VaR of the portfolio, then the portfolio risk is given by formula σ (portfolio) * 2.33.

#calculation dataframe
df_calculation = pd.DataFrame()
df_calculation[‘PortfolioData’] = portfolio_data.iloc[:,1]
df_calculation[‘PC1’] = df_factor_loading.iloc[:,0]
df_calculation[‘PC2’] = df_factor_loading.iloc[:,1]
df_calculation[‘Result1’] = df_calculation[‘PC1’] * df_calculation[‘PortfolioData’]
df_calculation[‘Result2’] = df_calculation[‘PC2’] * df_calculation[‘PortfolioData’]
result1 = ((df_calculation[‘Result1’].sum())**2) * variance_percent_df.iloc[0]
result2 = ((df_calculation[‘Result2’].sum())**2) * variance_percent_df.iloc[1]
portfolio_risk = np.sqrt(result1+result2) * 2.33 #99 percentile value
print(“portfolio risk:”, portfolio_risk)

We thus get an estimate of portfolio risk measure as an output after running the above code snippets. This post shows a way to implement a sophisticated concept like PCA and apply it to finance. By leveraging Python libraries and other functionalities of the language, we can execute the tedious looking linear algebra calculations with ease and speed. Python code is readable and thus appeals to majority of people.

Exploring Risk Analytics using PCA with Python

Written by Ameya Abhyankar