# Tutorial: testing factor models with `mosaicperm`

¶

## 1. Review of mosaic factor tests¶

For observations \(t=1, \dots, T\), suppose we observe asset returns \(Y_t \in \R^p\) which we believe follow the factor model:

where:

\(X_t \in \R^k\) are unobserved factor variables.

\(L_t \in \R^{p \times k}\) are

*prespecified*factor exposures. For example, many financial firms such as MSCI and BlackRock routinely distribute exposure matrices for financial risk models.\(\epsilon_t \in \R^p\) are residuals which capture the variance in \(Y_t\) which cannot be explained by the factors.

The goal of a factor model is to explain common variation among the outcomes. Thus, we say that a factor model fits if the components of \(\epsilon_t\) are independent:

`mosaicperm`

contains methods to *test* this null hypothesis, i.e., to test if a factor model reliably explains the common correlations among outcomes. These mosaic permutation tests have a few desirable properties:

They allow the use of black-box machine learning/statistical algorithms to find violations of the null.

This property helps analysts quickly understand when (e.g.) a financial risk model is no longer reliable.

They exactly control the false positive rate, i.e., the probability of rejecting a well-fitting factor model.

This property helps prevent analysts from unnecessarily rebuilding accurate models, which can waste resources and increase risk.

**Note**: `mosaicperm`

is designed to test if \(\mcH_0\) holds for a “prespecified” choice of exposures \(L_t\), i.e., whether one’s own factor model actually fits the data. Other methods often test whether \(\mcH_0\) holds for some unknown choice of \(L_t \in \R^{p \times k}\). This latter problem has does not accomplish our goal: if one’s own risk model is highly misspecified, it is of little comfort to know that some unknown choice of factor loadings is well-specified.

## 2. Example application¶

Below, we illustrate the use of `mosaicperm`

on a toy factor risk model for US equity data from 2018 through 2022. This “toy model” is not a serious model, but unlike real models used in industry (e.g. from BlackRock/MSCI), using it allows us to share the data publicly.

There are three key inputs to the analysis: - The `outcomes`

: a `n_timepoints`

x `n_assets`

array of (log) returns for each asset, i.e., \(Y_t\) in the notation above. - The `exposures`

: `n_assets`

x `n_factors`

array of exposures for each asset. This can also be a 3D array of the form `n_timepoints`

x `n_assets`

x `n_factors`

if the exposures change with time, although in this application they do not. - A callable test statistic (`test_stat`

) which takes an array of
estimated residuals as an input and produces one or more test statistics measuring evidence against the null as an output. In this analysis, we use `mosaicperm.statistics.mean_maxcorr_stat`

, which returns the maximum estimated absolute correlation between asset \(j\) and all other assets, averaged over all assets \(j\).

### 2.1 Load sample dataset¶

```
[1]:
```

```
import sys
sys.path.insert(0, "../../")
import numpy as np
import pandas as pd
import mosaicperm as mp
from datetime import datetime
# Load sample dataset from mosaicperm
returns, exposures = mp.utilities.load_sample_dataset()
returns = returns.loc[returns.index >= datetime(year=2018, month=1, day=1)]
logreturns = np.log(returns)
```

The log returns are for 488 stocks in the S&P 500 from 2018 through 2022.

```
[2]:
```

```
print(logreturns[['A', 'AAL', 'AAP', 'OGN']].head().to_markdown())
```

```
| Date | A | AAL | AAP | OGN |
|:--------------------|------------:|-------------:|------------:|------:|
| 2018-01-02 00:00:00 | -0.00936318 | -0.0182828 | -0.0622224 | nan |
| 2018-01-03 00:00:00 | -0.0251255 | 0.0123423 | -0.00900829 | nan |
| 2018-01-04 00:00:00 | 0.00752967 | -0.0062851 | -0.0362342 | nan |
| 2018-01-05 00:00:00 | -0.0158619 | 0.000379731 | -0.0105745 | nan |
| 2018-01-08 00:00:00 | -0.00214365 | 0.00992565 | 0.00706718 | nan |
```

Note there is some missing data, e.g., “OGN” returns are not available until mid 2021. `mosaicperm`

will automatically handle this.

The exposures include indicator factors for various sectors of the US economy, as well as indicators for REITs, common stocks, and large/mid cap stocks, e.g.:

```
[3]:
```

```
print(exposures.iloc[:, 0:3].head().to_markdown())
```

```
| Symbol | communication services | consumer discretionary | consumer staples |
|:---------|-------------------------:|-------------------------:|-------------------:|
| A | 0 | 0 | 0 |
| AAL | 0 | 0 | 0 |
| AAP | 0 | 1 | 0 |
| AAPL | 0 | 0 | 0 |
| ABBV | 0 | 0 | 0 |
```

### 2.2 Test goodness of fit for the whole model¶

Below we show how to test goodness-of-fit for the whole model.

```
[4]:
```

```
import matplotlib.pyplot as plt
# Initialize model
mpt = mp.factor.MosaicFactorTest(
outcomes=logreturns.values,
exposures=exposures.values,
test_stat=mp.statistics.mean_maxcorr_stat,
)
# Fit and make summary plot
fig, ax = mpt.fit().summary_plot()
# Adjustments for the docs
fig.patch.set_facecolor("white")
plt.tight_layout()
plt.show()
```

The p-value is highly significant, suggesting (unsurprisingly) that this toy model does not fully explain correlations among asset returns. We can also repeat this analysis in a sliding window across time:

```
[5]:
```

```
mpt.fit_tseries(nrand=500, n_timepoints=10, window=200)
fig, axes = mpt.plot_tseries(
time_index=returns.index, show_plot=False,
)
fig.patch.set_facecolor("white")
for ax in axes:
ax.tick_params(labelrotation=45)
plt.tight_layout()
plt.show()
```

We find above that at all timepoints the model does not fit properly. Note that above:

The exact Z-statistic is the transformation \(\Phi^{-1}(1 - p_{\mathrm{val}})\), which is stochastically dominated by a \(N(0,1)\) under the null. Since the p-value is discrete, the exact Z-statistic is truncated at an upper limit based on the number of computed permutations, in this case \(\Phi^{-1}(500/501) \approx 2.87\).

The approximate Z-statistic is the normalized test statistic value \(Z_{\mathrm{approx}} = (S - E[S]) / \sqrt{\mathrm{Var}(S)}\), where \(S\) is the observed test statistic and the expectation and variance are taken under the null permutation distribution. This statistic provably has mean zero and unit variance under the null, but it may not be normally distributed. However, unlike the exact Z-statistic, it is not truncated at an upper limit.

### 2.3 Using a custom test statistic¶

`mosaicperm`

makes it easy to plug in a custom test statistic. For example, below we define a custom maximum correlation statistic based on a Ledoit-Wolf covariance estimate.

```
[6]:
```

```
from sklearn.covariance import LedoitWolf
def custom_test_stat(residuals):
"""
Custom test statistic using LedoitWolf estimation.
"""
# Fit LW estimator
lw = LedoitWolf().fit(residuals)
cov = lw.covariance_
# Construct correlation matrix
scale = np.sqrt(np.diag(cov))
scale[scale == 0] = 1
corr = cov / np.outer(scale, scale)
# Return maximum absolute correlation
return np.abs(corr - np.eye(corr.shape[0])).max()
```

```
[7]:
```

```
mpt_lw = mp.factor.MosaicFactorTest(
outcomes=logreturns.values,
exposures=exposures.values,
test_stat=custom_test_stat,
)
mpt_lw.fit(nrand=100).summary()
```

```
[7]:
```

```
statistic 0.744690
null_statistic_mean 0.267050
p_value 0.009901
dtype: float64
```

### 2.4 Test goodness of fit sector-by-sector¶

To understand the null violations, we can also apply the mosaic permutation test to various individual sectors within the economy, as is done below.

```
[8]:
```

```
np.random.seed(1234)
from tqdm.auto import tqdm
# Sectors
all_sectors = exposures.columns[0:11].tolist()
# Initialize output
observed_stats = []
null_stats = []
# Loop through and run sector-by-sector
for sector in tqdm(all_sectors):
# Only consider the assets in th esector
flags = exposures[sector] == 1
mpt_sector = mp.factor.MosaicFactorTest(
outcomes=logreturns.T[flags].T.values,
exposures=exposures.loc[flags].values,
test_stat=mp.statistics.mean_maxcorr_stat,
)
# Fit
mpt_sector.fit(nrand=1000, verbose=False)
pval = np.around(mpt_sector.pval, 3)
# Save core output
sector_name = sector.capitalize() + f"\n({int(flags.sum())} assets, pval={pval})"
observed_stats.append(
[mpt_sector.statistic, sector_name, 'Observed test statistic']
)
# Save null statistics
null_df = pd.DataFrame(mpt_sector.null_statistics, columns=['statistic'])
null_df['sector'] = sector_name
null_df['label'] = 'Null statistics'
null_stats.append(null_df)
# Concatenate dataframes
observed_stats = pd.DataFrame(
observed_stats, columns=['statistic', 'sector', 'label']
)
null_stats = pd.concat(null_stats)
```

The plot below finds strong evidence against the null in some sectors. In others, we are unable to reject the null in others. Of course, the latter results do not prove the null holds; they could also indicate a lack of statistical power.

```
[16]:
```

```
# Plot results with plotnine
from plotnine import *
g = (
ggplot(
null_stats,
aes(x='statistic', fill='label')
)
+ geom_histogram(bins=20, color='black')
+ geom_vline(
data=observed_stats,
mapping=aes(xintercept='statistic', fill='label'),
size=1, color='red',
)
+ facet_wrap("~sector", scales='free', ncol=3)
+ theme_bw()
+ theme(
figure_size=(9, 12.5),
axis_text_x=element_text(size=9, rotation=25),
legend_position='top'
)
+ labs(x='Mean Max. Corr. Statistic', fill='', y='Count')
+ scale_fill_manual(['cornflowerblue', 'red'])
)
print(g)
```

```
```

### 2.5 Using an \(R^2\) for out-of-sample prediction error as the test statistic¶

One practical way to test \(\mcH_0\) is to test whether several augmented model candidates improve model fit. This can be done by splitting the data, estimating new candidate exposures on the first fold of the data, and using the `MosaicBCV`

class to test for significance on the second fold of the data, as shown below.

```
[8]:
```

```
# estimate candidate exposures on the first half of the data via sparse PCA
n0 = int(0.5 * logreturns.shape[0])
resid0 = mp.factor.ols_residuals(logreturns.fillna(0).values[0:n0], exposures.values)
new_exposure_candidates = mp.statistics.approximate_sparse_pcas(
np.corrcoef(resid0.T), quantiles=np.array([0.1, 0.5, 0.9])
)
new_exposure_candidates.shape
```

```
[8]:
```

```
(3, 488)
```

Above, `new_exposure_candidates`

contains 3 potential additional exposure vectors which are \(10\%, 50\%\) and \(90\%\) sparse.

```
[9]:
```

```
# Check if any of the new exposures improve model fit
np.random.seed(123)
mpt_bcv = mp.factor.MosaicBCV(
outcomes=logreturns.iloc[n0:].values,
exposures=exposures.values,
new_exposures=new_exposure_candidates,
)
mpt_bcv.fit(nrand=500)
print(mpt_bcv.summary().to_markdown())
```

```
| Statistic type | statistic | null_statistic_mean | p_value |
|:----------------------------|------------:|----------------------:|-----------:|
| Adaptive z-stat | 20.3598 | 0.179544 | 0.00199601 |
| $R^2$ for candidate model 0 | 0.0142692 | -0.00261767 | 0.00199601 |
| $R^2$ for candidate model 1 | 0.0392196 | -0.0035327 | 0.00199601 |
| $R^2$ for candidate model 2 | 0.0480705 | -0.0042004 | 0.00199601 |
```

We see that the out-of-sample \(R^2\) for the third candidate model is the highest, and that all three new exposure vectors have positive \(R^2\), indicating some model improvement. The “adaptive z-stat” row aggregates evidence against the null across all three exposure vectors and finds statistically significant evidence against the null.

**Important warning**: In general, the mosaic p-value may be significant even if the \(R^2\) for each candidate exposure is negative. The reason for this is that sometimes, we can detect that the null fails to hold even without knowing how to improve the model—i.e., if the null \(\mcH_0\) really held, we would expect the \(R^2\)’s to be even *more* negative. As a result, a significant p-value is not definitive proof that one can improve the model, but only evidence that there are
unexplained correlations among the assets.

### 2.6 Aggregating results across multiple random seeds¶

Sometimes, the `mosaicperm`

results can depend on the randomly chosen `mp.tilings.Tiling`

used in the `MosaicFactorTest`

. To overcome this problem, one can combine results across different choices of `mp.tilings.Tiling`

, as shown below:

```
[26]:
```

```
from tqdm.auto import tqdm
# Fit a MosaicFactorTest, e.g., five times
mpts = []
for seed in tqdm(range(5)):
# Initialize model
mpt = mp.factor.MosaicFactorTest(
outcomes=logreturns.values,
exposures=exposures.values,
test_stat=mp.statistics.mean_maxcorr_stat,
seed=seed, # this seed determines the tiling
)
mpt.fit_tseries(nrand=99, verbose=False, n_timepoints=10, window=200)
mpt.fit(nrand=99, verbose=False)
mpts.append(mpt)
```

In this case, the results do not depend much on the random seed. Nonetheless, we show how to use the API to combine results:

```
[27]:
```

```
df = pd.DataFrame(
[[j, mpt.statistic, mpt.apprx_zstat, mpt.pval] for j, mpt in enumerate(mpts)],
columns=['Seed', 'statistic', 'apprx_zstat', 'p-value']
)
print("Initial results:")
print(df.to_markdown())
print("Aggregated results:")
print(mp.core.combine_mosaic_tests(mpts).to_markdown())
```

```
Initial results:
| | Seed | statistic | apprx_zstat | p-value |
|---:|-------:|------------:|--------------:|----------:|
| 0 | 0 | 0.368954 | 124.192 | 0.01 |
| 1 | 1 | 0.369826 | 122.001 | 0.01 |
| 2 | 2 | 0.367392 | 132.439 | 0.01 |
| 3 | 3 | 0.370253 | 119.883 | 0.01 |
| 4 | 4 | 0.368707 | 114.406 | 0.01 |
Aggregated results:
| | 0 |
|:--------------------|-----------:|
| statistic | 0.369026 |
| null_statistic_mean | 0.18989 |
| apprx_zstat | 122.584 |
| p-value | 0.02 |
```

Note that by default, when combining p-values, `mosaicperm`

takes twice the median p-value. There are other options available in the documentation for `combine_mosaic_tests`

, however.

We can also combine results while performing an analysis over time:

```
[28]:
```

```
df = mp.core.combine_mosaic_tests_tseries(mpts, plot=True)
print(df.to_markdown())
```

```
| | Window end | statistic | null_statistic_mean | apprx_zstat | p-value |
|---:|-------------:|------------:|----------------------:|--------------:|----------:|
| 0 | 200 | 0.361792 | 0.275737 | 39.7586 | 0.02 |
| 1 | 302 | 0.375259 | 0.286088 | 40.1583 | 0.02 |
| 2 | 404 | 0.379709 | 0.280928 | 43.3117 | 0.02 |
| 3 | 507 | 0.393766 | 0.288624 | 48.5696 | 0.02 |
| 4 | 609 | 0.494382 | 0.386796 | 44.1209 | 0.02 |
| 5 | 711 | 0.48499 | 0.358578 | 54.4051 | 0.02 |
| 6 | 813 | 0.459433 | 0.303904 | 68.2153 | 0.02 |
| 7 | 916 | 0.473822 | 0.304636 | 18.2595 | 0.02 |
| 8 | 1018 | 0.431956 | 0.268754 | 79.6857 | 0.02 |
| 9 | 1120 | 0.437755 | 0.280597 | 73.6856 | 0.02 |
```