Skip to main content

Factor Model Methodology

This page is for allocators, portfolio managers, risk analysts, and compliance teams evaluating SEC API’s factor model for portfolio construction, attribution, and risk management.

Overview

SEC API provides an allocator-grade factor research catalog spanning more than 230 tracked factor definitions across market, style, macro, sector, industry, country, and thematic categories. The launch API surface is narrower: U.S. market, style, sector, and industry factors target a 2010-01-01 history floor, while shorter-history, beta, blocked, or deferred factors are labeled in catalog and methodology metadata. Public 2010+ claims should be made only where row-level coverage and freshness proof support the factor. The model follows a hierarchical purification architecture consistent with institutional standards pioneered by MSCI Barra and adopted by Axioma (Qontigo), Bloomberg PORT, and Northfield. Launch factor return surfaces are refreshed on a market-calendar-aware schedule. Intraday snapshot endpoints expose available in-session factor state when the underlying market-data plane is current; inspect freshness metadata before treating a snapshot as live. For source posture, source-rights metadata, and response trust fields, see Factor Provenance. For market-calendar SLAs and freshness interpretation, see Factor Freshness.

Factor categories

CategoryCountDescription
market1Broad US equity risk premium (MKT_US)
style30Cross-sectional return drivers (momentum, value, quality, size, low-volatility, etc.)
macro17Macroeconomic regime factors (rates, credit, commodities, currencies)
sector26GICS sector and sub-sector return attribution
industry31Industry-level return attribution
country45Single-country and regional equity factors
thematic81Named investment themes (nuclear energy, genomics, REITs, etc.)

Factor construction pipeline

Step 1: Universe construction

U.S. market, style, sector, and industry stock-basket factors are computed over a point-in-time U.S. equity universe refreshed monthly:
  • Minimum $100M market capitalization
  • Minimum $5M average daily dollar volume
  • Minimum $1.00 share price
  • Excludes warrants, units, rights, ETFs, ETNs, ADR certificates
  • Exchanges: NYSE, NASDAQ, AMEX, ARCA, BATS
Universe membership is determined as of each rebalance date using data available at that time — no look-ahead bias.

Step 2: Constituent sourcing

Each factor type uses a different data source for membership and characteristic computation:
SourceFactor typesExample
Point-in-time fundamentals (SEC XBRL)Style factorsBook/Market ratio from latest 10-K filing
SIC classificationSector and industry factorsSIC 7372 → Technology / Software
ETF, proxy, and curated basket metadataThematic, macro, and country factorsURA proxy spread → Nuclear Energy basket
Curated constituent listsThematic baskets without ETF proxyHotel REITs: APLE, HLT, MAR, HST…
Single-country ETFCountry factorsEWJ → Japan equity premium

Step 3: Return computation

Style factors: Long-short decile portfolio sorts. Stocks are ranked by a characteristic (e.g., book-to-market for Value), divided into decile portfolios, and the factor return is the spread between the top decile (long) and bottom decile (short). Portfolios are cap-weighted within each leg and rebalanced monthly. Thematic baskets (ETF-backed, 66 factors): Factor return is the ETF return minus a relevant benchmark. The ETF provider (Global X, ARK, VanEck, iShares, etc.) handles constituent selection and rebalancing. Example: THEMATIC_NUCLEAR_ENERGY = R(URA) - R(SPY). Thematic baskets (curated, 15 factors): Factor return is the equal-weight constituent basket return minus a relevant benchmark. Constituents are sourced from public factor composition data and reviewed quarterly. Example: THEMATIC_CASINO_LEISURE = (1/9) · Σ R(casino stocks) - R(SPY). Macro and country factors: ETF proxy spread versus a reference instrument. Example: RATES = R(TLT) - R(SHV).

Step 4: Hierarchical purification (orthogonalization)

Each factor declares 1–3 parent factors and is purified against them using rolling 156-trading-day OLS regression to extract the residual return not explained by parent factors.
Market (MKT_US)
├── Style factors (SIZE, VALUE, MOMENTUM, ...)
│   └── purified against: MKT_US
├── Sector factors (SECTOR_ENERGY, SECTOR_TECH, ...)
│   └── purified against: MKT_US
├── Thematic baskets
│   ├── Energy themes → purified against: MKT_US, SECTOR_ENERGY
│   ├── Tech themes → purified against: MKT_US, SECTOR_TECH
│   ├── REIT themes → purified against: MKT_US, THEMATIC_REITS
│   └── Country themes → purified against: MKT_US, regional parent
└── Country factors
    └── purified against: MKT_US, regional parent
Why hierarchical purification? The choice to purify against a small number of theoretically motivated parent factors — rather than all factors simultaneously — is a deliberate methodological decision supported by decades of factor research:
  • Parsimony: Fama & French (2018, Journal of Financial Economics) argue that factor models should include only factors that earn their place. Redundant factors should be excluded, not controlled for via regression.
  • Overfitting resistance: Harvey, Liu & Zhu (2016, Review of Financial Studies) show that including many correlated regressors inflates false discovery. Kozak, Nagel & Santosh (2020, Journal of Financial Economics) demonstrate that shrinkage estimators dominate unrestricted high-dimensional OLS.
  • Institutional alignment: MSCI Barra’s US Equity Model (USE4) uses hierarchical nesting: market → country → industry → style. Axioma and Bloomberg follow similar architectures. No major institutional risk model provider uses unrestricted regression against 100+ factors.
  • Signal preservation: Purifying a housing theme against market + real estate sector preserves the thematic signal. Over-purification against all known factors strips out the very characteristics that define the theme.

Step 5: Dynamic volatility targeting

After purification, each factor’s residual return series is dynamically leveraged to target 10% annualized volatility:
  • Rolling realized volatility computed over the same 156-day window
  • Leverage = min(target_vol / realized_vol, 3.0)
  • Prevents any single factor from dominating portfolio-level attribution
  • Consistent with standard practice at AQR, MSCI, and major factor ETF providers

Step 6: Z-score computation

Scaled returns are converted to rolling z-scores for cross-factor comparability. Z-scores are the primary output for dashboards, screening, and signal generation.

Intraday factor snapshots

During US market hours (9:30-16:00 ET), SEC API exposes intraday factor snapshots when current proxy and constituent inputs are available. Each snapshot can include raw return, purified return, scaled return, z-score, and freshness metadata so consumers can distinguish current, stale, or degraded results.

Data provenance and traceability

Every factor return observation includes:
  • requestId and traceparent for end-to-end request tracing
  • modelName identifying the computation pipeline version
  • methodology.inputs listing the data sources consumed
  • sourceRights.notes documenting source-rights status when trust metadata is included
Compact responses may omit heavier trust metadata unless requested. Use response_mode=compact&include=trust on REST endpoints or include: ["trust"] in MCP tool calls when an agent needs freshness, materialization, methodology, source-rights, and degraded-state metadata alongside the answer.

Methodology comparison

FeatureSEC APIMSCI Barra (USE4)
PurificationHierarchical, 1–3 declared parentsHierarchical nesting
Estimation windowRolling 156-dayRolling 252-day with exponential decay
Factor hierarchyMarket → Sector → ThemeMarket → Country → Industry → Style
WeightingCap-weight (style), equal-weight (thematic)Cap-weight
Volatility targeting10% annual, dynamic leverageFactor-specific
RecomputationMarket-calendar-aware daily refresh plus intraday snapshots when current inputs are availableDaily
Factor count230+ tracked definitions, narrower launch API surfaceVaries by model

Thematic basket methodology

SEC API’s 81 thematic baskets cover named investment themes across energy, technology, healthcare, financials, real estate, consumer, transport, ESG, and regional equity. Two construction modes:
  1. ETF-backed (66 baskets): An institutional ETF provider handles constituent selection and rebalancing. SEC API computes the spread versus a relevant benchmark and applies hierarchical purification. Examples: genomics (ARKG vs XLV), nuclear energy (URA vs SPY), cybersecurity (CIBR vs XLK).
  2. Curated constituent (15 baskets): Equal-weight portfolio of named constituents versus a benchmark. Constituents reviewed quarterly. Examples: hotel REITs (18 stocks vs VNQ), trucking (16 stocks vs XLI), alternative asset managers (10 stocks vs XLF).
Both modes pass through the same hierarchical purification and volatility-targeting pipeline.

Academic references

  • Ang, A., & Kristensen, D. (2012). Testing conditional factor models. Journal of Financial Economics, 106(1), 132–156.
  • Asness, C. S., Moskowitz, T. J., & Pedersen, L. H. (2013). Value and momentum everywhere. Journal of Finance, 68(3), 929–985.
  • DeMiguel, V., Garlappi, L., & Uppal, R. (2009). Optimal versus naive diversification. Review of Financial Studies, 22(5), 1915–1953.
  • Fama, E. F., & French, K. R. (2018). Choosing factors. Journal of Financial Economics, 128(2), 234–252.
  • Harvey, C. R., Liu, Y., & Zhu, H. (2016). …and the cross-section of expected returns. Review of Financial Studies, 29(1), 5–68.
  • Kozak, S., Nagel, S., & Santosh, S. (2020). Shrinking the cross-section. Journal of Financial Economics, 135(2), 271–292.
  • Menchero, J., Orr, D. J., & Wang, J. (2011). The Barra US equity model (USE4). MSCI Barra Research.
  • Patton, A. J., & Verardo, M. (2012). Does beta move with news? Review of Financial Studies, 25(9), 2789–2839.