An analyst uses parking-lot satellite imagery to estimate a retailer's foot traffic ahead of earnings. This is best classified as alternative data generated by:

sensors and the Internet of Things.. Satellite and geolocation imagery are sensor/IoT-sourced alternative data. Individual-sourced data come from social media and searches; business-process data come from receipts and supply chains; financial filings are traditional, not alternative, data.

A clustering algorithm groups equities by return behavior without being given any predefined labels or target output. This is an example of:

unsupervised learning.. Clustering with no labeled target is unsupervised learning, where the model discovers structure on its own. Supervised learning requires labeled input-output pairs, such as known default outcomes used to train a classifier.

A machine-learning model produces excellent in-sample results but fails badly on new data. The most likely problem and its primary remedy are:

overfitting, remedied by out-of-sample testing and reduced complexity.. Strong in-sample but weak out-of-sample performance signals overfitting - the model learned noise. The remedy is validation on held-out data, cross-validation, and constraining model complexity so it generalizes.

Fintech, Data, and Portfolio Applications | Free Guide 2026

Technology as an investment-process tool

Fintech is technology-enabled finance. The CFA curriculum groups its investment applications into a defined set: analysis of big data, artificial intelligence (AI) and machine learning (ML), robo-advisory services, risk analysis, algorithmic trading, and distributed ledger technology (DLT). The common theme is using systems and data to make the process faster, broader, or more consistent - without replacing judgment about objectives and suitability.

Big data and the four Vs

Big data is characterized by four Vs:

Dimension	Meaning	Example
Volume	Quantity of data	Terabytes of tick and transaction data
Velocity	Speed of arrival/processing	Real-time market feeds, streaming text
Variety	Structured vs unstructured forms	Prices vs news text, images, audio
Veracity	Reliability/credibility	Noise, gaps, and source-quality issues

Alternative data sits outside traditional financial statements and market feeds and comes from three sources: individuals (social media, web searches, product reviews), business processes (credit-card receipts, supply-chain records, point-of-sale data), and sensors/IoT (satellite imagery of parking lots, shipping, geolocation). It can sharpen research only when it is lawful, relevant, reliable, and properly governed - material nonpublic information and privacy rules can make a dataset unusable regardless of predictive power.

Processing this data requires real engineering. Data wrangling (cleaning, formatting, handling missing values, and aligning timestamps) typically consumes most of an analyst's effort, and unstructured text must be converted to numbers through natural language processing before a model can use it. The curriculum is explicit that veracity problems - duplicate records, survivorship bias, look-ahead bias, and stale data - can quietly corrupt an otherwise sophisticated model.

A signal that worked in a backtest because future information leaked into the past will fail in live trading, so data lineage (knowing the source and every transformation) is a core control, not a nicety.

Machine learning, trading, DLT, and governance

Machine learning

Machine learning (ML) finds patterns in large datasets. The curriculum distinguishes:

Supervised learning - the model learns from labeled inputs and outputs (e.g., regression, classification to predict default).
Unsupervised learning - no labels; the model finds structure itself (e.g., clustering similar securities, dimension reduction).
Deep learning / neural networks - layered networks for complex tasks like image and natural-language processing.

The headline risk is overfitting: a model memorizes noise in the training (in-sample) data and performs poorly out of sample. Controls include out-of-sample testing, cross-validation, comparison against a simple baseline, and limiting model complexity. Interpretability is often weak (a "black box"), making recommendations hard to explain to clients or committees; bias can enter through training data, labels, or design.

Robo-advisers and algorithmic trading

Robo-advisers deliver scalable, low-cost model portfolios with automated onboarding, allocation, rebalancing, and tax-loss harvesting; they suit straightforward situations but can struggle with complex or unusual circumstances. The curriculum distinguishes fully automated robo-advisers, which require no human adviser, from adviser-assisted (hybrid) models that pair software with a human for more complex needs - the hybrid form is better suited to clients with concentrated stock, estate complications, or strong behavioral biases that a questionnaire cannot capture.

Algorithmic trading uses computers to execute on preset rules, often slicing a large parent order into child orders to minimize market impact; high-frequency trading (HFT) is a subset operating on millisecond horizons that profits from fleeting price discrepancies. These improve execution and liquidity but raise concerns about systemic stability and flash crashes, which is why pre-trade risk limits and kill switches are standard controls.

Distributed ledger technology

Distributed ledger technology (DLT) maintains a shared, cryptographically secured record across a network. Blockchain is the best-known form; applications include cryptoassets, tokenization of real assets, and post-trade settlement. Benefits include transparency and reduced reconciliation; drawbacks include energy use and immature regulation.

Application	Portfolio use	Key control question
Big-data/alt-data analysis	Research signal generation	Is the data lawful, clean, predictive?
Robo-advice	Scalable model portfolios	Does the model fit the client's facts?
Machine learning	Forecasting and classification	Is the model overfit or opaque?
Algorithmic/HFT	Execution efficiency	Are risk and kill-switch controls in place?
DLT/blockchain	Settlement, tokenization	Are custody and regulation addressed?

Automation does not remove fiduciary and suitability duties. A model efficient for an average client may be wrong for one with concentrated stock, near-term cash needs, tax constraints, or ESG screens. When output conflicts with client facts, the professional investigates rather than treating the screen as authority. Optimization is especially fragile: it is highly sensitive to its expected-return inputs, so a small change in an assumed return can swing recommended weights dramatically and push the portfolio toward extreme, poorly diversified positions.

Estimation error, not the algorithm, is usually the binding limit, which is why practitioners constrain weights, smooth inputs, and compare any model output against a simple baseline.

For Level I, the safest reasoning is balanced rather than utopian or dismissive: technology improves scale, speed, and breadth of analysis, but it must stay subordinate to clear objectives, clean and lawful data, validated models, and professional judgment. Expect questions that ask for the best description of a fintech use or its risk - the answer that names the genuine control (validation, data governance, suitability) usually beats the answer that promises certainty or that rejects technology outright.

CFA Level I Study Guide

CFA Level 1

11.5 Fintech, Data, and Portfolio Applications

Key Takeaways

Technology as an investment-process tool

Big data and the four Vs

Machine learning, trading, DLT, and governance

Machine learning

Robo-advisers and algorithmic trading

Distributed ledger technology

CFA Level I Study Guide

1Chapter 1: Orientation, Official Sources, and Exam Strategy

2Chapter 2: Ethical and Professional Standards

3Chapter 3: Quantitative Methods

4Chapter 4: Economics

5Chapter 5: Financial Statement Analysis

6Chapter 6: Corporate Issuers

7Chapter 7: Equity Investments

8Chapter 8: Fixed Income

9Chapter 9: Derivatives

10Chapter 10: Alternative Investments

11Chapter 11: Portfolio Management

12Chapter 12: Integrated CFA Level I Review

13Chapter 13: Final Countdown, Results, and Next Steps

CFA Level 1

11.5 Fintech, Data, and Portfolio Applications

Key Takeaways

Technology as an investment-process tool

Big data and the four Vs

Machine learning, trading, DLT, and governance

Machine learning

Robo-advisers and algorithmic trading

Distributed ledger technology