Generative AI in financial forecasting: promise vs. reality

James R. Fischer

— Managing Director, Financial Analytics

March 30, 2024

10 min read

AIForecastingFinanceTechnologyAnalytics

Every major CFO we speak to in 2024 is being asked by their board: are we using AI in our forecasting? The honest answer, for most organizations, should be: not yet, and here is why that is probably the right call.

94%

of CFOs surveyed

report board pressure to adopt AI in finance function (Gartner, 2024)

23%

have deployed

any form of AI in their core forecasting process

report positive ROI

from AI-assisted forecasting after 12 months

Where LLMs Actually Outperform Classical Methods

The use cases where generative AI provides genuine, measurable lift in financial forecasting are narrower than the vendor pitch decks suggest — but they are real and they are significant for the organizations that find them.

Narrative interpretation — parsing earnings call transcripts, analyst reports, and news feeds to extract forward-looking signals that traditional time-series models miss

Scenario narration — translating quantitative scenario outputs into coherent, boardroom-ready prose that finance teams can actually use

Anomaly hypothesis generation — when a forecast diverges unexpectedly, LLMs can rapidly generate candidate explanations from external data sources

Cross-functional synthesis — aggregating inputs from disparate planning systems (HR, supply chain, sales) into a coherent narrative forecast

Where LLMs Fall Short — and Why It Matters

The performance gap between LLM-based forecasting and established statistical methods is most pronounced in structured, numerical, high-frequency time-series prediction. For monthly or quarterly financial forecasting — the bread-and-butter of FP&A — classical methods including ARIMA, gradient boosted trees, and Bayesian structural models consistently outperform GPT-class models in both accuracy and interpretability.

The Hallucination Problem

In internal testing across six FP&A deployments, we observed LLM-generated forecasts that appeared statistically plausible but were based on fabricated historical analogues. Unlike a classical model whose failure modes are transparent, an LLM's errors can be confidently stated and difficult to detect without domain expertise.

31%

average accuracy improvement

achieved by using LLMs for narrative inputs alongside classical numerical models — the hybrid approach outperforms either alone

A Practical Path for Finance Teams

The organizations that are extracting genuine value from AI in finance are not replacing their forecasting models. They are augmenting them — using LLMs at the data intake and output narration layers while preserving classical methods at the numerical prediction core. This hybrid architecture is less exciting as a board story, but it works.

The FJ AI Readiness Diagnostic

Before deploying any AI in your finance function, FischerJordan recommends a 30-day AI Readiness Diagnostic that audits data quality, model governance infrastructure, and organizational capability. Teams that skip this step spend significantly more on course correction than the diagnostic costs.

This analysis is based on FischerJordan's proprietary evaluation of six client FP&A AI deployments and a review of 23 published studies on LLM performance in time-series forecasting tasks.

Back to All Insights