Large Language Models has shown remarkable capabilities in many NLP tasks. Even though there are advancements in finance industry, the research is limited. There was a research paper published in Feb 2024 which has provided survey and enhancement of LLM in finance industry.
From General LLMs to Finance
LLM started with transformer architecture which was published in 2017 and then it led to 2 different types of models which are discriminative models like BERT for classifying text and Generative models like GPT which is designed to produce fluent text.

The GPT series showed the power of scaling from GPT-3 which was in-context learning and ChatGPT combined GPT-3 with code models, reinforcement learning from human feedback RLHF bringing conversational AI to the mainstream. GPT-4 pushed things with multimodal inputs like text, images, audio.
There are also open-source communities released like BERT, BLOOM and LlaMA which allowed everyone to build their own models. This gave finance domain way to adapt it.
- FinPLMs( Financial Pretrained Language Models) -> FinVERT-19, FinBERT-20, FinBERT-21, FLANG
- FinLLM (financial LLM) -> BloombergGPT, FinMA, InvestLM, FinGPT
The paper compares five main approaches
- Continual Pretraining — Start with general model BERT then keep training on finance corpora
- Domain-specific Training from Scratch — Train entirely on finance text is costly
- Mixed Domain Pretraining — using both general and finance data together like FinBERT-21
- Mixed-domain LLM + Prompt Engineering — Keep weights frozen but use the prompts like BloombergGPT
- Instruction Fine-tuning — Convert financial tasks into instruction format like question + answer and fine tune like FinMA, InvestLM, FinGPT
This evolution has a trend of moving from static data training to instruction based and interactive for finance
Did they perform well?
The survey tests finLLMs and general LLM on 6 benchmarks which are
- Sentiment Analysis — Classifying financial news as positive/negative /neutral where FLANG and GPT-4 did well
- Text Classification — Categorizing financial headlines or Fed statements, FinMA which has 30B hits and 98% of them F1
- Named Entity Recognition — Extracting Company names and tickers etc where GPT-4 leads and FinLLM lag
- Question Answering — From simple financial Q&A to hybrid numerical reasoning was tested and GPT-4 is near human level and Bloomberg GPT lags
- Stock Movement Prediction — Predicting stock price direction using text and prices where GPT-4 was better than FinLLMs but was below SOTA specialized models
- Text Summarization — Condensing the earnings call transcripts where task specific models was better than LLM but GPT-4 was better than FinLLMs and optimal
So overall the FinPLM are good on simple classification tasks but for complex reasoning, GPT-4 gave better results than FinLLMs
The research paper also says these benchmarks are basic and shallow where they also used 8 advanced tasks for future LLM to handle
- Relation Extraction — For example, the company A acquired by Company B
- Event Detection — Tracking corporate events like mergers
- Casuality Detection — Understanding cause-effect in financial text
- Numerical Reasoning — Math with textual numbers
- Structure Recognition — Understanding tables in reports
- Multimodal finance — combining text, audio from earnings calls, or even video
- Machine translation — finance aware translation across languages
- Market Forecasting — beyond stock moves like predicting volatility, risk and trends
These are needed for the real-world financial analysis and go beyond sentiment classification
The finance LLMs might have to address the above things while the scope of the future can be automation with streamlining report writing, compliance checks and financial analysis and making complex financial insights available for retail investors and providing power tools for traders, analysts and regulators to support their decisions and also providing natural language interfaces to query the finance databases and dashboards.
Challenges that LLMs should overcome
- Overcome the hallucinations as it can be very costly in finance industry
- The private data should be handled safely as it is sensitive information
- As the markets evolve and change the model should be able to adapt to the trends and changes
- There can be risk of bias with amplifying sentiment or misinformation
- Training and running FinLLM from scratch can be expensive and need lots of resources
- Current evaluation metrics like F1, accuracy don’t capture financial risk and need domain specific metrics like Sharpe ratio or expert reviews
We would need LLM that not only talks about finance but FinLLM should be reliably assist in financial decision making.
