Based on these findings, AI's immediate future in finance appears to be collaborative rather than replacive. While these systems demonstrate impressive capabilities in summarizing information and handling routine analytical tasks, their error rates - particularly in complex, client-facing situations - indicate that human oversight remains essential in an industry where mistakes can have serious financial and legal consequences.
The researchers analyzed over 10,000 responses from four different AI models (Bard, LLaMA, ChatGPT 3.5, and ChatGPT 4) to 1,083 financial licensing exam questions. Each question was tested across multiple models and configurations, creating a comprehensive dataset. The team evaluated two key aspects: whether the AI picked the correct answer and how well it explained its reasoning compared to expert explanations. They used sophisticated natural language processing techniques (specifically the BERT model) to measure how closely AI explanations matched expert-written ones.
Additionally, they mapped the questions to 51 real-world finance job tasks using data from the U.S. Department of Labor's Occupational Information Network (O*NET) to understand practical applications. The study also explored different ways of using AI systems, including web interfaces, API access with various settings, and specially trained (fine-tuned) models.
ChatGPT 4 emerged as the top performer, correctly answering 84.5% of questions - a significant 18-28 percentage points better than free models. When researchers fine-tuned ChatGPT 3.5 by training it on specific financial content, it nearly matched ChatGPT 4's accuracy and even surpassed it in explanation quality. The AIs performed best on questions about trading and market operations (73.4% accuracy) but struggled with client-specific tasks like financial planning and tax analysis (dropping to 56.6% accuracy). Interestingly, both AI and human test-takers tended to struggle with the same challenging questions, suggesting fundamental limitations in handling complex financial concepts.
The study primarily used entry-level licensing exam questions, which may not fully capture the complexity of real-world financial work. Some test questions were available online, potentially inflating AI performance by up to 13% for these questions. The research was conducted in late 2023 and early 2024, and given the rapid pace of AI development, results might change with newer versions. Additionally, exam questions don't test important aspects of finance jobs, such as writing, communication, and creative thinking skills.
The research suggests AI is currently better suited as an assistant than a replacement for financial professionals. While it shows promise in tasks like market monitoring and basic analysis, it remains less reliable for complex, client-specific work. The study reveals important tradeoffs between different AI models and implementation methods. Fine-tuning can significantly improve performance, but even the most advanced models still make errors that could be costly in real-world applications. The findings also suggest potential changes in entry-level finance jobs, particularly for junior analysts performing routine tasks.
The research was supported by data from Achievable and Knopman Marks, two financial exam preparation companies. Special acknowledgments were given to Justin Pincar at Achievable and Brian Marks at Knopman Marks. The study also benefited from input from seminar participants at Washington State University and Clemson University. The authors reported no conflicts of interest, and the study received peer review before publication in the Financial Analysts Journal.
This study was published in the Financial Analysts Journal on November 18, 2024. The article titled "How Much Does ChatGPT Know about Finance?" can be accessed using the Digital Object Identifier (DOI): 10.1080/0015198X.2024.2411941. The research was authored by Douglas (DJ) Fairhurst, an associate professor of finance at Carson College of Business, Washington State University, and Daniel Greene, the Bill Short Associate Professor of Finance at Wilbur O. and Ann Powers College of Business, Clemson University. The article earned 2.0 PL Credits and underwent peer review before publication. Correspondence regarding the study can be directed to Douglas (DJ) Fairhurst at [email protected].