LLM-based Credit Scoring using transaction data

scoring systems
OpenAI
Chatbots
Data Science

Intro

The South African financial market faces big challenges due to limited client data. Building strong financial scoring models usually needs labelled data. In South Africa, this data is hard to get. Major credit providers, like Capitec Bank, keep most customer data private. This makes it tough to compete. But not for our client!


Mission Mobile (business.missionmobile.co.za) is an AI-driven fintech in South Africa. They focus on smartphone financing and credit underwriting. They asked us to see if Large Language Models (LLMs) could help create a modern credit scoring system without needing labelled data. We provided two modular workstreams: a new OCR-to-JSON bank statement extraction pipeline and a five-stage Financial Health Score (FHS) pipeline. This combines traditional scoring with insights from LLMs.


Challenge

MissionMobile begins its journey with the Beam product, which aims to build a system that, instead of using a traditional credit scoring model, parses three months of transaction data using classical OCR techniques and then develops a credit scoring system based on financial rules. 

Postdata were assisting Beam with the financial regulations and ideas on how to improve them for building the financial scorecards.

Although this logic was quite convincing at times, it struggled with variations in layout. A common failure point for OCR systems was recognising statements with borderless tables, especially when PDFs are altered. The credit scoring section delivered superficial scores that did not explain the “why”. The logic counted transactions but failed to interpret behaviour.


Customer Requirements

The client requested an update to the system architecture employing modern AI techniques, capable of handling multiple bank formats, with a nuanced and explainable FHS powered by LLM reasoning.



Summary

Mission Mobile, a South African fintech, partnered with PostData.ai to modernize credit scoring in a data-limited market. The solution combined a robust OCR-to-JSON pipeline with a five-stage Financial Health Score (FHS) system that balances deterministic math with LLM-powered enrichment and interpretation. This approach solved challenges with variable statement formats, enabled evidence-backed insights, and produced more nuanced, fair, and explainable credit assessments than traditional models. The modular design ensures reliability today and flexibility for future model upgrades.

Project overview

We delivered a two-part system: a bank statement extraction pipeline for reliable parsing of complex formats, and a five-stage FHS pipeline that turns transactions into structured features, enriched insights, and explainable credit indicators.

Workstream 1 — Bank Statement Extraction


The development of the extraction pipeline was an iterative process. Initial research into PDF-to-Markdown tools, such as Marker, showed promise for standard documents but consistently failed on the key challenge case: statements with borderless tables. After further testing, MistralOCR was identified as a viable API-based solution that could handle the format. During this phase, performance was optimised: initial tests processing a complete statement's text with a single API call were slow, but switching to an asynchronous, per-page approach with MistralOCR significantly reduced the processing time.


However, to achieve even lower error rates and gain maximum control over the pipeline, the final architecture was built around a DotsOCR model hosted on a cloud platform. This final solution utilises a bank classifier first to identify the origin of each statement and then routes it to a bank-specific formatter. Using BeautifulSoup for HTML parsing and Pandas for data manipulation, these formatters parse the layout-aware tables generated by DotsOCR, normalise data types, and map the result to the required downstream schema. The system supports both synchronous and asynchronous ingestion paths.

Workstream 2 — FHS Pipeline (5 Stages)


  •     Processing & Features: This initial stage applies regex-based categorisation and engineers a suite of transactional features, such as the day of the week, the ratio of transaction amount to account balance, and a flag to identify internal account transfers. It then computes monthly aggregates.

  •     LLM Enrichment: Performance was a key focus. By refactoring the process to use batched, asynchronous calls to gpt-4.1-mini, the pipeline can efficiently handle the high volume of API requests for transaction-level enrichment. This stage flags each transaction with behavioural markers and consolidates savings signals.

  •     Insights Generation: Early experiments showed that gemini-2.5-pro could generate insights, but they were often unverifiable. To solve this, the team iterated through several prompt versions and tested various temperature settings. The final version uses "grounded prompting," which requires the model to cite specific transaction evidence and contextualise its findings against base-rate statistics, significantly reducing hallucination.

  •     Indicator Calculation: This deterministic stage computes twelve core indicators. These include five Prosperity indicators that measure factors like spending at premium merchants and lifestyle spending frequency, and seven Stability indicators that assess financial resilience. The logic evolved from simple transaction counts to more nuanced formulas, such as combining spend-share and frequency for prosperity metrics.

  •     Score Adjustment: In the final step, gpt-5-mini performs a holistic review of all transaction and bureau data, emitting adjusted indicators with calculated deltas and plain-language justifications for each change.



The entire pipeline is delivered as a Python package featuring typed configurations, data schemas for integrity, structured logging, and a robust suite of unit and integration tests.



Results

The extraction pipeline now reliably parses complex bank statements with DotsOCR and bank-specific formatters, hosted on a specialised provider for stable performance and cost control. The FHS pipeline produces quantitative indicators with concise, evidence-backed summaries that analysts can easily audit.

Unlike traditional models, the LLM-based system captures behavioural insights—such as post-salary spending patterns or savings discipline—that were previously inaccessible, making the solution more competitive in the market.


How It’s Used


Analysts start from structured transactions and indicators, then review the LLM’s justifications and persona-level narratives for deeper context. Clear data contracts ensure that models can be upgraded without disrupting the system.


Client Benefits


The approach improves fairness by using non-linear scoring to reduce outlier effects and enables population-specific tuning. It also strengthens detection of contractual savings by combining transaction data, LLM insights, and bureau signals, delivering a more precise financial picture.

Insights & Conclusions

Although some models may be regarded as state-of-the-art (SOTA) in the field, they might perform well on general tasks but not as effectively on your specific one. Always select models based on relevant leaderboards for your task, even if the model is not currently trending.

A specialised technology stack works best: use fast LLMs for structured enrichment and stronger reasoners for holistic review. Keep the mathematical calculations in code and reserve LLMs for explanation and interpretation. Grounding insights with specific evidence and base-rate statistics is critical to reducing hallucination and improving trust in the AI's output. Above all, designing for change is paramount—this package-based, modular setup allows the team to swap models and add new document types with minimal disruption.



Next steps

Operational maturity is the next focus. This includes introducing OpenRouter for LLM failover and attaching confidence intervals to the FHS indicators. The team will also continue benchmarking new model variants from Anthropic, Mistral, Gemini, and OpenAI for each pipeline stage to optimise for quality and cost. Finally, adding a RAG (Retrieval-Augmented Generation) component to inject real-time merchant metadata and local economic signals will further refine risk assessment.

Project duration:

6 weeks

Team

3

2 Data Scientist, 1 Principal Data Scientist

Technologies

FastAP, DotsOCR, LLMs, OpenAI,

Tech challenge

  • Layout variability: Parsing inconsistent bank statements, especially borderless or altered PDFs.

  • Scalability & cost: Ensuring fast, affordable OCR/LLM processing with async, batched calls.

  • Explainability: Reducing LLM hallucinations with grounded prompting and evidence-cited insights.

  • Label scarcity: Building reliable indicators without labelled data, while keeping scoring fair and robust.

Solution

We delivered the solution as two distinct components. The bank statement extraction pipeline was deployed as a high-performance service on a cloud platform. In contrast, the FHS scoring pipeline was delivered as an installable Python package, complete with typed configurations, explicit schemas, and asynchronous processing. The core design principle was to keep deterministic math in code while leveraging LLMs for enrichment and interpretation. Clear data contracts between stages and a classifier-formatter registry make onboarding new statement types straightforward.

Let's talk about your case

Email: andrii.rohovyi@postdata.ai

Let's talk about your case

Email: andrii.rohovyi@postdata.ai

Let's talk about your case

Email: andrii.rohovyi@postdata.ai