For decades, credit bureaus have been the default. A lender pulls a bureau score, checks the number, and makes a call. The system works well enough when consumers have deep credit histories and stable borrowing patterns. But a bureau score is a rear-view mirror: it tells you what happened months ago, filtered through the reporting lag of creditors who may or may not have submitted accurate data. Bank statement scoring reads the windscreen. It tells you what is happening right now, in the consumer's actual bank account, with their actual money.
SA credit providers are adopting it alongside traditional bureau enquiries, and for short-term lending, the bank statement signal increasingly carries more weight than the bureau alone.
A working definition
Bank statement scoring is the process of extracting, categorising, and scoring transaction data from a consumer's bank statements to produce a credit risk signal. It is not just OCR. It is not just data capture. It is behavioural analysis of cash flow: a structured read of how money moves through a person's account over time.
The input is a PDF bank statement (or a batch of them). The output is a structured decision pack: a behavioural credit score, an affordability calculation, a collection date forecast, and reason codes that explain the recommendation. Three months of statements become a three-dimensional picture of the consumer's financial life: income patterns, spending behaviour, existing obligations, and the timing of it all.
Where a bureau score compresses years of credit history into a single number, bank statement scoring compresses recent transactional behaviour into a decision-ready signal. The two are complementary. But for lenders dealing with thin-file applicants, irregular income, or Regulation 23A compliance, the bank statement signal often carries more weight.
The six-stage pipeline
A bank statement PDF is not data. It is an image of data, formatted for human eyes, not machine consumption. Turning it into a reliable credit signal requires six distinct stages, each adding a layer of intelligence on top of the last.
1. Intake
The statement arrives as a PDF via API or web portal upload. AffyScore accepts up to 30 statements in a single batch, typically three months each for multiple applicants, or six months for a single applicant requiring deeper history. The system identifies the bank, statement period, and account holder from the document metadata and header content before any extraction begins.
2. Extract
Transaction data is pulled from the PDF using regex-first parsing tuned to each bank's specific statement format. FNB statements look nothing like Capitec statements. Standard Bank formats differ from Discovery. Each bank has its own date formats, column layouts, reference structures, and edge cases. Regex-first extraction is faster, cheaper, and more deterministic than pure AI parsing; when a statement is a scan or a photograph rather than a digital PDF, an AI vision fallback handles the degraded input.
The output of extraction is a structured list of transactions: date, description, amount, running balance, and transaction type. Every row is mapped; nothing is discarded.
3. Tamper check
This is where most manual processes fall short. A compliance officer reviewing a bank statement by eye can spot obvious edits (misaligned text, inconsistent fonts) but they cannot check what they cannot see.
AffyScore runs four categories of tamper detection:
- Metadata analysis: PDF creation and modification dates, authoring software, embedded fonts. A genuine FNB statement is generated by a specific system. A statement created in Adobe Acrobat Pro two hours before submission raises a flag.
- Font and layout anomalies: Character spacing, font substitution, alignment inconsistencies. Even skilled edits leave traces in the rendering layer that the human eye misses but pattern matching catches.
- Mathematical verification: Every opening balance, transaction, and closing balance is recalculated. If the running totals do not reconcile, the statement has been altered, regardless of how clean it looks visually.
- Sequence anomalies: Transaction dates must be sequential. Reference numbers are checked against known patterns for each bank's format. Gaps, unexpected formats, or duplicates in either signal that rows may have been added, removed, or reordered.
A tampered statement does not automatically confirm intent to defraud; the lender needs the signal regardless of intent. The tamper check surfaces anomalies with specific reason codes rather than a binary pass/fail.
4. Categorise
Raw transaction descriptions are cryptic. "CAPITEC CRD*PNP MENL 0814" is a card purchase at Pick n Pay Menlyn. "ABSA INTERAC DEBS" is a debit order from an unspecified creditor. Categorisation maps every transaction to a taxonomy: income (salary, grant, commission, rental), fixed obligations (bond, vehicle, insurance, loan repayments), variable spending (groceries, fuel, entertainment), and financial events (dishonoured debit orders, returned payments, inter-account transfers).
Accurate categorisation is what separates data capture from credit intelligence. A lender does not need to know the consumer spent R247.50 at Woolworths on 14 March. They need to know that grocery spending averages R4,200 per month and has been trending upward for three consecutive months. Categorisation turns a 90-day transaction list into an answer.
5. Affordability
With income identified, obligations mapped, and spending categorised, the system calculates a Regulation 23A-compliant affordability assessment. This is the calculation the NCR requires: gross income, less statutory deductions (tax, UIF), less existing debt obligations, less necessary living expenses, equals disposable income available for new credit.
The difference between a bank-statement-derived affordability calculation and a declared-income calculation is evidence. When the consumer declares income of R28,000 but the bank statement shows net deposits of R19,400, the lender has an immediate discrepancy to resolve: before the credit is granted, not after the first missed payment. When the consumer declares no existing loan obligations but the statement shows three active debit orders totalling R6,800 per month, the lender is looking at a completely different affordability position.
The affordability output includes gross income (validated), net income, total identified obligations, estimated necessary expenses, and a disposable income figure with a confidence rating based on data completeness.
6. Score and recommendation
The final stage synthesises everything upstream into a behavioural score on a 300-850 scale, paired with a recommendation (Clear, Caution, or Review) and reason codes that explain the decision drivers.
The score is not a bureau score replacement. It is a different signal entirely, one built on cash flow behaviour rather than credit account history. A consumer with a thin bureau file but three months of stable salary deposits, no dishonoured debit orders, and a healthy balance trajectory will score well. A consumer with an 800 bureau score but a bank statement showing chronic overdraft reliance, gambling transactions reducing disposable income, and four reversed debit orders in the past quarter will score poorly. Both signals have value. Together, they catch what neither catches alone.
What the output looks like
AffyScore delivers a decision pack, not a raw data dump. The pack contains:
- Behavioural score (300-850): With reason codes explaining the primary score drivers: income stability, obligation coverage ratio, balance trajectory, dishonour frequency, spending volatility.
- Reg 23A affordability calculation: Structured income verification, obligation mapping, necessary expenses, and disposable income, formatted for the lender's record-keeping obligation under the NCA.
- Collection date forecast: The optimal debit order date based on salary timing, a collectability percentage (likelihood of funds being available on that date), and a predictability percentage (how consistent the pattern is across the statement period).
- Tamper assessment: Clean, flagged, or failed, with specific anomaly codes if applicable.
The decision pack is available in three formats: JSON for API integration into lending platforms, PDF for human review and compliance filing, and XLSX for analysts who want to work with the underlying data. All three contain the same information, structured for different consumers within the credit provider's organisation.
Why now
Lenders have been reading bank statements since before credit bureaus existed. Three forces converged to make automated, structured bank statement scoring a practical necessity.
Regulatory pressure
The NCR's proposed amendments to Regulation 23A signal increased scrutiny of affordability documentation. The draft amendments signal which way enforcement is heading: more granular affordability evidence, shorter look-back windows for income verification, and explicit requirements to validate declared information against bank statement data. Credit providers who are still relying on payslips and self-declared expense lists are building on a foundation the regulator is actively eroding.
Bureau lag
Bureau scores reflect credit account data reported by creditors on monthly submission cycles, meaning a new account or delinquency may not appear until the next reporting cycle. A consumer who lost their job last month, who took on three new store accounts in the past six weeks, or who started missing debit orders this week: none of that shows up in the bureau score today. The bank statement shows all of it. For short-term lending decisions (micro-loans, BNPL, device finance), the recency of the bank statement signal matters more than the depth of the bureau history.
Credit-invisible consumers
Millions of South Africans are economically active but credit-invisible: they have bank accounts but no formal credit history. They earn income, pay rent, buy groceries, and manage their money. They have not yet borrowed from a registered credit provider. A bureau score for these consumers is either non-existent or so thin it adds little value. A bank statement score gives them a signal from day one, based on the financial behaviour they are already demonstrating.
Who uses it
The immediate market is NCR-registered credit providers who are already required to perform affordability assessments. That includes roughly 4,500 micro-lenders and approximately 2,200 debt counsellors (NCR Annual Report) who need to verify client financial positions during debt review applications.
Beyond the regulated core, bank statement scoring is being adopted by:
- Telcos: The handset finance component of a mobile phone contract is a credit agreement under the NCA, requiring an affordability assessment for the device value.
- Vehicle finance providers: Supplementing bureau data with cash flow evidence for applicants with thin or mixed credit histories.
- BNPL operators: Buy-now-pay-later products where deferred payment exceeds the NCA's incidental credit threshold fall under the Act and require an affordability assessment. Real-time checks at point of sale require speed that manual statement review cannot deliver.
- Insurance premium finance: Premium funders operating as registered credit providers under the NCA are required to perform affordability assessments. Bank statement analysis verifies that the monthly instalment is serviceable from actual cash flow.
- Debt collectors: Before initiating collection action, understanding the consumer's actual cash flow position (including the best day to attempt a debit order) directly improves first-attempt recovery rates.
Banks supported
AffyScore parses statements from all six major South African banks: FNB, Standard Bank, ABSA, Nedbank, Capitec, and Discovery Bank. Each bank's statement format is handled by a dedicated regex extraction engine tuned to that bank's specific layout, date formatting, and reference number conventions.
Digital PDF statements (downloaded from internet banking) produce the highest extraction accuracy and the most reliable tamper detection. Scanned or photographed statements are processed using an AI vision fallback; the extraction still works, but tamper detection is limited because the original PDF metadata is lost in the scanning process. Where possible, lenders should request digital originals rather than scans.
How it integrates
AffyScore offers two integration paths. The REST API accepts a PDF upload and returns the decision pack via an asynchronous webhook; fire the request, get a callback when processing completes. Typical turnaround is under three seconds in testing for a single statement, longer for batch submissions of up to 30 documents. The API is designed for integration into existing lending platforms, loan origination systems, and automated decisioning workflows.
For credit providers who do not have a development team or who process lower volumes, the web portal provides the same functionality through a browser interface. Upload statements, receive decision packs, download reports. No integration required.
Both paths deliver the same output in JSON, PDF, and XLSX formats. The API adds webhook notifications and batch processing; the portal adds a visual dashboard for manual review.
What it costs
AffyScore pricing runs from R8 to R35 per extraction, depending on monthly volume. At the top tier, the cost of a bank statement extraction, affordability calculation, and behavioural score combined is comparable to a single bureau enquiry, except the output answers three questions instead of one.
There are no setup fees, no monthly minimums, and no long-term contracts. Credit providers pay per extraction and scale as their volume grows. The pricing model is designed to make bank statement scoring economically viable even for micro-lenders processing 50 applications per month.
For a detailed breakdown of volume tiers and per-unit pricing, see the pricing section on our homepage.