Technical Blog · Apr 28, 2026 · 9 min read

Fuzzy matching in accounts payable: finding the PO without an exact number

ininvoice: Fuzzy matching cross-checks invoice, PO and delivery note when data does not match exactly: PO number mistyped, variant supplier name, different date or slightly skewed amount. It combines classic algorithms like Levenshtein distance, n-grams and weighted fields over supplier, date, amount and line description. If the confidence exceeds a threshold (typically 90%) it auto-confirms; between 70-90% it asks a human; below, it discards. That way you turn into touchless invoices that exact matching rejects.


Exact matching is easy to explain and easy to fail. If the invoice carries a well-written PO number, the supplier exactly as it appears in your master and the date you expected, everything matches. The moment one data point drifts, the system rejects the invoice and a human keys it in.

Fuzzy matching is what separates an AP that auto-captures 30% of invoices from one that gets to 80%. This article explains how it works, which classic algorithms sit behind it and where to put the threshold between auto-confirm and human review.

What fuzzy matching is

Fuzzy matching is the technique that decides whether two records refer to the same object when fields do not match character by character. It originated in librarianship and record linkage in the 1960s and is now standard in any system that cross-checks dirty data.

In accounts payable we apply it to three questions:

  • Does this invoice correspond to this PO even if the PO number is missing or wrong?
  • Does this delivery note correspond to this PO line even if the description varies?
  • Is this supplier from the email the same “García Distribution SL” in the master even if it comes as “DISTRIB GARCIA, S.L.”?

The system does not decide alone. It computes a similarity score between 0 and 1 (or 0% and 100%) and compares against thresholds. Above the high threshold, confirm. Below the low one, discard. In between, escalate to a human.

Why it matters in AP

Exact matching assumes your data is perfect. The reality is you receive scanned PDF invoices, suppliers that write your name three different ways, sales reps that copy the PO number with one digit changed, and invoice dates that are not the real service date.

Ardent Partners and other mid-market surveys put more than 30% of invoices as having some minor discrepancy with the PO or delivery note. If you reject them all, your team never leaves manual mode. If you pay blind, you leave the door open to errors and fraud. Fuzzy matching is the middle path.

Typical cases where fuzzy saves the day

The five scenarios that dominate real exceptions:

  • Invoice with no PO number. Supplier does not include it or the buyer never asked. The link has to be rebuilt via supplier + date + amount + line description.
  • PO number mistyped. “PO-2026-001847” on the invoice, “PO-2026-001874” on the PO. A transposition. Small edit distance, high similarity.
  • Variant supplier name. “García Distribution SL” vs “DISTRIB GARCIA SL” vs “García Distribuciones”. A human sees it’s the same in a second; exact matching does not.
  • Different date. PO issued on March 5, invoice dated March 28 corresponding to the service. Within a reasonable window, the match should hold.
  • Slightly different amount. PO EUR 1,200, invoice EUR 1,215. The difference is within tolerance and the combination supplier + date + description confirms the link.

Algorithms behind fuzzy matching

There is no single algorithm. The usual practice is to combine several classics by field type:

  • Levenshtein distance. Counts the minimum inserts, deletes and substitutions of characters to turn one string into another. Good for PO numbers, article codes and short typos. A distance of 1 in a 12-character PO is very high similarity.
  • N-grams (bigrams, trigrams). Slices strings into 2- or 3-character sequences and computes the overlap. Very useful for supplier names with reordered or abbreviated words.
  • Jaro-Winkler. Variant focused on short proper names that penalises less when the start matches. Good for legal names.
  • Cosine similarity on embeddings or TF-IDF. For line descriptions (“HP CF410A Black Toner” vs “HP 410A Toner”). Turns text into numeric vectors and measures the angle between them.
  • Fuzzy hashing. Generates hashes resistant to small changes, useful to detect near-identical documents in dedup.
  • Weighted fields. Not an algorithm on its own: each field contributes its partial similarity multiplied by a weight. Supplier weighs more than date, date more than an optional field.

The thefuzz library (formerly fuzzywuzzy) in Python, or Elasticsearch fuzzy queries, are standard references for implementing these algorithms without reinventing them.

Key fields on an invoice: supplier, date, amount, description

FieldTypical algorithmSuggested weight
Supplier (legal name, tax ID / VAT)N-grams + normalisation40%
Date (with window)Distance in days ± tolerance15%
Total amountAbsolute difference ± tolerance20%
Line descriptionCosine / n-grams25%

The supplier’s tax ID or VAT is gold: if it matches exactly, it removes doubt about the legal name. That is why many systems give max weight to the tax ID when present and fall back to fuzzy on the name only if missing.

Confidence threshold: when to auto-confirm vs ask a human

  • Score > 90%. Auto-match. The invoice enters the touchless flow and goes to line-by-line three-way matching against that PO and delivery note.
  • Score 70-90%. Match with human review. The system proposes the most likely PO and a human confirms with one click. Usually 20-30% of invoices at companies with dirty data.
  • Score < 70%. No match. The invoice goes to the exceptions inbox for investigation: possibly no PO, miscoded or a new supplier.

False positives and false negatives

  • False positive. The system declares a match when there is none. The invoice is associated to the wrong PO and, if it passes all filters, may approve an undue payment. High cost.
  • False negative. The system says “no match” when there is one. The invoice goes to an exception and a human reconciles it. Low cost.

In AP you always lean conservative: more false negatives, more human reviews, fewer erroneous payments. So auto-match thresholds are set high (90%+) and the doubt zone is managed by humans. Subsequent line-by-line three-way matching acts as a safety net.

How many of your invoices would go touchless with fuzzy matching today?

ininvoice cross-checks supplier, date, amount and description with calibratable thresholds and isolates the grey zone for human review. Book a spot and measure the lift on your real volume.

How ininvoice applies fuzzy matching

  • Automatic intake from Gmail or Outlook with no manual forwarding.
  • Structured reading of PDF, XML and FacturaE: prioritises signed data over pixel recognition.
  • Normalisation of supplier names (uppercase, legal suffixes, accents) before applying n-grams.
  • Invoice ↔ PO fuzzy match with weighted fields and configurable thresholds.
  • Line-by-line three-way matching once the link is confirmed, with 2% / EUR 1.50 OR-mode tolerance.
  • Exceptions inbox sorted by probability: humans resolve doubt cases from highest to lowest confidence.

Full flow at three-way matching, touchless accounts payable and invoice and delivery note reconciliation.

Checklist to improve your matching rate

  1. Clean the supplier master. One legal name per tax ID, normalised legal suffixes, grouped aliases.
  2. Require tax ID/VAT on invoices. The tax ID kills half of the supplier fuzzy work.
  3. Ask suppliers for the PO number. Even with fuzzy, a well-typed PO is an exact match and saves human review.
  4. Calibrate thresholds with your history. Look at a month of invoices, label real match / no match and tune until the grey zone is minimal.
  5. Measure your auto-match rate. It is the true touchless KPI. Below 60% there is room.
  6. Review false positives every month. If they show up, raise thresholds or readjust weights.

FAQ

Is fuzzy matching the same as AI?
No. Classic algorithms (Levenshtein, n-grams, Jaro-Winkler) have worked for decades without neural networks. Today they are complemented with semantic embeddings for line descriptions, but the core of fuzzy matching in AP is still string arithmetic.
What happens if the supplier changes their legal name?
The master should record the change with effect from a date. Both versions count as aliases of the same tax ID.
Does fuzzy work if I have no formal PO?
Without a PO there is nothing to cross-check for three-way matching. Fuzzy can still link invoices to budgets, contracts or history of payments to the same supplier, but the control is weaker.
What date tolerance is reasonable?
Depends on purchase type. Physical goods: 30-60 day window between PO and invoice. Recurring services: invoiced period plus a margin. Long professional services: up to 90 days or more.
And if two open POs from the same supplier compete for the same invoice?
The system escalates to a human even if the best candidate exceeds the threshold. Tie or near-tie = mandatory human review.
Does fuzzy work on FacturaE or structured XML invoices?
Yes, and better: fields arrive already parsed. Normalisation costs less and thresholds can be stricter.
What is a decent auto-match rate for a mid-market company?
Below 50%, you have dirty data or miscalibrated thresholds. Between 60-75% is usual at start. Above 80% is realistic after 2-3 months of calibration and master cleansing.

Connect your email and measure your real auto-match rate.

30 days of your own invoices, fuzzy matching applied, line-by-line reconciliation. Get started.

Three things to remember

  1. Fuzzy matching combines Levenshtein, n-grams and weighted fields on supplier, date, amount and description. Not magic: string arithmetic with calibrated thresholds.
  2. Three bands: >90% auto, 70-90% human, <70% discard. The grey zone has to be narrow.
  3. False positives cost a lot, false negatives cost little. Calibrate conservatively and let line-by-line three-way matching be the safety net.

If you want to see this on your own invoices, try ininvoice. You can also check the pricing and features.

Related content

See a demo with my invoices

Connect Gmail or Outlook. ininvoice ingests, applies fuzzy matching against your POs and delivery notes and exports to your accounting.

Get started