Convert a PDF Invoice to UBL — Can You Really? What works and what doesn't when converting PDF invoices to XML
- E invoicing
- 30 Apr, 2026
- 7 min read
TL;DR: Converting a PDF invoice into a real UBL/Peppol XML is not a simple file conversion — it’s OCR plus data extraction, and the result is only as accurate as the OCR engine. For incoming invoices, you usually shouldn’t be converting at all: ask your supplier for a real UBL file. For your own outgoing invoices, send UBL straight from your accounting software instead of going via PDF.
You have a stack of PDF invoices you want to get into your system, or a supplier asks for UBL but you only have a PDF. The obvious question: can you just convert a PDF into UBL?
The short answer is yes, but not the way you’d expect. Here’s what actually happens — and when you should skip it.
Why PDF to UBL isn’t a real “conversion”
Converting Word to PDF keeps the content the same and only swaps the format. PDF to UBL is fundamentally different.
A PDF invoice is a visual document: pixels and text laid out for humans. A UBL invoice is a structured data file: every field (invoice number, supplier, VAT ID, line items, IBAN, totals) is labelled in XML so software can parse it automatically.
To go from PDF to UBL, software has to:
- Extract the text from the PDF (text extraction or OCR for scanned PDFs)
- Figure out which word is the supplier, which number is the VAT rate, which is an IBAN, and so on
- Place those fragments into the correct UBL fields
Step 2 is the hard part. Every supplier designs their invoice differently. “Total” might be bottom-right, top, or middle. VAT can be per-line or aggregated. Field order varies. So this isn’t conversion — it’s data extraction with pattern recognition, i.e. OCR with invoice intelligence.
What actually works: invoice OCR
Several categories of tools can read PDFs and return structured data.
Accounting software with scan-and-recognise
Packages like Xero, QuickBooks, Moneybird, Yuki, Exact Online, Sage and Twinfield offer scan-and-recognise for incoming invoices. You drop in a PDF, their OCR identifies the fields and proposes a booking. Most don’t explicitly export to UBL — but the data ends up structured inside your accounting system.
For: Businesses already using such a package.
Limitation: The UBL/XML usually doesn’t leave the system as a file; it stays internal.
Dedicated invoice-OCR services
Services like Klippa, Rossum, Hypatos, Veryfi and similar APIs specialise in invoice OCR and return structured JSON or XML. With a thin layer of mapping their output can be turned into UBL.
For: Businesses processing high volumes of PDFs that need automation.
Limitation: Costs per document (typically €0.05–€0.30), and accuracy depends on the OCR engine.
E-invoicing platforms
Some Peppol access points and e-invoicing platforms include built-in OCR for suppliers who can’t yet produce UBL. PDF goes in, UBL comes out, and it’s sent over Peppol.
For: Businesses going fully electronic that haven’t got their suppliers there yet.
Limitation: Requires a platform subscription and setup.
When you shouldn’t bother
Two scenarios where PDF→UBL conversion is the wrong tool.
Scenario 1: You received a PDF and want to book it
Your supplier should be sending you a real UBL/Peppol invoice — not a PDF you have to convert. In countries where e-invoicing is becoming mandatory (see our country-by-country overview) this is a legal requirement.
Better: Ask your supplier for a UBL file or a Peppol address. Most accounting tools can produce both with one click.
Or: Import the PDF directly into your accounting software using scan-and-recognise. You don’t need to convert it to UBL first — the data lands in your purchase ledger straight away.
Scenario 2: You want to invoice your own customers in UBL
You’re trying to deliver your own outgoing invoices in UBL by first making a PDF in Word/Pages and then converting it. That’s backwards.
Better: Use accounting software that produces UBL/Peppol natively. Most modern accounting tools do — fill in the invoice once and the system generates both a PDF (for human reading) and a UBL (for automated processing).
When conversion is genuinely useful
A few legitimate use cases:
- Historical archives — thousands of old PDFs you want indexed and searchable. OCR + extraction to structured data is useful, even if the output isn’t strictly UBL.
- Suppliers that truly can’t do UBL — if you need Peppol compliance for B2G and a supplier won’t cooperate, an access point with OCR can bridge the gap.
- Migrating off a legacy system — your old system only exported PDFs and you’re now switching to e-invoicing.
In all three: OCR output is statistical, not exact. Expect to manually correct 5–15% of fields, especially on complex or multilingual invoices.
The other direction: UBL/XML to PDF
Sometimes you want the reverse: turning a UBL invoice you received into a PDF for printing, archiving or sending to someone without a viewer.
That direction is a real conversion — all the data is there, you just need a template laid over the XML. Tools like UBL Buddy display a UBL invoice with proper formatting on Mac, iPhone or iPad. From there you can print to PDF using the standard macOS print dialog (File → Print → Save as PDF).
Frequently asked questions
Is there a free PDF to UBL converter?
Not a reliable one. Free online converters exist but OCR quality is usually poor, and you’re uploading sensitive invoice data to unknown servers. Fine for one-off curiosity; not for production use.
Can ChatGPT or another AI tool convert a PDF to UBL?
An AI can read text out of a PDF and place it into a UBL template. In practice this works reasonably for simple invoices, but:
- Error rates are unpredictable — you only know if fields are correct after manual review
- VAT codes and currency codes must follow UBL exactly (e.g.
Sfor standard rate, ISO currency codes) - For compliance (Peppol validation, VIES checks) the XML is strict
- Sending sensitive data to an external AI is a GDPR concern
For one-off experimentation: maybe. For production: no.
What’s the difference between UBL and XML?
XML is the generic format (a way to store structured data). UBL (Universal Business Language) is a specific standard within XML for business documents like invoices. So a UBL file is always XML, but not all XML is UBL. Peppol uses UBL as its on-the-wire format.
Can UBL Buddy convert my PDFs to UBL?
No. UBL Buddy is a viewer for UBL/Peppol invoices you’ve received — not an OCR tool. For PDF extraction, use your accounting software or a dedicated invoice-OCR service.
What if my supplier insists on PDF?
From 2026 onwards, B2B e-invoicing becomes mandatory in a growing list of European countries, including Belgium (Jan 1, 2026), Germany (sending from 2027) and France (September 2026). PDF-only invoices will become legally invalid for B2B in those markets. You can already point suppliers at the deadline and ask for Peppol enrolment — most accounting tools support this with one click.
Further reading
- What is a Peppol invoice? — the standard explained
- How to open an XML invoice on Mac — for when you’ve just received a UBL file
- XML invoice received by email — what now? — first aid for unexpected XML attachments
- E-invoicing country overview — when does PDF become invalid in your market?
Tags:
- Ubl
- Xml
- Conversion
- Peppol