Who should read this DataQubi resource?

Finance, procurement, and data platform leaders evaluating practical implementation patterns for supplier data quality, spend visibility, and Microsoft Fabric-enabled operating models.

Can this approach be piloted without replacing ERP?

Yes. DataQubi patterns are designed to run as an overlay using source exports and governed Fabric layers without requiring ERP replacement.

How fast can teams see measurable outcomes?

Most teams can run a focused pilot in roughly 8 to 12 weeks and track measurable changes in duplicate reduction, classification coverage, and reporting confidence.

Proprietary Intelligence

Spend Classification and Supplier Normalization Framework (2026 Edition)

We create a single, trusted golden spend and supplier layer before anything touches reporting, AI models, or Copilot agents. This guide shows the exact implementation pattern for finance and operations teams.

UNSPSC Backbone Fabric Bronze-Silver-Gold LLM-First Classification Golden Supplier ID 2026 Edition

Purpose: Establish a governed, versioned supplier and spend foundation with full auditability, then expose only certified gold data to BI, planning, Copilot, and AI agents.

1. Purpose and Design Principles

Design principles:

Versioned, governed data (Fabric Lakehouse bronze -> silver -> gold).
UNSPSC as the primary taxonomy, with multi-taxonomy support where needed.
LLM-first classification and deduplication, with human QA and full audit.
Microsoft-native stack: Fabric, Power Query/Dataflows, Power BI, Azure OpenAI, and Power Apps.

Key KPIs within 90 days of go-live:

>=92%

Addressable spend classified at UNSPSC Commodity level

<3%

Duplicate supplier rate in active master

<5%

Monthly new spend lines left unclassified after AI + steward pass

2. End-to-End Flow

Flow: Extract -> Profile -> Normalize -> Dedupe -> Classify -> Govern.

2.1 Extract (Bronze)

Extract from AP, PO, contracts, and card feeds.
Supplier master fields: IDs, names, addresses, tax IDs, bank details, contacts.
Spend lines: invoice lines, PO lines, GRNs, card transactions (12-24 months).
Land raw files into Fabric Lakehouse bronze tables via Dataflows or Notebook.

2.2 Profile

Profile nulls, formats, distinct counts, and outliers.
Identify suppliers missing tax IDs, bank details, country, or address.
Prioritize top 500 suppliers by spend and invoice count for focused QA.

Output: Data quality scorecard and prioritized remediation list.

2.3 Normalize (Silver)

Normalize names, addresses, and identifiers.
Standardize country codes, address formats, tax ID formats, and payment terms.
Create VendorName_Normalized, VendorKey_Compare, and domain-based keys.

Output: Silver supplier and spend tables suitable for dedupe and classification.

2.4 Dedupe and Golden Supplier (Silver -> Gold)

Generate duplicate candidates using deterministic rules and fuzzy or semantic similarity.
Score candidates and decide auto-merge, manual review, or keep separate.
Create Golden Supplier ID and mapping table from original IDs.

Output: Gold supplier dimension and mapping with full audit trail.

2.5 Classify to UNSPSC (Gold)

Clean descriptions and apply rules plus AI to assign UNSPSC at Commodity level.
Maintain UNSPSC mapping plus multi-taxonomy extensions.

Output: Gold spend fact joined to Golden Supplier and UNSPSC.

2.6 Govern and Expose

Certify gold tables and semantic models in Fabric and Power BI.
Require BI, planning, Copilot prompts, and AI agents to use gold data only.
Monitor KPIs, drift, and exceptions with steward cockpit and approvals.

3. Vendor Name Normalization Rules

Objective: create a consistent comparison key and a business-friendly display name.

3.1 Transformation Order

Case and whitespace

Trim leading and trailing spaces.
Collapse multiple spaces to single spaces.
Uppercase for comparison, optional title case for display.

Legal suffix and noise stripping (comparison only)

Remove legal suffixes like INC, CORP, LLC, LTD, PLC, GMBH, BV, SAS, SA, PTY. Remove non-distinctive tokens like COMPANY, CO, THE, GROUP, HOLDINGS.

Punctuation normalization

Replace ampersand with AND.
Remove punctuation that is formatting-only.

Domain-specific abbreviation expansion

IT: TECH -> TECHNOLOGIES, SYS -> SYSTEMS, SFTWR -> SOFTWARE.
Logistics: INTL -> INTERNATIONAL, TRSPT -> TRANSPORT.

Country and branch qualifiers

Parse branch qualifier separately (example ACME LTD - FRANCE).
Retain parent group fields for global suppliers.

Domain-based key

Derive email or website domain and use as strong grouping hint.

Canonical display name via LLM assist (optional)

Batch prompt example: Normalize this supplier name to a standard legal entity display name, removing noise while preserving brand and intent.

Final comparison key

VendorKey_Compare = UPPER(RemovedLegalSuffixName) + "|" + CountryCode + "|" + TaxID where TaxID exists.

3.2 Risk and ESG Enrichment

Reserve supplier dimension columns for risk score, ESG rating, and sector flags to support downstream analytics and agents.

4. UNSPSC Mapping Template (Excel and Dataverse Ready)

4.1 Mapping table structure

Table: UNSPSC_Mapping

Columns:

Local_Category_ID, Local_Category_Name
UNSPSC_Segment (2-digit), UNSPSC_Family (4-digit), UNSPSC_Class (6-digit), UNSPSC_Commodity (8-digit)
UNSPSC_Description
Alt_Taxonomy_Type, Alt_Taxonomy_Code, Alt_Taxonomy_Description
Classification_Source (Manual, Rule, AI-Suggested, Inherited)
Confidence_Score (0-1), Last_Reviewed_By, Last_Reviewed_Date, Is_Active

4.2 Spend line structure

Table: Spend_Lines_Gold

Invoice_Line_ID, Golden_Supplier_ID, Vendor_ID_Original
Description_Raw, Description_Clean
GL_Account, Cost_Center or Department, PO_Category or MaterialGroup or ItemCode
Amount, Currency, Quantity, UnitOfMeasure
UNSPSC_Commodity, UNSPSC_Confidence_Score, UNSPSC_Classification_Source

4.3 Classification logic

Direct code join when ERP already has UNSPSC or mappable commodity code.
GL and org defaults with contradiction checks against descriptions.
Keyword and pattern rules using priority table.
LLM-first AI pass for unmapped or low-confidence lines with explanation.
Auto-accept above threshold (for example 0.9), route remainder to steward review.

Monthly QA and drift checks:

Percent spend at Commodity level by BU and region.
Top Miscellaneous or Other buckets by spend.
Categories with sudden shifts in spend mix or confidence.
Drift rule: if category confidence drops more than 5 percent, flag for review.

5. Duplicate Detection and Scoring

Entity resolution pattern: generate likely pairs, then score and decide.

5.1 Candidate generation

Same tax ID or VAT number.
Same bank account and country.
Same email or website domain.
Same VendorKey_Compare.
High name similarity in same country or region.

5.2 Scoring model (0-100)

Tax ID exact match: +40
Bank account exact match: +30
Normalized name similarity >= 0.95: +20
Normalized name similarity 0.90-0.95: +15
Same country: +5
Same city or postcode: +5
Same email domain: +10
Phone number match: +10

Force rule: if TaxID or BankAccount matches and name similarity >= 0.88, score floor is 80.

5.3 Decision thresholds

Score >= 80: auto-merge candidate (with audit; optional steward check for high-spend suppliers).
Score 70-79: steward review queue in Power Apps or Power BI cockpit.
Score < 70: keep separate.

5.4 Golden supplier and digital supplier twin

Table: Supplier_Golden with Golden_Supplier_ID, normalized and display names, TaxID, country, parent company, risk, ESG, performance, and source metadata.

Table: Supplier_ID_Map with Vendor_ID_Original, Golden_Supplier_ID, source system, and status.

All spend facts should join through Golden_Supplier_ID.

6. Microsoft Stack Implementation Pattern

6.1 Data platform

Fabric Lakehouse for bronze, silver, gold tables.
Dataflows Gen2 or Power Query for extraction and normalization.
Notebook or ML for fuzzy, semantic, and graph clustering.

6.2 AI and agents

Azure OpenAI and Fabric data agents for name canonicalization.
UNSPSC classification with confidence and explanation.
Entity resolution suggestions for borderline duplicates.
Copilot over certified semantic model for steward explain and ad-hoc QA.

6.3 Stewardship and governance

Power Apps plus Power BI cockpit for merge approvals and code overrides.
KPI and drift monitoring with exception workflows.
Purview and lineage for policy, lineage, and governance controls where available.

6.4 Real-time operational touchpoints

Event-driven classification and normalization for new suppliers and invoices.
Dynamics 365 Procurement and AP integration for onboarding duplicate checks, sanctions checks, risk checks, and real-time PO category assignment.

Get Framework + My Tailored Recommendation -> Browse More Resources

Tailored Recommendation

Get a focused next-step plan for your environment

Book a 30-minute session. We'll map this framework to your ERP setup, your deduplication exposure, and your current Fabric configuration, and give you a prioritized 90-day path before the call ends.

Book a 30-Minute Session

No credit card. No sales deck. Talk directly to the person who runs the engagement.