Spend Classification and Supplier Normalization Framework (2026 Edition)
We create a single, trusted golden spend and supplier layer before anything touches reporting, AI models, or Copilot agents. This guide shows the exact implementation pattern for finance and operations teams.
Purpose: Establish a governed, versioned supplier and spend foundation with full auditability, then expose only certified gold data to BI, planning, Copilot, and AI agents.
1. Purpose and Design Principles
Design principles:
- Versioned, governed data (Fabric Lakehouse bronze -> silver -> gold).
- UNSPSC as the primary taxonomy, with multi-taxonomy support where needed.
- LLM-first classification and deduplication, with human QA and full audit.
- Microsoft-native stack: Fabric, Power Query/Dataflows, Power BI, Azure OpenAI, and Power Apps.
Key KPIs within 90 days of go-live:
2. End-to-End Flow
Flow: Extract -> Profile -> Normalize -> Dedupe -> Classify -> Govern.
2.1 Extract (Bronze)
- Extract from AP, PO, contracts, and card feeds.
- Supplier master fields: IDs, names, addresses, tax IDs, bank details, contacts.
- Spend lines: invoice lines, PO lines, GRNs, card transactions (12-24 months).
- Land raw files into Fabric Lakehouse bronze tables via Dataflows or Notebook.
2.2 Profile
- Profile nulls, formats, distinct counts, and outliers.
- Identify suppliers missing tax IDs, bank details, country, or address.
- Prioritize top 500 suppliers by spend and invoice count for focused QA.
Output: Data quality scorecard and prioritized remediation list.
2.3 Normalize (Silver)
- Normalize names, addresses, and identifiers.
- Standardize country codes, address formats, tax ID formats, and payment terms.
- Create
VendorName_Normalized,VendorKey_Compare, and domain-based keys.
Output: Silver supplier and spend tables suitable for dedupe and classification.
2.4 Dedupe and Golden Supplier (Silver -> Gold)
- Generate duplicate candidates using deterministic rules and fuzzy or semantic similarity.
- Score candidates and decide auto-merge, manual review, or keep separate.
- Create Golden Supplier ID and mapping table from original IDs.
Output: Gold supplier dimension and mapping with full audit trail.
2.5 Classify to UNSPSC (Gold)
- Clean descriptions and apply rules plus AI to assign UNSPSC at Commodity level.
- Maintain UNSPSC mapping plus multi-taxonomy extensions.
Output: Gold spend fact joined to Golden Supplier and UNSPSC.
2.6 Govern and Expose
- Certify gold tables and semantic models in Fabric and Power BI.
- Require BI, planning, Copilot prompts, and AI agents to use gold data only.
- Monitor KPIs, drift, and exceptions with steward cockpit and approvals.
3. Vendor Name Normalization Rules
Objective: create a consistent comparison key and a business-friendly display name.
3.1 Transformation Order
Case and whitespace
- Trim leading and trailing spaces.
- Collapse multiple spaces to single spaces.
- Uppercase for comparison, optional title case for display.
Legal suffix and noise stripping (comparison only)
Remove legal suffixes like INC, CORP, LLC, LTD, PLC, GMBH, BV, SAS, SA, PTY. Remove non-distinctive tokens like COMPANY, CO, THE, GROUP, HOLDINGS.
Punctuation normalization
- Replace ampersand with AND.
- Remove punctuation that is formatting-only.
Domain-specific abbreviation expansion
- IT: TECH -> TECHNOLOGIES, SYS -> SYSTEMS, SFTWR -> SOFTWARE.
- Logistics: INTL -> INTERNATIONAL, TRSPT -> TRANSPORT.
Country and branch qualifiers
- Parse branch qualifier separately (example ACME LTD - FRANCE).
- Retain parent group fields for global suppliers.
Domain-based key
Derive email or website domain and use as strong grouping hint.
Canonical display name via LLM assist (optional)
Batch prompt example: Normalize this supplier name to a standard legal entity display name, removing noise while preserving brand and intent.
Final comparison key
VendorKey_Compare = UPPER(RemovedLegalSuffixName) + "|" + CountryCode + "|" + TaxID where TaxID exists.
3.2 Risk and ESG Enrichment
Reserve supplier dimension columns for risk score, ESG rating, and sector flags to support downstream analytics and agents.
4. UNSPSC Mapping Template (Excel and Dataverse Ready)
4.1 Mapping table structure
Table: UNSPSC_Mapping
Columns:
- Local_Category_ID, Local_Category_Name
- UNSPSC_Segment (2-digit), UNSPSC_Family (4-digit), UNSPSC_Class (6-digit), UNSPSC_Commodity (8-digit)
- UNSPSC_Description
- Alt_Taxonomy_Type, Alt_Taxonomy_Code, Alt_Taxonomy_Description
- Classification_Source (Manual, Rule, AI-Suggested, Inherited)
- Confidence_Score (0-1), Last_Reviewed_By, Last_Reviewed_Date, Is_Active
4.2 Spend line structure
Table: Spend_Lines_Gold
- Invoice_Line_ID, Golden_Supplier_ID, Vendor_ID_Original
- Description_Raw, Description_Clean
- GL_Account, Cost_Center or Department, PO_Category or MaterialGroup or ItemCode
- Amount, Currency, Quantity, UnitOfMeasure
- UNSPSC_Commodity, UNSPSC_Confidence_Score, UNSPSC_Classification_Source
4.3 Classification logic
- Direct code join when ERP already has UNSPSC or mappable commodity code.
- GL and org defaults with contradiction checks against descriptions.
- Keyword and pattern rules using priority table.
- LLM-first AI pass for unmapped or low-confidence lines with explanation.
- Auto-accept above threshold (for example 0.9), route remainder to steward review.
Monthly QA and drift checks:
- Percent spend at Commodity level by BU and region.
- Top Miscellaneous or Other buckets by spend.
- Categories with sudden shifts in spend mix or confidence.
- Drift rule: if category confidence drops more than 5 percent, flag for review.
5. Duplicate Detection and Scoring
Entity resolution pattern: generate likely pairs, then score and decide.
5.1 Candidate generation
- Same tax ID or VAT number.
- Same bank account and country.
- Same email or website domain.
- Same VendorKey_Compare.
- High name similarity in same country or region.
5.2 Scoring model (0-100)
- Tax ID exact match: +40
- Bank account exact match: +30
- Normalized name similarity >= 0.95: +20
- Normalized name similarity 0.90-0.95: +15
- Same country: +5
- Same city or postcode: +5
- Same email domain: +10
- Phone number match: +10
Force rule: if TaxID or BankAccount matches and name similarity >= 0.88, score floor is 80.
5.3 Decision thresholds
- Score >= 80: auto-merge candidate (with audit; optional steward check for high-spend suppliers).
- Score 70-79: steward review queue in Power Apps or Power BI cockpit.
- Score < 70: keep separate.
5.4 Golden supplier and digital supplier twin
Table: Supplier_Golden with Golden_Supplier_ID, normalized and display names, TaxID, country, parent company, risk, ESG, performance, and source metadata.
Table: Supplier_ID_Map with Vendor_ID_Original, Golden_Supplier_ID, source system, and status.
All spend facts should join through Golden_Supplier_ID.
6. Microsoft Stack Implementation Pattern
6.1 Data platform
- Fabric Lakehouse for bronze, silver, gold tables.
- Dataflows Gen2 or Power Query for extraction and normalization.
- Notebook or ML for fuzzy, semantic, and graph clustering.
6.2 AI and agents
- Azure OpenAI and Fabric data agents for name canonicalization.
- UNSPSC classification with confidence and explanation.
- Entity resolution suggestions for borderline duplicates.
- Copilot over certified semantic model for steward explain and ad-hoc QA.
6.3 Stewardship and governance
- Power Apps plus Power BI cockpit for merge approvals and code overrides.
- KPI and drift monitoring with exception workflows.
- Purview and lineage for policy, lineage, and governance controls where available.
6.4 Real-time operational touchpoints
- Event-driven classification and normalization for new suppliers and invoices.
- Dynamics 365 Procurement and AP integration for onboarding duplicate checks, sanctions checks, risk checks, and real-time PO category assignment.
Get a focused next-step plan for your environment
Book a 30-minute session. We'll map this framework to your ERP setup, your deduplication exposure, and your current Fabric configuration, and give you a prioritized 90-day path before the call ends.
Book a 30-Minute SessionNo credit card. No sales deck. Talk directly to the person who runs the engagement.