AI.DI by imkore  ·  Document Intelligence Platform  ·  Built ground-up 2024–2025
Document Intelligence.
Any document. Any industry.
At any scale.
AI.DI is not a better file cabinet. It is the first platform built from the ground up to make every enterprise document permanently intelligent, certified, and ready to transact — with six integrated engines that compound in value the more you use them. Box stores your documents. SharePoint organizes them. AI.DI understands them.
6
Integrated Engines
~10K
Doc Fingerprints
ANY
Doc Types
30
ML Engines
0
Legacy Code Lines
Any
Industry · Any Scale
Sentry Document Assurance Abstract.DI Extraction AI.DI Document Warehouse AI Orchestration & Agent Gateway Document Gateway Millennia FileStar Continuous Transaction Readiness
Overview · Tab 01
The Document Intelligence Category Just Changed.
For thirty years, document management meant storage, organization, and search. AI.DI redefines the category entirely: Document Intelligence means every document is certified, extracted, scored, distributed, and queryable by any AI system — automatically, continuously, at scale. No incumbent has built this. We have.
Full Platform Architecture — Reproduced from AI.DI Statement of Outcomes
DOCUMENT SOURCES Box / Egnyte SharePoint ERP / HRIS SAP / Workday Email / Outlook DocuSign Salesforce Windows Share Any REST API ADVISORY SERVICE imkore Blueprint — Document Intelligence Audit + Readiness Roadmap · $50K to $150K · 60–90 days TRUST & GOVERNANCE LAYER TRUST ENGINE Sentry Document Assurance Fingerprint · Deduplicate · Certify · Search Zero doc storage · Fingerprints flow to Warehouse GOVERNANCE ENGINE Millennia FileStar Document Fabric · Workflow · Compliance Governs docs · Syncs with Warehouse OPERATIONAL ENGINES — RUN ON THE WAREHOUSE EXTRACTION ENGINE Abstract.DI Any doc · Any industry · 100K batch Extracts intelligence → Stores in Warehouse EXCHANGE ENGINE Document Gateway Ingest · Validate · Distribute · Track Certifies docs · Routes to Warehouse AI AGENT ENGINE AI Orchestration MCP · Agent Gateway · Q&A · RAG Queries Warehouse · Returns certified answers DATA & ANALYTICS Snowflake Databricks Power BI / BigQuery LAYER 1 — FOUNDATION INFRASTRUCTURE AI.DI Document Warehouse Documents · Extracted Data · Metadata · Fingerprints · Audit Trail · CTR Score The hub all engines connect through · Queryable by any AI or analytics system INTEGRATIONS REST API · JDBC MCP · SDK Webhooks · Snowflake AI & ANALYTICS CONSUMERS Snowflake / BigQuery Copilot / ChatGPT Claude / Gemini Power BI / Tableau Custom AI Agents MCP Clients / SDK DEPLOYMENT: Azure Cloud AWS On Premise Hybrid Single Tenant Multitenant Any File Type · Any Industry · Any Org Size Every engine has standalone value · Modular adoption · No rip and replace required
Why the Legacy Platforms Cannot Follow
The Architecture Problem
They Bolted AI Onto Storage. We Built Intelligence From the Start.

M-Files, Box, SharePoint Copilot, and Hyland all share the same architectural constraint: a file storage system built in a different era with AI features layered on top. Layered AI produces generic summaries from raw files. Native AI produces structured intelligence from every field of every document — something you can actually query, export, and act on.

You cannot retrofit the AI.DI Document Warehouse onto a file storage system. The warehouse is not a feature. It is the foundation. Building it requires starting over. None of them will.

The Data Problem
Documents and Data Have Always Lived in Separate Systems. We Ended That.

Every enterprise runs a document system and a data warehouse in parallel. The document system stores files. The data warehouse stores numbers. The information in the document — the obligation term, the coverage ratio, the commitment date, the counterparty agreement — lives in neither system. It is trapped in the PDF.

Abstract.DI ends this permanently. Every document becomes a structured database record the moment it enters the platform. The PDF is the backup. The warehouse row is the truth.

The Trust Problem
AI That Reasons From Uncertified Documents Produces Uncertified Answers.

Enterprise AI deployments fail for a predictable reason: the documents feeding the model are unverified, duplicated, and structurally inconsistent. Copilot hallucinates because SharePoint is untrustworthy. The model is not the problem. The data is.

Sentry certifies every document before it enters the AI pipeline. An AI agent using AI.DI as its knowledge base cannot be given a falsified document — the fingerprint will not match. Every answer is traceable to a specific certified version with a confidence score. That is a different category of AI deployment entirely.

The Intelligence Flywheel — Why It Compounds
Compounding Platform Value

Documents flow into Document Gateway. Abstract.DI extracts intelligence from every one. Sentry fingerprints and certifies them. The Document Warehouse stores all of it as structured, queryable data. The Warehouse improves Abstract.DI model accuracy. Better accuracy improves Sentry signals. Better signals make Document Gateway more valuable. More value drives more documents. After 18 months, switching costs are effectively permanent — and accuracy measurably exceeds any out of the box alternative.

Platform Layer Stack — Complete Feature Inventory
DG
Document Gateway — Exchange & Distribution Engine
Check-In Studio · Distribution Studio · Transaction Rooms · ML Learning Studio · 200+ components · 29 edge functions
Core OSAny industry
The central operating system for every document. Replaces Box, SharePoint, Egnyte as the primary system of record while connecting to any of them as migration sources. React/TypeScript/Vite + Supabase + Deno Edge Functions + Cloudflare R2 storage.
Check-In Studio
AI powered document intake with AbstractIQ auto classification, batch template mode, session history, rejection/resubmission pipeline, and external submitter portal.
Distribution Studio
Unified hub replacing 6 legacy distribution workflows. Standing distributions, serialized delivery, access tracking, client branding engine, full audit trail.
ML Learning Studio
30 self improving AI engines across 6 capability tiers. Org specific model weights. 4 map views. Continuous accuracy improvement.
AB
Abstract.DI — AI Extraction Engine
Any doc type · 94% day one confidence · 100K batch chunks · GPU OCR · Anomaly detection · Custom schema builder
AI nativeCore moat
Reads every document and converts it into structured, queryable intelligence. Multi pass pipeline: OCR → classification → extraction → confidence scoring → anomaly detection → warehouse write. Different models optimized per document type.
Batch Engine
Process entire archives in 100K-document chunks. ZIP, Box, SharePoint, S3. Output to Excel, JSON, CSV, or Warehouse.
Any Document Type
Custom schema builder for proprietary types — hours not months. Prebuilt schemas for legal, financial, compliance, healthcare, HR, and government.
GPU OCR
DocTR engine. CPU and GPU — 10x to 50x speedup on GPU. Selective OCR for maximum cost efficiency.
SE
Sentry Document Assurance — Trust & Compliance Monitor
~10,000 fingerprints · Zero doc storage · Patent pending · GDPR/HIPAA/SEC by architecture · 10 to 100x faster search v2
TrustCompliance
Deterministic mathematical fingerprinting. Zero document storage — only immutable fingerprints. Three types: Document Content, Document Data, Trusted Data Fingerprints (unique in market — fingerprint individual database rows, find every document referencing that entity).
Duplicate Elimination
40%+ industry average duplicate rate = 40% wasted AI spend. 30 to 50% LLM cost reduction immediately.
Cross System Search
Search SharePoint, OneDrive, Windows Share, email archives, ERP, FileStar simultaneously. No tags. No training.
PII Redaction
Auto-detects and redacts SSNs, financial IDs, tax IDs before fingerprint storage. GDPR data minimization by mathematics.
DW
AI.DI Document Warehouse — Structured Intelligence Layer
PostgreSQL · SQL/GraphQL/REST · Snowflake · Databricks · MCP · Vector embeddings · 6 query views
Data moatBI connectors
The biggest differentiator in the market — and it doesn't exist anywhere else. Every document Abstract.DI processes becomes structured rows in PostgreSQL. Every extracted field is queryable data. Every AI signal is persisted as a structured record.
6 Query Views
List · Library · Cube (pivot) · Time series · Schema · Scientist mode. Every dimension instantly explorable.
BI Connectors
Snowflake Data Share, Databricks, Tableau, Power BI, dbt, BigQuery, Python SDK. Zero ETL overhead.
Event Streaming
Webhook Manager fires events on every platform action. Real time pipeline triggers for any downstream system.
OA
AI Orchestration & Agent Gateway
LLM agnostic · MCP server · RAG foundation · OAuth2/OIDC · Zero hallucination
AI agentsRAG substrate
The AI layer that makes every enterprise LLM investment actually work. Not competing with LLMs — the prerequisite. Works with Copilot, GPT-4, Claude, Gemini, Llama, or any custom LLM. MCP server callable from Claude, Cursor, LangChain, AutoGen.
MCP Server
Available. Callable from any MCP compatible environment. No custom integration layer required.
LLM-Agnostic
Client chooses the AI model; AI.DI provides the trusted foundation. No vendor lock-in to any LLM.
RAG Foundation
Every chunk provenance tracked, every answer traceable to a specific certified document version. Zero hallucination.
FS
Millennia FileStar — Document Governance & Fabric
Founded 1996 · Trusted by enterprise clients · SSAE 18 certified · AI.DI integration pathway
Installed baseOn ramp
The governance engine powering the AI.DI platform. FileStar governs document lifecycle and syncs all metadata to the AI.DI Warehouse — turning every FileStar deployment into a warm on-ramp to the full AI.DI platform.
Document Fabric
Structured lifecycle governance for any document type. Configurable routing, approval chains, escalation paths.
AI.DI Integration
FileStar governs; Warehouse stores; Sentry certifies; Abstract.DI extracts — automatically on every FileStar doc.
Installed Base
8–10 Phase 1 upgrade targets. 20+ year institutional trust that no competitor can replicate regardless of funding.
Platform Combinations — Entry to Full Suite
DG alone
Day one value. Replaces Box.
Document storage, hierarchy, roles, and lifecycle management for any organization. 30 day deployment. No implementation project.
"We're better than Box at organizing documents for your specific org structure. Same price. Setup in a week."
DG+Abstract.DI
The intelligence upgrade.
Every document uploaded turns into structured, searchable data automatically. No manual tagging. 94% accuracy on day one.
"Every document you upload now tells us what it says, who it's about, and when it expires — without you doing anything."
DG+Sentry
The compliance overlay.
Sentry wraps any existing repository — Box, SharePoint, Egnyte — and adds continuous compliance monitoring without replacing storage.
"Keep your existing storage. We add the trust layer that tells you what's missing, what's expired, and what's been altered."
All six engines
The platform. The moat.
All six engines create a compounding intelligence flywheel. The platform becomes permanently irreplaceable.
"Your documents are now a structured, AI certified, queryable intelligence asset. This is a different category."
"M-Files has been building their AI product since 2019. Box added AI features in 2022. SharePoint Copilot launched in 2023. None of them started from scratch. None of them can. We did. That is the only advantage that cannot be copied."
— Platform architecture principle
Overview · Tab 02
AI Intelligence — Built Ground-Up. Not Bolted On.
Every incumbent document platform bolted AI on top of a 20 year-old data model. AI.DI was architected in 2024 with AI as the primary actor — not the afterthought. 27 AI engines. 30 self improving ML models. An MCP server connectable to any LLM. This is not a roadmap.
The Competitive Barrier — Stated Plainly

Competitors have slides about AI. AI.DI has 200+ React/TypeScript components, 29 live serverless edge functions, an ML Learning Studio with 30 self improving engines, a MCP server, an AI Agent Gateway connecting to Claude/Copilot/GPT-4/Gemini, and a production AI.DI Studio running 27 active AI engines. The gap between what competitors promise and what we have already shipped is measured in years of engineering. This is the unfair advantage that cannot be purchased with a VC round.

The AI.DI Studio — 27 Active AI Engines
documentgateway.ai
AI.DI Studio — Real Time Intelligence Infrastructure
Click to enlarge
AI
AI.DI Studio — Real Time Intelligence Infrastructure
AI Intelligence · 27 Active Engines · 5 Capability Domains
27 AI engines running simultaneously across 5 capability domains — purpose built for enterprise document intelligence. Each column is a domain: AI Core handles classification, extraction, confidence scoring, and the HITL Reduction meta-engine that makes the entire platform self improving. Intelligence manages deep document comprehension, cross document validation, expiry detection, and the document type registry that routes every document to the right schema. Process automates the operational pipeline — OCR, obligation extraction, approval routing, distribution rule execution, workflow management, and industry specific feature configuration. Trust & Security enforces immutability through blockchain anchoring, continuous tamper detection, and database level access control. Data/Integration maintains the warehouse sync, API gateway, registry, and storage optimization engines that keep the entire data pipeline running at scale. Every node shows live metrics: documents processed, classifications made, conflicts detected, connections active. This is the intelligence infrastructure. No competitor has built it. No competitor can.
HITL Reduction — The Self-Improving Loop
Human-In-The-Loop Reduction Architecture

The HITL Reduction AI engine monitors all other engines' human review rates and autonomously moves classifications to auto approve when confidence consistently exceeds configurable thresholds. Standard document types trend toward zero human intervention at 12 months. Novel or edge case documents always retain human oversight — the goal is the right humans reviewing the right exceptions, not zero humans.

Legacy Platform HITL at 12 Months
Standard contracts65%
Insurance certificates55%
Financial statements70%
Fixed model weights. No production learning. Same cost and error rate at month 12 as month 1.
AI.DI HITL at 12 Months
Standard contracts8%
Insurance certificates5%
Financial statements12%
Continuous production learning. Every human correction retrains the model automatically. No ML engineers required.
Blockchain Engine — Immutable Audit Trail
documentgateway.ai
AI.DI Studio — Blockchain Engine · On-Chain Document Integrity
Click to enlarge
AISENTRY
AI.DI Studio — Blockchain Engine · On-Chain Document Integrity
AI Intelligence · Trust Engine · Ethereum / Hedera / Polygon
2,814 documents have been anchored on chain through this engine — each one generating a Merkle tree hash committed to Ethereum, Hedera, or Polygon as an immutable proof of existence and content at a specific point in time. The HITL Reduction panel shows 100% automated — meaning the Blockchain Engine requires zero human review at this tenant after 12 months, because blockchain anchoring is deterministic: if the fingerprint matches, it anchors; no judgment required. The Pages Powered By This Engine panel shows exactly where this engine's certifications surface in the UI: Document Vault, Asset Vault, and Verification Portal. This is not a compliance checkbox — it is the infrastructure that allows a document to be presented to a regulator, counterparty, or auditor with cryptographic proof that its content has not changed since a specific timestamp. No document management platform on the market ships this out of the box.
Integration Studio — Connect Any AI Agent
documentgateway.ai
Integration Studio — Live AI Agent Gateway
Click to enlarge
AIORCHESTRATION
Integration Studio — Live AI Agent Gateway
AI Orchestration · MCP Server + 3 Connected AI Systems
This screen represents the moment enterprise AI deployment becomes real. Three AI systems are connected: Claude.ai, ChatGPT/OpenAI, and FileStar. Each has read only, row level-security-enforced access to the entire certified document corpus through 6 production tools. The MCP Server URL is a published endpoint — connect it to Claude, Cursor, LangChain, or any MCP compatible environment and the AI gains the ability to search certified documents, check compliance status, retrieve obligations, query the warehouse, navigate the hierarchy, and retrieve signed document URLs. Keys are tenant-scoped, revocable instantly, and enforce the same RLS policies as the UI. The AI Agent Gateway panel on the left shows Microsoft Copilot, Gemini/Google, and Grok/xAI as available-to-add connections — meaning your entire AI vendor portfolio can query the same trusted document foundation. This is the infrastructure that makes every LLM investment in your organization actually work.
Integration Ecosystem — 28 Connectors
documentgateway.ai
Integration Studio — 28 Connectors Across Every Enterprise System
Click to enlarge
AIORCHESTRATION
Integration Studio — 28 Connectors Across Every Enterprise System
AI Orchestration · Full Connector Ecosystem
The most common objection to any new platform is "we already use X." AI.DI answers it by connecting to every X simultaneously. AI agent platforms — Claude.ai and ChatGPT are fully integrated; Microsoft Copilot, Gemini, and Grok ready to configure. Enterprise ERPs push operational data, financial reports, and contract records directly into the ingestion pipeline on a scheduled or event-driven basis — no manual export, no batch required. Document management systems (SharePoint, Google Drive, Box, OneDrive, Dropbox) connect as source systems: AI.DI reads, certifies, and extracts from your existing storage without requiring you to move a single file. CRM platforms push agreements and correspondence as structured ingestion records. Data warehouse connectors deliver extracted intelligence outbound to Snowflake, Databricks, BigQuery, and Redshift on configurable schedules. Every connector is configured through a guided AI wizard — no IT project, no professional services, no custom code required. The platform fits into the enterprise as it currently exists, not as it would need to be rebuilt.
Document IQ — Conversational AI Over Your Certified Corpus
documentgateway.ai
Document IQ — AI Powered Document Intelligence Assistant
Click to enlarge
AIORCHESTRATION
Document IQ — AI Powered Document Intelligence Assistant
AI Orchestration · Conversational AI · Portfolio-Wide Access
Document IQ is what happens when you give an AI system access to a certified, structured document corpus instead of raw PDFs. With access to an enterprise corpus, it can answer questions that would take a human analyst days: "What's missing from the vault?" surfaces every gap across every asset simultaneously. "Show critical risk items" aggregates all violation flags and expiry warnings into a single prioritized view. "Expiring in the next 30 days" is a precise query against structured expiry dates in the Warehouse — not a keyword search, not an approximation. Upload any file and Document IQ cross references it against vault data in real time: upload a critical data extract and it identifies which tenants aren't in the vault, which leases are missing, which figures don't match the abstracted data. This is not a chatbot bolted onto a document management system — it is an AI with structured, trusted data access that no general-purpose LLM can replicate without the Warehouse underneath it.
ML Learning Studio — 30 Engines, 6 Tiers
The Self-Improvement Architecture

Every legacy DMS has fixed classification models requiring expensive, time-consuming retraining. AI.DI's ML Learning Studio inverts this entirely — 30 engines improving continuously from production data, automatically, without engineering intervention. AI.DI gets cheaper and more accurate at scale. Every competitor's cost stays flat or increases.

TierFocusExample EnginesHITL Trajectory
Tier 1 — FoundationDocument type classificationEnterprise Type Classifier, PE Type Classifier, Legal Type ClassifierNear-zero for covered types
Tier 2 — EntityNamed entity extractionParty Extractor, Property Identifier, Fund/Entity Linker5–15% at 6 months
Tier 3 — Date & ValidityTemporal signal extractionExpiration Detector, Effective Date Parser, Renewal ClassifierNear-zero for standard formats
Tier 4 — FinancialFinancial data extractionLoan Terms Extractor, Critical Data Extract Parser, Appraisal Value Extractor10–20% at 6 months
Tier 5 — ComplianceCompliance validationCoverage Gap Detector, Compliance Flag Engine, Signature Validator15–25% — domain expertise retained
Tier 6 — Cross-DocumentCross-document consistencyPortfolio Benchmark Engine, Anomaly Correlator, Reconciliation EngineComplex analysis — strategic HITL
"We didn't build a document platform and add AI. We built an AI platform that happens to manage documents. The difference is not semantic. It is architectural. And architecture determines destiny."
— AI.DI platform design principle
Overview · Tab 03
The Honest Battle Table — Where We Win and Why It's Structural
We do not win on every dimension today. What matters is the architecture. An incumbent can add a feature. No incumbent can add a clean data model, a zero-legacy stack, or an AI engine designed in from the first line of code.
Why Legacy Platforms Cannot Catch Up

Box cannot rebuild their data model for AI without breaking 150,000 customers. SharePoint's incentive is to preserve Teams and Office revenue, not cannibalize Copilot. M-Files is 2–3 years behind on the data model and the Warehouse layer. Egnyte wins on storage reliability but has no awareness of what documents contain. Every dollar these platforms invest in AI is constrained by the need to not break existing products. That constraint does not exist for AI.DI.

Full Capability Matrix
Capability AI.DI Platform BoxSharePointM-FilesEgnyte
Architecture & Philosophy
AI native architecture (built for AI, not adapted)Win 2024-2025. Zero compromise. AI is core, not a wrapper.Bolt-onCopilot wrapperAino — improving but bolted onMinimal
Zero legacy technical debtWin No codebase older than 18 months.2005 origin2001 origin2003 origin2009 origin
Edge compute architectureWin All compute at edge. Scale to zero or infinity.NoneAzure Functions (partial)NoneNone
Modular adoption (standalone or full suite)Win Every engine has standalone value.PartialModule-based but complexPartialPartial
AI & Document Intelligence
Structured data extraction from documentsWin Abstract.DI — any type, 94% day one, 100K batch.NoneBasic Copilot extractionAino — requires trainingNone
Day one extraction accuracy (no training)Win 94%+ on prebuilt schemas. No training required.N/AN/AMonths of trainingN/A
GPU accelerated OCR pipelineWin DocTR — 10-50x speedup on GPU.NoneAzure OCR (limited)Basic OCRBasic OCR
Batch processing (100K+ archives)Win 100K-chunk batch. ZIP, Box, SharePoint, S3.NoneNoneLimited batchNone
30 self improving ML enginesWin Continuous production learning. No ML engineers.NoneGeneric CopilotLimited self-learningNone
HITL Reduction AI (autonomous meta-engine)Win Autonomous promotion of high-confidence classifications.NoneNoneNoneNone
Trust, Compliance & Security
Document fingerprinting (deterministic, patent pending)Win ~10,000 fingerprint catalog. Zero doc storage.NoneNoneNoneNone
Zero document storage compliance modelWin Only fingerprints stored. GDPR minimization by math.Full storageFull storageFull storageFull storage
PII auto detection and redaction pipelineWin Tokenization pipeline auto redacts at ingestion.NonePurview (partial)NoneDLP (partial)
Fraud / document manipulation detectionWin Deterministic — single character change detectable.NoneNoneNoneNone
Blockchain audit trailWin On chain anchoring. 2,814+ documents on chain.NoneNoneNoneNone
Data & AI Infrastructure
Structured document intelligence warehouseWin Every extracted field is a queryable row. Unique.NoneNoneNoneNone
Snowflake Data Share (zero ETL)Win Zero-copy. Join doc intelligence with financial data.NoneNoneNoneNone
MCP server for AI agentsWin Production MCP. Claude, Cursor, LangChain — no wrapper.NoneNoneNoneNone
Vector embeddings on certified chunksWin Tied to certified versions. pg_vector native.NoneAzure AI Search (partial)NoneNone
CTR Score (Continuous Transaction Readiness)Win Live composite readiness score. Portfolio-wide.NoneNoneNoneNone
27 active AI engines in productionWin AI.DI Studio — live engine map with real time status.NoneNoneNoneNone
Deployment & Integration
Unlimited hierarchy depth (any org structure)Win Enterprise → Group → Entity → Asset → Unit. Any depth.Folders onlySites/subsitesMetadata basedFolders/workspaces
30 day deployment (no implementation project)Win 30 days from contract to live. M-Files runs 3–6 months.Weeks–monthsMonths–years3–6 months typicalWeeks–months
Installed base / existing trust relationshipsWin 45 FileStar enterprise clients. 20+ year relationships. Zero CAC.Large (hard to access)Large (bundled)Existing clientsExisting clients
One-Line Positioning Per Competitor
vs. Box
"Box stores it. We understand it."
Never compete with Box on storage. Lead with the batch engine demo — upload a folder of contracts, produce a structured Excel workbook in 4 hours. Box produces a list of file names. Do not ask them to cancel Box. Ask what Box actually tells them about their documents.
vs. SharePoint
"Keep SharePoint. Add intelligence."
Never demand they cancel SharePoint. "SharePoint manages collaboration. AI.DI manages the intelligence layer — extraction, assurance, readiness scoring — running on top of whatever you already have. You do not have to change anything to start."
vs. M-Files
"Day one intelligence, not month-six."
M-Files Aino requires months of training. AbstractIQ delivers 90%+ confidence on day one for any document type — because schemas are prebuilt and AI classification runs from the first upload. "You bring the documents. We bring the intelligence."
vs. Egnyte
"You know where your files are. We know what they say."
Egnyte wins on hybrid storage reliability and IT governance. It has no awareness of what documents contain. AbstractIQ connects to an Egnyte repository as an intelligence overlay. "Add intelligence to Egnyte" — the displacement happens organically once the intelligence layer is live.
"We are not trying to be a better Box. We are trying to make Box irrelevant — the same way Salesforce made Act! irrelevant. Not by being louder. By being categorically different."
— imkore category strategy · 2025
Products · Tab 05
Document Gateway — The Operating System for Every Document
The central hub where documents live, roles are configured, compliance is monitored, and every other AI.DI engine plugs in. 200+ React/TypeScript components. 29 live serverless edge functions. Replaces Box, SharePoint, Egnyte as the primary system of record.
200+
React/TS Components
29
Live Edge Functions
30 days
Avg. Deployment
5-tier
Org Hierarchy
30
ML Engines
4
Role Types
Check-In Studio — AI Powered Document Ingestion
documentgateway.ai
Check-In Studio — Intelligent Document Intake
Click to enlarge
GATEWAYAI
Check-In Studio — Intelligent Document Intake
Document Gateway · The AI Intake Engine
Every file dropped here enters a multi-stage AI pipeline that runs entirely without human instruction. AbstractIQ classifies the document by type, extracts key fields, scores confidence, checks for duplicates, detects anomalies, and routes to the correct steward queue — all before a human sees it. Required documents are surfaced as named cards organized by packet template, so a steward's view is not a list of files but a structured set of obligations: what's needed, what's fulfilled, what's outstanding, and what was rejected with AI identified reasons. The HITL Reduction AI continuously monitors which document types consistently reach auto certify confidence and promotes them to bypass human review entirely. As your document corpus grows, the percentage of documents requiring human attention trends toward zero for standard types. This is not document management — it is an autonomous compliance engine that happens to accept file uploads.
Check-In Engine — AI Thresholds & Real Time Performance
documentgateway.ai
Check-In Engine Settings — Configurable AI Per Tenant
Click to enlarge
GATEWAYAI
Check-In Engine Settings — Configurable AI Per Tenant
Document Gateway · Per-Tenant ML Configuration
The thresholds in this panel determine exactly where human judgment enters the pipeline — and where the platform operates without it. The auto certify threshold (85%) means the majority of standard documents never reach a steward queue: they arrive, get classified, get extracted, get certified, and land in the vault without human contact. Documents falling between 65% and 85% enter the steward review queue with uncertain fields flagged and source passages highlighted — a steward corrects one field, not the whole document. Below 65% triggers automatic rejection with a machine-generated explanation of which extraction criteria fell short and why. The OCR engine selector switches between Abstract.DI v3, v2, and open source options without any pipeline changes downstream. The AI Model Performance panel shows live metrics: 78% auto classify rate, 91% average confidence, 47 steward corrections in 30 days — each correction is permanently written as labeled training data against a 6,421-document corpus. No ML engineer is involved. The model improves from its own production use, continuously, without a retraining project.
Check-In API & Webhook Integration
documentgateway.ai
Check-In API — Full Programmatic Document Ingest
Click to enlarge
GATEWAYDEVELOPERS
Check-In API — Full Programmatic Document Ingest
Document Gateway · Developer Interface
The same AI pipeline that powers the visual Check-In Studio is fully accessible through a REST API — meaning any internal system, any existing workflow, any document management tool can push files directly into the AI.DI pipeline without a user interface. POST to /v1/checkin/ingest for a single file; POST to /v1/checkin/bulk for ZIP bundles of up to 10,000 documents in one call. The response returns a job ID for status polling — the full AbstractIQ pipeline runs asynchronously and fires webhook events at every stage: classified, extracted, named, certified, review_required, rejected. Each event carries the full payload — document type, confidence score, extracted fields, routing decision, DG name, anomaly flags. This means your existing systems get real time notification the moment a document reaches any status, enabling downstream automation without polling. The platform is not just a UI — it is a document intelligence API that happens to have an excellent UI.
Distribution Studio — Studio View
documentgateway.ai
Distribution Studio — The Unified Distribution Hub
Click to enlarge
GATEWAY
Distribution Studio — The Unified Distribution Hub
Document Gateway · Transaction Rooms · Packages · Share Links
Distributed documents are the highest-risk surface in any organization — they leave your control the moment they are sent, and most platforms give you no visibility after that. Distribution Studio makes that surface observable, auditable, and permanently traceable. Every active Transaction Room shows CTR Score progress against the required document set, expiry countdown on time-sensitive items, counterparty engagement data by document, and phase completion in a single view. Document Packages show who received which version, when they opened it, and which sections they accessed. Standing Distributions show which recipients are on automatic schedules and what they last received. Share Links show whether the recipient clicked, when, and from which device. Every distribution event is timestamped, version-locked, recipient-specific, and logged permanently — producing, and per-recipient download permissions. Every distribution event is immutable — timestamped, version-locked, recipient-specific, and fully auditable. The engagement data flowing from these rooms tells you more about your counterparty's interest level than any conversation: which documents they spent the most time on, which sections they returned to repeatedly, and which they never opened.
Distribution Studio — Builder & Templates
documentgateway.ai
Distribution Builder — Three Wizard Modes
Click to enlarge
GATEWAY
Distribution Builder — Three Wizard Modes
Document Gateway · Distribution Wizard
Configuring a document distribution incorrectly — wrong NDA gate, wrong counterparty visibility, wrong expiry date, wrong access scope — is a compliance event, not an inconvenience. Distribution Builder eliminates misconfiguration risk by making the setup structural rather than manual. Selecting Transaction Room launches a 7-step guided workflow that auto-configures deal type, counterparty hierarchy, phase-based document structure, section level access matrix, NDA gate behavior, QA threading, and CTR Score gap alerts based on a single selection. Selecting Document Package configures bundling, per-recipient watermarking, and custom cover letter generation in 3 steps. Selecting Share Documents produces a tracked, expiring link in 2 steps. The platform applies the correct configuration for each distribution type — you choose the context, it builds the controls. The output is not just a sent document. It is a governed, auditable distribution event with full recipient behavioral tracking from the moment it opens. No configuration guesswork. No asking what settings to use. The platform knows.
documentgateway.ai
Distribution Analytics — Counterparty Intelligence
Click to enlarge
GATEWAYVALUE
Distribution Analytics — Counterparty Intelligence
Document Gateway · Deal Analytics
Counterparty intent has always been invisible — you send documents and wait for a phone call. Distribution Analytics ends that. Room engagement shows exactly how long each recipient spent on each document, which sections they returned to, and which they skipped entirely. A counterparty who spends 47 minutes on the indemnification schedule and ignores the financial statements is communicating something specific before any conversation happens. Phase completion rates surface where transactions stall across all active rooms simultaneously — giving teams an objective signal on process friction that no CRM captures. The document access heatmap shows which document types generate the most engagement per deal type, informing which materials to lead with in future transactions. When interest is concentrated in a document you expected to be routine, you know before you get on the call. This is behavioral intelligence over your entire distribution history — continuously updated, never requiring manual compilation.
Process Library — Prebuilt Transaction Workflows
documentgateway.ai
Process Library — 11 Prebuilt Transaction Workflows
Click to enlarge
GATEWAYADMIN
Process Library — 11 Prebuilt Transaction Workflows
Document Gateway · Workflow Automation
Every complex document transaction follows a predictable phase structure — the documents required for a financing differ from those for an acquisition, a regulatory audit, or a counterparty onboarding, but each has a known sequence that most organizations rediscover from scratch every time. The Process Library ends that rediscovery cycle. A template defines the phases, the required documents per phase, the responsible roles, the estimated duration, and the distribution-ready package that assembles when all phases are complete. Launching a process creates a tracked instance with live phase completion, automatic CTR Score updates as each document is fulfilled, and stakeholder visibility throughout. Templates are versioned — when a better phase structure is identified, it becomes the new standard for all future instances immediately. A 7-phase template with 22 required documents and a 45-day estimated duration is not a checklist. It is institutional knowledge made repeatable, measurable, and improvable at organizational scale. Ad hoc document scramble becomes a managed, auditable workflow that gets faster every time the organization runs it.
Document Type Studio & Hierarchy Studio
documentgateway.ai
Document Type Studio — Complete Document Vocabulary
Click to enlarge
GATEWAYADMIN
Document Type Studio — Complete Document Vocabulary
Document Gateway · Document Taxonomy
Document type is not metadata — it is the instruction set that governs everything else in the pipeline. The classification label on a document determines which extraction schema applies, which fields are required, which routing rules trigger, which compliance obligations are checked, and how the CTR Score is affected. The Document Type Studio manages this vocabulary for the entire organization. Essential types drive CTR Score calculations directly — a missing Essential document drops the score and surfaces the gap in the Command Center immediately. Elective types are extracted and tracked but do not penalize readiness scores. The AI Generate function analyzes your existing document corpus and suggests new types based on structural patterns it detects — the taxonomy grows with your organization without manual taxonomy work. Each type carries a predefined extraction schema: the exact fields Abstract.DI will look for, the confidence thresholds required per field, and the anomaly detection rules that flag outliers against your established corpus patterns. The Diligence library alone contains 62 document types across Essential and Elective categories — prebuilt from years of real-world document intelligence deployments.rpus to suggest new types your organization actually uses that aren't in the default catalog. The platform ships prebuilt taxonomies for every major industry — configurable, extensible, and learnable from your own document patterns.
documentgateway.ai
Hierarchy Studio — Any Org Structure, Any Depth
Click to enlarge
GATEWAYADMIN
Hierarchy Studio — Any Org Structure, Any Depth
Document Gateway · Organization Architecture
Every organization has a structure — and every node in that structure carries a different document obligation, a different set of authorized users, a different CTR Score calculation, and a different AI extraction schema. Hierarchy Studio maps that structure precisely, without code, without professional services, without architectural constraints. Each node type carries its own configuration: required document libraries, role assignments, process templates, extraction schemas, and compliance obligations all attach at the node level. A user provisioned at a division node cannot see assets outside their scope — enforced at the database layer, not the application layer, through PostgreSQL row-level security. Hierarchy nodes are first-class citizens in the Document Warehouse: every query, every CTR Score, every AI agent call resolves to the node hierarchy the authenticated user belongs to. Corporate entity trees, regulatory division structures, branch networks, fund hierarchies, agency organizations — any org structure configures without changing the platform architecture.ions, corporate legal departments, financial institutions, and government agencies all configure different hierarchies from the same studio. Every node created here becomes a first-class citizen in the Document Warehouse — queryable, scoreable, and connectable to any AI agent via the MCP server.
Platform Configuration
documentgateway.ai
Platform Masters — Document Status Workflow Engine
Click to enlarge
GATEWAYADMIN
Platform Masters — Document Status Workflow Engine
Document Gateway · Status Configuration
Document statuses are not labels — they are workflow triggers. Each status in this table drives a specific system behavior: Submitted auto routes to review queue, Approved fires the Approval Engine, Expired triggers the Violation Engine, Sentry Certified records an immutable fingerprint in vault_records. The drag-to-reorder interface sets the logical default sequence, but the real power is in the terminal and certified flags — terminal statuses cannot be manually overridden, and certified statuses can only be assigned by the Sentry fingerprinting pipeline, never by a user. The AI Suggestions panel on the right uses your industry and document patterns to propose status additions — "Add AI Flagged for low-confidence classifications" appears because the system detected classifications below the review threshold that currently fall through to Needs Revision without a distinct routing path. This is the platform configuring itself.
documentgateway.ai
Roles & Permissions — 9 Roles × 138 Features
Click to enlarge
GATEWAYADMINDEVELOPERS
Roles & Permissions — 9 Roles × 138 Features
Document Gateway · Identity & Access
138 features. 9 roles. 4 tiers. This is enterprise access control with the granularity that regulated industries require. The Role Matrix shows exactly which features each role can access — filtered by Actions, Data, or Pages — with color coded permission states (full access, limited, read only, none). The 4-tier structure (System, Tenant, Hierarchy, Node) means a Steward at a specific hierarchy node can only see documents and actions relevant to their assigned assets — not portfolio wide. Row Level Security enforcement happens at the database layer via Supabase RLS, not at the application layer — which means even direct API access or MCP agent connections respect the same access boundaries. No orphaned permissions. No over-provisioned service accounts. Security is structural, not configured.
Storage Management — Intelligent Lifecycle Automation
documentgateway.ai
Storage Manager — Automated Document Lifecycle Policies
Click to enlarge
GATEWAYADMIN
Storage Manager — Automated Document Lifecycle Policies
Document Gateway · Storage Intelligence
Documents cost money to store, process, and query — and most organizations keep everything in hot storage indefinitely because moving things manually never happens. Storage Manager automates the entire lifecycle through policy rules that run on configurable schedules. Auto-Warm After Inactivity moves documents not accessed in 30 days from Hot to Warm storage automatically. Archive Certified Docs moves Sentry certified documents to Archive on an hourly schedule — certified documents are immutable by definition, so hot storage is wasteful. Hot Retention for Active keeps any document accessed in the last 7 days in Hot tier regardless of other rules. Each tier has a defined retention schedule: Hot is indefinite for active docs, Warm moves certified docs to Archive after 180 days, Archive retains compliance documents for 7 years minimum. The platform manages storage cost at scale without operational overhead.
White Label Branding
documentgateway.ai
White Label Branding — Full Enterprise Identity Control
Click to enlarge
GATEWAYADMIN
White Label Branding — Full Enterprise Identity Control
Document Gateway · Enterprise Branding
Every customer facing surface of the platform — Document Banner, Login Page, Email Templates, Certificates of Authenticity, Shared Viewer links — carries your organization's identity, not imkore's. Upload Master Logos once (light mode and dark mode variants) and they propagate automatically to all surfaces. Each surface can also be individually overridden with a custom logo if different contexts require different branding. The Logo Across Surfaces panel on the right shows a real time preview of exactly how your logo appears on each surface before you publish. For institutional clients sharing documents with investors, lenders, or regulatory bodies, the platform presents entirely as their own product. This is the infrastructure that allows any organization to present a Transaction Room to a counterparty with full institutional branding — no "Powered by imkore" anywhere in the counterparty experience.
Core Engines and Studios
Engine 01
Check-In Studio — AI Document Intake

Every file enters a multi-stage AI pipeline before a human sees it. Abstract.DI classifies the document type, extracts key fields, scores confidence at the field level, checks for duplicates, detects anomalies, and routes to the correct steward queue automatically.

  • Drag and drop, bulk upload, email ingestion, and API submission
  • ZIP auto-extract with recursive file processing
  • Required document templates show what is needed, fulfilled, outstanding, and rejected
  • HITL Reduction AI promotes document types to bypass human review once confidence thresholds are consistently met
  • Batch Template Manager for bulk ingestion workflows across multiple use cases
  • Rapid Review Mode for high-volume steward queues
Engine 02
Distribution Studio V5 — Governed Document Exchange

Three distribution modes with full audit trails and recipient access controls. Every document leaves the platform certified and tracked.

  • Shared Documents: individual files distributed to named recipients with expiry, watermarking, and view-only enforcement
  • Document Packages: curated sets of related documents delivered as a governed bundle with version locking
  • Transaction Rooms: fully white labeled deal rooms with custom branding, NDA gates, engagement analytics, and counterparty-facing views
  • Resend integration for transactional delivery receipts
  • Recipient access log with timestamps, IP, and engagement depth
Engine 03
Submitter Gateway — External Document Collection

A purpose built external submission portal that presents to counterparties as your own branded platform. No account creation required for submitters.

  • Invitation-only access via tokenized secure links
  • Required document checklists with real time status
  • Automatic routing into Check-In pipeline upon submission
  • Submission Packets define exactly what documents are expected per counterparty type
  • Notification system with automated reminders for outstanding items
Engine 04
Command Center — Portfolio Operations Dashboard

Real time operational view across the entire document corpus — by entity, by division, by document type, or by compliance obligation.

  • CTR Score dashboard with per-entity readiness scores
  • Expiry tracking across all documents with escalation alerts
  • Outstanding obligation views by steward or entity owner
  • Anomaly feed showing AI-flagged discrepancies across the corpus
  • Executive reporting views with configurable KPIs
Engine 05
Document Vault — Governed Entity Repository

Five-tier organizational hierarchy providing structured, queryable document storage with role-enforced access at every level.

  • Configurable folder taxonomies per entity type and industry
  • Version control with full history on every document
  • Role-based access: Admin, Steward, Analyst, User, Viewer
  • Document Navigator for cross-entity search and bulk operations
  • Smart Folders with dynamic rule-based population
Engine 06
Approval Workflows — Governed Review Chains

Configurable multi-step approval chains for any document type or business process. Every workflow is auditable end to end.

  • Sequential and parallel approval routing with escalation paths
  • OnlyOffice JWT-enforced document review in-browser (DOCX, XLSX)
  • Annotation and comment threading per document version
  • Automated notifications at each workflow stage
  • Full audit trail on every approval, rejection, and comment action
29 Live Edge Functions — The Serverless Backbone
Every Document Gateway operation is powered by a dedicated Supabase Deno edge function — deployed independently, versioned separately, and executable on demand. Zero shared infrastructure between functions. Each function enforces its own auth, rate limits, and error handling.
Document Processing
Ingestion and Extraction Functions
  • ingest-document — upload validation, storage routing, and pipeline trigger
  • abstract-document — Abstract.DI extraction orchestrator for any document type
  • ai-classify — standalone classification endpoint for document type inference
  • checkin-pipeline — full OCR → classify → extract → score → route pipeline
  • process-upload-link — handles tokenized external upload URLs for Submitter Gateway
  • quick-verify — fast Sentry fingerprint verification for incoming documents
  • parse-credentials — secure credential extraction from document headers and metadata
Intelligence and Query
AI and Warehouse Functions
  • document-qa — natural language question-answering against any document or corpus
  • warehouse-query — SQL and natural language query execution against extraction fields
  • warehouse-connector — sync manager for Snowflake, Databricks, BigQuery, and webhook targets
  • agent-gateway — AI agent request router with tool dispatch and row-level security
  • mcp-server — dual-protocol MCP (Claude) and REST/OpenAPI (ChatGPT) gateway with 17 tools
  • smart-folders — rule engine that dynamically populates folder views from extraction data
Operations and Delivery
Workflow, Notification, and Integration Functions
  • send-notification — transactional email via Resend for workflow steps and alerts
  • send-submitter-invitation — tokenized invitation emails for Submitter Gateway counterparties
  • create-invite-user — new user provisioning with role assignment and welcome email
  • erp-webhook — inbound event handler for ERPs, CRM, and enterprise platforms
  • submit-anchor — Submission Packet anchoring and counterparty session management
  • schedule-jobs — cron-triggered orchestration for batch pipeline runs
  • run-scheduled-reports — automated report generation and distribution
  • filestar-proxy — FileStar API bridge for existing Millennia Group clients
  • oo-jwt — OnlyOffice JWT token generation for in-browser document editing
  • deployment-health — infrastructure health monitoring and status reporting
  • migrate-infra — schema migration runner for incremental database updates
  • seed-demo-data / seed-pe-samples — demo corpus seeding for private equity verticals
  • generate-demo-blueprint — AI-generated Blueprint diagnostic reports for prospective clients
  • update-whitepapers — automated whitepaper content refresh pipeline
Role Architecture and Access Control
Five-Tier Role System

Every user action in Document Gateway is governed by a five-role permission model enforced at both the application and database layers via Supabase row-level security policies.

  • Admin — full platform configuration, user management, integration setup, and AI engine settings. Advanced Check-In mode.
  • Steward — document review, certification, approval chain management, and AI override authority. Advanced Check-In mode.
  • Analyst — read access to all extraction data, Warehouse Studio, and reporting. Advanced Check-In mode.
  • User — document submission, basic search, and personal workflow tasks. Basic Check-In mode.
  • Viewer — read only access to shared documents and approved views. Basic Check-In mode.
Tech Stack and Infrastructure

Zero legacy code. Entirely 2024 to 2026 stack designed for sub-30-day enterprise deployment.

  • Frontend: React 18 / TypeScript / Vite — 200+ components, dark and light theme, DM Sans / DM Mono typography
  • Backend: Supabase PostgreSQL with PostgREST, 29 Deno edge functions, row-level security on every table
  • Storage: Cloudflare R2 for all document binary storage — zero egress fees at scale
  • Document Editing: OnlyOffice v7.5 via Railway container — JWT-enforced DOCX and XLSX in-browser rendering
  • Email: Resend for all transactional delivery with signed receipts
  • Deployment: Vercel auto-deploy — documentgateway.ai, trusteddocs.ai, imkore.ai
Deployment Model

Single-tenant, multitenant, Azure Cloud, AWS, on-premise, and hybrid deployments are all supported. Any file type. Any industry. Any org size. Average enterprise deployment: 30 days from contract to go-live. No professional services required for standard configurations.

Products · Tab 06
Abstract.DI — The Engine That Reads Your Documents
Abstract.DI does not summarize documents. It comprehends them — classifying every document type, extracting every meaningful field, scoring confidence at the field level, detecting anomalies against corpus patterns, and writing all of it as structured, queryable data into the Document Warehouse. Any document. Any industry. 94%+ confidence out of the box. No training required.
ANY
Document Type
94%+
Day One Confidence
100K
Batch Chunk Size
10 to 50x
GPU OCR Speedup
Day 1
Accuracy (not Month 6)
Abstract.DI in Action — AI Powered Document Comprehension
documentgateway.ai
Abstract.DI — Structured Field Extraction from Any Document
Click to enlarge
ABSTRACTAI
Abstract.DI — Structured Field Extraction from Any Document
Abstract.DI · AI Extraction Engine · Any Document Type
What you are seeing here is a document — any document — being converted into structured, queryable intelligence in seconds. The left panel shows the original document exactly as it arrived. The right panel shows every field Abstract.DI extracted: parties, dates, financial terms, obligations, conditions, signatures, execution status — organized into typed field groups with individual confidence scores. Every highlighted passage in the document is a live link: click any extracted field and the document scrolls to the exact source text it was derived from. This is not an AI summary — it is a structured database record created from an unstructured document, with full provenance tracing from field value back to source text. The 94%+ confidence score is field level, not document level — you know exactly which fields the AI is certain about and which need review. All extracted data is immediately written to the Document Warehouse as queryable PostgreSQL rows, available to any BI tool, API consumer, or AI agent the moment extraction completes. This is the engine that turns a folder of PDFs into a structured database.
Multi-Pass Extraction Pipeline
Abstract.DI — Document to Intelligence Pipeline
Step 1
Document Ingestion
PDF, DOCX, XLSX, PPTX, MSG/EML, CSV, ZIP, JPEG/PNG/TIFF, DB records
Step 2
Selective OCR
DocTR engine · GPU 10 to 50x speedup · Multilingual · 8s timeout with fallback
Step 3
Classification
Any doc type · Claude Haiku inference · "AI.DI Named" badge · Confidence scoring
Step 4
Field Extraction
Type specific schemas · Dates · Parties · Amounts · Obligations · Conditions
Step 5
Anomaly Detection
Cross-document consistency · Version comparison · Portfolio baseline deviations
AI.DI Document Warehouse — Structured PostgreSQL Rows
All extracted fields stored as structured, queryable data. Available to BI tools, APIs, and AI agents instantly.
"Box shows you a file. We show you what the file says. Run both side by side. The demo closes itself."
— Abstract.DI positioning principle
The Extraction Pipeline — How It Works
Step 01
Ingest
File received via upload, email, API, or Submitter Gateway. ZIP files auto-extracted recursively. File validated and stored.
Step 02
OCR
Selective OCR applied only when it improves text completeness. GPU-accelerated at 10x to 50x CPU speed. Duplicate copies skipped.
Step 03
Classify
Document type identified from 5,700+ taxonomy entries. 78% auto-classify rate on day one without custom training.
Step 04
Extract
All meaningful fields extracted with individual confidence scores. Parties, dates, financial terms, obligations, signatures, clauses.
Step 05
Score and Route
Confidence checked against thresholds. Auto-certify, steward review, or flag. Duplicate and anomaly detection run in parallel.
Step 06
Warehouse
All extracted fields written as structured rows to PostgreSQL. Instantly queryable by SQL, natural language, BI tools, and AI agents.
AI Confidence Architecture — Three Zones
Zone 01 — Auto Certify
Confidence at or above the auto-certify threshold
Document passes through the pipeline without steward involvement. Default threshold: 85%. Configurable per document type or tenant. As the ML feedback loop accumulates corrections, more document types graduate to this zone. The HITL Reduction AI tracks which types are consistently above threshold and promotes them automatically — the percentage of documents requiring human review trends toward zero for standard document types over time.
Zone 02 — Steward Review
Confidence between review and auto-certify thresholds
Document routed to steward queue for field-level review. Stewards see exactly which fields are uncertain, with the source text highlighted in the original document. A single correction — changing a wrong date, confirming a party name — is fed back into the ML model as a labeled training example. Default review band: 60% to 85%. Every correction makes the next extraction more accurate.
Zone 03 — Flagged
Confidence below the review threshold
Document flagged for full manual review and possible reingestion. Default flag threshold: below 60%. Typically scanned documents with poor image quality, unusual layouts, or document types not yet in the training corpus. All flag events are tracked and used to prioritize which document types need additional training data. Flag rate declines as corpus grows.
OCR Engine Options
Primary Engine
Abstract.DI OCR v3 — Default
  • DocTR-based architecture with multilanguage support
  • GPU-accelerated processing — 10x to 50x faster than CPU-bound alternatives
  • Optimized for complex layouts: multi-column, rotated text, tables, and handwritten annotations
  • Selective execution — OCR applied only when it materially improves text completeness
  • Deduplication-aware: when one file in a duplicate group is processed, all copies are skipped
Legacy Engine
Abstract.DI OCR v2 — Stable Fallback
  • Proven accuracy on standard document formats
  • Available as fallback for tenants with specific compatibility requirements
  • CPU-bound processing — suitable for lower-volume deployments
  • Identical extraction pipeline output format — no downstream changes required when switching
Open Source Option
Tesseract — Client-Preferred Integration
  • Available when client contracts or compliance requirements specify open source OCR
  • Modular engine-agnostic architecture means any OCR engine can be substituted without pipeline changes
  • Tesseract output normalized to the same field extraction format as Abstract.DI native engines
What Gets Extracted — Field Schema by Category
Core Document Fields

Extracted from every document type regardless of content: node_path, hierarchy_path, doc_type, workflow_status, added_at, original_name, storage_path, period. These fields form the backbone of the Document Warehouse schema and enable cross-entity search across the entire corpus.

AI Extracted Fields

Present for documents where Abstract.DI has completed extraction: ai_fields (JSONB), extraction_confidence (numeric 0 to 100), entity_party (primary counterparty), primary_value (lead financial figure), start_date, end_date. These fields are present across a standard deployment corpus of thousands of documents.

Financial Fields

Extracted from financial statements, loan documents, and operating reports: coverage ratios, loan-to-value metrics, net operating income, revenue, net income, return metrics, and performance multiples. All numeric fields stored with full precision for direct BI tool consumption without transformation.

Operational Fields

Extracted from contracts, utilization reports, and entity records: utilization_rate, total_units, primary_counterparty, anomaly_flag (boolean — AI detected discrepancy). The anomaly flag is computed by comparing extracted values against corpus patterns. A coverage ratio of 0.4 in a corpus where the median is 1.8 raises the flag automatically.

AI Model Performance and Learning
Per-Tenant Model Performance

Abstract.DI maintains per-tenant model statistics that improve continuously as stewards interact with the platform:

  • Auto-classify rate: 78% of documents classified without steward input on a standard tenant
  • Average confidence: 91% across all extracted fields on certified documents
  • Training corpus: 6,421 steward-reviewed documents feeding the active learning loop
  • Steward corrections (30 days): 47 field-level corrections generating new labeled training examples
  • Model retraining: triggered automatically when correction volume exceeds threshold
ML Feedback Loop — How the Model Improves

Every steward action is a labeled training example. The model does not require separate annotation workflows or data science involvement.

  • Steward accepts a field → positive signal for that extraction pattern on that document type
  • Steward corrects a field → negative signal plus the corrected value as ground truth
  • Steward rejects a document → classification correction that updates the type inference model
  • ML Learning Dashboard shows training corpus growth, accuracy trends, and retraining schedule
  • Pipeline settings control: minimum documents before auto-certification, review window days, rapid review mode
Custom Schema Builder

Abstract.DI ships with prebuilt schemas for 5,700+ document types across industries. For document types outside the standard taxonomy, the Custom Schema Builder allows admins to define extraction targets — specify the fields you need, provide 3 to 5 example documents, and the model learns the pattern. No code. No data science team. New document type schemas are typically operational within one business day.

Products · Tab 07
Sentry Document Assurance — "Shazam for Documents"
Deterministic mathematical fingerprinting — patent pending. Zero document storage. Zero PII exposure. GDPR, HIPAA, SEC, and APA compliant by architecture, not configuration. Approximately 10,000 prebuilt fingerprints. Three unique fingerprint types including Trusted Data Fingerprints, which are unique in the market. Certify once. Comply everywhere.
~10K
Fingerprint Catalog
0
Documents Stored
10 to 100x
Search Speed v2
30 to 50%
LLM Cost Reduction
Patent
Pending Architecture
~13M
PubMed Records Indexed
Three Fingerprint Types
Type 01
Document Content Fingerprints
Full textual and structural content of any document. Two identical documents always produce identical fingerprints. Any change — a single word, date, number, or comma — produces a measurably different fingerprint. Deterministic mathematics. Zero false positives. Used for certification, version tracking, and fraud detection.
Type 02
Document Data Fingerprints
Structured data extracted from document fields and fingerprinted independently of full document content. Field level matching finds every document containing the same obligation term, coverage limit, or financial figure — without requiring full text identity. Supports cross document data validation and portfolio reconciliation.
Type 03
Trusted Data Fingerprints
Unique in the market. No equivalent exists anywhere. Fingerprint individual rows from Excel files and database tables — a vendor record, counterparty entry, or entity row — then find every document in the enterprise that references that specific record. Given any entity: instantly surface every connected document across the entire corpus.
30 to 50% AI Cost Reduction — Immediate

Industry average: 40% or more of files are duplicates. A corpus with 40% duplicates means 40% of every LLM bill computes the same content twice. Sentry identifies all duplicates, consolidates to canonical records, preserves all metadata from every duplicate instance, then suppresses duplicates from AI queues. LLM compute costs drop 30 to 50% immediately — without changing a single prompt or model.

2026 Dashboard and Analytics Enhancements
New in 2026
Operational Analytics and Decision Support

Advanced analytics on document distribution by source, format, status, and metadata profile. Enables evidence based prioritization of remediation, archival, and modernization initiatives.

Provides measurable KPIs for digital transformation monitoring, establishes compliance posture, and ensures data quality maturity. Decision makers can now act on document intelligence rather than document volume.

New in 2026
Enterprise Duplicate Intelligence

Enterprise wide visibility into unique and duplicate documents across all repositories. Quantifies storage, compliance, and operational risk exposure from redundant content.

Enables measurable cost reduction through defensible deduplication and lifecycle optimization — removing redundancy without sacrificing auditability or chain of custody.

New in 2026
Cross Repository Document Transparency

Unified discovery of unique documents across siloed systems and business applications. Identifies misplaced sensitive content and policy exceptions across environments.

Supports strategic document migration planning, normalization, and information governance initiatives across the full enterprise stack.

Next Generation OCR Engine
Sentry's OCR engine is fundamental to building digital document fingerprints, transforming visual content into structured, queryable text. Pretrained ML models outperform traditional rule based methods across accuracy, flexibility, and consistency on complex document structures.
New Engine — Active
Modular, Engine Agnostic Architecture
  • Supports multiple OCR technologies simultaneously within a single processing pipeline
  • DocTR prioritized for high accuracy recognition across complex layouts and multilingual content
  • GPU enabled processing delivers 10x to 50x performance acceleration
  • Selective OCR execution reduces unnecessary processing and improves throughput
  • Supports integration of any client preferred OCR engines where contractually required
Optimal Strategy — Active
Intelligent Selective OCR Execution
  • OCR applied only when it materially improves extracted text completeness
  • When one file in a duplicate group is processed, all remaining copies are skipped — saving approximately 30% of total computation
  • Selective OCR applied to embedded images when native text already exists in the document layer
  • Full document OCR triggered automatically when documents are scan dominant or lack usable text
  • Optimized balance between accuracy, performance, and computational cost at scale
Legacy Engine — Superseded
Previous Single Engine Architecture
  • Single engine architecture with limited flexibility across document types
  • Higher infrastructure overhead from CPU bound processing model
  • Less optimized handling of complex layouts, rotated text blocks, and multilingual content
  • Required full document OCR more frequently, increasing processing cost and completion time
Intuitive Document Search
Search Capability
Unified Metadata and Content Search
  • Familiar search experience users already know how to use — no training required
  • Unified search across all connected systems and document repositories simultaneously
  • Instantly searches titles, metadata, tags, filenames, and document properties
  • Advanced database level queries deliver lightning fast results even at enterprise scale
  • Flexible filters and sorting allow rapid refinement across any returned metadata field
Search Capability
Statistical Similarity and Version Discovery
  • Documents grouped by statistical similarity score, not just keyword match
  • Identify matching versions, prior drafts, and comparable files in a single result set
  • All historical document versions always appear at the top of results as the most similar documents
  • Supports fraud detection by surfacing near duplicate documents that differ only in critical fields
Vertical Expansion
Life Sciences — PubMed Integration

Approximately 13 million PubMed abstracts and associated data have been imported, fingerprinted, and organized by publication year. Daily updates can be processed on an hourly basis.

PubMed contains more than 39 million biomedical citations maintained by the National Center for Biotechnology Information at the US National Library of Medicine.

Ready for first Sentry academic, bioscience, biotechnology, and pharmaceutical clients.

Sentry + Document Gateway — Continuous Document and Data Readiness

Sentry registers, processes, and fingerprints every document flowing through Document Gateway — in both directions. Documents distributed externally are certified before leaving. Documents received are verified on arrival. The entire document corpus becomes a trusted, queryable intelligence layer with measurable readiness scores updated in real time. This integration is the foundation of the Continuous Transaction Readiness score.

"The fingerprint never lies. The document might. Sentry tells you the difference."
— Sentry Document Assurance design principle
Products · Tab 08
AI.DI Document Warehouse — The Data Moat That Doesn't Exist Anywhere Else
Every document Abstract.DI processes becomes structured rows in PostgreSQL. Every extracted field is queryable data. Every AI signal is persisted as a structured record. No competitor has built this. No competitor can build it without starting over.
Document Warehouse — Live Query Interface
documentgateway.ai
AI.DI Document Warehouse — Your Entire Document Corpus as Structured Data
Click to enlarge
WAREHOUSE
AI.DI Document Warehouse — Your Entire Document Corpus as Structured Data
Warehouse · 9,857 Documents · Live Query Interface
What looks like a document list is actually a live query interface into a structured database. Every row here is not a file reference — it is a record in PostgreSQL with typed columns for every field Abstract.DI extracted: classification type, confidence score, execution status, expiry date, party names, financial terms, compliance flags, and dozens more depending on document type. The Abstract.DI query bar at the top accepts natural language — "show all agreements expiring this quarter where extraction confidence is above 90%" returns structured results because the underlying data is structured, not a keyword search across unstructured text. The six view modes (List, Gallery, Library, Cube, Time Series, Schema, Scientist) let you slice the same corpus as a compliance officer, a data analyst, an AI engineer, or a CFO — each seeing exactly the view that matches their workflow. This is the first document management system where the documents are a side effect of the real product: a continuously enriched, AI maintained structured database of everything your organization has ever received, produced, or executed.
Warehouse Studio — BI Connectors
documentgateway.ai
Warehouse Studio — Zero ETL Connections to Every Data Stack
Click to enlarge
WAREHOUSEDEVELOPERS
Warehouse Studio — Zero ETL Connections to Every Data Stack
Warehouse · Snowflake · Databricks · 9 Connectors
The Document Warehouse is not a destination — it is a source of truth that feeds every analytics system your organization already uses. Snowflake receives 891 rows on a 15-minute incremental sync via Data Share — zero copy, zero ETL, no pipeline to build or maintain. Databricks connects via Delta Lake for full and incremental refresh, enabling document intelligence to join with financial models, risk systems, and ML pipelines in the same compute environment. Webhooks fire on every document event — ingest, certify, extract, expire — enabling real time triggers to any downstream HTTP endpoint. BigQuery, Redshift, Tableau, Power BI, dbt Cloud, and a Python SDK are available with one-click configuration. The data model is fully documented — every extracted field, every confidence score, every audit event — so data engineers can join document intelligence with any other enterprise dataset without discovering the schema by trial and error. The Warehouse is not just queryable. It is the most current, most complete, most structured view of your document estate that has ever existed.
Data Intelligence — Full Data Lineage Observability
documentgateway.ai
Data Lineage Map — Complete Provenance from Source to Consumer
Click to enlarge
WAREHOUSEDEVELOPERS
Data Lineage Map — Complete Provenance from Source to Consumer
Warehouse · Data Intelligence · 15 Nodes · 17 Connections
Every piece of intelligence in the platform has a traceable origin. The Data Lineage Map visualizes the complete path a document takes from source system through ingestion, processing, warehouse storage, and consumer delivery — with live status on every node. PDF Documents (693 ingested) flow through the Ingest Pipeline (1,000 processed), Deduplication (exact + fuzzy matching), and Fingerprinting (3 fingerprint types) before Abstract.DI extracts 13 abstraction fields. Those fields become Extraction Fields (queryable warehouse layer), Document Metadata (cross asset search index), and Query Engine (PostgreSQL + custom SQL). Consumers — MCP Server, Snowflake, Webhooks — pull from the warehouse in real time. Stale nodes are visually flagged (Bridge Sync shows 0 fields synced, triggering immediate attention). This is not just observability — it is the audit trail that answers "where did this AI answer come from?" for every query, every extraction, and every alert the platform produces.
"The Finance director ran one query. Years of contract values from thousands of documents into Excel in 30 seconds. He looked up and said: 'We've been paying people to do this manually for twenty years.' That was the moment the platform sale closed itself."
— Warehouse Scientist Mode proof moment
Warehouse Studio — 7 Workspaces
Warehouse Studio is an IDE-like environment for document intelligence. A left dock switches between seven purpose-built workspaces — each designed for a different user persona and workflow. One platform serves the compliance officer, the SQL analyst, the data scientist, the AI engineer, and the executive simultaneously.
Workspace 01
Overview — Portfolio Intelligence Dashboard
Real time metrics across the document corpus: total documents, extraction coverage, anomaly count, compliance status by entity, and ingestion trends. Time period selector (7d, 30d, quarterly, annual, all-time). Coverage heatmap shows document type completeness across the hierarchy — instantly shows which entities are transaction ready and which have gaps.
Workspace 02
Query Studio — SQL and Natural Language
Full SQL editor with schema autocomplete against all extraction tables. Natural language mode converts plain English questions to SQL automatically. Saved queries, query history, and result export to CSV or JSON. PostgREST endpoint and custom SQL both available. Context panel shows live schema with field types and occurrence counts.
Workspace 03
Data Explorer — Structured Field Browser
Browse the extracted field dataset as a structured table with filtering, sorting, and column selection. Field profiler shows null percentage, distinct value count, AI confidence distribution, and mini-histogram for every field. The mv_document_universe materialized view provides a single queryable source across all document types and extraction fields simultaneously.
Workspace 04
AbstractIQ Lab — Model Configuration and Testing
Interactive extraction testing environment. Drop any document, run it through the Abstract.DI pipeline, and inspect every extracted field with its confidence score and source text reference. Adjust confidence thresholds, toggle OCR engines, and compare extraction results across model versions. Schema builder for custom document type definitions.
Workspace 05
Notebooks — Persistent Analysis and Reporting
Jupyter-style analysis notebooks with SQL and Python cells. Save analysis work as persistent notebooks shared across the organization. Scheduled execution for recurring reports. Pre-built notebook templates for common analytics: contract data analysis, financial ratio monitoring, obligation expiry ladders, and coverage gap reports.
Workspace 06
Pipelines — Sync and Automation Management
Visual pipeline builder for data sync workflows. Define extraction-to-warehouse sync schedules, connector push cadences, and conditional routing rules. Pipeline status dashboard shows last run time, row counts, error rates, and next scheduled execution. Connects the schedule-jobs and run-scheduled-reports edge functions to a visual management interface.
Workspace 07
Connectors — BI and Data Stack Integration
Manage all outbound data connections from a single workspace. Configure, test, and monitor connections to Snowflake, Databricks, BigQuery, Redshift, Tableau, Power BI, dbt Cloud, webhook endpoints, and the Python SDK. Test connection modal with live credential validation. Sync log shows row counts, timestamps, and error details per connector.
Workspace 08
AbstractIQ Chat — Conversational Document Intelligence
Natural language chat interface that queries the entire document corpus. Ask any question about your documents and receive a structured answer with source citations. Powered by the MCP server layer with full row-level security enforcement. Every answer includes the document, field, and confidence score it was derived from.
9 Outbound Connectors — Zero-ETL Data Stack Integration
Connected
Snowflake
Push extracted fields to any Snowflake schema. Upsert and append modes. 15-minute incremental sync cadence. Supports partitioned tables and schema-on-write patterns. Zero copy via Data Share — no pipeline to build or maintain.
Connected
Databricks
Delta Lake via Unity Catalog. Incremental and full refresh modes. Structured streaming support for real time pipeline integration. Used by data engineering teams embedding document intelligence into existing lakehouse workflows.
Connected
Webhooks
Push to any HTTPS endpoint on extraction events. Configurable event filters: document.ingested, document.extracted, anomaly.detected, compliance.updated, sync.completed. Retry logic with exponential backoff. Used for ERP integration and downstream automation.
Available
BigQuery
Google BigQuery streaming or batch load. Supports partitioned tables. Service account authentication. For organizations running GCP-native analytics stacks.
Available
Redshift
Amazon Redshift direct connector. COPY and INSERT modes. IAM-based authentication. Designed for AWS-native data warehouse environments.
Available
Tableau and Power BI
Tableau: live query or extract data source, connect via Tableau Server or Cloud. Power BI: DirectQuery or import dataset. Both provide immediate visualization access to the full extraction field schema without ETL.
Available
dbt Cloud
Source node for dbt models. Automatically generates YAML schema files for all extraction tables. Enables data engineering teams to build transformation models directly on top of AI-extracted document intelligence.
Available
Python SDK
pip install document-gateway. Pandas-native output — client.query("SELECT * FROM documents") returns a DataFrame directly. Async support. Used in notebooks, data science workflows, and custom analysis scripts.
Data Lineage — 5-Tier Pipeline Architecture
Every document that enters the system has a complete, traceable lineage from source through every processing stage to every downstream consumer. The interactive Data Lineage Map shows this as a graph — click any node to see its connections, status, row count, and detail. Color coding by tier. Bezier connections highlight the path from any selected node.
Tier 01
Sources
PDF, Word, Excel, Image, and other document types. Counted by MIME type from the upload table. Multiple source nodes shown dynamically based on actual corpus composition.
Tier 02
Ingestion
Ingest Pipeline (validation and routing), Deduplication (SHA-256 exact plus MinHash fuzzy), Fingerprinting (Simhash, pHash, and MinHash — 3 fingerprint types). All counts wired to live data.
Tier 03
AI Processing
Abstract.DI extraction orchestrator, Bridge Sync (JSONB to typed rows), Schema Engine (AI-learned plus user-defined field type inference). Processing count reflects completed abstractions.
Tier 04
Warehouse
Extraction Fields table (typed, indexed, queryable rows), Document Metadata materialized view (mv_document_universe), Query Engine (PostgREST plus custom SQL).
Tier 05
Consumers
MCP Server (17 tools for AI agents), Snowflake, Databricks, and Webhook connectors. Status indicators show active versus stale connections.
The Moat Is the Data Structure — Not the UI

Every competitor in the document management space stores files. AI.DI stores intelligence. The gap between those two statements is the entire moat. A corpus of 10,000 documents with 18 months of extraction history, anomaly signals, steward corrections, and financial field time-series data cannot be migrated to a competitor in any meaningful timeframe. The data structure is the lock-in — not the contract.

Products · Tab 09
Millennia FileStar — The Document Warehouse of Record
FileStar is a unified document warehouse of record that captures, verifies, and secures every critical document across all of your systems. Founded in 1996, imkore Millennia has delivered tailored document management solutions for complex enterprises across financial services, healthcare, pension administration, commercial real estate, government, and more — for nearly three decades. FileStar is the governance engine behind the AI.DI platform, providing the trusted document foundation that every other engine builds on.
The Command Center for All Your Essential Documents
A document warehouse is to documents what a data warehouse is to data — a governed, centralized environment where every critical document from across the enterprise is organized using a consistent structure, reliably searchable, and always ready for operational and analytical use. FileStar applies a deep, industry informed document taxonomy spanning more than 5,700 unique document types — covering HR, finance, legal, operations, compliance, and the full enterprise document lifecycle.
Component 01
FileStar ePort — Intelligent Document Capture

Effortlessly capture and ensure accurate classification from any application. FileStar centralizes all document types — paper or electronic — into a unified system with required fields and built in approval workflows that guarantee consistent, accurate archiving.

  • Scan and add documents using your own devices or multifunction peripherals
  • Upload single or multiple files in virtually any format with drag and drop simplicity
  • Email documents directly into the system for seamless capture from any source
  • Every PDF automatically converted to searchable format via OCR on upload
  • Required fields enforce completeness — nothing moves forward without the right metadata
Component 02
FileStar Workflow — Governed Approval Processes

FileStar enforces stringent controls and compliance with precision and accountability at every step. Complex workflows can be modeled to exact requirements — sequential and parallel routing, escalation paths, and automated notifications.

  • Seamless DocuSign integration with digital signatures built directly into workflows
  • Customizable automation rules streamline approvals across any business process
  • Mobile friendly access — approvals and routing from any device, anywhere
  • Comprehensive logging and reporting with full audit trails for compliance
  • Handles: Contract Administration, Wire Transfers, Accounts Payable, Journal Entries, Vendor Contracts, Benefit Requests, Budget Approvals, and more
Component 03
FileStar Archive — Secure Centralized Repository

Protect your critical documents in a centralized repository with security and compliance built in from the foundation. The Archive is the governed system of record — every version, every action, every access event logged and preserved.

  • Powerful flexible search — locate documents by type, asset, process, date, or any metadata field
  • Version control with full document history ensures accuracy across all revisions
  • Secure external sharing via trackable links with customizable expiration dates
  • Two factor authentication and Single Sign On for enhanced access control
  • Detailed system logs tracking all document access and actions for complete auditability
Open API Framework — Integration-Ready by Design
FileStar is built to live inside your existing technology stack, not beside it. The open API framework integrates directly with leading enterprise platforms so your documents flow automatically with the transactions and workflows that create them — captured, structured, and linked to source data in real time.
Integration
Enterprise ERP and Property Management
FileStar APIs connect directly with major ERP, accounting, and property management platforms. Documents are captured, structured, and linked to source data in real time — creating a continuously accurate, compliant, and retrievable document of record synchronized with the systems that generate the underlying transactions.
Integration
DocuSign — Digital Signature Workflows
Native DocuSign integration embeds digital signature workflows directly into FileStar processes. Executed agreements arrive pre-classified, pre-validated, and automatically archived into the correct location — no manual routing, no version confusion, no broken audit trail between signature and storage.
Integration
SharePoint and Enterprise Content Platforms
The SharePoint plugin allows workflows to be initiated and files to be added directly from SharePoint, making FileStar the governance and intelligence layer over existing content stores without requiring a migration. Documents in SharePoint become governed assets without leaving their existing location.
Document Warehouse — Key Concepts
Document Schema — The DNA of the Warehouse

A document schema is the structured framework that defines how documents are identified, categorized, and related to one another. Just as a data warehouse relies on a data schema to bring order to large volumes of information, a document warehouse uses a document schema to create clarity, consistency, and predictable organization across all documents.

FileStar's schema spans more than 5,700 unique document types — giving it deep understanding of documents that support acquisitions, operations, financings, compliance, and every stage of the enterprise lifecycle. The schema automatically knows what a document is, how it should be classified, where it belongs, and what a complete document chain should look like. Documents are no longer scattered or mislabeled — they are organized consistently across systems and ready for audit, operations, and enterprise-wide decision making.

Metadata Extraction — Documents Become Structured Intelligence

Metadata is to documents what structured fields are to data. FileStar identifies and extracts key attributes — document type, parties, dates, asset identifiers, and relationships — transforming unstructured files into structured intelligence. Without metadata, documents behave like raw data without schema. With metadata, they become organized, trustworthy knowledge assets that support search, governance, compliance, and AI.

Every document in FileStar is a governed asset aligned with a consistent taxonomy and storage structure — searchable through clear logical pathways by type, entity, process, source system, date, or business function. Dynamic views and dashboards give teams visibility into entire document collections, not just isolated files.

Auditability — Complete Chain of Custody

Auditability is a defining characteristic of a document warehouse. FileStar records every interaction with every document and preserves the full lineage of a record — from its originating system to every update or review. Auditors can see exactly where a document came from, how it has been handled, and whether it remains complete and accurate.

FileStar also captures the source system, timestamps, authorship, and movement of each document — creating a verified chain of custody. This transparency builds trust across the organization and satisfies regulatory requirements without additional documentation work.

Security and Compliance — Built In, Not Bolted On

FileStar operates within an SSAE 18 certified hosting facility with annual SOC II audits. Role-based access controls ensure only authorized users and groups can access specific documents. All protocols comply with HIPAA and SOX guidelines for PII and PHI.

Compliance becomes easier when documents follow a consistent structure and lifecycle. FileStar enforces rules for document retention, validation, storage, and access — providing real time visibility into document completeness, timeliness, and accuracy. This makes it simpler to prove adherence to regulations and internal policies, and reduces the risk associated with missing or misplaced documents.

Services — Document Optimization and System Transformation
Service
Document Optimization
Alignment and optimization of document workflows to ensure seamless integration, robust control, and improved productivity throughout the organization. We transform fragmented document ecosystems into a unified, cohesive framework — merging critical document silos into one streamlined system that aligns with your business objectives.
Service
Document Conversion — I:S3 Smart Scanning
From contracts to full-size drawings, whether 10,000 pages or 10 million — imkore Millennia is the trusted source for seamless document conversion. The I:S3 Smart Document Scanning Service captures the contents of boxes or file cabinets and helps organizations decide what to Shred, Store, or Scan — onsite or at the secure Chicago service bureau.
Service
System Transformation
Tailored strategies and expert guidance to unify disconnected systems into a streamlined, cohesive framework. FileStar is woven into existing ecosystems, enhancing efficiency, control, and integration across all workflows without disrupting current operations. From comprehensive assessments to implementing structured solutions — including data migrations, cleanup, and normalization.
The AI.DI Integration Pathway
FileStar as the AI.DI Governance Engine

FileStar governs documents. AI.DI makes them intelligent. FileStar managed documents automatically flow through Sentry certification and Abstract.DI extraction without any workflow change for existing users. All FileStar metadata syncs to the AI.DI Warehouse continuously.

Every FileStar client is one conversation away from the full AI.DI platform. No rip and replace. No migration project. No change management crisis. The upgrade path is a configuration change — the governance infrastructure is already in place.

Why imkore Millennia

imkore Millennia was founded in 1996 with a focus on tailored document solutions for complex requirements that standard document management software cannot easily meet. The combination of SaaS flexibility with customizable framework design means FileStar can be configured for specific industries, regulatory environments, and workflow structures without professional services for standard deployments.

  • SSAE 18 certified hosting facility with annual SOC II audit
  • HIPAA and SOX compliant protocols for PII and PHI handling
  • Pre-employment screening for all employees handling sensitive documents
  • Nearly three decades of enterprise document management expertise
  • Serving financial services, healthcare, pension administration, real estate, government, and more
"Our document processes were fragmented across multiple systems, making accessing information a constant challenge. With their unified framework, we now have one central platform — information is organized, accessible, and secure. Compliance has become much easier to manage, with everything traceable and stored in one place. imkore Millennia didn't just implement a solution — they transformed the way we work with our documents across the entire organization."
— Enterprise FileStar Client
Products · Tab 10
AI Orchestration & Agent Gateway — The Infrastructure That Makes LLMs Actually Work
AI.DI is not a competitor to LLMs. It is their prerequisite. Every enterprise deploying Copilot, GPT-4, Claude, or Gemini faces the same problem: the AI is only as good as the documents it reasons from. If documents are uncertified and unstructured — your AI hallucinates. AI.DI is the trusted document foundation that makes any LLM enterprise grade.
MCP Server — AI Agent Gateway
documentgateway.ai
Integration Studio — Live AI Agent Gateway
Click to enlarge
ORCHESTRATIONAI
Integration Studio — Live AI Agent Gateway
AI Orchestration · MCP Server + Connected AI Systems
This is the screen that enterprise AI teams have been waiting for. An MCP server exposes certified tools to any MCP compatible AI system — Claude, Cursor, LangChain, AutoGen, or any agent framework. The moment Claude.ai connects to this URL, it can search your certified document corpus, check compliance status on any asset, retrieve all obligations from any document set, run structured queries against the full Warehouse, navigate your org hierarchy, and retrieve signed access URLs for specific document versions. Every query enforces row level security at the database layer — the AI agent cannot access documents the connected user is not authorized to see. Keys are revocable instantly. Usage is logged. This is not middleware or a wrapper — it is a purpose built enterprise document intelligence API that treats your LLM as a trusted, auditable consumer of certified data rather than a summarizer of raw PDFs.
28 Connectors — Every System You Already Use
documentgateway.ai
Integration Studio — 28 Enterprise Connectors
Click to enlarge
ORCHESTRATIONAI
Integration Studio — 28 Enterprise Connectors
AI Orchestration · Full Connector Ecosystem
The platform connects to every system an enterprise already runs — which means "we already use X" has no purchase as an objection. Enterprise ERPs push operational documents, financial reports, and contract records directly into the AI.DI ingestion pipeline on a configured schedule or in response to events — contracts, invoices, compliance filings, and amendments arrive as first-class pipeline records rather than email attachments or manual uploads. Document management connectors (SharePoint, Google Drive, Box, OneDrive, Dropbox) make AI.DI additive: it reads, certifies, and extracts from existing storage without requiring a file migration. CRM platforms deliver agreements and correspondence as structured ingestion records. Observability and monitoring tools push operational documents as they are generated. Data warehouse connectors deliver extracted intelligence outbound to Snowflake, Databricks, BigQuery, and Redshift on configurable schedules. Every connection is configured through a guided wizard — no custom code, no IT project, no services engagement required.
Document IQ — AI Powered Portfolio Intelligence
documentgateway.ai
Document IQ — Conversational AI Over a Certified Corpus
Click to enlarge
ORCHESTRATIONAI
Document IQ — Conversational AI Over a Certified Corpus
AI Orchestration · Portfolio-Wide AI Query Interface
What changes when AI reasons from a certified, structured corpus instead of raw files is not incremental — it is categorical. "What is missing from the vault?" is not a keyword search across folder names. It is a completeness calculation running against required document schemas for every entity in scope simultaneously, returning a ranked gap list with entity, document type, and days since last receipt. "Show critical risk items" is not a tag filter — it is an aggregation of violation flags, expiry warnings, anomaly detections, and compliance alerts across the entire corpus, sorted by risk magnitude. Upload any document and Document IQ cross-references it against vault records: matching party names, flagging version discrepancies, identifying superseded agreements, and surfacing every related obligation that touches the same entities. The difference between this and asking a general-purpose AI to analyze your documents is the difference between querying a continuously maintained structured database and asking someone who once read that database to remember what was in it. Trusted structure underneath the model is what makes every answerrtfolio into a single prioritized view that would take a compliance analyst days to compile manually. The file upload capability takes any document — a counterparty critical data extract, a vendor certificate, a financial statement — and cross references it against vault records in real time, identifying discrepancies, missing correlations, and data conflicts without a human pulling comparison reports. This is the AI experience that becomes the reason nobody opens a legacy platform again.
Data Lineage — Full Provenance for Every AI Answer
documentgateway.ai
Data Intelligence — Data Lineage Map
Click to enlarge
ORCHESTRATIONWAREHOUSE
Data Intelligence — Data Lineage Map
AI Orchestration · End to End Data Provenance
When an AI agent answers a question using AI.DI data, every element of that answer has a traceable origin. The Data Lineage Map shows the complete pipeline from source document to consumer — enabling any data engineer, compliance officer, or auditor to trace exactly how a specific piece of intelligence was produced, what transformations it passed through, and which source document it ultimately came from. This is the infrastructure that eliminates LLM hallucination risk: every answer the AI returns is backed by a certified document, a specific extraction, a confidence score, and a provenance chain. The stale node indicators (Bridge Sync showing 0 fields synced) surface data freshness issues proactively — you know before an AI answer is delivered whether the underlying data is current. Provenance is not an afterthought in AI.DI. It is the foundation.
MCP Server — 17 Certified AI Agent Tools
The AI.DI MCP Server serves dual protocols simultaneously: the Model Context Protocol for Claude and Cursor, and REST/OpenAPI for ChatGPT GPT Actions and any HTTP-capable agent framework. A single endpoint. Two protocol surfaces. All 17 tools available on both. Row-level security enforced at the PostgreSQL layer — the AI agent cannot access documents the authenticated user is not authorized to see.
Tool Category 01
Document Search and Retrieval
  • search_documents — full-text and metadata search across the certified corpus with relevance scoring
  • get_document — retrieve a specific document record with all extraction fields and a signed storage URL
  • list_documents_by_type — return all documents of a given classification type, filtered by entity or date range
  • get_document_versions — retrieve complete version history for any document including diff metadata
  • find_similar_documents — Sentry similarity search returning documents ranked by fingerprint distance
Tool Category 02
Extraction and Compliance Queries
  • get_extracted_fields — return all AI-extracted fields for a document with confidence scores and source references
  • query_extraction_fields — structured query against the extraction fields table with filters on any typed column
  • get_compliance_status — return the compliance posture for any entity: required documents, present, missing, expired
  • get_anomaly_flags — list all AI-detected anomalies across a specified entity or corpus scope
  • check_document_expiry — return documents expiring within a specified time window across any scope
Tool Category 03
Portfolio and Warehouse Intelligence
  • get_asset_hierarchy — navigate the org hierarchy from enterprise to entity level
  • warehouse_query — execute arbitrary SQL against the document warehouse with result pagination
  • get_ctr_score — retrieve the Continuous Transaction Readiness score for any entity or portfolio
  • get_portfolio_summary — aggregate document intelligence across a division or portfolio scope
  • list_extraction_schema — return the full field schema with types and occurrence counts for any document type
  • get_data_lineage — return the processing history of any document from ingest through warehouse
  • get_warehouse_metrics — return ingestion counts, extraction rates, and anomaly statistics for any time period
Edge Functions Powering the Orchestration Layer
agent-gateway — Intelligent Request Router

The agent-gateway edge function receives all AI agent requests and dispatches them to the appropriate tool handlers. It enforces authentication, validates the requesting agent's access scope, applies row-level security policies, and logs every tool invocation for the audit trail.

Supports Bearer token authentication for API clients and session-based auth for browser-connected agents. Rate limiting per API key. Tool-level permission grants — a key can be scoped to read only document retrieval without access to warehouse queries or compliance data.

mcp-server — Dual Protocol Gateway

A single Supabase Deno edge function serving both the Model Context Protocol (SSE transport for Claude and Cursor) and a REST/OpenAPI interface (for ChatGPT GPT Actions, LangChain, AutoGen, and any HTTP agent).

The same tool definitions, the same security model, the same data — two protocol surfaces from one deployment. ChatGPT integration operational. Deployed with --no-verify-jwt to support custom Bearer token auth independent of Supabase session auth.

erp-webhook — Inbound ERP Event Handler

Receives inbound webhook events from enterprise ERPs, CRM platforms, and any connected system. Validates payload signatures, routes events to the appropriate pipeline stage, and triggers document processing or metadata updates without human involvement.

When a contract is executed in an ERP, the erp-webhook fires the checkin-pipeline automatically — the document enters the AI extraction queue without anyone touching Document Gateway directly.

schedule-jobs + run-scheduled-reports — Autonomous Operations

Cron-triggered orchestration functions that run batch operations on a configurable schedule. Batch pipeline runs process large document queues during off-peak hours. Scheduled reports generate and distribute compliance summaries, expiry alerts, and portfolio intelligence reports automatically.

No human trigger required for ongoing operations. The platform monitors itself, processes new documents, updates CTR scores, and delivers reports on schedule — continuously.

Webhook Event Architecture
Event Type
document.ingested
Fires when any document completes the ingest pipeline — validation passed, stored, and queued for extraction. Payload includes document ID, file type, entity node, and submitter identity. Triggers downstream ERP updates or data warehouse prestaging.
Event Type
document.extracted
Fires when Abstract.DI completes field extraction on a document. Payload includes all extracted fields, confidence scores, and the document's workflow routing decision. Primary trigger for downstream analytics pipelines.
Event Type
anomaly.detected
Fires when Abstract.DI flags an extracted value as anomalous relative to corpus patterns. Payload includes the anomaly type, affected fields, expected range, and actual value. Used for real time alerting to portfolio managers or risk systems.
Event Type
compliance.updated
Fires when a document's compliance status changes — new document received completing a required set, document expiry approaching threshold, or outstanding obligation resolved. Triggers CTR score recalculation and stakeholder notifications.
Event Type
sync.completed
Fires when a connector sync run completes — Snowflake push, Databricks batch, or webhook batch delivery. Payload includes row counts, error counts, and sync duration. Used to confirm data freshness in downstream BI tools.
Config
Webhook Security Model
All outbound webhooks signed with HMAC-SHA256. Receiving endpoint validates signature before processing. Configurable per-event filtering. Retry logic with exponential backoff on 4xx and 5xx responses. Full delivery log available in the Connectors workspace.
Security, Audit, and Compliance Architecture
Row-Level Security — Database Enforced

Access control is not application-layer middleware. Every Supabase table has PostgreSQL row-level security policies that enforce which rows a given user can read, write, or delete — based on their role, their organization, and their specific entity permissions.

An AI agent authenticating with an API key receives exactly the same data access as the human user who created that key — not more, not less. Even if the agent constructs a warehouse query attempting to access data outside its scope, PostgreSQL silently returns only authorized rows. The restriction is invisible to the caller and unbypassable by any query construction.

API Keys, Audit Log, and Revocation

Every API key is scoped to a specific user, organization, and permission set at creation time. Keys can be restricted to specific tools, specific entities, or read only operations.

  • Instant revocation — key disabled at the database layer, all in-flight requests rejected immediately
  • Full audit log on every tool invocation: timestamp, user, tool name, parameters, result row count, latency
  • Usage analytics per key: call volume, top tools, error rates, and data volume
  • Key expiry with configurable TTL for time-limited integrations or contractor access
  • GDPR, HIPAA, SEC, and APA compliance maintained through architecture — no configuration required
AI.DI Is Not a Competitor to LLMs — It Is Their Prerequisite

Every enterprise deploying Copilot, GPT-4, Claude, or Gemini on their documents faces the same problem: the AI is only as good as the data it reasons from. Uncertified documents produce hallucinated answers. Unstructured files produce generic summaries. AI.DI is the certified, structured document foundation that transforms any LLM from a document summarizer into a reliable enterprise intelligence system.

Value & Strategy · Tab 11
Continuous Transaction Readiness™ — The Score That Disrupts a Category
CTR is not a feature. It is a category-defining concept that legacy document management platforms are structurally incapable of delivering. It means your organization is always prepared to respond to a capital call, close an acquisition, satisfy a regulator, onboard a counterparty, or distribute to a stakeholder — because AI.DI monitors, scores, routes, and maintains your entire document estate continuously, automatically, and in real time.
The Primary Value Statement

AI.DI gives your organization Continuous Transaction Readiness — the state where every document across every system is accessible, authentic, current, and actionable at all times. Organizations that achieve this state lower their cost of capital, reduce audit risk, accelerate transactions, deploy AI with confidence, and eliminate the document scramble that precedes every critical business event.

Why CTR Is Disruptive — What the Legacy Platforms Cannot Do
The Legacy Problem
Document Management Platforms Are Reactive. CTR Is Proactive.

Every document management platform ever built — M-Files, Hyland, Box, SharePoint, Laserfiche, OpenText — operates on the same model: a human asks a question, the system returns a file. The documents do not know they are incomplete. The system does not know a transaction is approaching. No one is told what is missing until the moment it matters.

CTR inverts this model. The platform continuously monitors the entire document estate against a dynamic requirement model, scores readiness in real time, and surfaces gaps before they become crises. The difference between reactive retrieval and proactive readiness is the difference between document management and document intelligence.

The Structural Barrier
CTR Requires Intelligence That File Storage Systems Cannot Generate.

To calculate a CTR score, you need to know: which documents are required, which are present, which are valid, which are current, which have changed, and which are expired. A file storage system knows none of this. It knows filenames and folder paths.

AI.DI knows all of this because Abstract.DI has read every document, Sentry has fingerprinted and certified every document, and the Warehouse stores every extracted field — including expiry dates, version identifiers, compliance flags, and obligation terms — as queryable structured data. CTR is computed from that data continuously. No competitor has that data. None can build it without starting over.

The Market Opportunity
Every Organization Has a Transaction in Its Future. None of Them Are Ready.

Every organization faces recurrent high-stakes document events: regulatory audits, financing processes, M&A due diligence, partner onboarding, contract renewals, compliance filings, board reviews. In every case, the weeks before the event are consumed by document scramble — finding files, verifying versions, hunting for missing certificates, correcting outdated records.

CTR eliminates that scramble permanently. The organization is ready before the event is announced. That is not an incremental improvement. It is a fundamentally different value proposition — one that no existing platform can match because none of them understand what their documents say.

How CTR Is Calculated — Five Weighted Dimensions
Sample Entity CTR Score
84/100
Near Ready — Minor Gaps
23/26
Docs Present
2
Expiring Soon
1
Violation
4.2d
Avg Response
Five Weighted Dimensions
Document Completeness88/100
23 of 26 required document types present and valid across this entity
Document Validity & Freshness76/100
2 regulatory certification documents expire within 45 days — alerts dispatched
Compliance & Regulatory Status71/100
1 active violation: Compliance Certificate version mismatch detected by Sentry fingerprint comparison
Distribution Readiness92/100
Complete document package deliverable to any counterparty within 2 hours from current state
Access & Permissioning Health97/100
All role assignments current. No orphaned access detected. Every stakeholder sees exactly what they should.
Score Interpretation
ScoreStatusTypical SituationTime to Transact
90–100Transaction ReadyAll documents present, current, and certified. No violations. Counterparty package deployable in hours.48 hours
75–89Near Ready1–3 documents missing or expiring. No active violations. Gaps identified and assigned.1–5 business days
55–74Attention RequiredMultiple gaps or 1–2 violations. Transaction possible but counterparty will surface issues.2–4 weeks
35–54Not ReadySignificant document gaps. Will not survive regulatory or counterparty diligence in current state.30–60 days
0–34CriticalSeverely incomplete or noncompliant. Immediate remediation required across multiple dimensions.90+ days
What CTR Delivers — Tangible Organizational Outcomes
Outcome 01
Transactions Close Faster
The typical document scramble before a financing, acquisition, or regulatory filing takes 3 to 6 weeks. Teams chase files across drives, email chains, and vendor portals. Half the documents retrieved are wrong versions. AI.DI eliminates this entirely. A CTR score of 90+ means the counterparty package is ready before the counterparty asks.
Outcome 02
Audit Risk Drops to Near Zero
Regulators and auditors request specific documents with specific version requirements. AI.DI maintains a continuous, certified audit trail on every document — version history, access log, fingerprint certification, and extraction record. When the auditor requests a document from 18 months ago, the platform produces it in seconds, certified, with full provenance chain.
Outcome 03
AI Deployments Actually Work
Every major enterprise AI deployment is failing for the same reason: the documents feeding the model are unverified, duplicated, and structurally inconsistent. AI.DI solves this permanently. When your LLM reasons from AI.DI-certified documents, every answer is backed by a fingerprint-verified, extraction-validated, version-controlled source.
Outcome 04
Compliance Is Continuous, Not Cyclical
Most organizations achieve compliance for a moment — the audit, the filing deadline, the renewal date — then drift back into gaps. AI.DI makes compliance a continuous state, not a periodic sprint. Expiry alerts fire 90, 60, and 30 days before a document lapses. The CTR score reflects the current compliance posture at all times.
Outcome 05
Cost of Capital Improves
Lenders, investors, and ratings agencies price risk based in part on how prepared an organization is to respond to information requests. Organizations that can deliver complete, certified, structured document packages in hours demonstrate operational maturity that translates directly into better terms. The CTR score is a quantified, auditable measure of that maturity.
Outcome 06
The Document Scramble Is Eliminated Permanently
Every organization knows the document scramble: the all-hands search that precedes every critical business event. It is expensive, error-prone, and entirely avoidable. AI.DI eliminates it by maintaining Continuous Transaction Readiness as a permanent operational state. The organization does not prepare for the transaction. The organization is always prepared.
The CTR Competitive Displacement Framework
Every legacy document management platform can be evaluated against a single question: can it tell you, right now, whether you are ready to transact? The answer is universally no — because transaction readiness requires knowing what your documents say, not just where they are stored.
CapabilityM-Files / Hyland / OpenTextBox / SharePointAI.DI
Real time readiness scoreNoneNoneCTR Score — continuous
Automatic gap detectionManual checklistNoneContinuous AI monitoring
Document content intelligenceMetadata tags onlyNoneFull field extraction
Expiry and validity trackingManual with remindersNoneAutomated from extracted dates
Counterparty package readinessManual assemblyManual assemblyPre-assembled, certified
Compliance posture visibilityPeriodic reportsNoneContinuous, real time
AI-ready data foundationRaw files onlyRaw files onlyCertified structured data
Version certificationVersion numbers onlyVersion numbers onlySentry fingerprint certified
"The question every organization needs to answer — and currently cannot — is: are we ready? AI.DI is the first platform that answers that question continuously, automatically, and with mathematical precision. CTR is not a score. It is proof that document intelligence has replaced document management."
— AI.DI platform design principle
Value & Strategy · Tab 11
For Data Scientists — The Document Intelligence Stack You've Been Waiting For
You've been asked to build AI on enterprise documents. You know what that means: unstructured PDFs, no provenance, wrong versions, 40% duplicates, PII everywhere, no reliable way to trace an LLM answer to a specific document. AI.DI is the infrastructure layer that solves every one of those problems — through every interface you already use.
What You're Actually Getting

AI.DI is not a document management UI with an API bolted on. It is a document intelligence data platform: a PostgreSQL warehouse of structured document intelligence, a MCP server, a webhook event stream, a REST/GraphQL API, Snowflake Data Share, JDBC/ODBC direct access, vector embeddings on certified document chunks, and a 30-engine ML pipeline that improves continuously. Every document becomes structured, provenance tracked, certified data — available to any model, pipeline, or analytics tool you're running.

The Data Model — What You're Querying
TableContentsKey FieldsPrimary Use
document_recordsEvery document processedid, original_name, document_type, workflow_status, asset_id, classification_confidence, storage_pathDocument inventory, classification analysis
extracted_fieldsStructured extraction from Abstract.DIdocument_id, field_name, field_value, confidence_score, extraction_model, extraction_timestampContract analytics, financial extraction
sentry_fingerprintsCryptographic fingerprint recordsdocument_id, fingerprint_hash, fingerprint_type, certified_at, version_chain, similarity_scoresCertification, duplicate detection, fraud monitoring
hierarchy_nodesFull org hierarchyid, parent_id, node_type, node_name, industry, ctr_score, completeness_pctPortfolio analytics, CTR aggregation
document_activity_logEvery action on every documentdocument_id, event_type, actor_id, actor_role, timestamp, metadataAudit trail, access pattern analysis
vector_embeddingsEmbeddings on certified chunksdocument_id, chunk_id, certified_version_hash, embedding_vector, model_versionSemantic search, RAG retrieval, clustering
ctr_score_historyCTR Score time seriesnode_id, score, dimension_scores, calculated_at, delta_from_priorReadiness trending, portfolio benchmarking
Python SDK — Example Patterns
from aidi import DocumentWarehouse
client = DocumentWarehouse(api_key="YOUR_KEY", tenant_id="YOUR_TENANT")

# Query all Q1 2027 lease expirations across a portfolio — certified docs only
expirations = client.extractions.query(
  document_type="commercial_lease", field="expiration_date",
  date_range=("2027-01-01", "2027-03-31"), certified_only=True
)

# Get version-locked embeddings for RAG pipeline
embeddings = client.vectors.get_certified_chunks(
  document_ids=expirations.document_ids(), version_locked=True
)

# Subscribe to certification events for real time model retraining
@client.events.on("document.certified", document_type="financial_statement")
async def on_new_financial_statement(event):
  extracted = await client.abstractions.get_fields(event.document_id)
  await my_model.retrain_incremental(extracted.to_feature_vector())
Value & Strategy · Tab 12
Any Industry. Any Complexity. Built for Scale.
AI.DI was not built for one vertical and adapted for others. The same engine that certifies a Blackstone real estate portfolio is equally compelling for a PE firm's data room, a hospital's compliance records, or a law firm's contract vault. The document problem is universal. So is the solution.
Unlimited Org Depth
Enterprise → Group → Entity → Asset → Unit → Counterparty. Any depth, any width, any industry. A 500-asset institutional fund, a 15-portfolio-company PE firm, a 200-branch bank — all map to the same hierarchy model with zero configuration overhead.
Edge Compute at Scale
Deno edge functions scale to zero when idle and to any volume on demand — same code handles 10 documents and 10 million. No ops team. No provisioning. No performance cliffs at scale.
Any File Type. Zero Exceptions.
PDF, DOCX, XLSX, PPTX, MSG/EML, CSV, ZIP, JPEG/PNG/TIFF scans, database records. No conversion required. No preprocessing. Whether a scanned fax or a native Word contract — AI.DI ingests, classifies, and extracts from all of it.
any industry
Asset managers, sponsors, operators across multifamily, office, industrial, retail, and mixed-use
Primary Market
Document Types
  • Title policies & ALTA surveys
  • Lease abstracts & leases
  • Insurance certificates
  • Environmental studies (Phase I/II)
  • Appraisals & BPOs
  • Loan documents & notes
  • Certificates of occupancy
  • Property management agreements
Key Use Cases
  • Acquisition due diligence
  • Loan closing packages
  • Lender covenant compliance
  • Insurance renewal management
  • Portfolio disposition readiness
  • LP reporting distributions
CTR Impact
  • Diligence prep: 6 weeks → 48 hours
  • Eliminate insurance gap incidents
  • Pre-qualify assets 12 months early
  • LP reports in 1 click, not 2 weeks
  • Close refinancings in half the time
Private Equity
GPs, fund managers, and portfolio operations teams managing company level and fund-level documentation
Primary Market
Document Types
  • Fund formation documents
  • LP subscription agreements
  • Cap tables & equity agreements
  • Material contracts
  • Audited financials
  • Board minutes & resolutions
  • Exit transaction documents
Key Use Cases
  • Portfolio company exit readiness
  • LP capital call packages
  • Annual audit preparation
  • Co-investor reporting
  • Secondary transfer docs
CTR Impact
  • Exit prep starts 18 months early
  • LP Q&A response under 24 hours
  • Audit cycle cut 70%
  • Deal team on diligence, not hunting docs
Department-Level Entry Points
DepartmentAcute PainAI.DI Entry ProductExpansion Path
Legal / GCContract version disputes, discovery liability, GDPR complianceSentry certification + Document Gateway distributionFull Document Warehouse for corporate legal corpus
Finance / AccountingAudit prep fire drills, financial document reconciliationAbstract.DI batch (financial extraction) + Blueprint auditSentry certification + Warehouse integration to ERP
Compliance / RiskRegulatory filing tracking, compliance gaps, audit exposureSentry + Warehouse (compliance corpus) + CTR ScoreFull platform across regulated document types
Transactions / Deal TeamDue diligence prep time, data room chaosDocument Gateway + Distribution Studio + Transaction RoomsAbstract.DI batch for portfolio wide extraction
IT / Data EngineeringUnstructured data not in Snowflake; LLM hallucinationsDocument Warehouse + Snowflake + MCP ServerFull platform as enterprise document intelligence backbone
Operations / HREmployee records, policy tracking, onboarding complianceFileStar lifecycle governance + Abstract.DI HR extractionSentry certification + Document Gateway policy distribution
Get Started · Tab 13
Start with One Department. Get the Whole Platform.
AI.DI is not a pilot program with limited features. From your very first document, you have access to the complete platform — every engine, every view, every integration. We believe in earning your full commitment by delivering full capability from day one. Start small if you want. The platform is built for all — enormous portfolios and single-department deployments run on the exact same infrastructure.
The imkore Philosophy — Do Some or Do It All

The world's largest institutional real estate portfolios run on the same platform as a 12-asset regional operator starting their first compliance program. A single compliance officer in one department gets the same AI intelligence, the same CTR Score, the same Warehouse, the same MCP server as a 500-person investment management firm running 20 funds. We built for scale from day one — which means the smallest client gets the most powerful platform available at any price point. No feature tiers. No locked capabilities. No "upgrade to get the real thing."

Three Ways to Start — All Paths Lead to the Same Platform
Entry Path 01
Start with One Document Type
Pick your most painful document type — insurance certificates, leases, vendor contracts, financial statements. Run Abstract.DI on everything you have. Get a CTR Score on that category in 72 hours. See exactly what's missing, expiring, or wrong. The rest of the platform is right there when you're ready.
"We started with just our COIs. In three days we knew which assets were exposed. We hadn't done that audit in two years." — Property Operations Director
Entry Path 02
Start with One Department
Give legal, compliance, finance, or your deal team a standalone deployment. They get the full platform — just scoped to their hierarchy node and document types. No IT project. No enterprise rollout required. One steward, one asset group, full capability. When they prove ROI, the next department asks to join.
"Legal started it. Then finance wanted in. Then the deal team. We never ran a rollout — it spread itself." — Chief Operating Officer, PE Firm
Entry Path 03
Start with One Asset or Fund
Run a complete AI.DI deployment on a single asset or fund as a proof of concept with real production data. CTR Score goes live in 72 hours. Abstract.DI processes your existing archive in the first week. Distribution Studio sends your first LP package before the end of month one.
"We ran one asset. The CTR Score told us things we didn't know. That asset closed three months faster. Then we did the whole portfolio." — Managing Director, Enterprise Client
imkore Blueprint — The Highest-Confidence Entry Point
Advisory Service · $50K to $150K · 60–90 Days
imkore Blueprint — Document Intelligence Audit & Readiness Roadmap

Blueprint evaluates your entire document ecosystem — every repository, every system, every process — and delivers a scored readiness assessment and a prioritized AI.DI product roadmap. Blueprint invariably reveals exactly which products the client needs and why. The roadmap we deliver IS the AI.DI implementation plan for your organization.

01
Discover
Map all document repositories across all systems
02
Assess
Evaluate governance, structure, and integrity
03-04
Classify + Validate
Standardize taxonomies, confirm authenticity, remove duplicates
05-06
Structure + Certify
Apply metadata conventions, establish certified records
07-08
Enable + Optimize
Prepare for AI and automation, maintain Continuous Transaction Readiness
Plans
Tier 01
Foundation
For teams managing one department, fund, or asset group getting organized and transaction ready for the first time.
Custom
Based on asset count and user seats.

+Up to 25 assets — full platform
+Check-In Studio with Abstract.DI
+CTR Score Dashboard
+Distribution Studio
+Up to 10 user seats
Contact Sales
Tier 03
Strategic Partner
For institutional platforms, banks, and technology integrators embedding AI.DI into their own products.
API-First
Full API access, white label options, revenue sharing available.

+Full Abstract.DI API access
+Sentry fingerprinting API
+CTR Score API
+White label Transaction Rooms
+MCP server + Revenue sharing
Talk to Partnerships
Frequently Asked Questions

You get the full platform from the moment you deploy — every engine, every view, every integration. There are no feature gates, no capability tiers, and no "enterprise unlock" for core functionality. Your first document gets the same AI pipeline as document number one million. We believe you should see the full value immediately, not earn access to it through a ramp-up process.

No. AI.DI layers over your existing infrastructure. Start with your highest-priority asset group or begin fresh with new documents. There is no requirement to migrate your entire historical archive before going live. The batch engine can process any legacy archive on its own timeline — you decide when and what to bring in.

Sentry generates a mathematical fingerprint — a unique hash derived from document content. Two identical documents always produce identical fingerprints. Any change produces a different fingerprint. The original document is never stored by Sentry. GDPR data minimization is achieved structurally — your documents never leave your control.

The MCP server exposes 6 tools: search_documents, get_compliance_status, get_obligations, query_warehouse, get_hierarchy, get_document_url. Add AI.DI to Claude, Cursor, LangChain, AutoGen, or any MCP compatible environment and your agents immediately have certified document search and structured extraction queries. Authentication via OAuth2 — agents only access what the connecting user is authorized to see. Keys are revocable instantly.

Yes. Full platform via Docker containers — no Kubernetes required. Azure Cloud, AWS, fully on premise, and hybrid (metadata in cloud, documents on-prem) are all supported. Air-gapped environments with no internet connectivity are also supported. Contact the enterprise team for deployment architecture details.

Snowflake Data Share (zero copy, no ETL), Databricks connector (Delta Lake, streaming), Tableau and Power BI native connectors, dbt compatibility, BigQuery export, direct JDBC/ODBC access, REST API with OpenAPI 3.0 spec, Python SDK, and webhook event streaming to any HTTP endpoint. SSO via SAML 2.0 and OAuth 2.0.

Request Access
Your next deal closes faster
when your documents are always ready.
Request a Simulator Key and explore a fully populated AI.DI demo environment with real document intelligence running on sample portfolio data — no implementation required.
Request Simulator Key Log In to Document Gateway
[email protected]  ·  documentgateway.ai  ·  imkore.ai