M-Files, Box, SharePoint Copilot, and Hyland all share the same architectural constraint: a file storage system built in a different era with AI features layered on top. Layered AI produces generic summaries from raw files. Native AI produces structured intelligence from every field of every document — something you can actually query, export, and act on.
You cannot retrofit the AI.DI Document Warehouse onto a file storage system. The warehouse is not a feature. It is the foundation. Building it requires starting over. None of them will.
Every enterprise runs a document system and a data warehouse in parallel. The document system stores files. The data warehouse stores numbers. The information in the document — the obligation term, the coverage ratio, the commitment date, the counterparty agreement — lives in neither system. It is trapped in the PDF.
Abstract.DI ends this permanently. Every document becomes a structured database record the moment it enters the platform. The PDF is the backup. The warehouse row is the truth.
Enterprise AI deployments fail for a predictable reason: the documents feeding the model are unverified, duplicated, and structurally inconsistent. Copilot hallucinates because SharePoint is untrustworthy. The model is not the problem. The data is.
Sentry certifies every document before it enters the AI pipeline. An AI agent using AI.DI as its knowledge base cannot be given a falsified document — the fingerprint will not match. Every answer is traceable to a specific certified version with a confidence score. That is a different category of AI deployment entirely.
Documents flow into Document Gateway. Abstract.DI extracts intelligence from every one. Sentry fingerprints and certifies them. The Document Warehouse stores all of it as structured, queryable data. The Warehouse improves Abstract.DI model accuracy. Better accuracy improves Sentry signals. Better signals make Document Gateway more valuable. More value drives more documents. After 18 months, switching costs are effectively permanent — and accuracy measurably exceeds any out of the box alternative.
Competitors have slides about AI. AI.DI has 200+ React/TypeScript components, 29 live serverless edge functions, an ML Learning Studio with 30 self improving engines, a MCP server, an AI Agent Gateway connecting to Claude/Copilot/GPT-4/Gemini, and a production AI.DI Studio running 27 active AI engines. The gap between what competitors promise and what we have already shipped is measured in years of engineering. This is the unfair advantage that cannot be purchased with a VC round.
The HITL Reduction AI engine monitors all other engines' human review rates and autonomously moves classifications to auto approve when confidence consistently exceeds configurable thresholds. Standard document types trend toward zero human intervention at 12 months. Novel or edge case documents always retain human oversight — the goal is the right humans reviewing the right exceptions, not zero humans.
Every legacy DMS has fixed classification models requiring expensive, time-consuming retraining. AI.DI's ML Learning Studio inverts this entirely — 30 engines improving continuously from production data, automatically, without engineering intervention. AI.DI gets cheaper and more accurate at scale. Every competitor's cost stays flat or increases.
| Tier | Focus | Example Engines | HITL Trajectory |
|---|---|---|---|
| Tier 1 — Foundation | Document type classification | Enterprise Type Classifier, PE Type Classifier, Legal Type Classifier | Near-zero for covered types |
| Tier 2 — Entity | Named entity extraction | Party Extractor, Property Identifier, Fund/Entity Linker | 5–15% at 6 months |
| Tier 3 — Date & Validity | Temporal signal extraction | Expiration Detector, Effective Date Parser, Renewal Classifier | Near-zero for standard formats |
| Tier 4 — Financial | Financial data extraction | Loan Terms Extractor, Critical Data Extract Parser, Appraisal Value Extractor | 10–20% at 6 months |
| Tier 5 — Compliance | Compliance validation | Coverage Gap Detector, Compliance Flag Engine, Signature Validator | 15–25% — domain expertise retained |
| Tier 6 — Cross-Document | Cross-document consistency | Portfolio Benchmark Engine, Anomaly Correlator, Reconciliation Engine | Complex analysis — strategic HITL |
Box cannot rebuild their data model for AI without breaking 150,000 customers. SharePoint's incentive is to preserve Teams and Office revenue, not cannibalize Copilot. M-Files is 2–3 years behind on the data model and the Warehouse layer. Egnyte wins on storage reliability but has no awareness of what documents contain. Every dollar these platforms invest in AI is constrained by the need to not break existing products. That constraint does not exist for AI.DI.
| Capability | AI.DI Platform | Box | SharePoint | M-Files | Egnyte |
|---|---|---|---|---|---|
| Architecture & Philosophy | |||||
| AI native architecture (built for AI, not adapted) | Win 2024-2025. Zero compromise. AI is core, not a wrapper. | Bolt-on | Copilot wrapper | Aino — improving but bolted on | Minimal |
| Zero legacy technical debt | Win No codebase older than 18 months. | 2005 origin | 2001 origin | 2003 origin | 2009 origin |
| Edge compute architecture | Win All compute at edge. Scale to zero or infinity. | None | Azure Functions (partial) | None | None |
| Modular adoption (standalone or full suite) | Win Every engine has standalone value. | Partial | Module-based but complex | Partial | Partial |
| AI & Document Intelligence | |||||
| Structured data extraction from documents | Win Abstract.DI — any type, 94% day one, 100K batch. | None | Basic Copilot extraction | Aino — requires training | None |
| Day one extraction accuracy (no training) | Win 94%+ on prebuilt schemas. No training required. | N/A | N/A | Months of training | N/A |
| GPU accelerated OCR pipeline | Win DocTR — 10-50x speedup on GPU. | None | Azure OCR (limited) | Basic OCR | Basic OCR |
| Batch processing (100K+ archives) | Win 100K-chunk batch. ZIP, Box, SharePoint, S3. | None | None | Limited batch | None |
| 30 self improving ML engines | Win Continuous production learning. No ML engineers. | None | Generic Copilot | Limited self-learning | None |
| HITL Reduction AI (autonomous meta-engine) | Win Autonomous promotion of high-confidence classifications. | None | None | None | None |
| Trust, Compliance & Security | |||||
| Document fingerprinting (deterministic, patent pending) | Win ~10,000 fingerprint catalog. Zero doc storage. | None | None | None | None |
| Zero document storage compliance model | Win Only fingerprints stored. GDPR minimization by math. | Full storage | Full storage | Full storage | Full storage |
| PII auto detection and redaction pipeline | Win Tokenization pipeline auto redacts at ingestion. | None | Purview (partial) | None | DLP (partial) |
| Fraud / document manipulation detection | Win Deterministic — single character change detectable. | None | None | None | None |
| Blockchain audit trail | Win On chain anchoring. 2,814+ documents on chain. | None | None | None | None |
| Data & AI Infrastructure | |||||
| Structured document intelligence warehouse | Win Every extracted field is a queryable row. Unique. | None | None | None | None |
| Snowflake Data Share (zero ETL) | Win Zero-copy. Join doc intelligence with financial data. | None | None | None | None |
| MCP server for AI agents | Win Production MCP. Claude, Cursor, LangChain — no wrapper. | None | None | None | None |
| Vector embeddings on certified chunks | Win Tied to certified versions. pg_vector native. | None | Azure AI Search (partial) | None | None |
| CTR Score (Continuous Transaction Readiness) | Win Live composite readiness score. Portfolio-wide. | None | None | None | None |
| 27 active AI engines in production | Win AI.DI Studio — live engine map with real time status. | None | None | None | None |
| Deployment & Integration | |||||
| Unlimited hierarchy depth (any org structure) | Win Enterprise → Group → Entity → Asset → Unit. Any depth. | Folders only | Sites/subsites | Metadata based | Folders/workspaces |
| 30 day deployment (no implementation project) | Win 30 days from contract to live. M-Files runs 3–6 months. | Weeks–months | Months–years | 3–6 months typical | Weeks–months |
| Installed base / existing trust relationships | Win 45 FileStar enterprise clients. 20+ year relationships. Zero CAC. | Large (hard to access) | Large (bundled) | Existing clients | Existing clients |
Every file enters a multi-stage AI pipeline before a human sees it. Abstract.DI classifies the document type, extracts key fields, scores confidence at the field level, checks for duplicates, detects anomalies, and routes to the correct steward queue automatically.
Three distribution modes with full audit trails and recipient access controls. Every document leaves the platform certified and tracked.
A purpose built external submission portal that presents to counterparties as your own branded platform. No account creation required for submitters.
Real time operational view across the entire document corpus — by entity, by division, by document type, or by compliance obligation.
Five-tier organizational hierarchy providing structured, queryable document storage with role-enforced access at every level.
Configurable multi-step approval chains for any document type or business process. Every workflow is auditable end to end.
Every user action in Document Gateway is governed by a five-role permission model enforced at both the application and database layers via Supabase row-level security policies.
Zero legacy code. Entirely 2024 to 2026 stack designed for sub-30-day enterprise deployment.
Single-tenant, multitenant, Azure Cloud, AWS, on-premise, and hybrid deployments are all supported. Any file type. Any industry. Any org size. Average enterprise deployment: 30 days from contract to go-live. No professional services required for standard configurations.
Extracted from every document type regardless of content: node_path, hierarchy_path, doc_type, workflow_status, added_at, original_name, storage_path, period. These fields form the backbone of the Document Warehouse schema and enable cross-entity search across the entire corpus.
Present for documents where Abstract.DI has completed extraction: ai_fields (JSONB), extraction_confidence (numeric 0 to 100), entity_party (primary counterparty), primary_value (lead financial figure), start_date, end_date. These fields are present across a standard deployment corpus of thousands of documents.
Extracted from financial statements, loan documents, and operating reports: coverage ratios, loan-to-value metrics, net operating income, revenue, net income, return metrics, and performance multiples. All numeric fields stored with full precision for direct BI tool consumption without transformation.
Extracted from contracts, utilization reports, and entity records: utilization_rate, total_units, primary_counterparty, anomaly_flag (boolean — AI detected discrepancy). The anomaly flag is computed by comparing extracted values against corpus patterns. A coverage ratio of 0.4 in a corpus where the median is 1.8 raises the flag automatically.
Abstract.DI maintains per-tenant model statistics that improve continuously as stewards interact with the platform:
Every steward action is a labeled training example. The model does not require separate annotation workflows or data science involvement.
Abstract.DI ships with prebuilt schemas for 5,700+ document types across industries. For document types outside the standard taxonomy, the Custom Schema Builder allows admins to define extraction targets — specify the fields you need, provide 3 to 5 example documents, and the model learns the pattern. No code. No data science team. New document type schemas are typically operational within one business day.
Industry average: 40% or more of files are duplicates. A corpus with 40% duplicates means 40% of every LLM bill computes the same content twice. Sentry identifies all duplicates, consolidates to canonical records, preserves all metadata from every duplicate instance, then suppresses duplicates from AI queues. LLM compute costs drop 30 to 50% immediately — without changing a single prompt or model.
Advanced analytics on document distribution by source, format, status, and metadata profile. Enables evidence based prioritization of remediation, archival, and modernization initiatives.
Provides measurable KPIs for digital transformation monitoring, establishes compliance posture, and ensures data quality maturity. Decision makers can now act on document intelligence rather than document volume.
Enterprise wide visibility into unique and duplicate documents across all repositories. Quantifies storage, compliance, and operational risk exposure from redundant content.
Enables measurable cost reduction through defensible deduplication and lifecycle optimization — removing redundancy without sacrificing auditability or chain of custody.
Unified discovery of unique documents across siloed systems and business applications. Identifies misplaced sensitive content and policy exceptions across environments.
Supports strategic document migration planning, normalization, and information governance initiatives across the full enterprise stack.
Approximately 13 million PubMed abstracts and associated data have been imported, fingerprinted, and organized by publication year. Daily updates can be processed on an hourly basis.
PubMed contains more than 39 million biomedical citations maintained by the National Center for Biotechnology Information at the US National Library of Medicine.
Ready for first Sentry academic, bioscience, biotechnology, and pharmaceutical clients.
Sentry registers, processes, and fingerprints every document flowing through Document Gateway — in both directions. Documents distributed externally are certified before leaving. Documents received are verified on arrival. The entire document corpus becomes a trusted, queryable intelligence layer with measurable readiness scores updated in real time. This integration is the foundation of the Continuous Transaction Readiness score.
mv_document_universe materialized view provides a single queryable source across all document types and extraction fields simultaneously.document.ingested, document.extracted, anomaly.detected, compliance.updated, sync.completed. Retry logic with exponential backoff. Used for ERP integration and downstream automation.pip install document-gateway. Pandas-native output — client.query("SELECT * FROM documents") returns a DataFrame directly. Async support. Used in notebooks, data science workflows, and custom analysis scripts.mv_document_universe), Query Engine (PostgREST plus custom SQL).Every competitor in the document management space stores files. AI.DI stores intelligence. The gap between those two statements is the entire moat. A corpus of 10,000 documents with 18 months of extraction history, anomaly signals, steward corrections, and financial field time-series data cannot be migrated to a competitor in any meaningful timeframe. The data structure is the lock-in — not the contract.
Effortlessly capture and ensure accurate classification from any application. FileStar centralizes all document types — paper or electronic — into a unified system with required fields and built in approval workflows that guarantee consistent, accurate archiving.
FileStar enforces stringent controls and compliance with precision and accountability at every step. Complex workflows can be modeled to exact requirements — sequential and parallel routing, escalation paths, and automated notifications.
Protect your critical documents in a centralized repository with security and compliance built in from the foundation. The Archive is the governed system of record — every version, every action, every access event logged and preserved.
A document schema is the structured framework that defines how documents are identified, categorized, and related to one another. Just as a data warehouse relies on a data schema to bring order to large volumes of information, a document warehouse uses a document schema to create clarity, consistency, and predictable organization across all documents.
FileStar's schema spans more than 5,700 unique document types — giving it deep understanding of documents that support acquisitions, operations, financings, compliance, and every stage of the enterprise lifecycle. The schema automatically knows what a document is, how it should be classified, where it belongs, and what a complete document chain should look like. Documents are no longer scattered or mislabeled — they are organized consistently across systems and ready for audit, operations, and enterprise-wide decision making.
Metadata is to documents what structured fields are to data. FileStar identifies and extracts key attributes — document type, parties, dates, asset identifiers, and relationships — transforming unstructured files into structured intelligence. Without metadata, documents behave like raw data without schema. With metadata, they become organized, trustworthy knowledge assets that support search, governance, compliance, and AI.
Every document in FileStar is a governed asset aligned with a consistent taxonomy and storage structure — searchable through clear logical pathways by type, entity, process, source system, date, or business function. Dynamic views and dashboards give teams visibility into entire document collections, not just isolated files.
Auditability is a defining characteristic of a document warehouse. FileStar records every interaction with every document and preserves the full lineage of a record — from its originating system to every update or review. Auditors can see exactly where a document came from, how it has been handled, and whether it remains complete and accurate.
FileStar also captures the source system, timestamps, authorship, and movement of each document — creating a verified chain of custody. This transparency builds trust across the organization and satisfies regulatory requirements without additional documentation work.
FileStar operates within an SSAE 18 certified hosting facility with annual SOC II audits. Role-based access controls ensure only authorized users and groups can access specific documents. All protocols comply with HIPAA and SOX guidelines for PII and PHI.
Compliance becomes easier when documents follow a consistent structure and lifecycle. FileStar enforces rules for document retention, validation, storage, and access — providing real time visibility into document completeness, timeliness, and accuracy. This makes it simpler to prove adherence to regulations and internal policies, and reduces the risk associated with missing or misplaced documents.
FileStar governs documents. AI.DI makes them intelligent. FileStar managed documents automatically flow through Sentry certification and Abstract.DI extraction without any workflow change for existing users. All FileStar metadata syncs to the AI.DI Warehouse continuously.
Every FileStar client is one conversation away from the full AI.DI platform. No rip and replace. No migration project. No change management crisis. The upgrade path is a configuration change — the governance infrastructure is already in place.
imkore Millennia was founded in 1996 with a focus on tailored document solutions for complex requirements that standard document management software cannot easily meet. The combination of SaaS flexibility with customizable framework design means FileStar can be configured for specific industries, regulatory environments, and workflow structures without professional services for standard deployments.
The agent-gateway edge function receives all AI agent requests and dispatches them to the appropriate tool handlers. It enforces authentication, validates the requesting agent's access scope, applies row-level security policies, and logs every tool invocation for the audit trail.
Supports Bearer token authentication for API clients and session-based auth for browser-connected agents. Rate limiting per API key. Tool-level permission grants — a key can be scoped to read only document retrieval without access to warehouse queries or compliance data.
A single Supabase Deno edge function serving both the Model Context Protocol (SSE transport for Claude and Cursor) and a REST/OpenAPI interface (for ChatGPT GPT Actions, LangChain, AutoGen, and any HTTP agent).
The same tool definitions, the same security model, the same data — two protocol surfaces from one deployment. ChatGPT integration operational. Deployed with --no-verify-jwt to support custom Bearer token auth independent of Supabase session auth.
Receives inbound webhook events from enterprise ERPs, CRM platforms, and any connected system. Validates payload signatures, routes events to the appropriate pipeline stage, and triggers document processing or metadata updates without human involvement.
When a contract is executed in an ERP, the erp-webhook fires the checkin-pipeline automatically — the document enters the AI extraction queue without anyone touching Document Gateway directly.
Cron-triggered orchestration functions that run batch operations on a configurable schedule. Batch pipeline runs process large document queues during off-peak hours. Scheduled reports generate and distribute compliance summaries, expiry alerts, and portfolio intelligence reports automatically.
No human trigger required for ongoing operations. The platform monitors itself, processes new documents, updates CTR scores, and delivers reports on schedule — continuously.
Access control is not application-layer middleware. Every Supabase table has PostgreSQL row-level security policies that enforce which rows a given user can read, write, or delete — based on their role, their organization, and their specific entity permissions.
An AI agent authenticating with an API key receives exactly the same data access as the human user who created that key — not more, not less. Even if the agent constructs a warehouse query attempting to access data outside its scope, PostgreSQL silently returns only authorized rows. The restriction is invisible to the caller and unbypassable by any query construction.
Every API key is scoped to a specific user, organization, and permission set at creation time. Keys can be restricted to specific tools, specific entities, or read only operations.
Every enterprise deploying Copilot, GPT-4, Claude, or Gemini on their documents faces the same problem: the AI is only as good as the data it reasons from. Uncertified documents produce hallucinated answers. Unstructured files produce generic summaries. AI.DI is the certified, structured document foundation that transforms any LLM from a document summarizer into a reliable enterprise intelligence system.
AI.DI gives your organization Continuous Transaction Readiness — the state where every document across every system is accessible, authentic, current, and actionable at all times. Organizations that achieve this state lower their cost of capital, reduce audit risk, accelerate transactions, deploy AI with confidence, and eliminate the document scramble that precedes every critical business event.
Every document management platform ever built — M-Files, Hyland, Box, SharePoint, Laserfiche, OpenText — operates on the same model: a human asks a question, the system returns a file. The documents do not know they are incomplete. The system does not know a transaction is approaching. No one is told what is missing until the moment it matters.
CTR inverts this model. The platform continuously monitors the entire document estate against a dynamic requirement model, scores readiness in real time, and surfaces gaps before they become crises. The difference between reactive retrieval and proactive readiness is the difference between document management and document intelligence.
To calculate a CTR score, you need to know: which documents are required, which are present, which are valid, which are current, which have changed, and which are expired. A file storage system knows none of this. It knows filenames and folder paths.
AI.DI knows all of this because Abstract.DI has read every document, Sentry has fingerprinted and certified every document, and the Warehouse stores every extracted field — including expiry dates, version identifiers, compliance flags, and obligation terms — as queryable structured data. CTR is computed from that data continuously. No competitor has that data. None can build it without starting over.
Every organization faces recurrent high-stakes document events: regulatory audits, financing processes, M&A due diligence, partner onboarding, contract renewals, compliance filings, board reviews. In every case, the weeks before the event are consumed by document scramble — finding files, verifying versions, hunting for missing certificates, correcting outdated records.
CTR eliminates that scramble permanently. The organization is ready before the event is announced. That is not an incremental improvement. It is a fundamentally different value proposition — one that no existing platform can match because none of them understand what their documents say.
| Score | Status | Typical Situation | Time to Transact |
|---|---|---|---|
| 90–100 | Transaction Ready | All documents present, current, and certified. No violations. Counterparty package deployable in hours. | 48 hours |
| 75–89 | Near Ready | 1–3 documents missing or expiring. No active violations. Gaps identified and assigned. | 1–5 business days |
| 55–74 | Attention Required | Multiple gaps or 1–2 violations. Transaction possible but counterparty will surface issues. | 2–4 weeks |
| 35–54 | Not Ready | Significant document gaps. Will not survive regulatory or counterparty diligence in current state. | 30–60 days |
| 0–34 | Critical | Severely incomplete or noncompliant. Immediate remediation required across multiple dimensions. | 90+ days |
| Capability | M-Files / Hyland / OpenText | Box / SharePoint | AI.DI |
|---|---|---|---|
| Real time readiness score | None | None | CTR Score — continuous |
| Automatic gap detection | Manual checklist | None | Continuous AI monitoring |
| Document content intelligence | Metadata tags only | None | Full field extraction |
| Expiry and validity tracking | Manual with reminders | None | Automated from extracted dates |
| Counterparty package readiness | Manual assembly | Manual assembly | Pre-assembled, certified |
| Compliance posture visibility | Periodic reports | None | Continuous, real time |
| AI-ready data foundation | Raw files only | Raw files only | Certified structured data |
| Version certification | Version numbers only | Version numbers only | Sentry fingerprint certified |
AI.DI is not a document management UI with an API bolted on. It is a document intelligence data platform: a PostgreSQL warehouse of structured document intelligence, a MCP server, a webhook event stream, a REST/GraphQL API, Snowflake Data Share, JDBC/ODBC direct access, vector embeddings on certified document chunks, and a 30-engine ML pipeline that improves continuously. Every document becomes structured, provenance tracked, certified data — available to any model, pipeline, or analytics tool you're running.
| Table | Contents | Key Fields | Primary Use |
|---|---|---|---|
document_records | Every document processed | id, original_name, document_type, workflow_status, asset_id, classification_confidence, storage_path | Document inventory, classification analysis |
extracted_fields | Structured extraction from Abstract.DI | document_id, field_name, field_value, confidence_score, extraction_model, extraction_timestamp | Contract analytics, financial extraction |
sentry_fingerprints | Cryptographic fingerprint records | document_id, fingerprint_hash, fingerprint_type, certified_at, version_chain, similarity_scores | Certification, duplicate detection, fraud monitoring |
hierarchy_nodes | Full org hierarchy | id, parent_id, node_type, node_name, industry, ctr_score, completeness_pct | Portfolio analytics, CTR aggregation |
document_activity_log | Every action on every document | document_id, event_type, actor_id, actor_role, timestamp, metadata | Audit trail, access pattern analysis |
vector_embeddings | Embeddings on certified chunks | document_id, chunk_id, certified_version_hash, embedding_vector, model_version | Semantic search, RAG retrieval, clustering |
ctr_score_history | CTR Score time series | node_id, score, dimension_scores, calculated_at, delta_from_prior | Readiness trending, portfolio benchmarking |
| Department | Acute Pain | AI.DI Entry Product | Expansion Path |
|---|---|---|---|
| Legal / GC | Contract version disputes, discovery liability, GDPR compliance | Sentry certification + Document Gateway distribution | Full Document Warehouse for corporate legal corpus |
| Finance / Accounting | Audit prep fire drills, financial document reconciliation | Abstract.DI batch (financial extraction) + Blueprint audit | Sentry certification + Warehouse integration to ERP |
| Compliance / Risk | Regulatory filing tracking, compliance gaps, audit exposure | Sentry + Warehouse (compliance corpus) + CTR Score | Full platform across regulated document types |
| Transactions / Deal Team | Due diligence prep time, data room chaos | Document Gateway + Distribution Studio + Transaction Rooms | Abstract.DI batch for portfolio wide extraction |
| IT / Data Engineering | Unstructured data not in Snowflake; LLM hallucinations | Document Warehouse + Snowflake + MCP Server | Full platform as enterprise document intelligence backbone |
| Operations / HR | Employee records, policy tracking, onboarding compliance | FileStar lifecycle governance + Abstract.DI HR extraction | Sentry certification + Document Gateway policy distribution |
The world's largest institutional real estate portfolios run on the same platform as a 12-asset regional operator starting their first compliance program. A single compliance officer in one department gets the same AI intelligence, the same CTR Score, the same Warehouse, the same MCP server as a 500-person investment management firm running 20 funds. We built for scale from day one — which means the smallest client gets the most powerful platform available at any price point. No feature tiers. No locked capabilities. No "upgrade to get the real thing."
Blueprint evaluates your entire document ecosystem — every repository, every system, every process — and delivers a scored readiness assessment and a prioritized AI.DI product roadmap. Blueprint invariably reveals exactly which products the client needs and why. The roadmap we deliver IS the AI.DI implementation plan for your organization.
You get the full platform from the moment you deploy — every engine, every view, every integration. There are no feature gates, no capability tiers, and no "enterprise unlock" for core functionality. Your first document gets the same AI pipeline as document number one million. We believe you should see the full value immediately, not earn access to it through a ramp-up process.
No. AI.DI layers over your existing infrastructure. Start with your highest-priority asset group or begin fresh with new documents. There is no requirement to migrate your entire historical archive before going live. The batch engine can process any legacy archive on its own timeline — you decide when and what to bring in.
Sentry generates a mathematical fingerprint — a unique hash derived from document content. Two identical documents always produce identical fingerprints. Any change produces a different fingerprint. The original document is never stored by Sentry. GDPR data minimization is achieved structurally — your documents never leave your control.
The MCP server exposes 6 tools: search_documents, get_compliance_status, get_obligations, query_warehouse, get_hierarchy, get_document_url. Add AI.DI to Claude, Cursor, LangChain, AutoGen, or any MCP compatible environment and your agents immediately have certified document search and structured extraction queries. Authentication via OAuth2 — agents only access what the connecting user is authorized to see. Keys are revocable instantly.
Yes. Full platform via Docker containers — no Kubernetes required. Azure Cloud, AWS, fully on premise, and hybrid (metadata in cloud, documents on-prem) are all supported. Air-gapped environments with no internet connectivity are also supported. Contact the enterprise team for deployment architecture details.
Snowflake Data Share (zero copy, no ETL), Databricks connector (Delta Lake, streaming), Tableau and Power BI native connectors, dbt compatibility, BigQuery export, direct JDBC/ODBC access, REST API with OpenAPI 3.0 spec, Python SDK, and webhook event streaming to any HTTP endpoint. SSO via SAML 2.0 and OAuth 2.0.