Our secret sauce
Document Ingestion and RAG Chunking

PrimeCut is POMA AI's patent-protected document ingestion & RAG chunking core — structure-aware across text, tables, scanned PDFs (OCR), and 50+ filetypes.

The structural awareness matters most on technical documents — research papers, financial filings, engineering specs — where naive chunkers split a single argument across three chunks and break retrieval. Drop it into your existing pipeline; retrieval, embeddings, and vector store stay where they are.

How Hierarchical Ingestion and Chunking Works: Structure to RAG-Ready Chunks
How POMA PrimeCut Sees Your Document Hierarchy

Every document carries an internal logic: a hierarchy of headings, sub-sections, tables, lists, and supporting elements that define what content belongs together and why. That structure is not decoration — it is the semantic map of the document.

Standard ingestion pipelines discard this map. They extract raw text and hand it to a chunker that has no knowledge of where one idea ends and another begins.

PrimeCut understands your document’s content hierarchy before chunking — preserving structural relationships, eliminating context poisoning, and producing semantically coherent chunksets that make every downstream RAG component more accurate by default.

Text, Chart & Table — One Document, Fully Resolved

MSCI World Index (USD) Factsheet, Sep 2025 Chunksets 0–5 of 43
Shared root
Shared hierarchy
Leaf (unique to one chunkset)
Text
Image description
Table data

What you get back
Every upload returns a POMA archive.

Send us a document — any supported filetype — and you get back a POMA archive: a zip containing the structured SDK output your pipeline can read directly. No glue code, no per-file branching.

The archive bundles the chunks your retrieval layer consumes alongside the intermediate artifacts that produced them, so you can debug, re-process, or surface source content without re-running the pipeline.

Inside a POMA archive

Core SDK files
chunks.json
Extracted chunks with hierarchy and page references.
chunksets.json
Grouped chunk collections for retrieval.
image_sources.json
Image references and metadata (when applicable).
assets/
Supporting files referenced by chunks (when applicable).
Extended processing artifacts

Every archive also carries the intermediate artifacts that produced those chunks: input as markdown and HTML, structurally-indented plain text, AI/OCR image descriptions, extracted tables, pre-processed source files, and archive- and content-level metadata.

What it does
From document to embedding-ready chunkset, one call.

PrimeCut treats each document as a structure, not as just bytes. It detects the hierarchy of your document - headings, subsections, and clauses. This allows it to preserves clauses with their definitions, tables with their captions, and graph content in their section. It then emits chunksets that your embedding model can use directly.

  • Structure-aware parsing

    Headings, tables, lists, and captions retain their hierarchical relationships through chunking. No flattening to character runs.

  • Fifty-plus filetypes

    PDF, DOCX, PPTX, XLSX, HTML — same engine for all of them. Images, charts, and tabulated data are handled inline as searchable content.

  • Hierarchical chunksets

    Output is structured JSON: chunks with full ancestor metadata and ready-to-embed traversal paths. Drop into any embedding model or vector DB.

What POMA PrimeCut Does Differently
POMA PrimeCut vs Unstructured.io vs Conventional Chunking:
Hierarchical Chunking Compared

Conventional Chunk

an SPDF is one approach to help ensure that the QS regulation is met. Because of its benefits in helping comply with the QS regulation and
cybersecurity, FDA encourages manufacturers to use an SPDF, but other approaches might also satisfy the QS regulation.
### B. Designing for Security
When reviewing premarket submissions, FDA intends to assess device cybersecurity based on a number of factors, including, but not limited to, the
device's ability to provide and implement the security objectives below throughout the device architecture. The security objectives below generally
may apply broadly to devices within the scope of this guidance, including, but not limited to, devices containing artificial intelligence (AI) and
cloud-based services.
Security Objectives:
• Authenticity, which includes integrity;
• Authorization:
• Availability:
• Confidentiality; and
• Secure and timely updatability and patchability.
Premarket submissions should include information that describes how the above security objectives are addressed by and integrated into the device
design. The extent to which security requirements, architecture, supply chain, and implementation are needed to meet these objectives will depend on
but may not be limited to:
- The device’s intended use, indications for use, and reasonably foreseeable misuse;
- The presence and functionality of its electronic data interfaces;
• Its intended and actual environment of use:
- The risks presented by cybersecurity vulnerabilities;
- The exploitability of the vulnerabilities; and
- The risk of patient harm due to vulnerability exploitation.
SPDF processes aim to reduce the number and severity of vulnerabilities and thereby reduce the exploitability of a medical device system and the
associated risk of patient harm. Because exploitation of known vulnerabilities or weak cybersecurity controls should be considered reasonably
foreseeable failure modes for medical device systems, these factors should be addressed in the device design. $ ^{19} $ One of the key benefits of
using an SPDF is that a medical device system is more likely to be secure by design, such that the device is designed from the outset to be secure
within its system and/or network of use throughout the device lifecycle.
### C. Transparency
A lack of cybersecurity information, such as information necessary to integrate the device into the use environment, as well as information needed
by users to maintain the medical device system’s cybersecurity over the device lifecycle, has the potential to affect the safety and effectiveness
of a device. In order to address these concerns, it is important for device users to
## Contains Nonbinding Recommendations
have access to information pertaining to the device’s cybersecurity controls, potential risks to the medical device system, and other relevant
  • very long
  • spanning multiple sections
  • isolating heading from subsequent content

Unstructured.io Chunk

• The device’s intended use, indications for use, and reasonably foreseeable misuse;
• The presence and functionality of its electronic data interfaces;
• Its intended and actual environment of use; 18
• The risks presented by cybersecurity vulnerabilities;
• The exploitability of the vulnerabilities; and
• The risk of patient harm due to vulnerability exploitation.
  • No section indication
  • Random artifacts as part of apparent main text (“18”)
  • No context/positioning within the document

POMA Chunk(Set): Full Context Path

Cybersecurity Guidance for Medical Devices: Quality Systems and Premarket Submission
Requirements
	[…]
	Guidance for Industry and Food and Drug Administration Staff
		[…]
		Contains Nonbinding Recommendations outline
			[…]
			B. Designing for Security
			When reviewing premarket submissions, FDA intends to assess device cybersecurity
			based on a number of factors, including, but not limited to, the device's ability
			to provide and implement the security objectives below throughout the device
			architecture.
				[…]
				The extent to which security requirements, architecture, supply chain, and
				implementation are needed to meet these objectives will depend on but may not
				be limited to:
					[…]
					• Its intended and actual environment of use:
						[…]
						• The risk of patient harm due to vulnerability exploitation.
It just works

The benchmark
23% of the tokens, 100% recall.

Standard chunking ignores how your documents are structured. So a query like 'How high was the interest rate last year?' retrieves a wide net of chunks where most of the content has nothing to do with the question — and you still pay for every token returned. PrimeCut chunks structure-aware: queries return only the relevant content, no loss of recall.

1.5M 1M 500K 0 340K 1.5M POMA Chunking Conventional chunking Tokens needed for 100% recall

Two Ways to Use POMA PrimeCut Depending on Your Budget
PrimeCut Eco and PrimeCut Pro

PrimeCut ships in two tiers. Both preserve document hierarchy. Both eliminate context poisoning. The difference is in how they handle visual content and compute — matched to the complexity of your documents and your budget.

PrimeCut Eco

Simple hierarchical chunking for well-structured documents.

0.003 € / page
Features
  • Rapid document hierarchy detection
  • Semantically bounded chunks with ancestor context inheritance
  • Ready-to-embed chunksets
  • Images and visual elements extracted and placeholdered
  • Optimized for low cost
  • Simple Title Generation
Best for
  • Large knowledge bases with limited budget
  • Simple and well-structured content

PrimeCut Pro

Full structural and visual intelligence for complex, mixed-content documents.

0.03 € / page
Features
  • Full document hierarchy parsing
  • Semantically bounded and neighbour-aware chunks with ancestor context inheritance
  • Context-aware ready-to-embed chunksets
  • Full AI processing — figures, tables, and images parsed as semantic content
  • Visual elements both extracted and converted to retrievable, context-aware textual chunks
  • Optimized for multimodal accurate hierarchical textual representation of complex content
Best for
  • High-stakes domains with complex documents (legal & regulatory, financial & insurance, medical, engineering)
  • High need for search accuracy
  • Multimodal retrieval based on comparable semantic and hierarchical representations

Structured filesxml, cir, json, yaml, toml, ini, env, csv, tsv, xls, xlsx, xlsb — are always chunked in Pro mode and billed at the Eco rate.

Integration into Your RAG Pipeline
LangChain Document Chunking and RAG Pipeline Integration — No Architectural Overhaul

PrimeCut sits at the ingestion layer of your RAG pipeline — upstream of your vector database, your embedding model, and your retrieval logic. It receives documents. It returns structured, hierarchically-bounded chunksets.

The SDK is lightweight. The API is flexible. And because PrimeCut's output schema is consistent across both configurations.

Compatible with:
LLMs
OpenAI
Anthropic
Other leading LLMs
Vector Databases
Pinecone
Weaviate
Other vector databases
Frameworks
LangChain
LlamaIndex
Custom RAG implementations

The SDK is lightweight. The API is flexible. And PrimeCut's output schema is consistent across both configurations.

Ready to get started?
Try it on your own pipeline.

Free tier covers 1,000 pages — drop the SDK in, point it at a document, see what comes back. No retrieval refactor, no vector DB swap, no architectural overhaul.

Processing at scale? Let's talk

1,000 free pages. No credit card required.