02/07/2026

A Cataloging System That Configures Itself to Your Archive

Every archive works differently. Different fields, a different vocabulary, a different language, and a recording style that took years to settle and reflects what this particular collection documents. That is exactly why generic cataloging software always feels foreign: it forces the archive to bend to its template, and the customization it offers is slow and expensive — sometimes whole consulting days per field.

We approach it from the opposite direction. Instead of a fixed system the archive adapts to, we built an engine that adapts itself to the archive — learning the domain it covers, the record structure it wants, the language and the level of detail — and producing a cataloging tool tailored precisely to its needs.

What the system does with a single item

For each photograph, document or film, the engine produces a full catalog record — not keyword tagging, but a genuine record: title, dating, places, persons and historical context. Three features set it apart from off-the-shelf output:

  • Built-in bilingualism. Every field in two languages, in parallel — not after-the-fact machine translation, but professional phrasing in each.
  • Explicit confidence markers. Every identification carries a rating — ✓ confirmed, ~ probable, ? tentative. The record does not sound certain where it isn't. This is the epistemic discipline that separates a defensible record from a paragraph that merely reads well.
  • Verso and handwriting. Historical photographs carry information on their backs — an archive number, a date, a handwritten dedication. The system reads that too, across its many languages.

This capability is strongest precisely in material other tools struggle with: Hebrew and Yiddish, handwriting, visual content, and multilingual documents from periods when borders, regimes and languages shifted within the same collection.

The record lands in the system you already have

A cataloging system is only worth something if its output enters the place where the archive actually works. So the engine does not try to replace your system of record — it feeds it. Records are exported in the accepted archival standards (EAD, Dublin Core) and imported into the leading platforms in the field — ArchivesSpace, AtoM and others. You keep working in your own environment; the system simply fills it faster, and at higher quality.

Privacy — not a footnote

Archives hold sensitive personal information, and sometimes material that must not leave the building. Our model is built around that from the ground up:

  • A dedicated server per archive — full isolation, no shared infrastructure.
  • Your data is never used to train models. We work exclusively through the paid API tiers, where providers do not train on the data — and never through the free tiers, which do.
  • Keys under the archive's control, full encryption, and built-in access logging.

For the most sensitive material, stricter options exist — processing in a European data region, or a local model that never reaches the cloud at all.

What it is not

This is not a tool that replaces the archivist. The delicate identifications — who the person is, which camp, which event — remain under human judgment, and the system is built to keep a human in the loop at all times: it proposes, rates its own confidence, and leaves the decision to whoever is responsible for the collection. Its value is not that it "knows everything," but that it does the broad work with an archivist's discipline, leaving to the human what genuinely requires judgment.


For archives, museums and preservation institutions: if you have a collection waiting to be cataloged — especially Hebrew, Yiddish, or multilingual material that generic tools struggle with — we'd be glad to talk about a short pilot on a sample from your own collection. Get in touch.