12/05/2026

AI Cataloging for Holocaust Archives

A photograph from the early 1940s arrives in a cataloging queue. An elderly woman in a heavy coat, a yellow star visible on her chest. A cobblestone street, a smudged shop sign in a Latin script. The temptation is to identify everything at once — where this was taken, who she is, when it happened. The discipline is to identify almost nothing on the first pass, and to do the rest in layers.

This post describes a three-layer cataloging framework that we use in archival-grade visual analysis of Holocaust-era materials. It is built on top of conventional AI vision tools, but its core is not technical — it is epistemic. The question it answers is: given what the photograph shows, what am I actually entitled to claim?

Why three layers, and not one

Most catalog entries — and most off-the-shelf vision pipelines — collapse the visual, the contextual, and the personal into one descriptive paragraph. The result is a paragraph that sounds confident on all fronts and is, in practice, indefensible on most of them.

Each of these layers requires different evidence and a different threshold for certainty. A trained vision model can recognize a 1940-pattern Wehrmacht overcoat reliably. It cannot tell you whose face appears in the photograph, and pretending otherwise pollutes the archive. Both kinds of claim belong in a catalog entry, but at very different confidence levels — and the structure has to make that visible.

Layer 1 — Visual description

What the camera captured. Composition. Framing. Photographic style, if identifiable: formal portrait, candid street scene, official propaganda image, family snapshot, surveillance frame.

This layer is rarely controversial. It is what is literally in the frame.

The discipline is to stay close to the surface. "Two figures in winter coats walking on a cobblestone street toward a building with a barred ground-floor window" — not "Two refugees fleeing the ghetto." The second sentence assumes a destination, a status, and a context that the photograph does not, by itself, prove. Layer 1 leaves that work to the layers below.

Layer 2 — Objects and details

The period-specific layer. Uniforms. Vehicles. Weapons. Signage. Yellow stars and other identifying patches. Documents. Shop fronts. Architecture. Anything in the image that carries a date stamp, a regional signature, or a regulatory trace.

This is where domain expertise begins to matter. A 1940-pattern Wehrmacht overcoat is a different signal from a 1944-pattern one. An armband bearing a six-pointed star reads differently from a sewn chest patch — and the difference indexes specific occupation regulations from specific years.

Off-the-shelf vision models will often guess at the unit, the regiment, the exact campaign. A trained eye says: "Wehrmacht greatcoat, 1939–42 pattern." Broader category, narrower error margin. The latter is what an archive can build on.

Layer 3 — Persons

The most sensitive layer, and the one where AI assistance is most dangerous.

In Holocaust-era photographs, faces are usually too small, too obscured, or too photographically degraded for honest visual identification — even when a viewer's brain insists on supplying a name. The default in this layer is descriptive, not nominal:

  • "Central figure, woman in her mid-50s, wearing civilian dress, head covered."
  • "Two children, approximately 8 and 10 years old, sharing a single coat."
  • "Uniformed officer, rank insignia visible on collar, face turned partially away from camera."

A name appears in Layer 3 only when there is strong, verifiable evidence: a visible nameplate, an archival caption from a trusted source, a face match against documented photographs by a researcher who knows the subject. Without that, the name does not belong in the entry, no matter how plausible a guess might feel.

This is the point at which most AI-assisted catalogs fail. They produce names, and the names enter the archive, and they cannot be retracted.

Layer 4 — Historical context

Once Layers 1–3 are documented, the historical context can be layered in. What is known to have happened at the kind of place and time the image shows.

This layer is always conditional. "If the location is consistent with the source's claim of Łódź, the date range (1941–1944) would coincide with the documented ghetto period." Not "This shows the Łódź ghetto." The conditional framing is not weakness — it is what allows another researcher to evaluate the chain of inference and either extend it or break it.

Confidence markers as practitioner discipline

Every identification in the catalog gets one of three markers:

  • ✓ Confirmed — clearly visible inscription, well-documented landmark, archival caption from a trusted source that supports the claim.
  • ~ Probable — consistent with multiple visual cues, period-correct, but no single decisive piece of evidence.
  • ? Tentative — visual resemblance only, or genre/period inference; needs further work to firm up.

The marker is not a hedge. It is a structured commitment to the level of evidence behind the claim. A ~ in an institutional catalog is more valuable than a quiet that assumes too much. Both are usable; both are auditable; both leave the next researcher with the information they need to build further work on top.

The same logic applies in reverse: if every claim in an entry is marked , an experienced reader knows to suspect the cataloger. Real archival work produces a mix.

A worked example

A photograph from the early 1940s, Eastern Europe. Unsorted, no caption.

  • Visual: An adult and a child, both in winter coats, standing in front of a wooden door. The door has a Hebrew inscription painted across the upper frame, partly visible.
  • Objects: The adult's coat shows what appears to be a yellow Star of David patch sewn at the left chest — ✓ consistent with German occupation regulations from 1941 onward in occupied Eastern Europe. The architecture (wooden door with a vertical groove cut into the right doorpost) is ~ consistent with rural Eastern European Jewish housing of the period.
  • Persons: Two figures, unidentified. The adult is approximately 35–45 years old; the child is approximately 6–8.
  • Historical context: If the location is in occupied Eastern Europe (consistent with visual cues above), the period 1941–1943 saw mass deportations of rural Jewish communities throughout the region. No specific event is identifiable from this image alone.
  • Notes: No inscriptions on the photograph itself; reverse not seen. Identification would benefit from comparison with regional family archives and donor records, if available.

The entry is publishable as-is. It does not over-claim. It does not under-claim. It is auditable. And critically, it leaves room for a researcher with additional context to extend it without having to first dismantle a confident guess.

Why this matters now

Holocaust-era visual archives are being digitized faster than they are being cataloged. Most digitization projects produce high-resolution scans but minimal structured metadata. AI tools are being applied to fill the gap — and many of them produce catalog entries that hallucinate names onto unidentified faces, locations onto generic streetscapes, dates onto undated images.

An archive polluted with hallucinated identifications is harder to use than an archive with no identifications at all. The 3-layer framework, with explicit confidence markers, lets AI help with the layers where it can genuinely help (visual, objects, sometimes context) and defer where it must (persons, specific events). What comes out is a catalog entry that future historians can build on rather than fight against.

The past is not made more accessible by being made more confident. It is made more accessible by being made more honest.