MothRag
Alpha · Paper upcoming

Answers to the questions one search can't reach.

MothRag is built for the hard questions — the ones whose answer is spread across many documents, where ordinary AI search misses it. It connects the evidence, shows the reasoning behind every answer, and runs entirely on the LLM APIs you already pay for: no GPUs, no model hosting, no lock-in.

Research-grade accuracy No GPUs, no model hosting Runs on any LLM API Shows its work
Why it matters

Production AI search stops at one lookup. Real questions don't.

Most AI search does a single lookup and stops. That breaks the moment a question spans multiple documents, chains entities, or compares facts across sources — exactly the questions that matter most in real knowledge work. MothRag is built for that case, and ships as a Python package you point at any LLM API you already use.

01

Built for the hard questions.

It connects facts across many sources to answer the multi-hop questions a single search can't.

02

No infrastructure to run.

No GPUs to rent, no models to host, no special infrastructure — it runs on the standard LLM APIs you already use.

03

You can see why it answered.

Every answer comes with the evidence and the reasoning trail behind it — so you can check it, not just trust it. The difference between a demo and something you can put in front of customers.

The edge

Frontier quality — finally something you can actually deploy.

Accuracy

Frontier accuracy, deployable.

On the standard multi-hop benchmarks, MothRag matches the best systems published by research labs. The difference: those need datacenter GPUs and non-commercial models — MothRag reaches that level on commodity APIs alone.

Infrastructure

Runs on commodity APIs.

No GPU fleet, no hosted models, no special infrastructure — MothRag runs entirely on the standard LLM APIs you already pay for.

Portability

No vendor lock-in.

Works with any model, today's or tomorrow's. Swap the engine underneath without retraining anything.

Transparency

Answers that show their work.

Every answer is structured as an inspectable reasoning trail over the evidence it used, with a built-in agreement signal across its internal reasoning paths — so you can gauge confidence at a glance.

Validation

Measured at research scale — and honest about it.

Validated across three standard multi-hop benchmarks at the same scale researchers use (1000 evaluations each, Llama-3.3-70B reader): on par with the published research frontier — while running entirely on the standard LLM APIs you already use. Full numbers and methodology in the upcoming paper.

Quality
On par with research SOTA
Benchmarks
3 standard, multi-hop
Scale
1000 evals each
Reader
Llama-3.3-70B
Hardware
Any LLM API, no GPU
Status

In active development. Paper ready to publish.

MothRag is being built in the open. The paper documenting the method and results is finished and will be published shortly, with a public release to follow.

Stage
In development
Install
Coming soon
Paper
Ready to publish
Contact
Julian Geymonat