Built a new integer codec (Lotus) that beats LEB128/Elias codes on many ranges – looking for feedback on gaps/prior art before arXiv submission

I’ve been sitting on this a while, I’d tried to post it to Reddit months ago but it got removed for who-knows-why, I love digg, always have, so I’d love nothing more if the first place it saw the light of day was D-I-double-G-dot-com.

I designed and implemented an integer compression codec called Lotus that reclaims the “wasted” representational space in standard binary encoding by treating each distinct bitstring (including leading zeros) as a unique value.

Core idea: Instead of treating \`1\`, \`01\`, \`001\` as the same number, Lotus maps every bitstring of length L to a contiguous integer range, then uses a small tiered header (anchored by a fixed-width “jumpstarter”) to make it self-delimiting.

Why it matters: On uniform 32-bit and 64-bit integer distributions, Lotus consistently beats:

• LEB128 (the protobuf varint) by \~2–5 bits/value

• Elias Delta/Omega by \~3–4 bits/value

• All classic universal codes across broad ranges

The codec is parametric (you tune J = jumpstarter width, d = tier depth) so you can optimize for your distribution.

Implementation: Full Rust library with streaming BitReader/BitWriter, benchmarks against LEB128/Elias, and a formal whitepaper with proofs.

GitHub: https://github.com/coldshalamov/lotus

Whitepaper: https://docs.google.com/document/d/1CuUPJ3iI87irfNXLlMjxgF1Lr14COlsrLUQz4SXQ9Qw/edit?usp=drivesdk

What I’m looking for:

• What prior art am I missing? (I cite Elias codes, LEB128, but there’s probably more)

• Does this map cleanly to existing work in information theory or is the “density reclaiming” framing actually novel?

• Any obvious bugs in my benchmark methodology or claims?

• If this seems solid, any suggestions on cleaning it up for an arXiv submission (cs.IT or cs.DS)?

I’m an independent dev with no academic affiliation, I’ve had a hell of a time even getting endorsed to publish in arXive but I’m working on it, so any pointers on improving rigor or finding relevant related work would be hugely appreciated.