Built a new integer codec (Lotus) that beats LEB128/Elias codes on many ranges – looking for feedback on gaps/prior art before arXiv submission
I’ve been sitting on this a while, I’d tried to post it to Reddit months ago but it got removed for who-knows-why, I love digg, always have, so I’d love nothing more if the first place it saw the light of day was D-I-double-G-dot-com.
I designed and implemented an integer compression codec called Lotus that reclaims the “wasted” representational space in standard binary encoding by treating each distinct bitstring (including leading zeros) as a unique value.
Core idea: Instead of treating \`1\`, \`01\`, \`001\` as the same number, Lotus maps every bitstring of length L to a contiguous integer range, then uses a small tiered header (anchored by a fixed-width “jumpstarter”) to make it self-delimiting.
Why it matters: On uniform 32-bit and 64-bit integer distributions, Lotus consistently beats:
• LEB128 (the protobuf varint) by \~2–5 bits/value
• Elias Delta/Omega by \~3–4 bits/value
• All classic universal codes across broad ranges
The codec is parametric (you tune J = jumpstarter width, d = tier depth) so you can optimize for your distribution.
Implementation: Full Rust library with streaming BitReader/BitWriter, benchmarks against LEB128/Elias, and a formal whitepaper with proofs.
GitHub: https://github.com/coldshalamov/lotus
Whitepaper: https://docs.google.com/document/d/1CuUPJ3iI87irfNXLlMjxgF1Lr14COlsrLUQz4SXQ9Qw/edit?usp=drivesdk
What I’m looking for:
• What prior art am I missing? (I cite Elias codes, LEB128, but there’s probably more)
• Does this map cleanly to existing work in information theory or is the “density reclaiming” framing actually novel?
• Any obvious bugs in my benchmark methodology or claims?
• If this seems solid, any suggestions on cleaning it up for an arXiv submission (cs.IT or cs.DS)?
I’m an independent dev with no academic affiliation, I’ve had a hell of a time even getting endorsed to publish in arXive but I’m working on it, so any pointers on improving rigor or finding relevant related work would be hugely appreciated.
0 Comments