Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Building on HF
126.0
TFLOPS
7
27
41
Zixi "Oz" Li
PRO
OzTianlu
Follow
Airin-chan's profile picture
drubdown4et's profile picture
mrs83's profile picture
28 followers
·
30 following
https://github.com/lizixi-0x2F
lizixi-0x2F
AI & ML interests
My research focuses on deep reasoning with small language models, Transformer architecture innovation, and knowledge distillation for efficient alignment and transfer.
Recent Activity
liked
a model
4 days ago
google/gemma-4-26B-A4B-it
reacted
to
their
post
with 🤗
10 days ago
https://github.com/lizixi-0x2F/March I just released March, an open-source high-performance KV cache sharing library for LLM inference that uses Trie-based prefix deduplication. When you run LLM services, you often see thousands of requests sharing the same system prompt and conversation history. But traditional KV cache systems store each sequence separately — duplicating the exact same data over and over again. Pure waste. March uses a Trie structure to automatically detect and reuse identical token prefixes. Instead of storing [system_prompt + history] 1000 times, it's stored once. Everyone shares it. - 80-97% memory reduction in prefix-heavy workloads (tested on SmolLM2-135M with 500 multi-turn conversations) - Zero-copy queries — returns direct pointers into the memory pool, no expensive memcpy on the hot path - Predictable memory usage — fixed-size page pool with O(L) complexity - Trade-off: slightly slower than dict O(1) lookup, but the memory savings are worth it in production
posted
an
update
10 days ago
https://github.com/lizixi-0x2F/March I just released March, an open-source high-performance KV cache sharing library for LLM inference that uses Trie-based prefix deduplication. When you run LLM services, you often see thousands of requests sharing the same system prompt and conversation history. But traditional KV cache systems store each sequence separately — duplicating the exact same data over and over again. Pure waste. March uses a Trie structure to automatically detect and reuse identical token prefixes. Instead of storing [system_prompt + history] 1000 times, it's stored once. Everyone shares it. - 80-97% memory reduction in prefix-heavy workloads (tested on SmolLM2-135M with 500 multi-turn conversations) - Zero-copy queries — returns direct pointers into the memory pool, no expensive memcpy on the hot path - Predictable memory usage — fixed-size page pool with O(L) complexity - Trade-off: slightly slower than dict O(1) lookup, but the memory savings are worth it in production
View all activity
Organizations
OzTianlu
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
authored
a paper
3 months ago
Reasoning: From Reflection to Solution
Paper
•
2511.11712
•
Published
Nov 12, 2025
•
2