Research
KV Cache Compression
- Institution: The Hong Kong Polytechnic University
- Category: Machine Learning
- Highlights: Compression
- Term: Ongoing
Description:
Attempting to integrate a new lossy compressor written in CUDA to compress cached Key Value data within the GPU architecture and reduce its size within the vllm library.
Link to vllm library: vllm - LLM interfacing & servicing