Research

KV Cache Compression

Institution: The Hong Kong Polytechnic University
Category: Machine Learning
Highlights: Compression
Term: Ongoing

Description:

Attempting to integrate a new lossy compressor written in CUDA to compress cached Key Value data within the GPU architecture and reduce its size within the vllm library.
Link to vllm library: vllm - LLM interfacing & servicing