Autoregressive Model Parallel Inference Efficiency
Autoregressive models face challenges due to their inherently sequential structure. Learn a new temporal fusion framework that enables parallel inference of multiple requests in autoregressive models, achieving significant speed-ups over state-of-the-art solution.