Deploying GPT-J and T5 with NVIDIA Triton Inference Server
This post is a guide to optimized inference of large transformer models such as EleutherAI’s GPT-J 6B and Google’s T5-3B. Both of these models demonstrate good results in many downstream tasks and are among the most available to researchers and data scientists.