text-generation-inference documentation
External Resources
Getting started
Text Generation InferenceQuick TourSupported ModelsUsing TGI with Nvidia GPUsUsing TGI with AMD GPUsUsing TGI with Intel GaudiUsing TGI with AWS Trainium and InferentiaUsing TGI with Google TPUsUsing TGI with Intel GPUsInstallation from sourceMulti-backend supportInternal ArchitectureUsage Statistics
Tutorials
Consuming TGIPreparing Model for ServingServing Private & Gated ModelsUsing TGI CLIDeploying on AWS (EC2 and SageMaker)Non-core Model ServingSafetyUsing Guidance, JSON, toolsVisual Language ModelsMonitoring TGI with Prometheus and GrafanaTrain Medusa
Backends
Reference
Conceptual Guides
External Resources
- Adyen wrote a detailed article about the interplay between TGI’s main components: router and server. LLM inference at scale with TGI (Martin Iglesias Goyanes - Adyen, 2024)