Get in Touch

Course Outline

Tencent Hunyuan Production Fundamentals

  • Overview of Tencent Hunyuan model serving scenarios
  • Production characteristics of large and MoE models
  • Common latency, throughput, and cost bottlenecks
  • Defining service-level objectives for inference workloads

Deployment Architecture and Serving Flow

  • Core components of a production inference stack
  • Choosing between containerized, on-premise, and cloud deployment models
  • Model loading, request routing, and GPU allocation basics
  • Designing for reliability and operational simplicity

Latency Optimisation in Practice

  • Using optimized inference engines such as TensorRT where applicable
  • KV-cache concepts and practical cache tuning
  • Reducing startup, warmup, and response overhead
  • Measuring time to first token and token generation speed

Throughput, Batching, and GPU Efficiency

  • Continuous batching and request batching strategies
  • Managing concurrency and queue behavior
  • Improving GPU utilization without harming user experience
  • Handling long-context and mixed-workload requests

Quantization and Cost Control

  • Why quantization matters for production serving
  • Practical trade-offs of FP16, INT8, and other common precision options
  • Balancing model quality, latency, and infrastructure cost
  • Building a simple cost optimization checklist

Operations, Monitoring, and Readiness Review

  • Autoscaling triggers for inference services
  • Monitoring latency, throughput, cache usage, and GPU health
  • Logging, alerting, and incident response basics
  • Reviewing a reference deployment and creating an improvement plan

Requirements

  • Basic understanding of large language model deployment and inference workflows
  • Experience with containers, cloud or on-premise infrastructure, and API-based services
  • Working knowledge of Python or system engineering tasks

Audience

  • ML engineers deploying LLMs into production
  • Platform engineers responsible for GPU-based inference services
  • Solution architects designing scalable AI serving platforms
 14 Hours

Custom Corporate Training

Training solutions designed exclusively for businesses.

  • Customised Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
  • Flexible Schedule: Dates and times adapted to your team's agenda.
  • Format: Online (live), In-company (at your offices), or Hybrid.
Investment

Price per private group, online live training, starting from £3200 + VAT*

Contact us for an exact quote and to hear our latest promotions

Provisional Upcoming Courses (Contact Us For More Information)

Related Categories