Course Outline
Foundations of Agentic Systems in Production
- Agentic architectures: loops, tools, memory, and orchestration layers
- Lifecycle of agents: development, deployment, and continuous operation
- Challenges of production-scale agent management
Infrastructure and Deployment Models
- Deploying agents in containerized and cloud environments
- Scaling patterns: horizontal vs vertical scaling, concurrency, and throttling
- Multi-agent orchestration and workload balancing
Monitoring and Observability
- Key metrics: latency, success rate, memory usage, and agent call depth
- Tracing agent activity and call graphs
- Instrumenting observability using Prometheus, OpenTelemetry, and Grafana
Logging, Auditing, and Compliance
- Centralized logging and structured event collection
- Compliance and auditability in agentic workflows
- Designing audit trails and replay mechanisms for debugging
Performance Tuning and Resource Optimization
- Reducing inference overhead and optimizing agent orchestration cycles
- Model caching and lightweight embeddings for faster retrieval
- Load testing and stress scenarios for AI pipelines
Cost Control and Governance
- Understanding agent cost drivers: API calls, memory, compute, and external integrations
- Tracking agent-level costs and implementing chargeback models
- Automation policies to prevent agent sprawl and idle resource consumption
CI/CD and Rollout Strategies for Agents
- Integrating agent pipelines into CI/CD systems
- Testing, versioning, and rollback strategies for iterative agent updates
- Progressive rollouts and safe deployment mechanisms
Failure Recovery and Reliability Engineering
- Designing for fault tolerance and graceful degradation
- Retry, timeout, and circuit breaker patterns for agent reliability
- Incident response and post-mortem frameworks for AI operations
Capstone Project
- Build and deploy an agentic AI system with full monitoring and cost tracking
- Simulate load, measure performance, and optimize resource usage
- Present final architecture and monitoring dashboard to peers
Summary and Next Steps
Requirements
- Strong understanding of MLOps and production machine learning systems
- Experience with containerized deployments (Docker/Kubernetes)
- Familiarity with cloud cost optimization and observability tools
Audience
- MLOps engineers
- Site Reliability Engineers (SREs)
- Engineering managers overseeing AI infrastructure
Delivery Options
Private Group Training
Our identity is rooted in delivering exactly what our clients need.
- Pre-course call with your trainer
- Customisation of the learning experience to achieve your goals -
- Bespoke outlines
- Practical hands-on exercises containing data / scenarios recognisable to the learners
- Training scheduled on a date of your choice
- Delivered online, onsite/classroom or hybrid by experts sharing real world experience
Private Group Prices RRP from £5700 online delivery, based on a group of 2 delegates, £1800 per additional delegate (excludes any certification / exam costs). We recommend a maximum group size of 12 for most learning events.
Contact us for an exact quote and to hear our latest promotions
Public Training
Please see our public courses
Testimonials (3)
Good mixvof knowledge and practice
Ion Mironescu - Facultatea S.A.I.A.P.M.
Course - Agentic AI for Enterprise Applications
The mix of theory and practice and of high level and low level perspectives
Ion Mironescu - Facultatea S.A.I.A.P.M.
Course - Autonomous Decision-Making with Agentic AI
practical exercises