Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction to Vision-Language Models
- Overview of VLMs and their role in multimodal AI
- Popular architectures: CLIP, Flamingo, BLIP, etc.
- Use cases: search, captioning, autonomous systems, content analysis
Preparing the Fine-Tuning Environment
- Setting up OpenCLIP and other VLM libraries
- Dataset formats for image-text pairs
- Preprocessing pipelines for vision and language inputs
Fine-Tuning CLIP and Similar Models
- Contrastive loss and joint embedding spaces
- Hands-on: fine-tuning CLIP on custom datasets
- Handling domain-specific and multilingual data
Advanced Fine-Tuning Techniques
- Using LoRA and adapter-based methods for efficiency
- Prompt tuning and visual prompt injection
- Zero-shot vs. fine-tuned evaluation trade-offs
Evaluation and Benchmarking
- Metrics for VLMs: retrieval accuracy, BLEU, CIDEr, recall
- Visual-text alignment diagnostics
- Visualizing embedding spaces and misclassifications
Deployment and Use in Real Applications
- Exporting models for inference (TorchScript, ONNX)
- Integrating VLMs into pipelines or APIs
- Resource considerations and model scaling
Case Studies and Applied Scenarios
- Media analysis and content moderation
- Search and retrieval in e-commerce and digital libraries
- Multimodal interaction in robotics and autonomous systems
Summary and Next Steps
Requirements
- An understanding of deep learning for vision and NLP
- Experience with PyTorch and transformer-based models
- Familiarity with multimodal model architectures
Audience
- Computer vision engineers
- AI developers
14 Hours
Delivery Options
Private Group Training
Our identity is rooted in delivering exactly what our clients need.
- Pre-course call with your trainer
- Customisation of the learning experience to achieve your goals -
- Bespoke outlines
- Practical hands-on exercises containing data / scenarios recognisable to the learners
- Training scheduled on a date of your choice
- Delivered online, onsite/classroom or hybrid by experts sharing real world experience
Private Group Prices RRP from £3800 online delivery, based on a group of 2 delegates, £1200 per additional delegate (excludes any certification / exam costs). We recommend a maximum group size of 12 for most learning events.
Contact us for an exact quote and to hear our latest promotions
Public Training
Please see our public courses