Lecture 5

Presenter

Name: Shafik Quoraishee
Topic: Foundations & Architectures of Multi-Modal Machine Learning
Description: In this lecture, Shafik provides a deep technical dive into multi-modal machine learning, focusing on the models and architectures that enable modern AI capabilities such as image understanding, video generation, and cross-modal reasoning. The session explores how different modalities like text, images, and video are processed and combined within real-world systems, offering practical insight into the technologies behind today’s most powerful AI applications. If time permits, the discussion will also cover emerging approaches to video understanding models and their role in next-generation AI systems.

Understanding the core architectures that enable multi-modal AI systems.
How image, text, and video modalities are combined in modern machine learning models.
Practical foundations for reasoning about real-world applications of multimodal AI.