Tutorial Speakers

  • Prof. Vittorio Ferrari
    University of Edinburgh
    Google Research
    Knowledge transfer and human-machine collaboration for training visual models
    Vittorio Ferrari is a Professor at the School of Informatics of the University of Edinburgh and a Research Scientist at Google, leading a research group on visual learning in each institution. He received his PhD from ETH Zurich in 2004 and was a post-doctoral researcher at INRIA Grenoble in 2006-2007 and at the University of Oxford in 2007-2008. Between 2008 and 2012 he was Assistant Professor at ETH Zurich, funded by a Swiss National Science Foundation Professorship grant. He received the prestigious ERC Starting Grant, and the best paper award from the European Conference in Computer Vision, both in 2012. He is the author of over 90 technical publications. He regularly serves as an Area Chair for the major computer vision conferences, he will be a Program Chair for ECCV 2018 and a General Chair for ECCV 2020. He is an Associate Editor of IEEE Pattern Analysis and Machine Intelligence. His current research interests are in learning visual models with minimal human supervision, object detection, and semantic segmentation.
    Object class detection and segmentation are challenging tasks that typically requires tedious and time consuming manual annotation for training. In this talk I will present three techniques we recently developed for reducing this effort. In the first part I will explore a knowledge transfer scenario: training object detectors for target classes with only image-level labels, helped by a set of source classes with bounding-box annotations. In the second and third parts I will consider human-machine collaboration scenarios (for annotating bounding-boxes of one object class, and for annotating the class label and approximate segmentation of every object and background region in an image).

  • Ivan Laptev
    Research Director, INRIA Paris
    Towards action understanding with less supervision
    Ivan Laptev is a senior researcher at INRIA Paris, France. He received a PhD degree in Computer Science from the Royal Institute of Technology in 2004 and a Habilitation degree from École Normale Supérieure in 2013. Ivan's main research interests include visual recognition of human actions, objects and interactions. He has published over 60 papers at international conferences and journals of computer vision and machine learning. He serves as an associate editor of IJCV and TPAMI journals, he will serve as a program chair for CVPR’18, he was an area chair for CVPR’10,’13,’15,’16 ICCV’11, ECCV’12,’14 and ACCV’14,16, he has co-organized several tutorials, workshops and challenges at major computer vision conferences. He has also co-organized a series of INRIA summer schools on computer vision and machine learning (2010-2013). He received an ERC Starting Grant in 2012 and was awarded a Helmholtz prize in 2017.
    Next to the impressive progress in static image recognition, action understanding remains a puzzle. The lack of large annotated datasets, the compositional nature of activities and ambiguities of manual supervision are likely obstacles towards a breakthrough. To address these issues, this talk will present alternatives for the fully-supervised approach to action recognition. First I will discuss methods that can efficiently deal with annotation noise. In particular, I will talk about learning from incomplete and noisy YouTube tags, weakly-supervised action classification from textual descriptions and weakly-supervised action localization using sparse manual annotation. The second half of the talk will discuss the problem of automatically defining appropriate human actions and will draw relations to robotics.

  • Abhinav Gupta
    Associate Professor, Carnegie Mellon University
    Supersizing and Empowering Visual Learning
    My research focuses on developing representation and reasoning approaches for deeper understanding of the scene. I am interested in formulating the scene understanding problem in terms of the underlying 3D scene and develop reasoning approaches based on physical, functional and causal relationships between the different elements in the scene. The key idea is to have a qualitative representation and yet have a meaningful grounding in the physical scene. I have been focusing on studying how do humans interact with their environment and how does their perception of visual world depends on these interactions and their abilities. Building upon Gibson’s idea of affordances, we have recently proposed the concept of human centric scene understanding.

  • Zeynep Akata
    Assistant Professor , University of Amsterdam
    Max Planck Institute
    Explaining and Representing Novel Concepts With Minimal Supervision
    Dr. Zeynep Akata is an Assistant Professor with the University of Amsterdam in the Netherlands, Scientific Manager of the Delta Lab and a Senior Researcher at the Max Planck Institute for Informatics in Germany. She holds a BSc degree from Trakya University (2008), MSc degree from RWTH Aachen (2010) and a PhD degree from University of Grenoble (2014). After completing her PhD at the INRIA Rhone Alpes with Prof. Dr. Cordelia Schmid, she worked as a post-doctoral researcher at the Max Planck Institute for Informatics with Prof. Dr. Bernt Schiele and a visiting researcher with Prof Trevor Darrell at UC Berkeley. She is the recipient of Lise Meitner Award for Excellent Women in Computer Science in 2014. Her research interests include machine learning combined with vision and language for the task of explainable artificial intelligence (XAI).
    Clearly explaining a rationale for a classification decision to an end-user can be as important as the decision itself. Existing approaches for deep visual recognition aregenerally opaque and do not output any justification text; contemporary vision-languagemodels can describe image content but fail to take into account class-discriminativeimage aspects which justify visual predictions. In this talk, I will present my past and current work on Zero-Shot Learning, Vision and Language for Generative Modeling and Explainable Artificial Intelligence in that (1) how we can generalize the image classifica- tion models to the cases when no visual training data is available, (2) how to generateimages and image features using detailed visual descriptions, and (3) how our models focus on discriminating properties of the visible object, jointly predict a class label,explain why the predicted label is appropriate for the image whereas another label is not.
Copyright ©2017-2018. All Rights Reserved. Contents provided by BMVC2018
^ Back to Top