Keynote Speakers

Keynote and Plenary Schedule

Sunday, December 22nd

Prof. Siddhartha Chaudhuri
Adobe and IIT Bombay

Keynote Timing: 4.30 - 5.30 PM
Title: Learning to Generate 3D Structures


Virtual 3D objects are central to a range of applications of visual computing, including industrial CAD, artificial intelligence, and entertainment. Since manually sculpting 3D shapes on a computer is difficult and time-consuming, a variety of techniques have been proposed to aid the process by automatically sampling detailed 3D shapes from a distribution over a shape space, optionally conditioned on high-level user input. Such generative distributions can be hard to specify by hand: instead, the models are learned by algorithms that analyze large repositories of existing shapes. For the synthesized shapes to be widely useful and manipulable, they need to encode not just the low-level monolithic geometry of the object, but also its high-level *structure*, typically represented as a (hyper-)graph whose vertices are meaningful components of the object and edges are relationships between them. Further, the synthesis itself can benefit from a structural understanding of shapes. For instance, knowing that chairs are typically tightly-constrained arrangements of seats, backs and legs enables learning low-dimensional factorized models that respect this structure. In this talk, I will discuss recent work in learning generative models over collections of 3D structures. Machine learning for vision and graphics is traditionally applied to fixed-dimensional domains, commonly grids of pixels or voxels. However, these methods do not easily extend to domains where each element is a graph with arbitrary, irregular topology. Several approaches have been devised to allow cutting-edge learning methods, such as deep neural networks, to operate on these domains. These methods draw inspiration from diverse areas such as language, biology and perception, and in turn may aid computational analysis in a variety of allied fields where spaces of graphs are studied. I will conclude with a discussion of existing challenges and opportunities for further work.


Siddhartha Chaudhuri is a Senior Research Scientist in the Creative Intelligence Lab at Adobe Research, and Assistant Professor (on leave) of Computer Science and Engineering at IIT Bombay. He obtained his Ph.D. from Stanford University, and his undergraduate degree from IIT Kanpur. He subsequently did postdoctoral research at Stanford and Princeton, and taught for a year at Cornell. Siddhartha’s work combines geometric analysis, machine learning, and UI innovation to make sophisticated 3D geomet- ric modeling accessible even to non-expert users. He also studies foundational problems in geometry processing (e.g. retrieval, segmentation, correspondences) that arise from this pursuit. His research themes include probabilistic assembly-based modeling, semantic attributes for design, generative neural networks for 3D structures, and other applications of deep learning to 3D geometry processing. He has published several papers on these topics, and has taught tutorials on data-driven 3D design (SIGGRAPH Asia 2014), shape "semantics" (ICVGIP 2016), and structure-aware 3D generation (Eurographics 2019). He is also the original author of the commercial 3D modeling tool Adobe Fuse. More information is available at his website: .

Prof. Ardhendu Behera
Edge Hill University

Keynote Timing: 5.30 - 6.30 PM
Title: Computer Vision and Deep Learning – A Marriage of Neuroscience and Machine Learning


For almost 10 decades, human vision researchers have been studying how the human vision system has evolved. While computer vision is a much younger discipline, it has achieved impressive results in many detection and classification tasks (e.g. object recognition, scene classification, face recognition, etc.) within a short span of time. Computer vision is one of the fastest growing fields and one of the reasons is due to the amount of video/image data from urban environment growing exponentially (e.g. 24/7 cameras, social media sources, smart city, etc.). The scale and diversity of these videos/images make it very difficult to extract reliable information to automate in a timely manner. Recently, Deep Convolutional Neural Networks (DCNNs) have shown impressive performance for solving visual recognition tasks when trained on large-scale datasets. However, such progresses face challenges when rolling into automation and production. These include enough data of good quality, executives’ expectations about model performance, responsibility and trustworthiness in decision making, data ingest, storage, security and overall infrastructure, as well as understanding how machine learning differ from software engineering.

In this talk, I will focus on recent progress in advancing human action/activity and behaviour recognition from images/videos, addressing the research challenges of relational learning, deep learning, human pose, human-objects interactions and transfer learning. I will then briefly describe some of our recent efforts to adopt these challenges in automation and robotics, in particular human-robot social interaction, in-vehicle activity monitoring and smart factories.


Ardhendu Behera is a Senior Lecturer (Associate Professor) in the Department of Computer Science in Edge Hill University (EHU). Prior to this, he held post-doc positions at the universities of Fribourg (2006-07) and Leeds (2007-14). He holds a PhD from University of Fribourg, MEng from Indian Institute of Science, Bangalore and BEng from NIT Allahabad. He is leading the visualisation theme of the Data and Complex Systems Research Centre at the EHU. He is also a member of Visual Computing Lab. His main research interests are computer vision, deep learning, pattern recognition, robotics and artificial intelligence. He applies this interest to interdisciplinary research areas such as monitoring and recognising suspicious behaviour, human-robot social interactions, autonomous vehicles, monitoring driving behaviour, healthcare and patient monitoring, and smart environments. Dr Behera has been involved in various outreach activities and some of his research are covered by media, press, newspaper and television.

Monday, December 23rd

Plenary Speaker: Prof. Andrew Zisserman
University of Oxford

Keynote Timing: 9:30 - 10:30 AM
Title: Recognizing human actions in videos


One of the long term aims of computer vision is understanding human actions and activities from video. Other tasks in computer vision, such as human face recognition and pose prediction have seen considerable improvement due to deep learning, and action recognition is now catching up. However, networks for action recognition are very data hungry in training, and the question is: how can we obtain suitable training data and networks for action recognition?

This talk will review a number of approaches for answering this question, including the classic: strong supervision using a large scale action classification dataset (Kinetics); and also more recent self-supervised approaches using prediction from within the video stream or cross-modal prediction from audio, and finally weak supervision from narrated videos. We will also describe applications of the trained networks to action classification, retrieval and temporal segmentation.


Andrew Zisserman is a Royal Society Research Professor at the Department of Engineering Science, University of Oxford, where he heads the Visual Geometry Group (VGG). His research has investigated and made contributions to computer vision, including: multiple view geometry, visual recognition, and large scale retrieval in images and video. He has authored over 400 peer reviewed papers in computer vision, and co-edited and written several books in this area. His recent research focuses on audio and visual recognition. His papers have won best paper awards at international conferences, and several 'test of time' awards.

Prof. Namrata Vaswani
Iowa State University

Keynote Timing: 2-3 PM
Title: Subspace Learning from Bad Data: Phaseless PCA and dynamic Robust PCA


Principal Components Analysis (PCA), a.k.a. subspace learning, is one of the most widely used noise removal and dimension reduction techniques.  PCA assumes that the true data (``signal``) sequence lies close to a low-dimensional subspace of the ambient space. Said another way, it assumes that the true data sequence forms a low-rank matrix. This talk will discuss our work on two problems that involve PCA and subspace learning from ``bad`` data. Here the term ``bad`` means one of many things – the observed data is a nonlinear function of the true data, or it has missing entries, and/or it is corrupted by outliers. Applications in video analytics and Fourier ptygography will be shown.

Phaseless PCA is a term we use for the following low-rank phase retrieval (PR) problem: recover a low-rank matrix from magnitude-only (phaseless) linear projections of each of its columns. In analogy with Robust PCA (which refers to the problem of recovering a low-rank matrix from one that is corrupted by additive sparse outliers), we refer to this problem as ``Phaseless PCA``. It finds important applications in dynamic phaseless imaging problems, for example, for Fourier ptychographic imaging of live biological specimens. In recent work (ICML 2019), we introduced the first provably correct solution, called AltMinLowRaP, for this problem. Our guarantee shows that AltMinLowRaP can recover the low-rank matrix using much fewer measurements/samples than what standard PR methods need. Moreover, its sample complexity is only a little (r^3 times) worse than the order-optimal one for low-rank recovery. Here r is the rank of the unknown low- rank matrix. Extensive experiments on both simulated and real dynamic Fourier ptychographic data demonstrate the practical power our method. In the second part of this talk, we will briefly describe our older body of work on provably correct, fast, and useful solutions to robust subspace tracking (``dynamic`` robust PCA) and its special case, subspace tracking with missing data. Applications in video analytics will be shown.


Namrata Vaswani is the Anderlik Professor of Electrical and Computer Engineering at Iowa State University. She received a Ph.D. in 2004 from the University of Maryland, College Park and a B.Tech. from Indian Institute of Technology (IIT-Delhi) in India in 1999. Her research interests lie in a data science, with a particular focus on statistical Machine Learning, statistical Signal Processing, and Computer Vision. She has served two terms as an Associate Editor for the IEEE Transactions on Signal Processing; as a lead guest-editor for a Proceedings of the IEEE Special Issue (on Rethinking PCA for modern datasets); and is currently serving as an Area Editor for the IEEE Signal Processing Magazine. Vaswani is a recipient of the Iowa State University Mid-Career Achievement in Research Award (2019); the University of Maryland’s ECE Distinguished Alumni Award (2019); the Iowa State Early Career Engineering Faculty Research Award (2014); and the 2014 IEEE Signal Processing Society Best Paper Award. Vaswani is a Fellow of the IEEE (class of 2019) for contributions to dynamic structured high-dimensional data recovery.

Tuesday, December 24th

Prof. Amit Roy Chowdhury
University of California, Riverside

Keynote Timing: 9.30 - 10.30 AM
Title: Scalable Computer Vision


There has been tremendous progress in computer vision over the last decade, and research ideas are making their way into various products. However, major challenges remain if computer vision is to make a meaningful impact in a variety of societal needs, e.g., autonomous driving, environmental monitoring, medical applications, etc. These challenges relate to the scalability of existing algorithms to the needs of the applications. In this talk, we will discuss three such challenges: (i) the lack of training data, (ii) the ability to handle large multi-sensor multi-modal data volumes, and (iii) implementation in resource constrained environments. We will briefly review some of the ideas that have been proposed in this regard, discuss their strengths and limitations, and directions that computer vision scientists should focus on in the future.


Amit Roy-Chowdhury received his PhD from the University of Maryland, College Park (UMCP) in Electrical and Computer Engineering in 2002 and joined the University of California, Riverside (UCR) in 2003 where he is a Professor and Bourns Family Faculty Fellow of Electrical and Computer Engineering, Director of the Center for Research in Intelligent Systems, and Cooperating Faculty in the department of Computer Science and Engineering. He leads the Video Computing Group at UCR, working on foundational principles of computer vision, image processing, and vision-based statistical learning, with applications in cyber-physical, autonomous and intelligent systems. He has published about 200 papers in peer-reviewed journals and conferences. Prof. Roy-Chowdhury's research has been supported by various US Federal and State agencies and private industries, including the NSF, DoD, Google, and CISCO, receiving well over USD10M in research funding. He is the first author of the book Camera Networks: The Acquisition and Analysis of Videos Over Wide Areas. He is a Senior Associate Editor of the IEEE Trans. on Image Processing, an Associate Editor of the IEEE Trans. on Pattern Analysis and Machine Intelligence, and on program committees of the main conferences in his area. His students have been first authors on multiple papers that received Best Paper Awards at major international conferences, including ICASSP and ICMR. He is a Fellow of the IEEE and IAPR, and received the Doctoral Dissertation Advising/Mentoring Award 2019 from UCR.

Prof. Dima Damen
University of Bristol

Keynote Timing: 2 - 3PM
Title: A fine-grained perspective onto object interactions


This talk aims to argue for a fine-grained perspective onto human-object interactions, from video sequences. I will present approaches for determining skill or expertise from video sequences [CVPR 2019], assessing action ‘completion’ – i.e. when an interaction is attempted but not completed [BMVC 2018], dual-domain and dual-time learning [CVPR 2019, ICCVW 2019] as well as multi-modal approaches using vision, audio and language [ICCV 2019, BMVC 2019]. I will also introduce EPIC-KITCHENS [ECCV 2018], the recently released largest dataset of object interactions in people’s homes, recorded using wearable cameras. The dataset includes 11.5M frames fully annotated with objects and actions, based on unique annotations from the participants narrating their own videos, thus reflecting true intention. Three open challenges are now available on object detection, action recognition and action anticipation []


Dima Damen: Reader (Associate Professor) in Computer Vision at the University of Bristol, United Kingdom. Received her PhD from the University of Leeds (2009). Dima's research interests are in the automatic understanding of object interactions, actions and activities using static and wearable visual (and depth) sensors. Dima co-chaired BMVC 2013, is area chair for BMVC (2014-2019), associate editor of Pattern Recognition (2017-) and IET Computer Vision (2013-). She was selected as a Nokia Research collaborator in 2016, and as an Outstanding Reviewer in ICCV17, CVPR13 and CVPR12. She currently supervises 8 PhD students, and 3 postdoctoral researchers. More details at: []