Difference-of-Convex Convolutional Neural Networks (DC-CNN)

Lead Research Organisation: University of Oxford
Department Name: Engineering Science

Abstract

The goal of computer vision is to impart machines with the ability to see, that is, to understand an image similar to a human. This consists of identifying which object categories are present in an image and where (car, person, trees), their actions (running, driving, sitting), and their relative locations (person inside the car, car above the road). The challenge of computer vision lies in obtaining a powerful discriminative representation of an image that allows us to infer the scene encoded within it. Consider, for instance, the representation of an image as captured by a camera, which consists of color values of each of its pixels. Two images that differ greatly in their color values may still depict the same scene (for example, images of the same location at day and night). At the same time, small changes in the color values may result in a completely different scene where objects have moved considerably. This makes raw color values unsuitable for understanding the scene depicted in the image.

Traditional computer vision approaches have relied on hand-designed representations of an image that are more amenable to scene interpretation. Given such a representation, there exist several principled formulations for learning to interpret scenes, which take advantage of the powerful mathematical programming framework of convex optimization. Convex optimization offers many computational advantages: it scales elegantly with the size of the problem, it provides global optimality, it offers convergence guarantees and it can be parallelised over multiple machines without affecting the accuracy.

Recent years have seen the rise of deep learning, and specifically convolutional neural networks (CNN), which aim to automatically obtain the representation of visual data from a large training set. While an automated approach is highly desirable due to its scalability, it comes with the challenge of solving highly complex non-convex mathematical programs. The aim of our research is to overcome these challenges by finding connections between convex optimization and deep learning. The key observation is that the non-convex programs encountered in deep learning for computer vision have a special structure that is closely related to convexity. Specifically, while the mathematical programs are not convex, they are of a difference-of-convex (DC) form. A DC program can be optimized efficiently by an iterative algorithm, which, at each iteration, solves a convex optimization problem.

Our aim is to exploit the structure of DC-CNNs to design the next generation of algorithms for computer vision. Specifically, we will build customized algorithms that will scale up the dimensionality of the CNN by orders of magnitude while keeping the computational cost low. Our algorithms will retain many of the highly desirable benefits of convex programming (convergence, quality guarantees, elegant scaling, distributed computing) while still allow the automatic estimation of image representations.

The impact of such principled and efficient algorithms is potentially huge. The new CNN architectures that this enables will allow researchers to address significantly more complex visual tasks. For example, a generative network that can provide a set of diverse future frames of a given video sequence, or a intelligent agent that can crawl the web for images and videos and complete the captions in order to bridge the gap between visual data and searchable content. Our research results will be made publicly available via open source software. The project is also likely to have a large academic impact, consolidating the leadership of the UK in machine learning and computer vision.

Planned Impact

Recent years have witnessed the deployment of computer vision in many real-world applications. Examples include autonomous navigation from companies such as MobilEye, where it is important to understand the scene captured by the sensors placed in a car, and the Kinect sensor for Microsoft XBox, where the human pose has to be estimated automatially from depth images. Established tech-based companies such as Google, Facebook and Microsoft are releasing new software on an almost daily basis including image search (a standard feature available on almost all search engines these days), camera stablization (e.g. Motion Stills), and 3D reconstruction (e.g. Seene). The UK, and indeed most countries in the world, are witnessing an unprecedented increase in the number of computer vision based start-ups. However, this is just the start, as computer vision now tries to answer significantly more challenging problems such as automatically infer all the high-level components of a scene depicted in a visual samples (images uploaded on Flickr, or videos uploaded on YouTube) in order to bridge the large gap between the amount of visual information and the amount of searchable content on the Internet. As we make more progress in computer vision and related areas of
artificial intelligence, the opportunities for new tech-based business will grow manifolds.

In this context, the proposed research will play a key role in the deployment of the next general of computer vision solutions. One of the limiting factors of the current technology is that the current optimization algorithms used in conjunction with the ubiquitous deep learning framework has several practical and theoretical drawbacks: (i) they are slow, often taking several days to train even on state of the art hardware; (ii) they offer no guarantees of convergence; and (iii) they cannot be easily distributed across multiple computational cores. Our research is geared towards providing a new principled optimization approach that can speed up the training of a deep learning framework significantly, provide conver
gence guarantees, and naturally lend itself to parallel processing. As an upshot, researchers and practitioners will be able to cut down the development time for new applications substantially and even address more challenging tasks such as predicting the several diverse sets of future frames for a given video, or generating high-resolution and realistic visual samples from textual description.

The potential impact of the project is exemplied by the PI's collaboration with Microsoft Research on two projects that rely heavily on deep learning. The first is aimed at learning a deep disciminative or general model for visual data, which can be applied to infer human poses from depth images is highly cluttered environments, or enable gesture based human-computer interaction by inferring the joint locations of a hand. The second is the optimization of code, which would enable programmers to focus completely on the correctness of their code and leave the optimization for speed to a neural network that is capable of adapting the code to a given data set of samples. The PI is also pursuing research on automatically estimating the network architectures with the help of a Google funded PhD student, which would enable even faster deployment of the core machine learning technology to novel applications. The PI will continue to pursue these collaborations to create practical applications of the methodologies developed through the proposal.

Successful completion of this research will have a high international impact. As mentioned above, it will be of interest both to the established tech-based companies as well as the emerging start-ups in the UK and worldwide. In addition, it will also be of great interest to the large academic community focused on deep learning.

Publications

10 25 50
 
Description The grant focuses on novel optimization algorithms for deep neural networks, which are widely used to address several machine learning applications. There are three main methodologies that we have developed so far. (1) A smoothed training objective for minimizing the commonly used top-k error criterion for classification (published in ICLR 2018). (2) Efficient optimization of loss functions used for learning to rank a given set of samples in the order of their relevance to a query (published in CVPR 2018). (3) A novel algorithm for deep neural network optimization based on proximal minimization (published in ICLR 2019).
Exploitation Route The code for the three aforementioned methodologies is publicly available. The new training objective can be used to improve the results of other machine learning applications (for example, multilabel classification).
Sectors Digital/Communication/Information Technologies (including Software)

 
Description IIIT 
Organisation International Institute of Information Technology, Hyderabad
Country India 
Sector Academic/University 
PI Contribution The collaboration focuses on learning to round the solutions of a convex relaxation for discrete optimization. Our group was responsible for the problem formulation, as well the design of the optimization algorithm.
Collaborator Contribution The partner university also contributed to the design of the algorithm, its implementation and testing it on standard benchmark data sets.
Impact The collaboration has resulted in the following publication: P. Mohapatra, C.V. Jawahar and M. Pawan Kumar. Learning to Round for Discrete Labeling Problems . In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2018.
Start Year 2017
 
Description IST 
Organisation Institute of Science and Technology Austria
Country Austria 
Sector Academic/University 
PI Contribution The collaboration focuses on efficient optimization for ranking based loss functions. Our team identified several special cases of loss functions that are useful in practice, and designed an initial algorithm for their optimization. We also implemented this algorithm, tested it on several standard benchmark data sets, and publicly released the code.
Collaborator Contribution The partner university improved the applicability of the algorithm, improved its runtime, and established theoretical guarantees on the best possible runtime. The partner university also contributed to the code.
Impact The collaboration has resulted in the following publication: Efficient Optimization for Rank-based Loss Functions. P. Mohapatra, M. Rolinek, C. V. Jawahar, V. Kolmogorov, and M. Pawan Kumar. In Proceedings of Computer Vision and Pattern Recognition (CVPR), 2018.
Start Year 2017
 
Description IST 
Organisation International Institute of Information Technology, Hyderabad
Country India 
Sector Academic/University 
PI Contribution The collaboration focuses on efficient optimization for ranking based loss functions. Our team identified several special cases of loss functions that are useful in practice, and designed an initial algorithm for their optimization. We also implemented this algorithm, tested it on several standard benchmark data sets, and publicly released the code.
Collaborator Contribution The partner university improved the applicability of the algorithm, improved its runtime, and established theoretical guarantees on the best possible runtime. The partner university also contributed to the code.
Impact The collaboration has resulted in the following publication: Efficient Optimization for Rank-based Loss Functions. P. Mohapatra, M. Rolinek, C. V. Jawahar, V. Kolmogorov, and M. Pawan Kumar. In Proceedings of Computer Vision and Pattern Recognition (CVPR), 2018.
Start Year 2017
 
Description Surrey 
Organisation University of Surrey
Country United Kingdom 
Sector Academic/University 
PI Contribution The partnership focuses on the use of submodular functions to perform marginal estimation in graphical models. Our team was responsible for the analysis of existing and new submodular functions for marginal computation, implementation, and testing on standard benchmark data sets.
Collaborator Contribution The partner university was also involved in the design and analysis of new and existing submodular functions for marginal computation.
Impact The collaboration resulted in the following publication: P. Pansari, C. Russell, M. Pawan Kumar. Worst-case Optimal Submodular Extensions for Marginal Estimation. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2018
Start Year 2017
 
Title Dense CRF Inference 
Description The software package contains the implementation of all the inference algorithms for dense conditional random fields developed within our team in the past two and a half years. The resulting work has been published at ECCV 2016, CVPR 2017 and AISTATS 2018. 
Type Of Technology Software 
Year Produced 2017 
Impact The software is currently being used within our group to develop further algorithms for dense CRFs, and also within another group at Oxford University with the focus on deep neural networks for semantic segmentation. 
URL https://github.com/oval-group/DenseCRF
 
Title PLCNN 
Description Implementation of a layerwise optimization for deep neural networks based on an accurate and efficient conditional gradient algorithm. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact The software is being used within our group to further develop the algorithm for simultaneous optimization of all layers. 
URL https://github.com/oval-group/pl-cnn
 
Description CVPR AC Meeting 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The area chair meeting for CVPR (the premier annual conference in computer vision).
Year(s) Of Engagement Activity 2018
 
Description CVPR AC Meeting 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The area chair meeting for CVPR (premier annual conference on computer vision).
Year(s) Of Engagement Activity 2017
 
Description ETHZ Summer School 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Presented our work on "Efficient Frank-Wolfe for Dense CRFs and Piecewise Linear CNNs" at the ETH Zurich summer school for pre-doctoral candidates.
Year(s) Of Engagement Activity 2017