Year wise list: 2018 | 2017 | 2016 | 2015 | 2014 | 2013 |


Unsupervised Representation Learning

Maneesh

Maneesh Kumar Singh

 

Date : 28/11/2018

 

Abstract:

upervised machine learning using deep neural networks have shown tremendous success on a variety of tasks in machine learning. However, supervised learning on each individual task is neither scalable nor the only way to create world models of visual (and other) phenomena and to do inference on them. The recent research thrusts are in mitigating and surmounting such problems. The learning of static world models is also called representation learning. Unsupervised representation learning seeks to create structured latent representations to avoid the onerous need to generate supervisory labels and to enable learning of task-independent (universal) representations. In this talk, I will provide an overview of recent efforts in my group at Verisk, carried out in collaboration with various academic partners, on these topics. Most of this work is available in recent publications and accompanying code online.

Bio:

Dr. Singh is Head, Verisk | AI and the Director of Human and Computation Intelligence Lab at Verisk. He leads the R&D efforts for the development of AI and machine learning technologies in a variety of areas including computer vision, natural language processing and speech understanding. Verisk Analytics builds tools for risk assessment, risk forecasting and decision analytics in a variety of sectors including insurance, financial services, energy, government and human resources.
From 2013-2015, Dr. Singh was a Technology Leader in the Center for Vision Technologies at SRI International, Princeton, NJ. At SRI, he was the technical lead for the DARPA Visual Media Reasoning (VMR) project for Automatic Performance Characterization and led the development and implementation of efficient Pareto optimal performance curves and a multithreaded APC system for benchmarking more than 40 CV and ML algorithms. Dr. Singh was the Algorithms Lead for the DARPA CwC CHAPLIN project for designing a human-computer collaboration (HCC) system to enable composition of visual narratives (cartoon strips, movies) with effective collaboration between a human actor and the computer. He was also a key performer on the DARPA DTM (Deep Temporal Models) seedling project for designing deep learning algorithms on video data. Previously, Dr. Singh was a Staff Scientist at Siemens Corporate Technology, Princeton, NJ till 2013. At Siemens, he led and contributed to a large number of projects for successful development and deployed of computer vision and machine learning technologies in multi-camera security and surveillance, aerial surveillance, advanced driver assistance and intelligent traffic control; industrial inspection; and, medical image processing and patient diagnostics. Dr. Singh received his Ph.D. in Electrical and Computer Engineering from the University of Illinois at in 2003. He has authored over 35 publications and 15 U.S. and International patents.


Visual recognition of human communications

Joon Son Chung

Dr Joon Son Chung

 

Date : 05/09/2018

 

Abstract:

The objective of this work is visual recognition of human communications.Solving this problem opens up a host of applications, such as transcribing archival silent films, or resolving multi-talker simultaneous speech,but most importantly it helps to advance the state of the art in speech recognition by enabling machines to take advantage of the multi-modal nature of human communications.
Training a deep learning algorithm requires a lot of training data.We propose a method to automatically collect, process and generate a large-scale audio-visual corpus from television videos temporally aligned with the transcript.To build such data-set, it is essential to know 'who' is speaking 'when'.We develop a ConvNet model that learns joint embedding of the sound and the mouth images from unlabeled data, and apply this network to the tasks of audio-to-video synchronization and active speaker detection.We also show that the methods developed here can be extended to the problem of generating talking faces from audio and still images, and re-dubbing videos with audio samples from different speakers.
We then propose a number of deep learning models that are able to recognize visual speech at sentence level.The lip reading performance beats a professional lip reader on videos from BBC television.We demonstrate that if audio is available, then visual information helps to improve speech recognition performance.We also propose methods to enhance noisy audio and to resolve multi-talker simultaneous speech using visual cues.
Finally, we explore the problem of speaker recognition.Whereas previous works for speaker identification have been limited to constrained conditions, here we build a new large-scale speaker recognition data-set collected from 'in the wild' videos using an automated pipeline. We propose a number of ConvNet architectures that outperforms traditional baselines on this data-set.

Bio:

Joon Son is a recent graduate from the Visual Geometry Group at the University of Oxford, and a research scientist at Naver Corp. His research interests are in computer vision and machine learning.


Visual recognition of human communications

Triantafyllos Afouras

Triantafyllos Afouras

 

Date : 05/09/2018

 

Abstract:

The objective of this work is visual recognition of human communications.Solving this problem opens up a host of applications, such as transcribing archival silent films, or resolving multi-talker simultaneous speech,but most importantly it helps to advance the state of the art in speech recognition by enabling machines to take advantage of the multi-modal nature of human communications.
Training a deep learning algorithm requires a lot of training data.We propose a method to automatically collect, process and generate a large-scale audio-visual corpus from television videos temporally aligned with the transcript.To build such data-set, it is essential to know 'who' is speaking 'when'.We develop a ConvNet model that learns joint embedding of the sound and the mouth images from unlabeled data, and apply this network to the tasks of audio-to-video synchronization and active speaker detection.We also show that the methods developed here can be extended to the problem of generating talking faces from audio and still images, and re-dubbing videos with audio samples from different speakers.
We then propose a number of deep learning models that are able to recognize visual speech at sentence level.The lip reading performance beats a professional lip reader on videos from BBC television.We demonstrate that if audio is available, then visual information helps to improve speech recognition performance.We also propose methods to enhance noisy audio and to resolve multi-talker simultaneous speech using visual cues.
Finally, we explore the problem of speaker recognition.Whereas previous works for speaker identification have been limited to constrained conditions, here we build a new large-scale speaker recognition data-set collected from 'in the wild' videos using an automated pipeline. We propose a number of ConvNet architectures that outperforms traditional baselines on this data-set.

Bio:

Triantafyllos is a 2nd year PhD student in the Visual Geometry Group at the University of Oxford, under the supervision of Prof. Andrew Zisserman. He currently researches computer vision for understanding human communication, which includes lip reading, audio visual speech recognition and enhancement and body language modeling.


Making sense of Urban Data: Experiences from interdisciplinary research studies

Manik Gupta

Dr Manik Gupta

 

Date : 03/08/2018

 

Abstract:

In this talk, I present an overview about the different research projects that I have worked during my research career beginning with my PhD on air pollution monitoring to my first postdoc on wheelchair accessibility and then onto my second postdoc on building energy management data to my current successful research grants on ear protection in noisy environments.
Finally, I summarize my learnings and future directions, guiding towards unlimited opportunities to make difference to millions of lives in this very exciting new era of data and more data.

Bio:

Dr Manik Gupta is a lecturer (assistant professor) in the Computer Science and Informatics division at London Southbank University (LSBU), London, UK.She received her PhD from Queen Mary University of London (QMUL) in Wireless Sensor Networks (WSN)/Internet of Things (IoT). Her research interests include Data Science for IoT with focus upon real world data collection and data quality issues, knowledge discovery from large sensor datasets, IoT data processing using time series data mining and distributed analytics, applications in environmental monitoring and well being.
She has extensive experience in IoT sensor deployments and real-world data collection studies for EPSRC funded research projects while at QMUL and University College London (UCL) on urban air pollution monitoring and wheelchair accessibility. Her most recent engagement as a postdoc at LSBU has been on an Innovate UK research project dealing with the application of topological data analysis and machine learning techniques to building energy management data. She is currently the co-investigator on an Innovate UK funded research grant and a knowledge transfer partnership on Intelligent ear protection to address occupational hearing loss for use in heavy industry.


Semantic Representation and Analysis of E-commerce Orders

Arijit

Dr.Arijit Biswas

University of Maryland

Date : 29/01/2018

 

Abstract:

E-commerce websites such as Amazon, Alibaba, and Walmart typically process billions of orders every year. Semantic representation and understanding of these orders is extremely critical for an eCommerce company. Each order can be represented as a tuple of <customer, product, price, date>. In this talk, I will describe two of our recent work (i) product embedding using MRNet-Product2Vec and (ii) generating fake orders using eCommerceGAN.

MRNet-Product2Vec [ECML-PKDD 2017]: In this work, we propose an approach called MRNet-Product2Vec for creating generic embeddings of products within an e-commerce ecosystem. We learn a dense and low-dimensional embedding where a diverse set of signals related to a product are explicitly injected into its representation. We train a Discriminative Multi-task Bidirectional Recurrent Neural Network (RNN), where the input is a product title fed through a Bidirectional RNN and at the output, product labels corresponding to fifteen different tasks are predicted.

eCommerceGAN: Exploring the space of all plausible orders could help us better understand the relationships between the various entities in an e-commerce ecosystem, namely the customers and the products they purchase. In this paper, we propose a Generative Adversarial Network (GAN) for orders made in e-commerce websites. Once trained, the generator in the GAN could generate any number of plausible orders. Our contributions include: (a) creating a dense and low-dimensional representation of e-commerce orders, (b) train an ecommerceGAN (ecGAN) with real orders to show the feasibility of the proposed paradigm, and (c) train an ecommerce-conditional-GAN (ec^2GAN) to generate the plausible orders involving a particular product. We propose several qualitative methods to evaluate ecGAN and demonstrate its effectiveness.

 

Bio:

Arijit Biswas is currently a machine learning scientist at the India machine learning team in Amazon, Bangalore. His research interests are mainly in deep learning, machine learning and computer vision. Earlier he was a research scientist at Xerox Research Centre India (XRCI) from June, 2014 to July, 2016. He received his PhD in Computer Science from University of Maryland, College Park in April 2014. His PhD thesis was on Semi-supervised and Active Learning Methods for Image Clustering. His thesis advisor was David Jacobs and he closely collaborated with Devi Parikh and Peter Belhumeur during his stay at UMD. While doing his PhD, Arijit also did internships at Xerox PARC and Toyota Technological Institute at Chicago (TTIC). He has published papers in CVPR, ECCV, ACM-MM, BMVC, IJCV and CVIU. Arijit has a Bachelor's degree in Electronics and Telecommunication Engineering from Jadavpur University, Kolkata. Arijit is also a recipient of the MIT Technology Review Innovators under 35 award from India in 2016.