Google Research Blog
The latest news from Research at Google
CVPR 2016 & Research at Google
Tuesday, June 28, 2016
Posted by Rahul Sukthankar, Research Scientist
This week, Las Vegas hosts the
2016 Conference on Computer Vision and Pattern Recognition
(CVPR 2016), the premier annual computer vision event comprising the main conference and several co-located workshops and short courses. As a leader in computer vision research, Google has a strong presence at CVPR 2016, with many Googlers presenting papers and invited talks at the conference, tutorials and workshops.
We congratulate Google Research Scientist Ce Liu and Google Faculty Advisor
Abhinav Gupta
, who were selected as this year’s recipients of the
PAMI Young Researcher Award
for outstanding research contributions within computer vision. We also congratulate Googler Henrik Stewenius for receiving the
Longuet-Higgins Prize
, a retrospective award that recognizes up to two CVPR papers from ten years ago that have made a significant impact on computer vision research, for his 2006 CVPR paper “
Scalable Recognition with a Vocabulary Tree
”, co-authored with David Nister, during their time at University of Kentucky.
If you are attending CVPR this year, please stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for hundreds of millions of people. The Google booth will also showcase several recent efforts, including the technology behind
Motion Stills
, a live demo of neural network-based image compression and
TensorFlow-Slim
, the lightweight library for defining, training and evaluating models in TensorFlow. Learn more about our research being presented at CVPR 2016 in the list below (Googlers highlighted in
blue
).
Oral Presentations
Generation and Comprehension of Unambiguous Object Descriptions
Junhua Mao,
Jonathan Huang
,
Alexander Toshev
, Oana Camburu, Alan L. Yuille,
Kevin Murphy
Detecting Events and Key Actors in Multi-Person Videos
Vignesh Ramanathan,
Jonathan Huang
,
Sami Abu-El-Haija
,
Alexander Gorban
,
Kevin Murphy
, Li Fei-Fei
Spotlight Session: 3D Reconstruction
DeepStereo: Learning to Predict New Views From the World’s Imagery
John Flynn,
Ivan Neulander
, James Philbin,
Noah Snavely
Posters
Discovering the Physical Parts of an Articulated Object Class From Multiple Videos
Luca Del Pero,
Susanna Ricco
,
Rahul Sukthankar
, Vittorio Ferrari
Blockout: Dynamic Model Selection for Hierarchical Deep Networks
Calvin Murdock,
Zhen Li
,
Howard Zhou
,
Tom Duerig
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy
,
Vincent Vanhoucke
,
Sergey Ioffe
,
Jon Shlens
, Zbigniew Wojna
Improving the Robustness of Deep Neural Networks via Stability Training
Stephan Zheng,
Yang Song
,
Thomas Leung
,
Ian Goodfellow
Semantic Image Segmentation With Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform
Liang-Chieh Chen,
Jonathan T. Barron
,
George Papandreou
,
Kevin Murphy
, Alan L. Yuille
Tutorial
Optimization Algorithms for Subset Selection and Summarization in Large Data Sets
Ehsan Elhamifar, Jeff Bilmes,
Alex Kulesza
, Michael Gygli
Workshops
Perceptual Organization in Computer Vision: The Role of Feedback in Recognition and Reorganization
Organizers:
Katerina Fragkiadaki
, Phillip Isola,
Joao Carreira
Invited talks:
Viren Jain
,
Jitendra Malik
VQA Challenge Workshop
Invited talks:
Jitendra Malik
,
Kevin Murphy
Women in Computer Vision
Invited talk:
Caroline Pantofaru
Computational Models for Learning Systems and Educational Assessment
Invited talk:
Jonathan Huang
Large-Scale Scene Understanding (LSUN) Challenge
Invited talk:
Jitendra Malik
Large Scale Visual Recognition and Retrieval: BigVision 2016
General Chairs:
Jason Corso, Fei-Fei Li,
Samy Bengio
ChaLearn Looking at People
Invited talk:
Florian Schroff
Medical Computer Vision
Invited talk:
Ramin Zabih
Google Computer Vision research at CVPR 2015
Sunday, June 07, 2015
Posted by Vincent Vanhoucke, Google Research Scientist
Much of the world's data is in the form of visual media. In order to utilize meaningful information from multimedia and deliver innovative products, such as
Google Photos
, Google builds machine-learning systems that are designed to enable computer perception of visual input, in addition to pursuing image and video analysis techniques focused on image/scene reconstruction and understanding.
This week, Boston hosts the
2015 Conference on Computer Vision and Pattern Recognition
(CVPR 2015), the premier annual computer vision event comprising the main CVPR conference and several co-located workshops and short courses. As a leader in
computer vision
research, Google will have a strong presence at CVPR 2015, with many Googlers presenting publications in addition to hosting workshops and tutorials on topics covering image/video annotation and enhancement, 3D analysis and processing, development of semantic similarity measures for visual objects, synthesis of meaningful composites for visualization/browsing of large image/video collections and more.
Learn more about some of our research in the list below (Googlers highlighted in
blue
). If you are attending CVPR this year, we hope you’ll stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for hundreds of millions of people. Members of the
Jump
team will also have a prototype of the camera on display and will be showing videos produced using the Jump system on
Google Cardboard
.
Tutorials:
Applied Deep Learning for Computer Vision with Torch
Koray Kavukcuoglu
, Ronan Collobert, Soumith Chintala
DIY Deep Learning: a Hands-On Tutorial with Caffe
Evan Shelhamer, Jeff Donahue,
Yangqing Jia
, Jonathan Long, Ross Girshick
ImageNet Large Scale Visual Recognition Challenge Tutorial
Olga Russakovsky, Jonathan Krause,
Karen Simonyan
,
Yangqing Jia
, Jia Deng, Alex Berg, Fei-Fei Li
Fast Image Processing With Halide
Jonathan Ragan-Kelley,
Andrew Adams
, Fredo Durand
Open Source Structure-from-Motion
Matt Leotta,
Sameer Agarwal
, Frank Dellaert, Pierre Moulon, Vincent Rabaud
Oral Sessions:
Modeling Local and Global Deformations in Deep Learning: Epitomic Convolution, Multiple Instance Learning, and Sliding Window Detection
George Papandreou
, Iasonas Kokkinos, Pierre-André Savalle
Going Deeper with Convolutions
Christian Szegedy
, Wei Liu,
Yangqing Jia
,
Pierre Sermanet
, Scott Reed,
Dragomir Anguelov
,
Dumitru Erhan
,
Vincent Vanhoucke
, Andrew Rabinovich
DynamicFusion: Reconstruction and Tracking of Non-Rigid Scenes in Real-Time
Richard A. Newcombe, Dieter Fox,
Steven M. Seitz
Show and Tell: A Neural Image Caption Generator
Oriol Vinyals
,
Alexander Toshev
,
Samy Bengio
,
Dumitru Erhan
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description
Jeffrey Donahue, Lisa Anne Hendricks,
Sergio Guadarrama
, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell
Visual Vibrometry: Estimating Material Properties from Small Motion in Video
Abe Davis, Katherine L. Bouman, Justin G. Chen,
Michael Rubinstein
, Frédo Durand,
William T. Freeman
Fast Bilateral-Space Stereo for Synthetic Defocus
Jonathan T. Barron
,
Andrew Adams
, YiChang Shih,
Carlos Hernández
Poster Sessions:
Learning Semantic Relationships for Better Action Retrieval in Images
Vignesh Ramanathan,
Congcong Li
, Jia Deng, Wei Han,
Zhen Li
,
Kunlong Gu
,
Yang Song
,
Samy Bengio
,
Charles Rosenberg
, Li Fei-Fei
FaceNet: A Unified Embedding for Face Recognition and Clustering
Florian Schroff
,
Dmitry Kalenichenko
,
James Philbin
A Mixed Bag of Emotions: Model, Predict, and Transfer Emotion Distributions
Kuan-Chuan Peng, Tsuhan Chen, Amir Sadovnik,
Andrew C. Gallagher
Best-Buddies Similarity for Robust Template Matching
Tali Dekel, Shaul Oron,
Michael Rubinstein
, Shai Avidan,
William T. Freeman
Articulated Motion Discovery Using Pairs of Trajectories
Luca Del Pero,
Susanna Ricco
,
Rahul Sukthankar
, Vittorio Ferrari
Reflection Removal Using Ghosting Cues
YiChang Shih,
Dilip Krishnan
, Frédo Durand,
William T. Freeman
P3.5P: Pose Estimation with Unknown Focal Length
Changchang Wu
MatchNet: Unifying Feature and Metric Learning for Patch-Based Matching
Xufeng Han,
Thomas Leung
,
Yangqing Jia
,
Rahul Sukthankar
, Alexander C. Berg
Inferring 3D Layout of Building Facades from a Single Image
Jiyan Pan
, Martial Hebert, Takeo Kanade
The Aperture Problem for Refractive Motion
Tianfan Xue, Hossein Mobahei, Frédo Durand,
William T. Freeman
Video Magnification in Presence of Large Motions
Mohamed Elgharib, Mohamed Hefeeda, Frédo Durand,
William T. Freeman
Robust Video Segment Proposals with Painless Occlusion Handling
Zhengyang Wu, Fuxin Li,
Rahul Sukthankar
, James M. Rehg
Ontological Supervision for Fine Grained Classification of Street View Storefronts
Yair Movshovitz-Attias,
Qian Yu
,
Martin C. Stumpe
,
Vinay Shet
,
Sacha Arnoud
, Liron Yatziv
VIP: Finding Important People in Images
Clint Solomon Mathialagan,
Andrew C. Gallagher
, Dhruv Batra
Fusing Subcategory Probabilities for Texture Classification
Yang Song
, Weidong Cai, Qing Li, Fan Zhang
Beyond Short Snippets: Deep Networks for Video Classification
Joe Yue-Hei Ng, Matthew Hausknecht,
Sudheendra Vijayanarasimhan
,
Oriol Vinyals
,
Rajat Monga
,
George Toderici
Workshops:
THUMOS Challenge 2015
Program organizers include:
Alexander Gorban
,
Rahul Sukthankar
DeepVision: Deep Learning in Computer Vision 2015
Invited Speaker:
Rahul Sukthankar
Large Scale Visual Commerce (LSVisCom)
Panelist:
Luc Vincent
Large-Scale Video Search and Mining (LSVSM)
Invited Speaker and Panelist:
Rahul Sukthankar
Program Committee includes:
Apostol Natsev
Vision meets Cognition: Functionality, Physics, Intentionality and Causality
Program Organizers include:
Peter Battaglia
Big Data Meets Computer Vision: 3rd International Workshop on Large Scale Visual Recognition and Retrieval (BigVision 2015)
Program Organizers include:
Samy Bengio
Includes speaker
Christian Szegedy
- “Scalable approaches for large scale vision”
Observing and Understanding Hands in Action (Hands 2015)
Program Committee includes:
Murphy Stein
Fine-Grained Visual Categorization (FGVC3)
Program Organizers include:
Anelia Angelova
Large-scale Scene Understanding Challenge (LSUN)
Winners of the Scene Classification Challenge:
Julian Ibarz
,
Christian Szegedy
and
Vincent Vanhoucke
Winners of the Caption Generation Challenge:
Oriol Vinyals
,
Alexander Toshev
,
Samy Bengio
, and
Dumitru Erhan
Looking from above: when Earth observation meets vision (EARTHVISION)
Technical Committee includes:
Andreas Wendel
Computer Vision in Vehicle Technology: Assisted Driving, Exploration Rovers, Aerial and Underwater Vehicles
Invited Speaker:
Andreas Wendel
Program Committee includes:
Andreas Wendel
Women in Computer Vision (WiCV)
Invited Speaker:
Mei Han
ChaLearn Looking at People
(
sponsor
)
Fine-Grained Visual Categorization (FGVC3)
(
sponsor
)
Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths
Monday, June 20, 2011
Posted by
Matthias Grundmann
,
Vivek Kwatra
, and
Irfan Essa
, Research Team
Earlier this year, we
announced
the launch of new features on the
YouTube Video Editor
, including stabilization for shaky videos, with the ability to preview them in real-time. The core technology behind this feature is detailed in
this paper
, which will be presented at the IEEE International Conference on Computer Vision and Pattern Recognition (
CVPR 2011
).
Casually shot videos captured by handheld or mobile cameras suffer from significant amount of shake. Existing in-camera stabilization methods dampen high-frequency jitter but do not suppress low-frequency movements and bounces, such as those observed in videos captured by a walking person. On the other hand, most professionally shot videos usually consist of carefully designed camera configurations, using specialized equipment such as tripods or camera dollies, and employ ease-in and ease-out for transitions. Our goal was to devise a completely automatic method for converting casual shaky footage into more pleasant and professional looking videos.
Our technique mimics the cinematographic principles outlined above by automatically determining the best camera path using a robust optimization technique. The original, shaky camera path is divided into a set of segments, each approximated by either a constant, linear or parabolic motion. Our optimization finds the best of all possible partitions using a computationally efficient and stable algorithm.
To achieve real-time performance on the web, we distribute the computation across multiple machines in the cloud. This enables us to provide users with a real-time preview and interactive control of the stabilized result. Above we provide a video demonstration of how to use this feature on the YouTube Editor. We will also demo this live at
Google’s exhibition booth
in CVPR 2011.
For further details, please read our
paper
.
Google at CVPR 2011
Thursday, June 16, 2011
Posted by Mei Han and Sergey Ioffe, Research Team
The computer vision community will get together in Colorado Springs the week of June 20th for the
IEEE International Conference on Computer Vision and Pattern Recognition
(CVPR 2011). This year will see a record number of people attending the conference and 27 co-located workshops and tutorials. The registration was closed at 1500 attendees even before the conference started.
Computer Vision is at the core of many Google products, such as
Image Search
,
YouTube
,
Street View
,
Picasa
, and
Goggles
, and as always, Google is involved in several ways with CVPR.
Andrew Senior
is serving as an area chair of CVPR 2011 and many Googlers are reviewers. Googlers also co-authored these papers:
Where's Waldo: Matching People in Images of Crowds
by Rahul Garg, Deva Ramanan, Steve Seitz, Noah Snavely
Visual and Semantic Similarity in ImageNet
by Thomas Deselaers, Vittorio Ferrari
Multicore Bundle Adjustment
by Changchang Wu, Sameer Agarwal, Brian Curless, Steve Seitz
A Hierarchical Conditional Random Field Model for Labeling and Segmenting Images of Street Scenes
by Qixing Huang, Mei Han, Bo Wu, Sergey Ioffe
Kernelized Structural SVM Learning for Supervised Object Segmentation
by Luca Bertelli, Tianli Yu, Diem Vu, Salih Gokturk
Discriminative Tag Learning on YouTube Videos with Latent Sub-tags
by Weilong Yang, George Toderici
Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths
by Matthias Grundmann, Vivek Kwatra, Irfan Essa
Image Saliency: From Local to Global Context
by Meng Wang, Janusz Konrad, Prakash Ishwar, Yushi Jing, Henry Rowley
If you are attending the conference, stop by Google’s exhibition booth. In addition to talking with Google researchers, you will get to see examples of exciting computer vision research that has made it into Google products including, among others, the following:
Google Earth Facade Shadow Removal
by Mei Han, Vivek Kwatra, and Shengyang Dai
We will demonstrate our technique for removing shadows and other lighting/texture artifacts from building facades in Google Earth. We obtain cleaner, clearer, and more uniform textures which provide users with an improved visual experience.
Video Stabilization on YouTube Editor
by Matthias Grundmann, Vivek Kwatra, and Irfan Essa
Casually shot videos captured by handheld or mobile cameras suffer from significant amount of shake. In contrast, professionally shot video usually employs stabilization equipment such as tripods or camera dollies, and employ ease-in and ease-out for transitions. Our technique mimics these cinematographic principles, by optimally dividing the original, shaky camera path into a set of segments and approximating each with either constant, linear or parabolic motion using a computationally efficient and stable algorithm. We will showcase a live version of our algorithm, featuring real-time performance and interactive control, which is publicly available at youtube.com/editor.
Tag Suggest for YouTube
by George Toderici and Mehmet Emre Sargin
YouTube offers millions of users the opportunity to upload videos and share them with their friends. Many users would love to have their videos discoverable but don't annotate them properly. One new feature on YouTube that seeks to address this problem is tag prediction based on video content and independently based on text metadata.
6/17/2011 UPDATE: "Posted by" was changed to include Sergey Ioffe.
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
AI
Algorithms
Android
Android Wear
API
App Engine
App Inventor
April Fools
Art
Audio
Australia
Automatic Speech Recognition
Awards
Cantonese
Chemistry
China
Chrome
Cloud Computing
Collaboration
Computational Imaging
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
Data Discovery
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
electronics
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Europe
Exacycle
Expander
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Gmail
Google Books
Google Brain
Google Cloud Platform
Google Docs
Google Drive
Google Genomics
Google Maps
Google Photos
Google Play Apps
Google Science Fair
Google Sheets
Google Translate
Google Trips
Google Voice Search
Google+
Government
grants
Graph
Graph Mining
Hardware
HCI
Health
High Dynamic Range Imaging
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Klingon
Korean
Labs
Linear Optimization
localization
Low-Light Photography
Machine Hearing
Machine Intelligence
Machine Learning
Machine Perception
Machine Translation
Magenta
MapReduce
market algorithms
Market Research
Mixed Reality
ML
MOOC
Moore's Law
Multimodal Learning
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Nexus
Ngram
NIPS
NLP
On-device Learning
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
ph.d. fellowship
PhD Fellowship
PhotoScan
PiLab
Pixel
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Robotics
schema.org
Search
search ads
Security and Privacy
Semi-supervised Learning
SIGCOMM
SIGMOD
Site Reliability Engineering
Social Networks
Software
Speech
Speech Recognition
statistics
Structured Data
Style Transfer
Supervised Learning
Systems
TensorFlow
TPU
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Video Analysis
Virtual Reality
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube
Archive
2017
May
Apr
Mar
Feb
Jan
2016
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Google
on
Follow @googleresearch
Give us feedback in our
Product Forums
.