Google Research Blog
The latest news from Research at Google
The CS Capacity Program - New Tools and SIGCSE 2017
Thursday, February 16, 2017
Posted by Chris Stephenson, Head of Computer Science Education Strategy
CS Capacity program
was launched in March of 2015 to help address a
dramatic increase in undergraduate computer science enrollments
that is creating serious resource and pedagogical challenges for many colleges and universities. Over the last two years, a diverse group of universities have been working to develop successful strategies that support the expansion of high-quality CS programs at the undergraduate level. Their work focuses on innovations in teaching and technologies that support scaling while ensuring the engagement of women and underrepresented students. These innovations could provide assistance to many other institutions that are challenged to provide a high-quality educational experience to an increasing number of introductory-level students.
The cohort of CS Capacity institutions include George Mason University, Mount Holyoke College, Rutgers University, and the University California Berkeley which are working individually, and Duke University, North Carolina State University, the University of Florida, and the University of North Carolina which are working together. These institution each brings a unique approach to addressing CS capacity challenges. Two years into the program, we're sharing an update on some of the great projects and ideas to emerge so far.
At George Mason, for example, computer science professor Jeff Offutt and his team have developed an online system to provide self-paced learning for CS1 and CS2 classes that allows learners through the learning materials wore quickly or slowly depending on their needs. The system, called
, includes course content, practice and assessment exercises (including automated testing), mini-lectures, and daily inspirations. This team has also launched a program to recruit and train undergraduate tutorial assistants to increase learning support. For more information on SPARC, contact Jeff Offutt at firstname.lastname@example.org.
MaGE Peer Mentor program
at Mount Holyoke College is addressing its increasing CS student enrollment by preparing undergraduate peer mentors to provide effective feedback on coding assignments and contribute to an inclusive learning environment. One of the major elements of these program is an online course that helps to recruit and train students to be undergraduate peer mentors. Mount Holyoke has made their
entire online course curriculum
for the peer mentor program available so that other institutions can incorporate all or part of it to assist with preparing their own student tutors. For more information on the MaGE curriculum, contact Heather Pon-Barry at email@example.com.
MaGE Program Students and Faculty from Mount Holyoke College
At University of California, Berkeley, the CS Capacity team is focused on providing access to increased and better tutoring. They’ve instituted
a small-group tutoring program
that includes weekend mastery learning sessions, increased office hours support, designated discussions section, project checkpoint deadlines, exam/homework/lab/discussion walkthrough videos, and a new office hours app that tracks student satisfaction with office hours. For more information on Berkeley’s interventions, contact Josh Hug at firstname.lastname@example.org.
The CS Capacity team at Rutgers has been exploring the gender gap at multiple levels using a longitudinal study across four required CS classes (paper to be published in the proceedings of the
SIGCSE 2017 Technical Symposium
). They’re investigating several factors that may impact the retention of women and underrepresented student populations, including intention to major in CS, grades, and prior experience. They’ve also been defining an additional set of feature set to improve their use of
(a course management system with automated grading). This work includes building a hint system to provide more information for students who are struggling with a concept or assignment, crowd-sourcing grading, and studying how students think about CS content and the kinds of errors they are making. The Rutgers team will be publishing their study results in the proceedings of the SIGCSE 2017 Technical Symposium. For more information on these tools, contact Andrew Tjang at email@example.com.
The team consisting of Duke, NCSU, UNC, and UF have produced and plan to share tools to improve the student learning experience.
My Digital Hand
(MDH) is a free online tool for managing and tracking one-to-one peer teaching sessions (for example, helping to keep track of how many hours peer mentors are spending with mentees). MDH supports best practice in peer teaching and mitigates some of the observed challenges in taking peer teaching to scale. The team has also been working on ASCEND (Adaptive Student Computing Environment with Natural Language Dialogue), an Eclipse plug-in designed to facilitate remote synchronous peer teaching sessions. Students can share their projects with a peer teaching fellow (PTF) and chat as the PTF leads the student through a session. ASCEND helps instructors better understand current practice by logging all programming actions and textual chats in real time to a database. For more information on these tools, contact Jeff Forbes at firstname.lastname@example.org.
Several of the CS Capacity principle investigators will be presenting papers on these new interventions and tools at the
SIGCSE conference in March
. Faculty from the CS capacity program will also be presenting a panel and roundtable discussion session called “New Tools and Solutions to Address the CS Capacity Crunch.” If you’re attending SIGCSE this year, we hope you’ll join us on Thursday, March 9, from 3:45-5:00 pm.
Given the likelihood that CS undergraduate enrollments will continue to climb, it is critical that the CS education community continue to find, test, and share solutions and tools that enable institutions to effectively teach more students while maintaining the quality of the education experience for students. Faculty from the CS Capacity program will continue to share their solutions and results with the community via CS education conferences and publications.
An updated YouTube-8M, a video understanding challenge, and a CVPR workshop. Oh my!
Wednesday, February 15, 2017
Posted by Paul Natsev, Software Engineer
Last September, we released the
, which spans
millions of videos labeled with thousands of classes
, in order to spur innovation and advancement in large-scale video understanding. More recently, other teams at Google have released datasets such as
that, along with YouTube-8M, can be used to accelerate image and video understanding. To further these goals, today we are releasing an update to the
, and in collaboration with
Google Cloud Machine Learning
, we are also organizing a
video understanding competition
and an affiliated
An Updated YouTube-8M
The new and improved YouTube-8M includes cleaner and more verbose labels (twice as many labels per video, on average), a cleaned-up set of videos, and for the first time, the dataset includes pre-computed audio features, based on a state-of-the-art
audio modeling architecture
, in addition to the previously released visual features. The audio and visual features are synchronized in time, at 1-second temporal granularity, which makes YouTube-8M a large-scale multi-modal dataset, and opens up opportunities for exciting new research on joint audio-visual (temporal) modeling. Key statistics on the new version are illustrated below (more details
A tree-map visualization of the updated YouTube-8M dataset, organized into 24 high-level verticals, including the top-200 most frequent entities, plus the top-5 entities for each vertical.
Sample videos from the top-18 high-level verticals in the YouTube-8M dataset.
The Google Cloud & YouTube-8M Video Understanding Challenge
We are also excited to announce the
Google Cloud & YouTube-8M Video Understanding Challenge
, in partnership with
. The challenge invites participants to build audio-visual content classification models using YouTube-8M as training data, and to then label ~700K unseen test videos. It will be hosted as a
, sponsored by Google Cloud, and will feature a $100,000 prize pool for the top performers (details
). In order to enable wider participation in the competition, Google Cloud is also offering credits so participants can optionally do model training and exploration using
Google Cloud Machine Learning
. Open-source TensorFlow code, implementing a few baseline classification models for YouTube-8M, along with training and evaluation scripts, is available at
. For details on getting started with local or cloud-based training, please see our
getting started guide on Kaggle
The CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding
We will announce the results of the challenge and host invited talks by distinguished researchers at the
1st YouTube-8M Workshop
, to be held July 26, 2017, at the 30th IEEE Conference on Computer Vision and Pattern Recognition (
) in Honolulu, Hawaii. The workshop will also feature presentations by top-performing challenge participants and a selected set of paper submissions. We
researchers to submit papers describing novel research, experiments, or applications based on YouTube-8M dataset, including papers summarizing their participation in the above challenge.
We designed this dataset with scale and diversity in mind, and hope lessons learned here will generalize to many video domains (YouTube-8M captures over 20 diverse video domains). We believe the challenge can also accelerate research by enabling researchers without access to big data or compute clusters to explore and innovate at previously unprecedented scale. Please join us in advancing video understanding!
This post reflects the work of many others within Machine Perception at Google Research, including Sami Abu-El-Haija, Anja Hauth, Nisarg Kothari, Joonseok Lee, Hanhan Li, Sobhan Naderi Parizi, Rahul Sukthankar, George Toderici, Balakrishnan Varadarajan, Sudheendra Vijayanarasimhan, Jiang Wang, as well as Philippe Poutonnet and Mike Styer from Google Cloud, and our partners at Kaggle. We are grateful for the support and advice from many others at Google Research, Google Cloud, and YouTube, and especially thank Aren Jansen, Jort Gemmeke, Dan Ellis, and the Google Research Sound Understanding team for providing the audio features in the updated dataset.
Announcing TensorFlow 1.0
Wednesday, February 15, 2017
Posted by Amy McDonald Sandjideh, Technical Program Manager, TensorFlow
In just its
, TensorFlow has helped researchers, engineers, artists, students, and many others make progress with everything from
early detection of skin cancer
preventing blindness in diabetics
. We’re excited to see people using TensorFlow in over
6000 open-source repositories online
Today, as part of the first annual
TensorFlow Developer Summit
, hosted in Mountain View and
livestreamed around the world
, we’re announcing
TensorFlow 1.0 is incredibly fast!
lays the groundwork for even more performance improvements in the future, and
tips & tricks
for tuning your models to achieve maximum speed. We’ll soon publish updated implementations of several popular models to show how to take full advantage of TensorFlow 1.0 - including a 7.3x speedup on 8 GPUs for Inception v3 and 58x speedup for distributed Inception v3 training on 64 GPUs!
It’s more flexible:
TensorFlow 1.0 introduces a high-level API for TensorFlow, with tf.layers, tf.metrics, and tf.losses modules. We’ve also announced the inclusion of a new tf.keras module that provides full compatibility with
, another popular high-level neural networks library.
It’s more production-ready than ever:
TensorFlow 1.0 promises Python API stability (details
), making it easier to pick up new features without worrying about breaking your existing code.
Other highlights from
Python APIs have been changed to resemble NumPy more closely. For this and other backwards-incompatible changes made to support API stability going forward, please use our handy
Experimental APIs for
Higher-level API modules tf.layers, tf.metrics, and tf.losses - brought over from
Experimental release of
, a domain-specific compiler for TensorFlow graphs, that targets CPUs and GPUs. XLA is rapidly evolving - expect to see more progress in upcoming releases.
Introduction of the TensorFlow Debugger (
), a command-line interface and API for debugging live TensorFlow programs.
for object detection and localization, and camera-based image stylization.
improvements: Python 3 docker images have been added, and TensorFlow’s pip packages are now PyPI compliant. This means TensorFlow can now be installed with a simple invocation of
pip install tensorflow
We’re thrilled to see the pace of development in the TensorFlow community around the world. To hear more about TensorFlow 1.0 and how it’s being used, you can watch the
TensorFlow Developer Summit talks on YouTube
, covering recent updates from higher-level APIs to TensorFlow on mobile to our new
compiler, as well as the exciting ways that TensorFlow is being used:
for a link to the livestream and video playlist (individual talks will be posted online later in the day).
The TensorFlow ecosystem continues to grow with new techniques like
for dynamic batching and tools like the
along with updates to our existing tools like
. We’re incredibly grateful to the community of contributors, educators, and researchers who have made advances in deep learning available to everyone. We look forward to working with you on forums like
group and at future events.
On-Device Machine Intelligence
Thursday, February 09, 2017
Posted by Sujith Ravi, Staff Research Scientist, Google Research
To build the cutting-edge technologies that enable
, we often apply combinations of machine learning technologies such as
deep neural networks
graph-based machine learning
. However, the machine learning systems that power most of these applications run in the cloud and are computationally intensive and have significant memory requirements. What if you want machine intelligence to run on your personal phone or smartwatch, or on
devices, regardless of whether they are connected to the cloud?
Yesterday, we announced the launch of
Android Wear 2.0
, along with brand new wearable devices, that will run Google's first entirely “on-device” ML technology for powering smart messaging. This on-device ML system, developed by the Expander research team, enables technologies like
to be used for any application,
including third-party messaging apps
, without ever having to connect with the cloud…so now you can respond to incoming chat messages directly from your watch, with a tap.
The research behind this began last year while our team was developing the machine learning systems that enable conversational understanding capability in
. The Android Wear team reached out to us and was interested to know whether it would be possible to deploy this Smart Reply technology directly onto a smart device. Because of the limited computing power and memory on smart devices, we quickly realized that it was not possible to do so. Our product manager, Patrick McGregor, realized that this presented a unique challenge and an opportunity for the Expander team to return to the drawing board to design a completely new, lightweight, machine learning architecture — not only to enable Smart Reply on Android Wear, but also to power a wealth of other on-device mobile applications. Together with Tom Rudick, Nathan Beach, and other colleagues from the Android Wear team, we set out to build the new system.
Learning with Projections
A simple strategy to build lightweight conversational models might be to create a small dictionary of common rules (input → reply mappings) on the device and use a naive look-up strategy at inference time. This can work for simple prediction tasks involving a small set of classes using a handful of features (such as binary
from text, e.g. “
I love this movie
” conveys a positive sentiment whereas the sentence “
The acting was horrible
” is negative). But, it does not scale to complex natural language tasks involving rich vocabularies and the wide language variability observed in chat messages. On the other hand, machine learning models like
recurrent neural network
s (such as
s), in conjunction with
, have proven to be extremely powerful tools for complex sequence learning in natural language understanding tasks, including Smart Reply. However, compressing such rich models to fit in device memory
produce robust predictions at low computation cost (rapidly on-demand) is extremely challenging. Early experiments with restricting the model to predict only a small handful of replies or using other techniques like
did not produce useful results.
Instead, we built a different solution for the on-device ML system. We first use a fast, efficient mechanism to group similar incoming messages and project them to similar (“nearby”) bit vector representations. While there are several ways to perform this projection step, such as using
, we employ a modified version of
locality sensitive hashing
(LSH) to reduce dimension from millions of unique words to a short, fixed-length sequence of bits. This allows us to compute a projection for an incoming message very fast, on-the-fly, with a small memory footprint on the device since we do not need to store the incoming messages, word embeddings, or even the full model used for training.
Similar messages are grouped together and projected to nearby vectors. For example, the messages "
hey, how's it going?
" and "
How's it going buddy?
" share similar content and might be projected to the same vector 11100011. Another related message “
Howdy, everything going well
?” is mapped to a nearby vector 11100110 that differs only in 2 bits.
Next, our system takes the incoming message along with its projections and jointly trains a “message projection model” that learns to predict likely replies using our
framework. The graph learning framework enables training a robust model by combining semantic relationships from multiple sources — message/reply interactions, word/phrase similarity, semantic cluster information — learning useful projection operations that can be mapped to good reply predictions.
(Top) Messages along with
, if available, are used in a machine learning framework to jointly learn a “message projection model”. (Bottom) The message projection model learns to associate replies with the projections of the corresponding incoming messages. For example, the model projects two different messages “
Howdy, everything going well?
” and “
How’s it going buddy?
” (bottom center) to nearby bit vectors and learns to map these to relevant replies (bottom right).
It’s worth noting that while the message projection model can be trained using complex machine learning architectures and the power of the cloud, as described above, the model itself resides and performs inference completely on device. Apps running on the device can pass a user’s incoming messages and receive reply predictions from the on-device model without data leaving the device. The model can also be adapted to cater to the user’s writing style and individual preferences to provide a personalized experience.
The model applies the learned projections to an incoming message (or sequence of messages) and suggests relevant and diverse replies. Inference is performed on the device, allowing the model to adapt to user data and personal writing styles.
To get the on-device system to work out of the box, we had to make a few additional improvements such as optimizing for speeding up computations on device and generating rich, diverse replies from the model. We will have a forthcoming scientific publication that describes the on-device machine learning work in more detail.
Converse from Your Wrist
When we embarked on our journey to build this technology from scratch, we weren’t sure if the predictions would be useful or of sufficient quality. We’re quite surprised and excited about how well it works even on Android wearable devices with very limited computation and memory resources. We look forward to continuing to improve the models to provide users with more delightful conversational experiences, and we will be leveraging this on-device ML platform to enable completely new applications in the months to come.
You can now use this feature to respond to your messages directly from your Google watches or any watch that runs Android Wear 2.0. It is already enabled on Google Hangouts, Google Messenger, and many third-party messaging apps. We also provide an
for developers of third-party Wear apps.
On behalf of the Google Expander team, I would also like to thank the following people who helped make this technology a success: Andrei Broder, Andrew Tomkins, David Singleton, Mirko Ranieri, Robin Dua and Yicheng Fan.
Internet of Things
Natural Language Understanding
Announcing TensorFlow Fold: Deep Learning With Dynamic Computation Graphs
Tuesday, February 07, 2017
Posted by Moshe Looks, Marcello Herreshoff and DeLesley Hutchins, Software Engineers
In much of machine learning, data used for training and inference undergoes a preprocessing step, where multiple inputs (such as images) are scaled to the same dimensions and stacked into batches. This lets high-performance deep learning libraries like
run the same
across all the inputs in the batch in parallel. Batching exploits the
capabilities of modern GPUs and multi-core CPUs to speed up execution. However, there are many problem domains where the size and structure of the input data varies, such as
in natural language understanding,
abstract syntax trees
in source code,
for web pages and more. In these cases, the different inputs have different computation graphs that don't naturally batch together, resulting in poor processor, memory, and cache utilization.
Today we are releasing
to address these challenges. TensorFlow Fold makes it easy to implement deep-learning models that operate over data of varying size and structure. Furthermore, TensorFlow Fold brings the benefits of batching to such models, resulting in a speedup of more than 10x on CPU, and more than 100x on GPU, over alternative implementations. This is made possible by
, introduced in our paper
Deep Learning with Dynamic Computation Graphs
This animation shows a
recursive neural network
run with dynamic batching. Operations with the same color are batched together, which lets TensorFlow run them faster. The Embed operation converts
words to vector representations
. The fully connected (FC) operation combines word vectors to form vector representations of phrases. The output of the network is a vector representation of an entire sentence. Although only a single parse tree of a sentence is shown, the same network can run, and batch together operations, over multiple parse trees of arbitrary shapes and sizes.
The TensorFlow Fold library will initially build a separate computation graph from each input.
Because the individual inputs may have different sizes and structures, the computation graphs may as well. Dynamic batching then automatically combines these graphs to take advantage of opportunities for batching, both within and across inputs, and inserts additional instructions to move data between the batched operations (see
for technical details).
To learn more, head over to our
. We hope that TensorFlow Fold will be useful for researchers and practitioners implementing neural networks with dynamic computation graphs in TensorFlow.
This work was done under the supervision of Peter Norvig.
Advancing Research on Video Understanding with the YouTube-BoundingBoxes Dataset
Monday, February 06, 2017
Posted by Esteban Real, Vincent Vanhoucke, Jonathon Shlens, Google Brain team and
Stefano Mazzocchi, Google Research
One of the most challenging research areas in machine learning today is enabling computers to understand what a scene is about. For example, while humans know that a ball that disappears behind a wall only to reappear a moment later is very likely the same object, this is not at all obvious to an algorithm. Understanding this requires not only a global picture of
are contained in each frame of a video, but also
those objects are located within the frame and their
locations over time
. Just last year we published
, a dataset consisting of automatically labelled YouTube videos. And while this helps further progress in the field, it is only one piece to the puzzle.
Today, in order to facilitate progress in video understanding research, we are introducing
, a dataset consisting of 5 million bounding boxes spanning 23 object categories, densely labeling segments from 210,000 YouTube videos. To date, this is the largest manually annotated video dataset containing bounding boxes, which track objects in temporally contiguous frames. The dataset is designed to be large enough to train large-scale models, and be representative of videos captured in natural settings. Importantly, the human-labelled annotations contain objects as they appear in the real world with partial occlusions, motion blur and natural lighting.
Summary of dataset statistics.
Relative number of detections in existing image (red) and video (blue) data sets. The YouTube BoundingBoxes dataset (YT-BB) is at the bottom, is at the bottom.
The three columns are counts for: classification annotations, bounding boxes, and unique videos with bounding boxes. Full details on the dataset can be found in the
A key feature of this dataset is that bounding box annotations are provided for entire video segments. These bounding box annotations may be used to train models that explicitly leverage this temporal information to
over time. In a video, individual annotated objects might become entirely occluded and later return in subsequent frames. These annotations of individual objects are sometimes not recognizable from individual frames, but
be understood and recognized in the context of the video if the objects are localized and tracked accurately.
Three video segments, sampled at 1 frame per second. The final frame of each example shows how it is visually challenging to recognize the bounded object, due to blur or occlusion (train example, blue arrow). However, temporally-related frames, where the object has been more clearly identified, can allow object classes to be inferred. Note how only visible parts are included in the box: the orange arrow in the bear example (middle row) points to the hidden head. The dog example illustrates tight bounding boxes that track the tail (orange arrows) and foot (blue arrows). The airplane example illustrates how partial objects are annotated (first frame) tracked across changes in perspective, occlusions and camera cuts.
We hope that this dataset might ultimately aid the computer vision and machine learning community and lead to new methods for analyzing and understanding real world vision problems. You can learn more about the dataset in this
This work was greatly helped along by Xin Pan, Thomas Silva, Mir Shabber Ali Khan, Ashwin Kakarla and many others, as well as support and advice from Manfred Georg, Sami Abu-El-Haija, Susanna Ricco and George Toderici.
Using Machine Learning to predict parking difficulty
Friday, February 03, 2017
Posted by James Cook, Yechen Li, Software Engineers and Ravi Kumar, Research Scientist
When Solomon said there was a time and a place for everything he had not encountered the problem of parking his automobile.
, Broadcast Journalist
Much of driving is spent either
stuck in traffic
looking for parking
. With products like
, it is our long-standing goal to help people navigate the roads easily and efficiently. But until now, there wasn’t a tool to address the all-too-common parking woes.
Last week, we
launched a new feature for Google Maps for Android
across 25 US cities that offers predictions about parking difficulty close to your destination so you can plan accordingly. Providing this feature required addressing some significant challenges:
Parking availability is highly variable, based on factors like the time, day of week, weather, special events, holidays, and so on. Compounding the problem, there is almost no real time information about free parking spots.
Even in areas with internet-connected parking meters providing information on availability, this data doesn’t account for those who park illegally, park with a permit, or depart early from still-paid meters.
Roads form a mostly-planar graph, but parking structures may be more complex, with traffic flows across many levels, possibly with different layouts.
Both the supply and the demand for parking are in constant flux, so even the best system is at risk of being outdated as soon as it’s built.
To face these challenges, we used a unique combination of crowdsourcing and machine learning (ML) to build a system that can provide you with parking difficulty information for your destination, and even help you decide what mode of travel to take — in a pre-launch experiment, we saw a significant increase in clicks on the transit travel mode button, indicating that users with additional knowledge of parking difficulty were more likely to consider public transit rather than driving.
Three technical pieces were required to build the algorithms behind the parking difficulty feature: good ground truth data from crowdsourcing, an appropriate ML model and a robust set of features to train the model on.
Ground Truth Data
Gathering high-quality ground truth data is often a key challenge in building any ML solution. We began by asking individuals at a diverse set of locations and times if they found the parking difficult. But we learned that answers to subjective questions like this produces inconsistent results - for a given location and time, one person may answer that it was “
” to find parking while another found it “
” Switching to objective questions like “
How long did it it take to find parking?
” led to an increase in answer confidence, enabling us to crowdsource a high-quality set of ground truth data with over 100K responses.
With this data available, we began to determine features we could train a model on. Fortunately, we were able to turn to the
wisdom of the crowd
, and utilize anonymous aggregated information from users who opt to share their location data, which already is a vital source of information for estimates of
popular times and visit durations
We quickly discovered that even with this data, some unique challenges remain. For example, our system shouldn’t be fooled into thinking parking is plentiful if someone is parking in a gated or private lot. Users arriving by taxi might look like a sign of abundant parking at the front door, and similarly, public-transit users might seem to park at bus stops. These false positives, and many others, all have the potential to mislead an ML system.
So we needed more robust aggregate features. Perhaps not surprisingly, the inspiration for one of these features came from our own backyard in downtown Mountain View. If Google navigation observes many users circling downtown Mountain View during lunchtime along trajectories like this one, it strongly suggests that parking might be difficult:
Our team thought about how to recognize this “fingerprint” of difficult parking as a feature to train on. In this case, we aggregate the difference between when a user should have arrived at a destination if they simply drove to the front door, versus when they actually arrived, taking into account circling, parking, and walking. If many users show a large gap between these two times, we expect this to be a useful signal that parking is difficult.
From there, we continued to develop more features that took into account, for any particular destination, dispersion of parking locations, time-of-day and date dependence of parking (e.g. what if users park close to a destination in the early morning, but further away at busier hours?), historical parking data and more. In the end, we decided on roughly twenty different features along these lines for our model. Then it was time to tune the model performance.
Model Selection & Training
We decided to use a standard
ML model for this feature, for a few different reasons. First, the behavior of logistic regression is well understood, and it tends to be resilient to noise in the training data; this is a useful property when the data comes from crowdsourcing a complicated response variable like difficulty of parking. Second, it’s natural to interpret the output of these models as the probability that parking will be difficult, which we can then map into descriptive terms like “
” or “
.” Third, it’s easy to understand the influence of each specific feature, which makes it easier to verify that the model is behaving reasonably. For example, when we started the training process, many of us thought that the “fingerprint” feature described above would be the “silver bullet” that would crack the problem for us. We were surprised to note that this wasn’t the case at all — in fact, it was features based on the dispersion of parking locations that turned out to be one of the most powerful predictors of parking difficulty.
With our model in hand, we were able to generate an estimate for difficulty of parking at any place and time. The figure below gives a few examples of the output of our system, which is then used to provide parking difficulty estimates for a given destination. Parking on Monday mornings, for instance, is difficult throughout the city, especially in the busiest financial and retail areas. On Saturday night, things are busy again, but now predominantly in the areas with restaurants and attractions.
Output of our parking difficulty model in the Financial District and Union Square areas of San Francisco. Red denotes a higher confidence that parking is difficult.
a typical Monday at ~8am (left) and ~9pm (right).
the same times but on a typical Saturday.
We’re excited about the opportunities to continue to improve the model quality based on user feedback. If we are able to better understand parking difficulty, we will be able to develop new and smarter forms of parking assistance — we’re very excited about future applications of ML to help make transportation more enjoyable!
Adaptive Data Analysis
Automatic Speech Recognition
Electronic Commerce and Algorithms
Google Cloud Platform
Google Play Apps
Google Science Fair
Google Voice Search
High Dynamic Range Imaging
Internet of Things
Natural Language Processing
Natural Language Understanding
Optical Character Recognition
Public Data Explorer
Security and Privacy
Site Reliability Engineering
Give us feedback in our
Official Google Blog
Public Policy Blog
Lat Long Blog
Ads Developer Blog
Android Developers Blog