Nathan.ai newsletter issue #20: May-July 2017
Reporting from 10th May 2017 through July 19th 2017
Hi there! I’m Nathan Benaich — welcome to issue #20 of my AI newsletter. Here, I’ll synthesise a narrative analysing and linking important news, data, research and startup activity from the AI world. Grab your hot beverage of choice ☕ and enjoy the read! A few quick points before we start:
1. A huge thanks to our 13 speakers, 260 attendees, Google Cloud, Cooley LLP and the RAAIS team led by my sister, Joyce, for making the 3rd annual Research and Applied AI Summit a success! Watch the talks on our YouTube channel and check out the official photos of the day. Join our Facebook to keep up with the RAAIS community and request an invite to the 2018 edition on June 29th!
2. Give me a shout if you’re translating research into a commercial domain, raising capital for your company (I’m investing!) or need exposure to build your team.
3. My article pick: Vertical AI startups, solving industry-specific problems by combining AI and subject matter expertise.
Referred by a friend? Sign up here. Help share by giving this it a tweet :)
🆕 Technology news, trends and opinions
🚗 Department of Driverless Cars
a. Large incumbents
New York state joins California, Michigan and Nevada in allowing the testing of AVs on public roads. The Governor announced the testing program and the first approval of Audi’s Level 3 system in May, and public demos held in June.
While the Tesla share price continues to surge, it’s Autopilot 2 software is under increasing criticism for still not reaching feature parity with Autopilot 1. Speculation abounds for why that’s so. The system is blamed for a recent accident in Minnesota, where a driver’s Tesla accelerated and left the roadway after he activated Autopilot. In the meantime, the company lost Chris Lattner, head of Autopilot and author of Apple’s Swift programming language, 6 months after he joined because of disagreements with Musk. Tesla then hired OpenAI research scientist, Andrej Karpathy, as Director of AI and Autopilot Vision to report directly to Musk. Good move for the company, but that’s now another major loss for OpenAI (Goodfellow left for Google Brain), which could morph into a poachable talent farm for Musk et al. Musk also ran an entertaining TED interview back in April here where he commits Tesla (with a straight face) to autonomously driving across America by the end of the year.
General Motors’ owned Cadillac have released a highway autopilot system they call Super Cruise, which combines GPS, 160k miles’ worth of maps, front-facing camera and radar.
Audi unveiled the world’s first Level 3 vehicle, the 2018 A8, which features an ‘AI Traffic Jam Pilot’. Powered by NVIDIA, this system uses a laser scanner combined with cameras to navigate autonomously at speeds up to 60 km/h. Consumer Reports believe that this marketing is “disappointing at best and irresponsible at worst”.
Baidu is still playing catch up in self-driving by adding 50 partners to its open source Project Apollo. The company also signed an agreement with TomTom to collaborate on maps, as well as with Continental and Bosch.
Volvo is committed to both releasing its first AV car by 2020 and achieving zero fatalities or serious injuries from all its cars by the same time. However, early testing of the system in Australia availed that its “Large Animal Detection” system that is otherwise capable of identifying and avoiding deer, elk and caribou, did not generalise to kangaroos! Hence why training datasets for AVs must cover as wide environments as the vehicles are aimed to drive in.
On the topic of generalisation, Mapillary and HERE form a partnership to leverage the former’s fast-growing, crowdsourced street-level image dataset and its ability to automatically extract information from images with the latter’s Map Creator community in order to scale the generation of map data.
Apple confirms that it’s working on autonomous systems, having received approval to test in California. Not much news on what products will come of this. Typical.
b. Startups
The plot continues to thicken ahead of the October court date between Uber and Waymo. The man at the center of the trial, Anthony Levandowski of Uber/Otto, has since been fired by the company for allegedly refusing to cooperate with its legal team. Fillings show that Uber’s Travis Kalanick knew about the Google file downloads prior to acquiring his company, Otto, but Uber still denies it. In a May hearing, Waymo lawyers were told by the Judge: “you have one of the strongest records I’ve seen for a long time of anybody doing something that bad”. The Judge referred the case to federal prosecutors, meaning that Levandowski could be under criminal investigation.
Meanwhile, Waymo signed agreements with both Avis and Lyft to collaborate on self-driving cars. Unlike Uber, Lyft has no plans to develop its own AV technology even though it raised another $600m in April. General Motors, which is a major Lyft investor, owns Cruise Automation, which has begun testing GM vehicles on open road in California. Waymo also hired Tesla’s former director of hardware engineering. Interestingly, Morgan Stanley theorise that Waymo could be worth $70bn if it were to be spun off to compete against Tesla.
Two infographics to help you track all the AV companies: 263 self-driving technology startups and map of said companies in the Bay Area (37 of which are licensed for public testing in California). While there’s tons of activity in this space, there’s arguably a lack of scientific method in defining operating and data standards, task benchmarks, testing and evaluation environments.
BBC have a fun documentary film on ML technologies applied to self-driving, in which they follow local hero Stan Boland of FiveAI, Sebastian Thrun, Yann LeCun and others!
Voyage, a new autonomous taxi startup run by ex-Udacity self-driving engineers, published a piece on LIDAR’s history and use for mapping, navigation and object tracking, as well as a complete look at the compute and core self-driving systems.
Comma.ai publish their “road to self driving victory”. In short, get the data, own the network, win self driving cars. Bold and simples, I like it :-)
The UK will play host to an ambitious real-world test for AVs driving from London to Oxford, led by the DRIVEN consortium that includes Oxbotica (vs the Streetwise consortium led by FiveAI).
Electric public bus manufacturer, Proterra, is working on AV technology with the University of Nevada in Reno. Deployment of AV fleets in defined AV-only areas makes the most sense to me!
💪 The big boys
NYT ran a Saturday Profile on Geoff Hinton, an acclaimed scientist who needs little introduction to this readership. It runs through his career path, the brief stint he ran as a carpenter fed up with academia, joining UToronto as a Professor of CS without ever having taking a CS course and his more recent focus on applying AI to healthcare. Legend.
Microsoft is serious on its “AI comeback” according to recent reports. The company has set up a DeepMind-style dedicated AI research institute called Microsoft Research AI. In an exclusive BBC video here, Eric Horvitz (MD of Microsoft Research) states that Microsoft is “betting the company on AI. We think this is the future”. Harry Shum, EVP of AI and Research at Microsoft, also stated that he’s in the market for another SwiftKey-style acquisition after recent investments in Element.AI ($102m Series A!) and acquiring Maluuba. The latter, of course, contributed to convincing Yoshua Bengio to join the company as strategic advisor. So you know, the AI and Research groups at Microsoft are organising in four areas: “products, early-stage products, really early-stage products, and research.”
Google held their I/O event in May. Main highlights include the launch of v2 TPU, which unlike v1, is able to both train and run ML models (vs. just run). Each chip delivers up to 180 teraflops of processing power, compared to NVIDIA’s upcoming Volta-based Tesla V100 chip that offers 120 teraflops. To put this into perspective, the latter chip is already 6-12x faster than its Pascal-based Tesla P100 predecessor. Google also pushed Smart Reply (I’m loving it!) to all Gmail users, announced work in computational photography to remove unwanted artefacts (as part of Google Lens), and the Google Assistant for iPhone. Google also launched the People + AI Research initiative to study how design principles for how people interact with AI systems.
Now two years old, Amazon’s Echo is but one of a growing suite of AI-enabled edge devices focused on winning your trust (and wallet) in the home. The company will soon release Echo v2, which complements the Echo Dot, which is of course voice interaction-focused, while the Echo Look and Echo Show position the camera front and center. This means that human-centered video understanding is a must! Although developers have written over 15k skills for Alexa, research from Goldman Sachs shows that over 65% of these skills are focused on fact retrieval or games/trivia/novelty. Monetisation, therefore, is likely to come from ‘premium skills’.
Apple released their HomePod. It focuses on the music lover as a means to bring Siri into the home. The company also unveiled a new mobile-focused ML API for developers called Core ML. It powers the recognition and tracking of faces, landmarks, text, barcodes and objects.
DeepMind have retired AlphaGo from competitive gameplay and promised a final research paper detailing the extensive set of improvements made to the system since the original paper. The company also released the first Independent Review of their DeepMind Health initiative here. The most important question presented was that relating to the use of patient data supplied by the Royal Free Hospital in connection with the Streams project. The UK’s data privacy watchdog came down on the company saying that DeepMind did not have legal basis to do so. Separately, the company established DeepMind Alberta in Canada by hiring longtime DeepMind advisor Rich Sutton (he joined back in 2010), Michael Bowling, Patrick Pilarski and Adam White (author of poker-playing DeepStack).
Is China outpacing the US in the race to winning AI? NYT argue that China has already spent billions on R&D and are readying to launch a new multi-billion dollar initiative to fund moonshots, startups (more on this at the end!) and research programs aimed at solidifying the country’s leadership on AI. China is not shying away from buying and subsidising foreign talent to win. Meanwhile, US leadership are considering legislation to limit Chinese investment into US technology companies that could flip an advantage to the Chinese from a national security perspective.
NAVER Corporation, Korea’s leading internet company that operates the popular NAVER search portal and the mobile messenger LINE, purchased the 80 researcher strong Xerox Research Europe. The center has expertise in AI, computer vision amongst others.
🍪 Hardware
Graphcore co-founder and CTO, Simon Knowles, gave a terrific talk at RAAIS 2017 where he peeled back more exclusive details on their processor for machine intelligence, the IPU. I caught up with Simon afterwards to discuss their IPU design principles.
Apple are reportedly working on a new processor likely devoted to on-device AI workloads, dubbed internally as the “Apple Neural Engine”. So exciting to see innovation in semiconductors (more here and here!).
But of course, these companies aren’t alone.
Intel CEO was on a panel describing his moves to bring the company towards higher performance CPUs, GPUs and proprietary memory devices, in addition to launching these for servers and cloud data centers. Investor sentiment wasn’t terribly positive.
Bosch announced a $1.1bn semiconductor facility in Dresden, Germany focused on self-driving cars, smart homes and smart city infrastructure. This is the largest single investment made in the company’s history. Quick reminder that NVIDIA and Bosch agreed to produce the first “AI car computer” using the DRIVE PX platform back in March.
NVIDIA launched their own cloud computing service to expose training with GPUs along with their HGX-1 hyperscale GPU accelerator.
Softbank entered into a definitive agreement to acquire robotics pioneer Boston Dynamics from Alphabet. This marks a clear step in Softbank’s vision to catalyse smart robotics, empowered by advanced perception, control and mobility.
🏥 Healthcare
Cardiogram publish research on using deep learning to detect atrial fibrillation (AFib, irregular/fast heart rate). Here, they asked 200 users of their Apple iWatch-based heart rate monitor to wear the AliveCor mobile ECG device. Using 6,338 mobile ECGs readings labelled with a positive or negative atrial fibrillation event, Cardiogram claim they could classify atrial fibrillation from iWatch heart rate traces with 98.04% sensitivity (true positive) and 90.2% specificity (true negative).
French startup Cardiologs received FDA clearance for their deep learning-based software for screening AFib and other arrhythmias from a digital ECG obtained by any compatible cardiac monitoring device. Trained with over 500k recordings, the system has 97% sensitivity.
Two long review studies explore the use of deep learning in biology and medicine: one focused on opportunities and obstacles, and the other on imaging applications.
Why should drug discovery be open sourced? DeepChem.io allows researchers to share a wealth of knowledge into a common pool to fight two key problems in drug development: high cost and slow development process.
🆕 New initiatives and frontiers for AI
Automated machine learning
Given a task that a machine learning system can solve, what is the optimal model architecture one could use? A typical 10-layer neural network can have ~10^10 candidate networks. By formulating the problem of generating and validating neural network architectures as a reinforcement learning problem, Google have shown that their AutoML system is able to generate, test and optimize network design automatically. This system, originating from Google Brain research, was announced at Google I/O and will presumably make its way to Google Cloud. Given that it uses 800 GPUs churning full time for weeks to produce results, it’ll otherwise be out of reach for most of us in the real world!
Airbnb published a post about how they’re using automated machine learning tools for benchmarking model performance, detecting data leakage (unexpected additional information makes its way into training data allowing unrealistically good performance), and exploratory data analysis. AML allows for faster data exploration as well as improving model accuracy through model tuning and better diagnostics.
Explainability
Science run an article exploring different approaches to model explainability, including Local Interpretable Model-Agnostic Explanations (LIME). The impetus for this effort is not only to build trust with users, but also because of the European GDPR directive that mandates companies deploying algorithms for automated decision making must create explanations for their logic.
Despite the current deep learning summer, there are of course problems for which other ML approaches work best. This post explores when not to use DL, particularly when interpreting/communicating model parameters, features and causal mechanics is important for the user in question.
On the same question of “where next with deep learning”, the author of Keras suggests that we should move beyond learning input-to-output mapping functions and focus on reasoning and abstraction by deeply understanding inputs in a human-relatable way. He points to the use of video games as a substrate for this, in which an ML model could be defined as a “learnable program”. A paper I feature in the Research section below delves into this subject too.
Video understanding
As mentioned in a prior newsletter, video is a powerful data source for effectively learning common sense about the real world. To this end, two groups have published large, labeled video datasets. DeepMind released Kinetics, which focuses on high-level human activities taken from YouTube videos. TwentyBN published the Something-Something (object interactions) and Jester (hand gestures) datasets, which represent the primitive actions that humans make in the real world from which we can learn visual common sense. Watch TwentyBN’s Chief Scientist, Roland Memisevic, present this brilliant work at RAAIS 2017 here.
🔬 Research
Here’s a selection of impactful work that caught my eye:
One Model to Learn Them All, Google Brain, Google Research and University of Toronto. Solving speech recognition, transcription, translation and image recognition with deep learning networks is a proven today. However, each one of these problems requires its own bespoke-trained network, perhaps with specific architectural modifications to reach state-of-the-art. In the pursuit of a network that solve multiple tasks, this domain-specific approach is obviously not optimal. To this end, the authors present the MultiModel architecture - a single, unified deep learning model that can simultaneous be trained and learn to solve object recognition in images, translation tasks, image captioning, speech recognition and English language parsing. Awesome. Here, input data from these different modalities are converted into a joint, unified representation space by sub-networks (termed modality nets). The MultiModel itself includes depthwise-separable convolutions (originally for image processing), an attention mechanism, and sparsely-gated mixture-of-experts layers (originally for language problems). While the network is not state-of-the-art at present, the authors achieve impressive results with an elegant and flexible model architecture!
Deep Neural Networks Do Not Recognize Negative Images, University of Washington. As powerful function approximators, deep neural networks achieve remarkable performance on pattern recognition tasks. Computer vision is a great example: a network can learn a complex function to map input pixels to output classifications of interest. However, does this network actually learn the semantics of the objects in the image or does it instead simply memorise the statistical distribution of inputs? Humans of course master the former, as evidenced by our ability to express an overall sense of objects and recognise them regardless of altered color, orientation, brightness and scale. To test this idea, the authors take original images from the MNIST and German Traffic Signs Recognition Benchmarks datasets and generate negative images, which are simply original images with reverse brightness (light areas become dark and vice versa). While a DNN trained on these original datasets achieves state of the art (95-95%) classification accuracy on the test set, the same network is incapable of recapitulating this performance on negative images (4.39%-6.68%). As such, the inability to recognise transformed inputs reveals further limitations of deep learning to semantically generalise. Bad news.
Emergence of Locomotion Behaviours in Rich Environments, DeepMind. Reinforcement learning agents require a well-defined reward function (e.g. game score) by which to assess the effectiveness of novel behaviours (policies) that they discover during training. However, it’s often unclear what the right reward function should be from either an objective standpoint (solve a problem in the best way, e.g. optimise energy utilisation) or human standpoint (solve a problem the way a human would, e.g. self-driving). Here, the authors posit that given a simple reward function, an RL agent will learn rich and robust behaviours if placed in a training environment that is equally rich and diverse. Indeed, by presenting an agent (Quadruped, Planar Walker, and Humanoid) tasked with moving forward while facing a diversity of obstacle course challenges, they show that the A3C agent can learn solutions that are robust across settings. Check out the videos here! Importantly, the authors present a robust policy gradient algorithm, suitable for high-dimensional continuous control problems, that can be scaled to much larger domains using distributed computation.
Distral: Robust Multitask Reinforcement Learning, DeepMind. This paper addresses the problem that using deep neural networks to scale reinforcement learning algorithms in complex and rich environments requires lots of data while struggling to learn multiple tasks concurrently or in sequence. Here, the authors present an approach for joint training of multiple tasks, referred to as Distral (Distill and transfer learning). In this framework, knowledge gained by an RL agent in one task is “distilled” into a policy that captures common behaviour shared across other tasks the agent must solve simultaneously or in sequence. This shared policy acts as a constraint when workers are trained to solve their particular task. When testing in the first-person 3D maze environment, DeepMind Lab, the authors find that Distral algorithms learn faster, achieve higher final performance, are more stable and robust to hyperparameter settings than multitask A3C baselines. This is exciting work towards generalising the learning ability of RL agent in multitask settings.
Quick mentions of other great work:
SEP-Nets: Small and Effective Pattern Networks, Snap Research and The University of Iowa. This paper proposes a method for compressing large convolutional neural networks so they can run on mobile devices. A 5.2MB SEP-Net achieves 65.8% top-1 accuracy vs. 60.4% from a 4.8MB SqueezeNet (by DeepScale.ai) and 63.7% by a a 5.2MB MobileNet (by Google). MobileNet and SqueezeNet are state-of-the-art here. Interestingly, a much smaller 1.3MB SEP-Net still achieves 63.5% top-1 accuracy on ImageNet.
CAN: Creative Adversarial Networks Generating “Art” by Learning About Styles and Deviating from Style Norms, Rutgers University, Facebook AI Research, College of Charleston. The authors build on generative adversarial networks to make them able to generate creative art by maximising deviation from established styles while minimising deviation from art distribution (i.e. make art that is novel, but not too novel). This is accomplished by having the agent encode and continually update its memory of the art it has been exposed to, and use this memory while generating new art.
Deep Reinforcement Learning from Human Preferences, DeepMind and OpenAI. Most RL systems are designed to learn complex behaviours in an environment by trial-and-error by their own accord. Here, the authors introduce human supervision into the loop by asking non-experts whether policies an agent develops in Atari games without access to the reward function are good or bad. In this way, humans can spot and correct undesired behaviours and agent will learn when it explores different policies. Blog post here.
Attention Is All You Need, Google Brain, Google Research and University of Toronto. The most competitive neural translation models today employ an encoder-decoder structure based on recurrent or convolutional neural networks. These networks also use attention mechanisms to focus the network on specific input characters or words when computing output translations. Rather paradoxically, this paper shows that an encoder-decoder architecture can be used with attention mechanisms alone; that is, no CNN, RNN or LSTM. They show this architecture is more parallelizable and requires less time to train (days).
Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing, University of Pennsylvania. The motivation for this work is to remove technical barriers that hamper data sharing in the clinic. The authors use GANs applied to the Systolic Blood Pressure Trial data analysis challenge to generate realistic simulated participant blood pressure trajectories. The discriminator of the GAN is the only component that has access to the real, private, data and is thus trained under differential privacy. They show the model learns realistic distributions of the data and can classify participants in the dataset.
📑 Resources
Watch RAAIS 2017 talks from the founders and engineering leaders of Graphcore, Mapillary, Google Research Europe, TwentyBN, Orbital Insight, Blue Yonder and more.
Need more sources to keep track of progress in AI? Check out a) the Electronic Frontier Foundation’s “AI Progress Measurement”, a project that collects problems and metrics/datasets from the AI research literature, and tracks progress on them, and b) RealAI.org’s curation of research papers along specific themes and institutions.
Venture firm, a16z, publish a helpful wiki on AI, which includes explanations, demos and benchmarks.
Facebook AI Research released ParlAI (“par-lay”), an open source Python-based framework for training and testing dialogue models across multiple tasks at once. Five categories of tasks are available: Q&A, sentence completion, goal-oriented dialog, chit-chat dialog, and visual dialog.
OpenAI open sourced Baselines, a set of implementations of DeepMind’s Deep Q-Learning algorithms to make it easier for the research community to replicate, refine and identify new ideas. More will be added soon!
Fashion company StitchFix publish an awesome interactive website exploring how they use data science across warehouse assignment, recommender systems, matchmaking, logistics, demand modelling, inventory management and more.
Exploring LSTMs - keen to understand the mathematics and code behind these popular recurrent models? Check this out.
Ian Goodfellow of Google Brain delivers a talk on Generative Adversarial Networks at NVIDIA GTC 2017. Watch the video here.
NLP News, curated by Sebastian Ruder - a focused newsletter, check it out.
Building a GPU deep learning box, all by yourself.
AI will add $15.7 trillion to the global economy, says research by PwC.
💰 Venture capital financings and exits
182 deals (66% US and 15% EU) totalling $1.44bn (43% US and 5% EU).
China is going big! SenseTime Co, a company selling facial recognition systems to Chinese law enforcement agencies, raised a $410m Series B at a pre-money of more than $1.5bn! This follows a $10m Series A raised in 2014.
Darktrace, the leading enterprise cybersecurity company, raised $75m led by Insight Venture Partners at $825m valuation.
Diffblue, an Oxford University spinout work on machine learning tools for software development, raised a $22m Series A led by Goldman Sachs Principal Strategic Investments and Oxford Sciences Innovations.
Drive.ai, a self-driving taxi service company, raised a $50m Series B led by NEA.
Google launched Gradient Ventures to fund AI-driven companies (and keep their top AI researchers and engineers from leaving to startup land).
Toyota launched Toyota AI Ventures, a $100m fund, with the #1 priority being “access to scarce talent and disruptive technologies”.
24 acquisitions, including:
MindMeld, a provider of machine learning-based conversational software, was acquired by Cisco for $125m in cash and equity grants. According to Cisco, the acquisition “will power new conversational interfaces for Cisco's collaboration products, revolutionizing how users will interact with our technology, increasing ease of use, and enabling new cognitive capabilities.” Founded in 2011, MindMeld raised $15.4 million from IDG Ventures, GV, Greylock Partners, Bessemer Venture Partners and Intel Capital, among others.
Lattice Data, a startup commercialising DeepDive, the open source system to extract value from dark data was acquired by Apple for a reported $200m. Lattice Data would create structured data (SQL tables) from unstructured information (text documents) and integrate such data with an existing structured database. It’s therefore able to extract sophisticated relationships between entities and make inferences about facts involving those entities. The company raised $20m in venture capital from GV, Madrona (backers of Turi, which Apple also bought for $200m last year) and InQTel. Lattice Data was founded by the co-creator of Hadoop and a Stanford MacArthur Genius Award winner. The company employed two dozen staff.
Niland, a Paris-based startup using ML to optimise music search and recommendation, was acquired by Spotify for an undisclosed (assumed very small) sum. The team will relocate to New York to work on personalisation features.
-
Congratulations on reaching the end of issue #20! Here’s some comic relief :)
Anything else catch your eye? Comments, questions? Just hit reply!