Your guide to AI in April 2018, by nathan.ai
Reporting from 19th March 2018 through May 1st 2018
Hello from London ☀️! I’m Nathan Benaich. Here’s the April Edition of my monthly guide to AI. I’ll synthesise a narrative analysing and linking important news, data, research and startup activity from the AI world. Grab your beverage of choice ☕ and enjoy the read!
Do hit reply if you’re up for a brainstorming session on use cases, new research or ways to future proof your SaaS or enterprise product by implementing ML where it makes sense.
If you'd rather not receive this newsletter and were too afraid to offend me, GDPR is the excuse you've been looking for! 😆 Scroll to the bottom to unsubscribe.
Referred by a friend? Sign up here. Spread the word by giving this it a tweet :)
💥 4th Research and Applied AI Summit
T-4 weeks until our flagship annual event! RAAIS is a community for entrepreneurs and researchers who accelerate the science and applications of AI technology. We've convened an exciting lineup that has built/scaled products used by billions (Street View), founded and sold tech companies (Lattice Data), published influential research (DeepMind, Brain, UToronto, Princeton), and developed industry leading ML tooling (Keras, TensorFlow).
🆕 Technology news, trends and opinions
🌍 AI and the nation state
🇫🇷 France:
The French government announced a bullish plan to offer €1.5bn of public support for AI by 2022. The goal is to stem its brain drain and catch up to the US and China. It builds on President Macron’s vision to make France a “startup nation” and is underpinned by Cédric Villani’s special report, For a Meaningful Artificial Intelligence. Villani supports the opening government data collectives, new ethical frameworks, increasing public-private partnerships, increasing academic salaries, and the recruiting of 400 AI experts to France over the next two years. Read more here (FYI, the report is 154 pages long).
DeepMind has opened an office in Paris (announcement) to be led by Rémi Munos. Google has also launched a new AI research team in Google Paris. The company is also sponsoring academic positions at Polytechnique, projects at INRIA, as well as training programs for PhDs and Postdocs.
🇬🇧 The UK:
The House of Lords Select Committee on AI released their own report, AI in the UK: ready, willing and able? (FYI, the report is 183 pages long). The government pledges to invest £300m more in AI research (on top of £300m pledged to technology last year). This includes funding for training 8,000 educators, 1,000 AI PhDs by 2025 and a Turing Fellowship programme to attract talent to the country. This still feels insufficient. The country needs to capitalise on the opportunity in AI especially as it prepares to leave the EU.
The UK, along with 24 EU member states, are also signatories to a high-level cooperation agreement on AI. The how, who and when answers aren’t fleshed out.
Of course, the big critique of these European AI plans is that they must address the pay scale delta between engineers and researchers in Europe vs. the US, both in industry and academia.
🇺🇸 The US:
The Trump administration has ditched the AI plan Obama published on his way out from office. It has no intentions of devising its own plan. Instead, it believes there’s no need for an AI moonshot and that minimizing government intervention is the right approach. False. At the very least, the US government should be massively supporting research and commercial development of AI, given it’s mighty talent pool and leading technology companies. Yes, it already offers significant grants to universities. However, the talent market is global and while the US is winning today, it must compete like any other nation to sustain it’s advantage. It must also forward plan for the inevitable changes to the nature of work, offer regulatory guidance (autonomous cars is an example), and educate its citizens to what the future will behold.
🇨🇳 China:
As we know, the country continues to blaze its AI path forward to world domination by 2030. The government is fast tracking Baidu’s self-driving technology through the Xiongan New Area, a new ‘smart city’. The company’s Apollo program has been signed to power the new city’s vehicle infrastructure, including passenger vehicle fleets, street cleaners and public busses.
Alibaba is powering ML-based services on the public transport system (customer speech and face recognition while they purchase a ticket). As the 5th largest cloud computing provider in the world, Alibaba is already forking out $2.6bn in R&D last year and will triple this budget with their 3 year $15bn commitment on emerging technology research through their DAMO academy.
🔮 Where AI is heading next
On the current “AI revolution”: In a lovely piece, Prof. Michael Jordan of Berkeley explores many of the central tenets driving the excitement around AI today. He makes the case for a new engineering discipline, defines the differences between human-imitative AI (i.e. general intelligence, to some degree), intelligent infrastructure (connected fabric of computation, data and physical entities) and intelligence augmentation (computation and data used to create services to improve human performance at tasks). This excerpt summarises his points beautifully:
“The current focus on doing AI research via the gathering of data, the deployment of “deep learning” infrastructure, and the demonstration of systems that mimic certain narrowly-defined human skills — with little in the way of emerging explanatory principles — tends to deflect attention from major open problems in classical AI. These problems include the need to bring meaning and reasoning into systems that perform natural language processing, the need to infer and represent causality, the need to develop computationally-tractable representations of uncertainty and the need to develop systems that formulate and pursue long-term goals. These are classical goals in human-imitative AI, but in the current hubbub over the “AI revolution,” it is easy to forget that they are not yet solved.”
On paradigm shifts in technology and bridges in between: With hindsight, the evolution of technology is marked by paradigm shifts. The PC, the feature phone, the smartphone, the Web, etc. However, these demarkations are tricky to discern while we’re in between cycles. These cycles too can be macro and micro. That is to say, there are several cycles within the smartphone era itself. When looking at the sensor suite available to engineers of autonomous vehicles, it's clear that LiDAR, radar, and vision are the best sensing modalities we have. However, it’s less clear when and if the puck will slide in a new direction that more faithfully approximates human perception. In this context, Ben Evans asks whether LiDAR is the best we’ve got or whether it’s actually a ‘bridge’ to a more powerful alternative.
On bias and fairness for automated decision systems: This question remains top of mind for businesses and governments who seek to leverage AI. There are two main sources of bias: a) statistical bias, i.e. the training datasets do not represent the statistical properties of the real world, and b) imprints of bias, i.e. the training datasets encapsulate the bias of their creators. Present solutions to these problems focus on remove specific labels that unfairly identify groups susceptible of bias (e.g. last name, ethnicity). However, this is often insufficient because other features may enable specific groups to be pulled out from the data (e.g. postcodes). We need new methods to systematically identify bias in automated decision making systems. DeepMind have set up a research group on the topic.
On reproducibility of AI: We’ve covered this topic in prior editions of the newsletter, but there’s still no widely adopted answer. Of course, this is true not only in AI, but in most quantitative disciplines that deal with sequential optimisation of methods and digital systems. Keeping track of one’s thought process during development, the experimental dependencies, setup, conditions and results at all times is tricky without purpose-built software in place. Indeed, as this piece points out, “there’s no equivalent to source control or even agreed best-practices about how to archive a training process so that it can be successfully re-run in the future.” Solutions are indeed available internally within large technology companies and some startups, including Stockholm-based Peltarion, are working on providing tools for everyone else.
On data monopolies and the future of the Internet: Many have opined on blockchain-based approaches to liberating personal data from the clutches of monopolistic incumbents. At a high level, machine learning truly shines when data is highly granular, task-specific and centralised by the owner/operator of the model. Blockchain-based networks, on the other hand, shine when there’s token-based incentivised collaboration between highly distributed actors, none of whom have data advantages over the other. Is there a overlapping ground where these two poles meet? This piece by Fred Ersham (with contributions from Openmined and Numerai) is a fascinating window into how decentralized machine learning marketplaces can dismantle the data monopolies of the current tech giants. It involves the training of metamodels using secure computation methods. Such a metamodel would request decentralised, staked data and model contributions from a community that is repaid as a function of marginal prediction improvements they delivered against objective evaluation metrics.
On robotic automation: We hear lots about how robots are proliferating all around us. A Swedish economist, however, disagrees on the impact they have on the economy. In the US, labor productivity in manufacturing has not risen in accordance with the increase in robot shipments. On a related note, the lighthouse solution to technological unemployment, namely universal basic income, experienced a setback in Finland where their public trial was cut short.
🚗 Department of Driverless Cars
An autonomous Volvo XC90 vehicle operated by Uber in Arizona struck and tragically killed a pedestrian who was visible to the car’s front facing camera as she crossed outside of a dedicated zone at night. At the inception of their Arizona program, Uber had two people per vehicle: one in the driver seat who would intervene in case the car misbehaved and another in the passenger seat to oversee the perception systems of the vehicle. Prior to the accident, Uber had reduced their teams down to having one employee in the driver seat only. Waymo, on the other hand, still use two operators per vehicle. Post-crash analysis showed that neither the autonomous system nor the driver hit the breaks when the pedestrian, claiming that neither agent had detected her (more analysis here). As a result, other automakers including NVIDIA and Toyota have suspended their public testing programs in the US. Uber has also let go of all four co-founders of Otto.
In fact, Mobileye CTO Amnon Shashua wrote a post on the Intel blog to show how their ADAS technology run on the police video footage could detect the passenger crossing the road with 1 second to spare. He calls for substantive conversations about safety for autonomous vehicles.
In the US, there are 770 accidents every 1 billion miles of driving, according to NVIDIA. A fleet of 20 test cars can only cover 1 million miles a year. For this reason, amongst several others, businesses developing AV systems are investing in simulation environments. At their annual GTC in the US, NVIDIA presented their photorealistic simulation environment with the self-driving perception and planning stack running autonomously (see video).
At the NYC auto show, Waymo and Jaguar Land Rover inked a deal where the automaker will supply up to 20,000 of its new electric vehicles to Waymo for conversion into autonomous vehicles. This deal could be worth up to £1.3bn and follows similar tie ups such as Lyft-Ford, Lyft-Magma, and Uber-Volvo. Waymo are also close to signing a partnership with Honda to create a new autonomous vehicle from scratch.
Tesla has lost their Autopilot chief and suffered its own fatal accident with a Model X that had Autopilot engaged while driving on Highway 101. The driver’s hands were not detected on the steering wheel 6 seconds before the crash. Along with the Uber accident, many are calling for the implementation of greater regulation for public testing. Indeed, I think there should be an independent body charged assessing the performance and safety of the perception, control, and planning systems of self-driving car operators. This piece explores scenarios for how the industry will play out.
Voyage have released their open autonomous safety manual (a-la-API documentation) covering scenario testing, functional safety, autonomy assessment and a testing toolkit. Cool initiative to drive protocol consensus in this space.
Apple filed for a patent at the nexus of virtual reality and autonomous vehicles. They lay out potential configurations of vehicles where VR is the primary mode of entertainment and means through which to perceive the outside world. You could basically be hang gliding or surfing down the 101 instead of falling asleep in traffic :-)
BMW have finally launched a purpose built center for AV R&D outside Munich! Just in time to try and compete on talent with others who have already set up shop, including Lyft ;-)
Alibaba is running public road tests with its self-driving development cars and is hunting for 50 people for its team (of unknown current size). No blessing from the government yet (like its rival Baidu).
💪 The giants
There’s been a reorganisation at the helm for Google AI. John Giannandrea (JG), SVP Engineering who previously oversaw both search and AI product development, left the company and joined Apple as their Head of AI. Jeff Dean, who has played a pivotal role in Google from the early days and is widely considered to be a legend, is now in charge of company’s AI work. This is clearly a sign for how serious the company is with disseminating AI throughout its products. Indeed, their 2017 annual letter to shareholders lists many real use cases where ML powers Google products and features.
Microsoft underwent its own reorganisation to focus on AI. The company is now split into two divisions: Experiences + Devices, and Cloud + AI platforms. More details here.
Google also came under scrutiny by its own AI developers following word that the company would engage with the US defense department. With Project Maven, the idea is to use Google’s cloud computer vision capabilities “to increase the ability of weapon systems to detect objects”. While the Google Cloud organisation seemed for the contract, AI engineers have been signing a petition for the company to halt this project, otherwise they would resign. Good response.
Scarily, a similar situation has played out in Korea. In February, leading defense business (Hanwha Systems) and academic institution (KAIST) launched a joined research center at KAIST to co-develop AI applied to military weapons. It was said to include AI-based missile systems, unmanned submarines and armed quadcopters.
Following on this of ethics, automation and privacy, Bloomberg ran a piece on Palantir and their practices of data aggregation in the enterprise, as well as the battlefield.
In a more bullish note, Jeff Bezos wrote to his shareholders to say that “tens of thousands of customers are also using a broad range of Amazon Web Services machine learning services, with active users increasing more than 250 percent in the last year, spurred by the broad adoption of Amazon SageMaker.” Impressive launch! Alexa too is getting an upgrade to her brain, including a newfound memory unit!
🍪 Hardware
Goldman Sachs published a report where they sized the market opportunity for hardware in the age of AI. The 30,000 ft view suggests that the overall AI hardware TAM has potential to grow from $12bn in 2017 to $35bn/$100bn+ by 2020/2025. This is driven by a) the need for significantly more compute and memory for training networks with enormous numbers of parameters, b) AI-related workloads accounting for a greater share of all datacenter workloads and c) the higher bill of materials for the hardware itself.
NVIDIA announced their DGX-2 box, which claims 10x the compute of the DGX-1 because it includes 16 GPUs (2x more than DGX-1), 32GB (and faster) memory per GPU, and a faster GPU interconnect. They’ve created a network fabric with 5x the bandwidth of top PCIe switches on the market to connect all 16 GPUs together vs. a point to point connection. What’s more, it now takes 18 minutes to train AlexNet on a DGX-2 vs. 6 days on two GTX 500 chips (state of the art in 2012). 500x speedup in 5 years!
Tesla have shipped their Autopilot 2.5 hardware (tear down here), which includes a secondary GPU.
Facebook has set up a semiconductor group to develop its own system-on-chip/ASIC, firmware and driver development team, mostly likely due to the cost and performance advantages derived from creating a custom chip fit for Facebook’s workloads.
Microsoft have said they’re entering the AI hardware space, although details are scarce. This is concomitant with a push for Cortana not to be left totally by the wayside.
Google AI (as it’s now called) published benchmarks on the computational time and cost required to train on networks on ImageNet using different frameworks, networks and hardware. They show that ImageNet classification training time and cost using Google Cloud TPUs+AmoebaNet (architecture learned via evolutionary search) to produce a train a model to 93% top-5 accuracy takes less that 7.5 hours and costs just shy of $50.
Spotify is working on a in-car device for streaming their content and will include a slew of audio commands. Interestingly, Uber also seems to be making a push towards voice by setting up a conversational AI group.
Robots have been shown to automatically plan and control the assembly of an IKEA chair! However, the robot still needs to be told the step-by-step instructions from those pesky plans.
We’re also set to see more innovation in the materials science industry, where it’s been recently shown that machine learning predictions can help discover novel materials.
🏥 Healthcare
Babylon, the UK-based mobile telemedicine startup, will deploy its services via WeChat thanks to a deal signed with Tencent. One billion users will be able to message medical systems to Babylon’s app and receive healthcare advice in return. This deal follows a similar arrangement with the Saudi Arabian ministry of health.
The FDA issued an authorisation for IDx-DR, a deep learning based system for diagnosing eye disease in diabetic adults. The screening involves standard retinal imaging, takes less than a minute, and can be performed without a clinician's interpretation of the images or results. Interestingly, approval required a clinical trial to achieve screening recommendations and corresponding retinal images from over 800 diabetic patients at ten different primary care sites. The study “indicated that IDx-DR correctly identified the presence of more than mild diabetic retinopathy 87.4% of the time and correctly identified patients with less than mild diabetic retinopathy 89.5% of the time.”
📑Careers📑
TwentyBN, the video understanding company are hiring for engineering, research and product roles in Berlin and Toronto. Reach out to Ingo Bax.
Onfido, the identity verification engine backed by Salesforce and Microsoft, are hiring for engineering and research roles in London and Lisbon. Reach out to ZeShaan Shamsi.
Optimal, the AI-controlled indoor farming systems company, are hiring senior software engineers, ML research engineers and model-based RL engineers. Reach out to Dave Hunter.
PolyAI, are hiring full stack and front end software engineers in London. Reach out to Nikola Mrkšíc.
PROWLER.io, the decision company, are hiring a data science lead, data engineers and ML engineers in Cambridge. Reach out to Juha Seppaä.
Jukedeck, the developers of musical AI, are hiring a operations and business development leads as well as research scientists in London. Reach out to Patrick Stobbs.
🔬 Research
Here’s a selection of impactful work that caught my eye:
Deep gradient compression: Reducing the communication bandwidth for distributed training, Stanford, Tsinghua, NVIDIA and Google Brain. We’re living in an era of massive compute where large-scale distributed training improves the productivity of training deeper and larger models. The costly aspect of this paradigm is the need to pass gradient updates from synchronous stochastic gradient descent through a distributed training network. The network bandwidth therefore becomes a significant bottleneck for scaling up distributed training. This bandwidth problem gets even worse when distributed training is performed on mobile devices, such as federated learning (a proposed solution to preserve user privacy). In this work presented at ICLR 2018, Song Han and colleagues propose a method (“Deep Gradient Compression”) to reduce the communication bandwidth by sending only the important gradients (sparse update) first and letting sub-threshold gradients accumulate locally before they are sent. They mitigate the staleness issue of having certain updates wait before being integrated by using momentum factor masking and warm-up training. They show how DGC results in similar learning rates to non-compressed gradients on image classification, language modelling and speech recognition tasks. Interestingly, DGC achieves 597x and 277x compression of gradient size for AlexNet and ResNet-50 with no concomitant loss in Top-1 or Top-5 accuracy. This paper provides fascinating insights into how the network may no longer be a bottleneck for distributed training. Thanks to Dan @ Point72 for shooting this over to me!
Can agents learn inside of their own dreams?, Google Brain and NNAISENSE. If you think about the way you quickly navigate the world and act within it, you’ll be unsurprised to learn that your mind is building mental models of that world based on limited information you have about it. This abstract spatial and temporal representation of the world helps you act within it and deal with the continual deluge of new information your brain must process. In this paper, Ha and Schmidhuber build such world models for RL agents in the context of video games. They first train a large neural network to learn a model of the agent’s world in an unsupervised manner, and then train the smaller controller model to learn to perform a task using this world model. Interestingly, the agent has three components: vision (a VAE model to encode a hi def visual observation into a low def vector), memory (predict future states based on history using a probabilistic RNN) and a controller (select actions based on vision + memory). By training the agent through the lens of its world model, they show that it can learn a highly compact policy to perform its task. This means that one could reduce the resource requirements for training agents in otherwise computationally intensive game engines.
Learning to navigate in cities without a map, DeepMind. Raia Hadsell (who spoke at RAAIS 2017) and her collaborators used Street View imagery to build an environment in which they trained RL agents to navigate using pixel-based visual queues without prior knowledge of a map. There's three modules: 1) a CNN to process the visual scene, 2) a first locale-specific RNN that memorises the city in its internal states (locale-specific features and topology) and 3) a second RNN that produces a navigation policy over the agent's actions. The action space is composed of 5 discrete actions (forward, left/right fast or slow). The goal vector is the proximity of the agent from landmarks (lat/long). They start with tasking the agent to reach a goal 500m away and progressively move up to 5km in London in a courier problem. After the agents achieves a goal, it's given another one, until it runs 1000 steps to finish the episode. What I like here is that they not only show that such a method is tractable on Street View data (pretty neat!) but they also show that the trained models transfer quite well to new cities, only replacing the locale-specific RNN component. If you’d like to dive further into navigation methods, here’s a paper that compares how different flying animals and insects manage to navigate with short-range visual guidance.
Other highlights:
An analysis of neural language modelling at multiple scales, Salesforce Research. Simplicity is still competitive! Standard LSTM/QRNN models can achieve state-of-the-art results on character (PTB, enwik8) and word level (WikiText-103) language modeling datasets in 12-48 hours w/ a single GPU, according to author Stephen Merity.
You can now paint by writing textual instructions: Image generation from scene graphs.
Meshed up: Learnt error correction in 3D reconstructions. The team at Oxford Robotics Institute use a CNN to predict residual error in the depth map from a low-quality reconstruction with respect to a high-quality reconstruction. At run time, they subtract the predicted error from the depth-map to obtain a better one, that can then be used in a new, better reconstruction.
Scaling throughput processors for machine intelligence, a recent talk by Simon Knowles of Graphcore. The group also published work on revisiting small batch sizes for training neural networks, which delivers more stable and reliable training. Note that the industry has moved to large batches mostly because that’s the best fit for the GPU substrates we’re training networks on.
Prefrontal cortex as a meta-reinforcement learning system. This paper provides evidence that dopamine actually trains the prefrontal cortex to operate as its own free-standing reward-based learning system.
Uber AI labs present a new method for meta-learning termed differentiable plasticity. It seeks to solve the issue that fully-trained networks don’t learn as they process new data because their network parameters remain fixed. Instead, this paper introduces a modified neural network activation function whereby the input weights have both a fixed component and a plastic component. The latter is automatically updated as a function of ongoing inputs and outputs.
Ever found it difficult to understand a speaker on a video conference when there’s more than one of you in a room? Google published an approach that focuses the audio channel on the voice of a speaker of choice by integrating representations of the video and the sound. It’s called Looking to listen: Audio-video speech separation. This could be great add on for Zoom + Owl teleconferencing systems :-)
📑 Resources
Here’s a nice walkthrough of DeepMind’s AlphaGo system, focusing on the theory and practice behind Monte Carlo Tree Search.
NVIDIA open sourced NVVL: a library that provides GPU accelerated video decoding for DL training. Save 40X on your storage space and bandwidth, reduce CPU load by 2X when training on video datasets. Great for GPU dense systems like DGX-2.
Ian Goodfellow shared comments on how to review papers offering new methods that are meant to generically make GANs train more reliably or produce better samples. He points to consistency in using the right benchmarks, not cherry picking the best looking image samples, nd where the hyperparameters came from.
Tensorflow.js: A WebGL accelerated, browser based JS library for training and deploying ML models.
OpenAI launched a transfer learning contest and dataset that measures the ability of RL algorithms to generalised to new experiences.
Slidedeck from Imperial: An introduction to deep reinforcement learning and here’s another line-by-line breakdown of how to train a pong RL agent in PyTorch.
Here’s a cool post about computer vision techniques applied to medical imaging.
Some video demonstrations to demystify neural networks.
There’s little work in the open about what it takes to reproduce open source research. This piece describes an 8 month project to reproduce Deep reinforcement learning from human preferences. Another piece provides a step-by-step reproduction of Uber’s Deep Neuroevolution paper, in this case with less compute than the original team had access to.
💰 Venture capital financings and exits
220 deals (64% US, 15% EU, 13% Asia) totalling $3B (55% US, 7% EU, 35% Asia).
Benevolent.ai, the London-based technology/life sciences drug discovery company, raised a round of $115M, which values the company at $2.1bn. The company has initiated >20 drug programmes to date based on its internal knowledge base and causal representation of drugs, molecular pathways and disease symptoms. It is also exploring other applications of its technology in advanced materials, agriculture and energy storage.
SenseTime, the Chinese face recognition company that is also publishing computer science textbooks for schools (!), raised a $600M Series C led by Alibaba. The four year old company is now valued at $4.5B has has 314 employees.
SiFive, the open-source chip platform from the makers of the RISC-V architecture, raised a $50M round as an alternative to ARM’s chip platform.
Mapillary, the world’s largest collaborative platform for street level imagery, raised a $15M Series B to scale up its computer vision-powered software products for automotive and its global community of contributors.
6 River Systems, the Boston-based makers of warehouse robotics, raised a $25M Series B from Menlo Ventures. They’ve deployed 600 robots across 30 sites with a SaaS business model or 1-2 year rental contracts. The team is ex-Kiva Systems (acq. Amazon).
15 acquisitions, including:
Kensho, the financial analytics software designed to augment and power decision making throughout the global financial system, was acquired by S&P Global for $550M. The company raised $105M over three rounds. Unfortunately, it looks like the Series B post-money was $595M, suggesting that the outcome was rather tempered.