🏡Your guide to AI: June 2020
Welcome back to regular readers and hello to everyone who joined since last month!. Enclosed you'll find Your Guide to AI: June 2020. I'll cover key developments in AI tech, geopolitics, health/bio, startups, research and blogs.
Startups: If you’d like to chat about your startup or a project you’re working on, just hit reply!
Help spread the word: If you enjoy this issue, I’d appreciate you hitting forward to a couple of friends 👍
🆕 Technology news, trends and opinions
🏥 Healthcare and life science
Genomic medicine, which uses genetic information about a patient's health condition to inform diagnostic and treatment, is an area that’s ripe for AI applications. As a primer on this topic, a report from the phg foundation in Cambridge is worth a read. It evaluates why AI is relevant to genomic medicine, what applications are out there in the wild today, and policy considerations for further adoption. Part of the reason we don’t see more AI in genomic medicine is the issue of limited access to quality datasets (an issue that Pearse Keane discussed at our RAAIS conference last week), imbalanced datasets that result in algorithmic bias, and immature technology infrastructure to facilitate the adoption of predictive systems in healthcare.
One such solution to these problems is federated learning (FL), whereby an ML model is sent to train where a dataset lives (vs. the other way around in traditional ML). Many FL initiatives (either academic, open source or in startups) focus on healthcare use cases. In this perspective paper, the authors present an “overview of current and next-generation methods for federated, secure and privacy-preserving artificial intelligence with a focus on medical imaging applications, alongside potential attack vectors and future prospects in medical imaging and beyond.” The driving motivation is the bountiful opportunities for ML in healthcare that is rate limited by the difficulty in safely accessing this data without breaching the subject’s privacy.
Another area of therapeutics that’s heating up is protein engineering. LabGenius (an Air Street Capital portfolio company) shared their 8-step approach to AI-first protein engineering applied to therapeutic discovery and development. This is a great primer if you’re interested in the space.
🌎 The (geo)politics of AI
As a field that’s strongly rooted in academia, the AI industry needs a steady flow of post-graduate talent. A study of 175 randomly selected NeurIPS 2019 papers found that of 128 authors who held undergraduate degrees from China (30% of the sample set), over half of them went on to earn graduate degrees in the US and currently work in the US. Although this trend has been quite apparent on the ground for a while, it is the US’ ability to stay as a talent magnet that is being questioned now more than ever. Cast against the government’s trade sanctions, company-non-grata lists, H1B visa freezes and overall policy hostility towards China, things aren’t looking good. For the post-graduation work share balance to really shift from the US to elsewhere (e.g. Europe), however, contending nations need to embrace immigration, open talent visas, have companies and universities pay competitive salaries, grow government expenditure on R&D and the like. Perhaps fortunately for the US, this wish list doesn’t appear to be materialising. I also suspect that US companies will be more willing (given COVID work from home) to establish foreign subsidiaries to keep hold of talent that is forced to leave the US.
On the topic of talent and training, Eric Schmidt’s foundation Schmidt Futures confirmed a donation to the University of Cambridge called the Accelerate Program for Scientific Discovery. The initiative provides machine learning training to PhD students in the sciences.
🚗 Autonomous everything
Self-driving car companies are hitting the streets again after their COVID-induced hiatus that started in mid-March. Minivans from Waymo and cars from Lyft are out testing in California. On the other hand, Aurora confirmed that it has been working on using its self-driving technology to power trucks. Co-founder Sterling Anderson affirmed that the safest application of AVs are indeed trucks.
The US’s National Highway Traffic Safety Administration unveiled AV TEST, a new initiative to provide “an online, public-facing platform for sharing automated driving system on-road testing activities”. The initiative spans eight US states and nine self-driving companies including Waymo, Uber, Cruise and Nuro.
Meanwhile, large investments continue to accrue to the large players. Argo closed a $2.6B deal with Volkswagen, which sees its own Munich-based Autonomous Intelligent Driving unit folded into Argo. Ford and VW will now share the costs of developing Argo.
Amazon signed an agreement to acquire Zoox for a reported $1-1.2B, which is just a pinch more than the amount of venture capital the company had raised. Of note, Amazon will let Zoox run as a standalone company on a mission to reinvent the autonomous car. It is expected that Zoox vehicles could plug into Amazon’s logistic network to further compress delivery times and let the company own more of its value chain.
From ground AVs to the air, Airbus demonstrated the world’s first fully automatic vision-based autonomous taxi, takeoff and landing of an A350 aircraft.
💪 The giants
Following the release of their 175 billion parameter GPT-3 language model, OpenAI announced they have developed an API for third party developers to access new models developed by the company. The (private beta) API runs models with weights from the GPT-3 family and offers a general purpose text-in, text-out interface. You can program the API by giving it a few examples of the task you’d like it to solve. OpenAI gives examples of chat, semantic search, table completion in Excel, translation, and text generation. The company received mixed reviews on this release. Some saw it as an unwelcome commercial move given OpenAI’s original non-profit mission and their strong reticence to release model weights when their original paper was published for fear of misuse. Others, however, welcomed the release. I actually think it’s a positive development because having a production software service outcome at the end of a multi-year R&D project ensures that the work is actually useful in the real world. The API provides useful abstractions to widen the relevance of GPT R&D and their limited beta lets the company audit use cases ahead of approval.
Snap announced their Lens Studio 3.0, which introduces SnapML. This service now allows third party developers to add their own ML models into Lenses they create and publish to consumers on the Snapchat application. Aside from Niantic’s mobile games, Snap’s Lenses are one of the few extremely popular AR features running on mobile devices. The company also released PlantSnap and Dog Scanner, which as the names suggest enable users to point their Snapchat camera at plants and dogs to recognise species and breeds, respectively. It’s hard to say if this is a gimmick or whether user data suggests that consumers scan dogs and plants a lot. I’d be willing to bet that the former is true :-)
Facebook shared the results of their open Deepfake detection challenge that drew over 2,000 participants and more than 35,000 model submissions on a dataset of videos produced by 3,500 paid actors. These real videos were altered using a variety of deepfake generation models and refinement techniques. The task for participants was to classify real vs. fake on a public dataset as well as black box dataset that they did not previously see. Although the best model submission achieved 82.56% average precision on the public dataset, the same model dropped to 65.18% average precision against the black box dataset. Such an outcome highlights the challenge for models to generalise to hitherto unseen samples - a task that is of critical importance for real-world, robust ML systems. What’s also interesting is that all the winning submissions used pretrained EfficientNet networks with fine-tuning on the deepfake dataset. The EfficientNet is a computer vision model that was generated using neural architecture search. It is both smaller (by parameter number) and faster (by inference speed) than the best existing convolutional neural network at the time (in 2019).
🍪 Hardware
News emerged that Graphcore (an Air Street Capital portfolio company) has shipped tens of thousands of its IPU processors to some 100 customers around the world.
Cerebras announced that it is building a supercomputer with the Pittsburgh Supercomputer Center thanks to a $5M grant from the NSF. The supercomputer uses two of Cerebras’ CS-1 machines, which Andy Hock described at RAAIS last week. The company is expected to share details about its second-generation system at the upcoming Hot Chips conference.
NVIDIA demonstrated a 20x performance speedup for ETL operations on the TPCx-BB benchmark challenge. Their approach compared a system using sixteen DGX A100 systems (128 GPUs) to a CPU system.
Apple announced that it is moving away from Intel-based processors to its own ARM-based designs for its future products, giving it even deeper full-stack control over its hardware. From a product perspective, Apple Silicon takes aim at a large number of features that would otherwise not be included in a third party CPU. These include high-efficiency audio processing, low-power video playback, power management, secure enclaves, neural engine, cryptography acceleration, and more. This means that for the first time, all Apple hardware products will be able to run similar software, which potentially allows the millions of iOS apps to run on Mac (which currently has some 28,000 apps in the store).
Cambricon, a privately-held Chinese unicorn that designs AI-focused processors, is reportedly seeking an IPO on China’s new Nasdaq-like exchange called Star Market. Of note, the company sold technology to Huawei for inclusion in their first AI-chip (Kirin) powered smartphones. However, Huawei’s AI semiconductor division called HiSilicon has since doubled down on its own development, leaving a dent in Cambricon’s revenue.
Intel announced its first AI-optimised FPGA that now incorporates a block that is tuned for tensor matrix multiplications. They claim that this device is up to 2.3x faster than NVIDIA’s V100 GPUs for BERT batch processing.
🔬Research and Development🔬
Here’s a selection of impactful work that caught my eye, grouped in categories:
NLP
Recent Advances in Google Translate, Google AI. This post shows how the company’s popular translation system has averaged +5 BLEU score improvement over all 100+ languages in the last 12 months. The performance increase is thanks to improvements to “model architecture and training, improved treatment of noise in datasets, increased multilingual transfer learning through M4 modeling, and use of monolingual data.”
Imitation Attacks and Defenses for Black-box Machine Translation Systems, UC Berkeley. It is known that hosted prediction APIs are vulnerable to adversarial attacks. This paper shows that one can train a model to imitate a black-box machine translation (MT) system by inputting monolingual phrases into the MT system and learning to imitate the outputs. The imitation model can be used to deliver adversarial attacks on the MT system such that it outputs semantically-incorrect translations, dropped content, and vulgar model outputs. The authors also demonstrate how to defend from such an attack.
Unsupervised Translation of Programming Languages, Facebook AI Research. A transcompiler is used to convert one high level programming language into another. However, such a tool tends to be rules-based and prone to errors. While ML-based translation systems would be a good fit for this problem, there is a lack of parallel data in this domain that can be used for training. In this paper, the authors use unsupervised machine translation to train a Transformer-based model on source code from open source GitHub projects. They show that this model can translate functions between C++, Java, and Python with high accuracy. The method relies “exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other programming languages.” Of note, the model understands the “syntax specific to each language, learns data structures and their methods, and correctly aligns libraries across programming languages.”
Beyond Domain APIs: Task-oriented Conversational Modeling with Unstructured Knowledge Access, Amazon Alexa AI. This paper evaluates the issue that today’s conversational agents are typically restricted to providing information that is immediately available from the APIs or databases they are directly connected to. The authors propose to expand coverage of task-oriented dialogue systems by incorporating external unstructured knowledge sources so that unexpected conversations can be graciously handled.
How big should my language model be? Hugging Face. This paper studies the relationship between model size and training speed to predict the optimal model size and its performance given a certain GPU wall time or $ training budget. They find that big models are surprisingly efficient to train and that training until convergent isn’t an efficient path.
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?, Iandola, Shaw and Berkeley. This paper is written by the authors of SqueezeNet, an approach to compressing the size of a computer vision model by order of magnitude. They evaluate whether methods that worked well in vision, namely grouped convolutions, can be applied to BERT-based NLP models. The motivation is that BERT-base takes 1.7 seconds to classify a text snipped on a Pixel 3 smartphone, which is noticeably slow to the end-user. The paper presents SqueezeBERT, which runs 4.3x faster than BERT-base on Pixel 3 while achieving competitive accuracy on the GLUE test set.
Computer vision
Learning with Verification: Improving Object Recognition with the Community’s Input, Mapillary. This paper demonstrates how to use human verification of machine-labeled street view images to significantly improve object detection accuracy. The approach is particularly important for datasets that are rapidly growing and changing, e.g. those of the real world!
Learning feature matching with graph neural networks, ETHZ, and Magic Leap. This paper addresses the problem of local image feature matching for computer vision-based navigation or scene reconstruction systems (e.g. SfM or SLAM). The system uses a graph neural network to simultaneously perform context aggregation, matching, and filtering of local features. The result is high fidelity matching (hence SuperGlue) that can run in real-time on a GPU.
PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization, Facebook, USC. This paper demonstrates impressive results in image-based 3D human shape estimation. Current approaches either generate a coarse shape that is subsequently refined using explicit geometric representations or use high-fidelity models of humans. In contrast, PIFuHD offers an end-to-end multi-level framework to infer the 3D geometry of humans that retains the details in the original inputs without any post-processing. The method also handles uncertainty in regions that are not observed (e.g. the back) so that reconstructions are of high detail. This is useful work for synthetic media and avatar building. Related to this work is a paper from Oxford and Apple called Equivariant Neural Rendering that proposes a framework for learning neural scene representations from single images without 3D supervision.
Generative Pretraining from Pixels, OpenAI. Recent advances in NLP such as GPT-2 and other transformer-based architectures have shown the power of unsupervised pretraining in sequence-based tasks in language. This paper asks whether the same GPT-2 architecture can be used for unsupervised pretraining on images if they are treated as sequences of pixels (termed image GPT-2). Indeed, they show that iGPT-2 can be trained to auto-regressively predict pixels without incorporating knowledge of the 2D input structure. The features they learn are competitive with those from unsupervised convolutional networks, but they require significantly more compute to get there. Even so, this is a cool result!
Reinforcement learning
Learning to Play No-Press Diplomacy with Best Response Policy Iteration, DeepMind. Diplomacy is a strategy game where players control armies that aim to gain control of provinces. While it has simple rules, the game has high emergent complexity, which creates a difficult environment for RL algorithms. In this work, the authors propose new RL methods that can handle large combinatorial action spaces and simultaneous moves. The resulting RL agents outperform the previous state-of-the-art.
Neuroevolution of Self-Interpretable Agents, Google Brain. In this paper, the authors introduce self-attention (popularised by the Transformer model) to train RL agents so they focus their attention capacity on a small number of relevant visual inputs. This approach encourages an agent to attend to only a small fraction of its visual input, which makes the agent easier to interpret in pixel space.
Systems and methods
Once-for-all: Train one network and specialize it for efficient deployment, MIT, and MIT-IBM Watson AI Lab. This paper addresses the problem of developing an optimised neural network architectures for a specific hardware substrate. Existing methods use either engineer-based design or neural architecture search, both of which need to repeat the network design process and retrain the designed network from scratch for each use case. As such, the process of developing an increasingly large array of devices becomes expensive in time and compute. This paper introduces a solution that decouples model training from neural architecture search. First, they train a once-for-all network to have a large number of sub-networks (more than 10^9) that have a different depth, width, kernel size, and resolution configurations. Second, they sample a subset of sub-networks to train an accuracy predictor and latency predictors that guide neural architecture search given hardware and latency constraints in question. The approach results in networks that have the same or greater accuracy than MobileNetV3 and EfficientNet but require orders of magnitude fewer GPU hours to train.
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding, Google. This paper considers the issues with massively scaling Transformer models to orders of magnitude larger sizes than the capacity limit of a single accelerator GPU or TPU memory. These include support for efficient model parallelism algorithms, the super-linear relationship between the computation cost and the model size, and model partitioning strategies. The authors build a 600 billion parameter sequence-to-sequence Transformer model and train it on 2048 TPU v3 devices for 4 days to solve multilingual machine translation tasks. To do so, they introduce a GShard module that offers a set of lightweight annotation APIs and an extension to the XLA compiler.
Science (bio, health, etc.)
Human postprandial responses to food and potential for precision nutrition, ZOE (an Air Street Capital portfolio company), King’s College London, University of Nottingham, Mass General Hospital, Harvard, Lund University, University of Trento. We know that the metabolic response to food influences disease risk such as cardiovascular disease. In this work, the authors describe the results from a nutritional study of 1,002 twins and unrelated healthy adults in the UK that evaluated how blood glucose, fat, and insulin changed after eating specific meals. The study found large inter-individual variability even amongst identical twins for identical meals. Of particular interest is the finding that the gut microbiome had a greater influence on these response profiles than did the meal macronutrients. Surprisingly, genetic variants had a modest impact on explaining meal responses. The authors then built an ML model to predict both fat and glucose responses to food intake.
AMPL: A Data-Driven Modeling Pipeline for Drug Discovery, Lawrence Livermore, GSK. This paper describes the open-source ATOM Modeling PipeLine (AMPL), which integrates modern ML methods with best practices for chemical activity and property prediction. The Python project addresses dataset characterization, hyperparameter optimization, model fitting, model validation, predictions, and uncertainty quantification in an automated fashion for the benefit of the drug discovery community. This is quite exciting because it is a sign of a maturing ML tooling ecosystem in the life sciences.
Determinants of Base Editing Outcomes from Target Library Analysis and Machine Learning, Harvard and MIT. The targeted editing of nucleotides in genomic DNA (e.g. CRISPR-Cas0) is a mainstay for genetic engineering. It allows researchers to make changes to DNA than can introduce or correct genetic mutations to model and treat disease. However, the biological rules that govern base editing outcome precision and efficiency are not known. This results in mixed results depending on what edit we’re looking to make. To address this issue, the authors design an experiment to characterize the result of specific base editors with genetic targets in mammalian cells. Using this data, they train a machine learning model (gradient-boosted regression trees) to predict base editing genomic outcomes and efficiency. They also show how this model can be used to engineer more potent base editors. Taken together, this work shows how machine learning can help decode the obscure rules of biology.
📑 Resources
Blogs and reports
Zero-Shot Learning in Modern NLP: Using state-of-the-art NLP models for sequence classification without large annotated training sets.
Opportunities for ML in effectively managing cloud infrastructure. It all starts with logging and treating data as a first-class citizen in the infrastructure world.
Exploration Strategies in Deep Reinforcement Learning. This blog post by OpenAI discusses a range of exploration strategies, including classical exploration, intrinsic rewards, memory-based, Q-value, and variational options.
Does Tweeting Improve Citations? Yes.
Chip Huyen hit us with another awesome blog post, What I learned from looking at 200 machine learning tools. The post looks at different ML projects and companies, splits them by focus along the ML value chain, and discusses problems in MLOps.
5 design principles for antifragile predictive systems. In another insightful blog post from the team at Tinyclues, the authors describe real-world problems that you face with machine learning systems and how to design reliable systems. A teaser: “The real-world problem that you are solving cannot be modeled by a technical loss function.”
Videos/lectures
Workshop on scalability in autonomous driving, a keynote video presentation from Andrej Karpathy at Tesla.
Datasets and benchmarks
Lyft Level 5 released an exciting new dataset called the Prediction Dataset, which is their largest release yet. This dataset is interesting because it focuses on predicting the motion of traffic agents such as cars, cyclists, and pedestrians. The release includes a paper and SDK, followed by an upcoming Kaggle competition. Sacha Arnoud, Senior Director of Engineering at Lyft Level 5, introduced the dataset at our RAAIS 2020 conference just last week.
AI.Reverie released the largest open-source, very high-resolution synthetic dataset of satellite images for the purposes of assessing the value add for synthetic data in ML model training. They show that object detection and instance segmentation models trained with synthetic data and fine-tuned with 10% of the real observed data perform just as well as models trained with 100% real-world data.
Open Graph Benchmark: Datasets for machine learning on graphs. This paper presents a set of benchmark datasets for different types of graph ML tasks and domains, including biological networks, molecular graphs, source code and knowledge graphs. The authors provide a unified evaluation protocol with application-specific data splits and evaluation metrics.
Recursion Pharmaceuticals open-sourced their COVID-19 screening dataset called RxRx19. The dataset consists of 305,520 fluorescent microscopy images and their deep learning embeddings that represent the treatment of two cell types with 1,672 small molecules at 6 different contractions with three viral conditions. A win for open science and public health!
Open source tools
MLflow, the open-source end-to-end ML platform launched by Databricks, is becoming a Linux Foundation project. MLflow has more than 2.5 million monthly downloads, 200 contributors from 100 companies, and 4x year-on-year growth.
DeepMind released Acme, a framework for distributed reinforcement learning. The goal is to allow users to provide simple descriptions of RL agents that can be run at various scales of execution, from small environments to distributed agents, and remove surprises when moving from paper to code.
Uber released Neuropod, an open-source deep learning inference engine. This project provides an abstraction layer on top of existing deep learning frameworks (and future ones to come) to run DL models through a uniform interface. It allows users to experiment with models built in different frameworks and run them in a uniform way.
Facebook AI released PyTorch3D, a highly modular and optimized library for making 3D deep learning easier to work on with PyTorch.
DeepMind released dm_control, a collection of Python libraries and task suites for physics-based RL agents tasks in an articulated-body simulation.
Google released a set of TensorFlow resources focused on Responsible AI, which includes fairness indicators in the model analysis tool and a what-if scenario tester.
💰Venture capital financings
Here’s a highlight of the most intriguing financing rounds:
Locus Robotics, which makes autonomous mobile robots for warehouse automation, raised a $40M Series D led by Zebra Ventures. Customers include Ceva, DHL, Boots UK, Geodis, Port Logistics Group, Verst Logistics, and Radial. They cite a “doubling or tripling of fulfillment productivity with near-100 percent accuracy while saving 30 percent or more in operating expenses.”
Hyperscience, an enterprise process automation company, raised a $60M Series C led by Bessemer. The business was an early mover in document digitisation in 2014. They cite dozens of customers spanning government, financial institutions (e.g. TD Ameritrade back office), and insurance.
AImotive, the self-driving technology company based in Hungary, raised a $20M round led by Lead Ventures, which brings its total funding to $75M. The company sells its technology to car companies for use cases that include highway driving, emergency situations to avoid accidents, and automated valet parking. AImotive also offers development and validation tools (including an ISO-certified simulator) and hardware acceleration.
Elementary Robotics, the makers or an industrial part inspection robot and computer vision solution raised a $12.7M Series A led by Threshold Ventures.
Hunters.ai, an autonomous cyber threat detection company, raised a $15M Series A led by M12 and US Venture Partners.
Cape Privacy, the developers of a data science platform for privately sharing encrypted data, came out of stealth with $5M in funding led by boldstart and Version One.
7bridges, which offers an AI-powered logistics optimization product, raised a $3.4M Seed round led by Crane and LocalGlobe.
Owkin, a French federated learning startup that’s focused on medical research, raised $18M from Mubadala and Bpifrance as an extension to their Series A. This brings its total funding to $70M.
M&A and IPOs in June 2020:
Mapillary, the leading crowdsourced street view imaging company, was acquired by Facebook. This was one of my first investments as a venture capitalist in 2014 and I couldn’t be happier for the incredible team at Mapillary on their achievements. Since its launch in 2013, the Mapillary service has grown to host over 1.2 billion images from around the world. Using these images, Mapillary developed state-of-the-art computer vision pipelines to generate map data at scale. This included the Vistas dataset, a popular dataset for training instance-specific object recognition models for self-driving. Mapillary was also an early mover in the distributed-first working culture, which allowed them to recruit the best and the brightest no matter where they wanted to work. This resulted in a small team of 60 individuals across Europe and the US. For an opinion on why Facebook completed this acquisition, check out this blog post and the official post from Jan Erik, CEO.
Lemonade, the data-driven contents and personal liability insurance company, filed to go public in late June and its share price jumped 2x on the first day of trading, raising $320M. The business is close to 5 years old, which bucks the recent trend of private VC-backed companies going public after some 10 years of operations.
---
Signing off,
Nathan Benaich, 5 July 2020
Air Street Capital | Twitter | LinkedIn | RAAIS | London.AI
Air Street Capital is a venture capital firm that invests in AI-first technology and life science companies. We’re a team of experienced investors, engineering leaders, entrepreneurs and AI researchers from the World’s most innovative technology companies and research institutions.