☃️Your Guide to AI: May-Nov '19, Part 2/2!

Nov 24, 2019

Hello from New York 🇺🇸! I’m Nathan Benaich from Air Street Capital, a venture firm investing in AI-first technology and life science companies. For our new readers, welcome to my AI newsletter!

Welcome to part 2/2 (part 1 here), covering the period of May through November 2019. In this edition, I'll cover research, helpful resources, and startup activity. Grab your beverage of choice ☕ and enjoy the read!

📣Hit reply if you’d like to chat about building AI-first products, new research papers, interesting market opportunities or if you’re considering a career move in the startup world.

🆕 New post from me: AI-first biology. I explain why the AI moment for biology is here and how AI-first biology startups can capture value.

🆕 Portfolio post from Vtrus (autonomous data capture and analysis for construction progress): Online learning of autonomous vehicle dynamics using model-based deep learning.

Referred by a friend? Sign up here. Help share by giving this it a tweet :)

🔬 Research

Here’s a selection of impactful work that caught my eye, grouped in categories:

📝 Natural language processing

Language models as knowledge bases? Facebook and UCL. This paper investigates whether pre-trained language models build up their own relational knowledge bases that can serve as question/answer systems. They find that “without fine-tuning, BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge, (ii) BERT also does remarkably well on open-domain question answering against a supervised baseline, and (iii) certain types of factual knowledge are learned much more readily than others by standard language model pre-training approaches.”

Unsupervised word embeddings capture latent knowledge from materials science literature, UC Berkeley. This paper is pretty neat. It shows how published literature can be efficiently encoded as information-dense word embeddings without human labeling or supervision. The resulting models recover the underlying structure of the periodic table and structure-property relationships in materials. These embeddings can also be used to recommend materials for functional applications several years before their discovery. Taken together, this work shows how NLP on large corpora can learn knowledge that is useful for inference about a complex domain.

Few-shot adversarial learning of realistic neural talking head models, Samsung AI Center and Skolkovo Institute. In this paper, the authors develop personalized talking heads that are learned from a single source image. Quite surprisingly, their system can generate frame-by-frame talking heads of people it has never seen before. This is achieved through extensive pre-training on lots of talking head videos with corresponding facial landmark traces (“meta-learning”). The system is formulated as a generative adversarial network with high-capacity generator and discriminator pre-trained via meta-learning. Check out videos here.

Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges, Google. In this paper, the authors “push the limits of research on multilingual NMT by training a single NMT model on 25+ billion sentence pairs, from 100+ languages to and from English, with 50+ billion parameters. The result is an approach for massively multilingual, massive neural machine translation (M4) that demonstrates large quality improvements on both low- and high-resource languages and can be easily adapted to individual domains/languages, while showing great efficacy on cross-lingual downstream transfer tasks.”

📷 Computer vision

Moving camera, moving people: A deep learning approach to depth prediction, Google. Learning depth from camera and video footage is helpful for augmented reality uses cases and autonomous systems. However, obtaining paired image frame and high-quality depth maps to train deep learning-based depth prediction models is tricky. In this work, the authors make use of an existing source of data for supervision: YouTube videos in which people imitate mannequins by freezing in a wide variety of natural poses, while a hand-held camera tours the scene. Depth maps can be generated using triangulation-based methods because the entire scene is stationary (only the camera is moving).

Learning individual styles of conversational gesture, UC Berkeley, MIT, Zebra Medical. Human speech is not only about the words that are said, but about the hand and arm gestures that accompany the words. In this paper, the authors develop a model that learns to generate plausible gestures that should be associated with audio speech input and synthesizes the resulting video of a person talking.

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks, Google Research. This work uses neural architecture search to develop a family of more efficient and accurate image classification models. They show that carefully balancing network depth, width, and resolution can lead to better performance. In particular, their EfficientNet-B7 achieves state-of-the-art 84.4% top-1 / 97.1% top-5 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet. What’s more, the EfficientNets transfer well onto new image recognition datasets and achieve state-of-the-art accuracy on CIFAR-100 (91.7%), Flowers (98.8%), and 3 other transfer learning datasets, with an order of magnitude fewer parameters. Blog post on their work here.

Stand-alone self-attention in vision models, Google Brain. This paper shows that convolutions in computer vision models can be replaced with self-attention modules that outperform the convolutional baseline for both image classification and object detection while being parameter and compute efficient.

Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis, Tencent AI Lab and ShanghaiTech University. This paper tackles the human motion imitation, appearance transfer, and novel view synthesis within a uniﬁed framework, which means that the model once being trained can be used to handle all these tasks. Quite impressive results!

Large scale adversarial representation learning, DeepMind. This paper shows how GANs can be used for unsupervised representation learning. These generation-based models achieve state of the art results on ImageNet. The reconstructions they learn also tend to emphasize high-level semantics over pixel-level details.

Adversarial video generation on complex datasets, DeepMind. This paper shows how to use large GANs trained on videos of human actions to synthesize new examples.

🤖 Reinforcement learning

Go-Explore: a new approach for hard-exploration problems. Uber AI Labs. This work further develops the idea that curiosity-driven exploration is an important trait the RL agents must have to quickly learn how to problem-solve. Typically, this trait is either smart or scalable, but not both at the same time. Jeff Clune’s group introduces a new algorithm, Go-Explore, beats human performance on Pitfall and Montezuma’s Revenge. It also performs RL agents that are trained by imitating human gameplay. More work is required to generalize this algorithm to a larger class of problems that require better-learned representations.

Human-level performance in 3D multiplayer games with population-based reinforcement learning, DeepMind. In this work, the authors evaluate RL on a 3D first-person, multi-agent video game scenario, Quake III Arena. The challenges involve multiple agents learning and acting independently to cooperate and compete using only pixels and game points scored as input. Moreover, the agents must acquire policies that are robust to the variability of maps, number of players, and choice of teammates and opponents. To do so, the authors’ solution is to train in a population of thousands of agents in parallel that plays with each other in randomly generated environments. Each agent learns its own internal reward signal and rich representation of the world. The authors also delve into analyzing the learned agent behaviors, some of which are similar to human moves and others are unique.

The challenges of real-world reinforcement learning, Google Brain and DeepMind. The authors identify, describe, and provide evaluation metrics for 9 key challenges that lay between today’s RL and systems that can be made production-ready in the real world. Overall, a system of interest rarely has a good simulator, is often stochastic and non-stationary, has strong safety constraints, and is often expensive or slow to run in the real world. These features contrast meaningfully to simulation environments that have availed to be a fertile experimental testbed for RL agents because data is unlimited, there are no serious risks to poor decisions and the system dynamics are clean and often deterministic.

One-shot imitation from video with recurrent comparator networks, ElementAI. This paper introduces a learning system that enables a 3D simulated robot to learn to reproduce a behavior it sees in a single video. Importantly, the system works in 3D and does not need a specialized model to explicitly extract parameters such as joint positions from the video. Instead, the system uses the sequential structure of motion to learn two distance functions between the agent and the observed behavior: one in time and another in space.

Grandmaster level in StarCraft II using multi-agent reinforcement learning, DeepMind. Fresh off their AlphaGo/Zero breakthroughs and applications in protein folding, DeepMind published their latest Nature paper on StarCraft II, a complex, multiplayer strategy game. The authors describe a system that uses neural networks, self-play via reinforcement learning, multi-agent learning, and imitation learning to build a system that achieves the Grandmaster level. The system was unleashed on the internet against players all over the world too.

🧪 Science, bio and more

Accelerating MRI reconstruction via active acquisition, Facebook and NYU. In this work, the authors “reduce reconstruction error and uncertainty by dynamically selecting which measurements are best to observe. The goal of this data-driven active acquisition approach is to simultaneously minimize image reconstruction errors and acquisition time.”

Deep learning enables rapid identification of potent DDR1 kinase inhibitors, InSilico Medicine. There has been a lot of interest in using generative models to produce new chemical matter in a directed fashion. However, few papers then empirically validate their results by synthesizing and assaying said compounds in vitro and in vivo. That’s what this paper did and it made quite a press splash. The authors develop a system (GENTRL) that prioritizes the “synthetic feasibility of a compound, its effectiveness against a given biological target, and how distinct it is from other molecules in the literature and patent space.” They show that GENTRL can rapidly design novel compounds that are active against DDR1 kinase and prove this in vitro. However, the authors say that "despite reasonable microsomal stability and pharmacokinetic properties, the compounds that have been identified here may require further optimization in terms of selectivity, specificity, and other medicinal chemistry properties.” This suggests that GENTRL is not really fully automated/end-to-end because its outputs still require significant optimization. For more critique, check out this blog post.

Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals, UC San Diego. This paper applies graph neural networks to represent chemical molecules and materials. We discussed why such representations are of interest in an earlier edition of this newsletter.

Machine-learning-guided directed evolution for protein engineering, CalTech. This review paper dives into the why and how of using ML to develop and improve the structure and performance of proteins. This is a particularly interesting space because proteins are potent drug-like molecules that can be used in the clinic. Companies like LabGenius are building businesses using these principles. For more on ML and proteins, have a read of this article on using deep learning to predict protein structure from sequence.

Clinical-grade computational pathology using weakly supervised deep learning on whole slide images, Sloan Kettering Cancer Center, Cornell. This paper shows that computer vision models can be trained to accurately classify cancer on histology slides without needing pixel-level labeling of the training set. They use the slide-level diagnosis as a weak supervision signal for multiple instance learning and show as a result that they can make use of much larger training sets.

A clinically applicable approach to continuous prediction of future acute kidney injury, DeepMind. This paper shows that ML can be used to predict 90% of all acute kidney injuries in the clinic that required subsequent administration of dialysis, with a lead time of up to 2 days.

Ab-Initio Solution of the Many-Electron Schrödinger Equation with Deep Neural Networks, Imperial and DeepMind. Thread from David, lead author, here. They developed a new neural network architecture that can represent wavefunctions for systems of fermions - the kind of particles that make up most matter - and show that it is much more accurate than conventional approximate wavefunctions.

Mechanisms of systems memory consolidation during sleep, University of Tübingen. While not about machine learning, this review article explores how long-term memory forms during sleep. This process happens by cycles of “repeated neuronal replay of representations originating from the hippocampus during slow-wave sleep, [which] leads to a gradual transformation and integration of representations in neocortical networks.” Understanding how this process occurs is important for neuroscience-inspired AI efforts, hence it’s mention here :-)

Deep learning-based classification of mesothelioma improves prediction of patient outcome, Owkin. This paper uses convnets operating on histology slides of mesothelioma patients to predict overall survival more accurately than using current pathology practices. The model outlines the areas on the pathology slide that contribute to patient outcome prediction.

Machine Learning for Scent: Learning Generalizable Perceptual Representations of Small Molecules, Google, ASU, Toronto. This paper demonstrates that graph neural networks, a deep learning architecture adapted to representing information from graphical structures such as chemical molecules, can be trained to predict odor descriptors for individual molecules. This means that the network can process a new molecular graph and output a corresponding odor palette (e.g. tropical, dry, juicy).

🤓 Misc et al.

Hearing your touch: A new acoustic side-channel on smartphones, Cambridge. This paper shows how the sound waves created on your phone touch screen while tapping keys can be captured by the microphone and used by a model to infer which keys you’re actually typing. They manage to recover 61% of the 200 4-digit PIN-codes within 20 attempts, even if the model is not trained with the victim’s data.

Learnability can be undecided, University of Waterloo, Princeton University, Tel Aviv University, Technion-IIT. Mathematical proofs help in the understanding of scientific concepts such as learning. It has already been argued that not everything is mathematically provable. In this work, the authors suggest that machine learning shares this fate.

Meta-learning surrogate models for sequential decision making, DeepMind. This paper introduces a probabilistic framework model-based approach that explains observed data while capturing predictive uncertainty during the decision making process. They show how it can be applied in robotic manipulation tasks and RL.

📑 Resources

Blog posts

Ravelin’s CTO shares a template for running efficient machine learning infrastructure and teams. It’s an excellent read.

A set of illustrated cheat sheets for machine learning concepts.

A “latest in NLP” reading list, updated to May 2019.

Blog post explaining how the Transformer and derived NLP models work.

How to make encrypted deep learning fast for complex models working on medical image datasets.

Evolving the Transformer with neural architecture search.

An AI reading list for newcomers.

Automating software development with deep learning. Blog post and talk.

Training costs for state of the art language models.

3 pitfalls to avoid in ML: splitting data inappropriately, hidden variables you didn’t expect to matter, and mistaking the objective to optimize.

Facebook’s progress on NLP and NLU tasks using Transformers.

NVIDIA shows they are able to train an 8.3B parameter Transformer model for NLP tasks. They show that training can scale linearly with GPU resources and that the models with more parameters perform better on NLP tasks than those with fewer parameters.

On the other end of the model size spectrum, Hugging Face demonstrates a significantly distilled version of BERT that achieves 97% of the regular model’s performance.

NeurIPS paper acceptances are out, and the data shows that Google/DeepMind is again in pole position with almost 2x more accepted papers than #2, Stanford. However, 2/3 authors with the most number of accepted papers are Berkeley professors (Levine and Abbeel). Google/Brain also tops the leaderboard at ICML.

A 100-slide deck about 5G went viral in China and was translated by Jeff Ding and collaborators at ChinAI.

A two-part (here and here) post on the evolution of intelligence in robots.

Alex LeBrun, now CEO of Nabla, shared the pitch deck for his earlier bot building startup, VirtuOz from 2008 that raised $12M.

Kaggle released its 2019 State of ML and Data Science survey that gleans statistics from 19,717 Kaggle members. Their answers covered demographics, education, employment, and technology usage.

Videos/lectures

Andrew Zisserman of DeepMind on self-supervised learning.

Molecular Transformer for Chemical Reaction Prediction and Uncertainty Estimation.

An intro to OpenMined on Udacity

Datasets

The Replica Dataset: A digital replica of indoor spaces. 18 highly photo-realistic 3D indoor scene reconstructions at room and building scale.

Bayesian DL benchmarks, a framework to bridge the design of deep probabilistic ML models and their application to real-world problems.

Mapillary Traffic Sign Dataset, the world’s most diverse dataset of traffic sign annotations.

The KnowRef Coreference Corpus: a resource for training and evaluating common sense in AI using natural language.

Waymo released an Open Dataset, which comprises Lidar and camera data.

Predicting molecular properties - a Kaggle competition to measure the magnetic interactions between a pair of atoms.

Open source tools

QuantumBlack presents a protocol of checks for risks at each step of the ML model creation lifecycle. The protocol manages risks associated with fairness, explainability, and model performance while models are in development. It includes a library with over 100 risks embedded within a model development protocol, which organizes the process of building machine learning models into high level ‘activities’ and more detailed ‘tasks’.

Deep TabNine is an autocomplete service for code editors. It is trained on around 2 million files from GitHub where the goal is to predict each token given the tokens that came before. The model is based on GPT-2, which is itself based on the Transformer. The service supports multiple languages but is computationally expensive so there are latency issues at the moment.

Causal Bayesian networks - a visual framework to formalize, measure, and deal with different unfairness scenarios underlying a dataset.

PyTorch is growing in popularity to overtake TensorFlow in research papers.

Netflix open sourced Polynote, a notebook designed to be used with Scala first.

💰 Venture capital financings and exits

Here’s a highlight of the most intriguing financing rounds:

Babylon Health, a London-based provider of on-demand primary healthcare consultations, raised a $550M Series C at a valuation of over $2B. The company claims that its service covers 4.3 million patients worldwide. In the UK, consumers can switch over their NHS primary care physician from a brick and mortar practice over to Babylon. The company also offers a private health solution in partnership with Bupa. The business employs over 1,500 staff with a heavy investment in AI, particularly NLP and causal reasoning.

DataRobot, a Boston-based enterprise software company focused on end-to-end ML, raised a $200M Series E led by Sapphire Ventures. This values the business above $1B.

Scale, the SF-based provider of data labeling services for users training ML models, raised a $100M Series C at a valuation over $1B, led by Founders Fund. While the majority of their revenue-generating business allegedly comes from annotating data for self-driving companies, the business mentioned growing into the AI-first infrastructure market. This is interesting because tooling choices are difficult today, i.e. one can choose open-source, startup vendors, or large cloud vendors. More on the challenges of 3 distinct business personals in ML here.

Anduril, the SF-based AI-first military defense contractor, raised a Series B round valuing the company over $1B. Existing investor, General Catalyst, explains their motives in a great blog post.

LabGenius, the London-based AI-first therapeutic protein development company, raised a $10M Series A led by Lux Capital and Obvious Ventures, with participation from Air Street Capital (my firm). Nan at Obvious articulates the why and how the business fits an overall storyline for the new biology.

Databricks, the data analytics company originally founded to commercialize Apache Spark, raised a $400M Series F led by a16z’s late-stage fund. The company claims to make $200M ARR, having grown 2.5x YoY and from a $0 base 4 years ago.

Starship, the London-based ground delivery robot company, raised a $40M round as it crossed 100k completed deliveries on campuses in the US.

Element AI, the Canadian AI startup, raised a $150M Series B led by the Quebec pension plan, McKinsey and the government of Quebec. An interesting evolution for the business, which in a way signals that governments are leaning in to support local AI software players.

dotData, a no-code platform that leverages machine learning to clean, normalize, aggregate, and combine data sets while performing feature engineering, raised a $23M Series A led by JAFCO and Goldman Sachs. The business competes with the likes of H20.ai and DataRobot.

Voyage, a self-driving car operator focused on private communities, raised a $31M Series B led by Franklin Templeton, the publicly traded investment manager.

Einride, a Swedish full-stack autonomous electric trucking company, raised a $25M round led by EQT Ventures. The company develops its own “pods” that are sold to logistics companies and retailers.

Citrine, an AI-first materials discovery company, raised a $20M Series B led by Prelude Ventures and Innovation Endeavors. The article cites that Citrine helps chemical companies “hit overall R&D milestones in 50-70% percent of the time originally forecast and has enabled totally new materials product lines.”

A couple of M&A deals and an IPO, including:

CrowdStrike, a VC-backed AI-first cybersecurity company focused on endpoint threat detection and response, went public to raise $610M on the Nasdaq. The company went public on $250M ARR, 140% net dollar retention and 108% annual revenue growth (more metrics), which contributed to driving its share price strongly upwards in the first six months of trading. The stock has since taken a hit as analysts report its valuation as being too rich.

Fabula AI, a London-based team of researchers using graph deep learning to detect network manipulation, was acquired by Twitter for an undisclosed price. Twitter’s interest rests in the use of ML on complex datasets describing relations and interactions between entities (or pieces of content). The company’s CEO, Michael Bronstein is currently the Chair in Machine Learning and Pattern Recognition at Imperial College London and will remain in that position while leading graph deep learning research at Twitter.

Thoughtonomy, a UK-based developer of the “Virtual Workforce” enterprise SaaS automation platform, was acquired by Blue Prism (LSE: PRSM) for up to £100M, based on milestones. The company’s platform is said to “address activities that require understanding or interpretation, and so it expands the use case for RPA beyond structured processes”.

Mighty AI, a labeling startup focused on computer vision, was sold to Uber. Its 40 employees will join Uber in Seattle and the product is shut down for its customers.

Looker Data Sciences, a popular data analytics company, was acquired for $2.6B by Google. The deal is now under review by the Justice Department to determine if the tie-up harms competition. An anecdote: Transferwise has 1,600 employees and 71% are monthly active users of Looker. Impressive penetration.

Scotty, one of several remote teleoperated vehicle companies, was acquired by DoorDash for an undisclosed price. Scotty had recently raised $6M from Gradient. DoorDash plans to use the technology as a fallback for its robotic delivery service.

6 River Systems, a robotics for logistics company founded by veterans of Kiva (acq. Amazon), was acquired by Shopify for $450M after the latter announced its intent to develop a fulfillment network to allow its customers to compete with Amazon. This transaction is interesting because it shows the full-stack ambitions of Shopify and the need for robotics to provide operational scale.

Apprente, an early-stage NLP/dialogue startup in SF, was acquired by McDonald’s in the latest edition of “incumbent acquires AI startup to set up applied AI lab (based in the Valley)”. The price was undisclosed. This follows McDonald’s >$300M acquisition of Dynamic Yield earlier this year and falls under their agenda of creating an improved drive-thru experience. Apprente’s CEO, Itamar Arel, is now VP of the McD Technology Labs. He was previously CEO of Osaro, a robotic pick, and place startup, before leaving to start Apprente.

DeepScale, an AI startup founded by the original author of SqueezeNet, was acquired by Tesla for an undisclosed sum to join the AutoPilot team. I bet the team is likely to work on embedding small, efficient neural networks onto Tesla’s new hardware stack.

---

Signing off,

Nathan Benaich, 24 November 2020

Air Street Capital | Twitter | LinkedIn | RAAIS | London.AI

Air Street Capital is a venture capital firm that invests in AI-first technology and life science companies. We’re a team of experienced investors, engineering leaders, entrepreneurs and AI researchers from the World’s most innovative technology companies and research institutions.

Guide to AI