👋 Your guide to AI: June 2022 (we're back!)
Welcome to the latest issue of your guide to AI, an editorialized newsletter covering key developments in AI research, industry, geopolitics and startups. The last issue was back in October 2021 when we launched the State of AI Report. So in this one we’ll do our best to compress a couple months worth of content into a palatable embedding for you to decode over a coffee. Before we kick off, a couple of news items from us :-)
Spinout.fyi: we’ve crowdsourced the largest global dataset of university spinout deal data to bring transparency to the opaque practice of company creation from academic research. We’ll be publishing this dataset and an accompanying analysis openly very shortly. You can register to receive the data by email here or follow us on Twitter here.
Air Street Capital: after announcing our latest Fund 1 investments including Adept (AGI for your computer workflows), Modern Intelligence (one AI for defense), Gandeeva (AI-first cryoEM) and Athenian (data-enabled engineering), we’re now live making investments from Fund 2.
State of AI Report: with the summer around the corner, we’re about to kick off work on the 2022 edition of our annual report. If you’ve got a killer AI use case or paper coming up soon, do drop us a line to learn about our slide submission process. Register to receive the report by email here or follow us on Twitter here.
RAAIS: We’re also back in-person with our non-profit RAAIS one-day summit on Friday 23rd June 2023 in London. As usual, we’ll be hosting a top-tier group of large companies, startups, researchers and students all working on various aspects of AI. You can register your interest here.
As usual, we love hearing what you’re up to and what’s on your mind, just hit reply or forward to your friends :-)
🆕 Technology news, trends and opinions
🏥 Life (and) science
The public biotech industry, alongside their technology peers, has suffered a sizable erosion of enterprise value as part of the post-covid stock market melt down in 2022. Some 128 biotechs trade at valuations below their cash at hand, suggesting in part that investors have little confidence in the likelihood of their (pre)clinical asset success. This might in fact be true. Consider that most public biotech companies of yesterday are single-asset companies predicated on years of academic science, meaning that they are generally created to exploit one drug (family) against a set of disease indications. Note too that the vast majority of academic science is irreproducible. Thus, single asset biotechs are vulnerable herbivores searching for safety in a wide open savannah of hungry risk-off hunting lions.
By contrast, a new generation of AI-first biotechnology companies led by Exscientia and Recursion in public markets is flipping this single asset biotech model on its head. These companies systematically apply software engineering, automation, and machine learning to problems along the drug discovery and development value chain. This approach means more reliable science and, crucially, a “platform” for discovery that churns out a multitude of diverse drug assets that can be pointed at a range of disease indications. With biotechs' value based on the likelihood of success of their drug programs, AI-first biotechnology companies should be both more resilient and more valuable in the long term thanks to their many shots on goal. Both Recursion and Exscientia published their Q1 2022 financial results, which include the dosing of multiple Phase 2 clinical trials and a huge partnership with Roche/Genentech for Recursion, and a large partnership with Sanofi for Exscientia based on technology it acquired from Allcyte. With more research groups and companies making use of emerging tools such as AlphaFold (some 400k researchers have accessed the EMBL AlphaFold database and 100 papers mentioning AlphaFold now published per month), we can expect even more shots on goal to be taken per biotech company.
Over in clinical medicine, the first autonomous AI-first chest X-ray software was granted CE Class IIb certification in Europe. Its developer is Lithuanian startup Oxipit. The product processes chest X-rays, flags normal scans and sends abnormal ones for human review. The result should be a large alleviation of primary care screening burden and, next up, a “spell check” for human-reviewed cases.
In cardiology, the Mayo Clinic demonstrated how an AI system running on an Apple Watch ECG could detect weak heart pump. It’s impressive that a wrist-worn consumer device could be used instead of today’s echocardiogram, CT scan or MRI, which are all expensive and heavy-duty. What’s more, Johns Hopkins developed a computer vision system to predict cardiac arrests up to 10 years ahead of time using scar tissue distribution from contrast-enhanced cardiac images.
🌎 The (geo)politics of AI
Clearview AI came under fire in Italy, the UK, and the US in the last two months because it had scraped billions of public photos without the consent, or knowledge, of the individuals involved. Italy fined the company €20M for breaches to EU law, including GDPR. The UK did so with over £7.5M for similar violations. In the US, Clearview was forced to settle a 2020 lawsuit accusing the company of violating BIPA, an Illinois privacy law that had previously bitten Meta. As a result of this settlement, Clearview is required to stop selling its database to most US companies. A few days ago, the company announced that it was expanding the sale of its facial recognition software beyond police, to US companies, through a new “consent-based” product.
It’s not clear yet if Russia or Ukraine have used AI-enabled weapons in Ukraine. But it’s interesting to note that more AI-enabled tools that are peripheral to the military have proved useful to Ukraine in wartime: US-based startup Primer helped Ukrainian forces process unencrypted Russian radio transmissions, and Clearview AI allowed them to identify Russian soldiers using facial recognition. During large scale war efforts, factories – and indeed the entire industrial sector – used to be repurposed to serve the military. In comparison with the large capital expenditures involved at the time, the repurposing of software today is almost frictionless. Seemingly inoffensive software today could prove decisive in future wars. We hope we’ll never know for sure.
As the world woke up to the importance of the semiconductor industry, TSMC’s home country, Taiwan, made economic espionage a crime punishable by up to 12 years of prison. A very interesting thread here on the timeline leading to this decision.
From the insightful ChinAI newsletter translations: Microsoft Research Asia has stopped recruiting interns from the Seven Sons of National Defense, a group of Chinese universities that work closely with the Chinese military.
Spinning out of university is a favored route for AI founders in the UK: spinouts represent 4.3% of all AI companies, while only 0.03% of all UK companies are spinouts. A UK government report on AI commercisalisation sheds light on the importance of improving the country’s spinout policy in order to fully realize its universities’ potential. We published a thread with highlights of this report, and another one on recently published Beauhurst data on UK spinouts.
Faced with increasing competition from specialized AI chip manufacturers, incumbents Intel and NVIDIA are quickly stepping up their AI game. Intel announced a new AI chip, Habana Gaudi 2, built by Habana Labs, a company acquired in 2019. Intel says it is twice more performant than NVIDIA’s ubiquitous A100s. In the meantime, NVIDIA began selling the H100 "Hopper" processor, an AI processor that will compete with Google’s TPU v4 on large scale, memory hungry, machine learning models training. Meanwhile, word on the street is that Google’s TPU effort is essentially disbanded and run by a skeleton team…
🏭 Big tech
In the past few months, big tech companies have shown willingness to invest financially and scientifically in the fight against climate change. Meta and Alphabet have joined a $1B fund launched by Stripe called Frontier. Meta also said it was using an AI-designed concrete in its data centers that emits 40% less carbon than other concrete mixtures. Some have suggested that the baseline they compared against is weak, but this is another useful illustration of the use of AI in materials discovery, a flourishing application field that we’re excited about. Meta has been very active in this space, for example through the Open Catalyst Project, which aims to discover new catalysts to help large scale production of green hydrogen.
But what about the climate impact of the models which are trained for these discoveries? Google identified 4 practices to reduce the carbon footprint of ML models: using efficient model architectures such as sparse models, using hardware which is optimized for ML, using cloud computing rather than on-premises, and training models in “green” datacenters. Google says these practices allow them to reduce energy consumption by 100x and emissions by 1,000x.
Meta AI became the first large AI lab to (almost) fully outsource its large language model (LLM), called OPT-175B. Both the pretrained model (all 175B parameters) and the source code used for training will be made available upon request. Notably, the model was trained on publicly available datasets. Another similar initiative is currently underway as part of the Big science project. The project aims to train an LLM and make it available to the ML community. The project is led by Hugging Face among others but is organized as an open workshop. It’s so open you can even follow the training logs of the model here. The next step: multimodal models. As a set up to achieve this, LAION, a non-profit organization advocating for open research in AI, published LAION-5B, the largest publicly available image-text dataset.
The battle for the best image generation model is raging. Open AI released DALL-E 2, the second version of its previously successful GPT-3-based DALL-E. But DALL-E is no longer the only game in town. Google announced Imagen, its own model generating photorealistic and artistic images. Beyond their differences, what is interesting – and new – about these models is that they are both based on diffusion models, new(ish) models which have been the state of the art in generative modeling for the past two years or so. Check out a few samples from both models, and more, here. Another notable text-to-image generation model is Tsinghua’s CogView2, which supports both English and Chinese.
All but one author from Google’s celebrated Attention is All You Need paper – which introduced the Transformer architecture – have left the company. The latest to leave, research scientists Ashish Vaswani and Niki Parmar, did so to launch Adept alongside David Luan from OpenAI, a company that promises to automate the way individuals and businesses interact with computers.
Google was criticized again for the firing of a researcher, an article of NYT reported. The article implied that the researcher was fired because he had criticized the work of other Google and Stanford researchers. While this might at first be reminiscent of the firing of Timnit Gebru and Margaret Mitchell, this time it was different. Several academics agreed with the fact that the researcher “had waged a years-long campaign to harass & undermine [the] work” of the first authors of the work in question. Dr. Gebru herself expressed her discontent over the parallel drawn between her and the fired researcher in the NYT article.
Meta and DeepMind made two interesting steps towards making AI algorithms more general.
A Generalist Agent, DeepMind. Large Language Models have shown an impressive ability to transfer across tasks – without the need for specific training on new tasks. Gato is an attempt to bring this capability to multimodal models. The authors trained a model on a large range of tasks, from language modeling to image captioning, to simulated control tasks and robotics. “The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.” The current model was intentionally kept relatively small by today’s standards (1B parameters) to enable it to run in real time in robotics tasks.
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language, Meta AI, SambaNova. This article introduces a framework that uses the same learning method for computer vision, speech and NLP. “Instead of predicting modality-specific targets such as words, visual tokens or units of human speech which are local in nature, data2vec predicts contextualized latent representations that contain information from the entire input.”
Multimodal learning is still all the rage in AI research:
Flamingo: visual language model that establishes new SOTA in few-shot learning in multimodal tasks, DeepMind. DeepMind had arguably been late to the LLM and Visual Language Model (VLM) party, but it has been gaining ground lately. First with LLMs Gopher and Chinchilla (which we’ll talk about shortly), and now with Flamingo, a VLM. Compared to previous models Flamingo can notably process interleaved visual and textual information. On many tasks, “Flamingo actually surpasses the performance of models that are fine-tuned on thousands of times more task-specific data.”
CoCa: Contrastive Captioners are Image-Text Foundation Models Google. Time for an ImageNet SOTA check, and the winner is… CoCa, a multimodal model. Google used contrastive learning to achieve a 91% top-1 accuracy on ImageNet. They use contrastive learning with classical encoder-decoder transformers, but with a twist: “CoCa omits cross-attention in the first half of decoder layers to encode unimodal text representations, and cascades the remaining decoder layers which cross-attend to the image encoder for multimodal image-text representations.”
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances, Google Robotics. LLMs + Robots. Thanks to their large range of capabilities, LLMs could in principle enable robots to perform any task by explaining its steps in natural language. But LLMs have little contextual knowledge of the robot’s environment and its capabilities, so that LLMs explanations are generally infeasible for the robot. This paper, and online demos, show how to fix this.
We saw earlier that diffusion models were the basis of recent advances in image generation from text. Text-to-video, which is orders of magnitudes harder, could also benefit from improvements in diffusion models. Two recent papers make attempts at diffusion model-based text-to-video: Video Diffusion Models, from Google, generates up to 64 new frames (samples here). Flexible Diffusion Modeling of Long Videos, from University of British Columbia, aims for longer duration videos of up to 25 minutes (samples here).
Training Compute-Optimal Large Language Models, DeepMind. The authors studied scaling laws of LLMs. They find that current LLMs are significantly under trained in the sense that researchers focus on increasing the number of parameters while keeping the training data constant. Instead, they say that for every doubling of model size, the number of training tokens should also be doubled. Following their own advice, they train Chinchilla, a 70B parameter model which outperformed its larger 270B version Gopher. Chinchilla used 4 times more data, but with the same computational budget as Gopher.
On the same subject, Google’s PaLM model is an additional illustration of the fact that scaling leads to breakthrough performance. These are described as cases where a model that was failing at a certain task suddenly succeeds when its scale becomes large enough. The authors say that PaLM’s performance on several tasks didn’t reach a plateau, suggesting further scaling can yield additional benefits. Perhaps the most surprising use of PaLM was chain of thought prompting, where Google researchers showed that the performance of LLMs can be vastly improved simply by using logical assertions in prompts to the model. For example, adding “Let’s think step by step” to a prompt would make the model’s output more structured and more likely to be correct.
Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning Google. In transfer learning, two approaches for finetuning are 1) freezing the whole network and retraining only its last layer and 2) finetuning the whole network. Instead, only a subset of weights coming from all layers are finetuned. This allows the authors to reduce training and storage costs hundred folds. Their model even performs better than fine tuning on out-of-distribution transfer tasks.
Online Decision Transformer FAIR, Berkeley, UCLA. Decision Transformers reformulate offline Reinforcement Learning (RL) as a sequence modeling task. This article extends Decision Transformers to online RL.
All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL (UDRL), ARAYA, Imperial College London, IDSIA, USI, SUPSI, KAUST, NNAISENSE. Here it’s the other way around. Instead of using actions as inputs and predicting returns, UDRL uses returns as inputs and predicts actions. This was shown to work well in Offline RL. This article extends UDRL to Online RL.
It’s Raw! Audio Generation with State-Space Models Stanford, Christopher Ré’s group. The authors introduce SaShiMi, an architecture for waveform modeling built around their own S4 model for long sequence modeling. The model reaches a twice better mean opinion score than WaveNet on an unconditional speech generation task and, on a music generation task, it outperforms WaveNet on density estimation and speed at both training and inference using 3 times fewer parameters.
Making federated learning faster and more scalable: A new asynchronous method, Meta. Federated learning can train large models based on data and models sitting in millions of devices without requiring data or models to be centralized at any particular location. This has important privacy advantages, but also has a significant drawback: training can be only as fast as the slowest device. To remedy this, Meta proposed an asynchronous approach to Federated Learning, which results in a system that is five times faster than synchronous federated learning.
Block-NeRF: Scalable Large Scene Neural View SynthesisBlock-NeRF: Scalable Large Scene Neural View Synthesis. Berkeley, Waymo, Google. NeRF variants are the state of the art models for novel view synthesis. This work brings their capabilities to large scale environments. The authors developed a new variant called Block-NeRF built to render city-scale scenes. One important challenge they overcame was rendering views based on data taken in different environmental conditions. They were able to render an entire neighborhood of San Francisco, and the results are indeed extremely impressive.
Learning inverse folding from millions of predicted structures, Berkeley, FAIR. Starting from a sequence, AlphaFold2 predicted how a protein would fold. Inverse folding does the inverse: from a protein structure, the task is to determine a sequence that will fold into it. A major limitation in this task is that available data is limited. To remedy this, the authors used AlphaFold2 on 12M protein sequences to augment their dataset. Using this additional data and a sequence-to-sequence transformer, they outperformed previous approaches by almost 10 percentage points. A nice explanatory thread can be found here.
Equivariant Diffusion for Molecule Generation in 3D University of Amsterdam, EPFL. Another generative modeling task, another application of diffusion models. The authors introduce an “E(3) equivariant diffusion model which learns to denoise a diffusion process with an equivariant network that jointly operates on both continuous (atom coordinates) and categorical features (atom types).” Compared to other methods, their model produces better samples, is more efficient at training time and scales better to molecules with a large number of atoms.
Funding highlight reel
These past few months have seen a massive wave of funding into giant AI model startups, largely focused on natural language processing.
Anthropic, the OpenAI spinout pursuing safe general intelligence, raised a $580M Series B led by Sam Bankman-Fried, CEO/Founder of the crypto exchange FTX amongst others from the Effective Altruism community.
Inflection AI, the AI startup building conversational agents for human-computer interaction founded by DeepMind co-founder Mustafa Suleyman and Karén Simonyan, raised a $225M maiden round led by Greylock.
Hugging Face, the community-first company building open and collaborative AI, raised a $100M Series C at $2B valuation led by Lux Capital. The platform now hosts over 100k pre-trained models and 10k datasets across NLP (where the company started), computer vision, reinforcement learning, science and more. The company says that over 10k businesses use HF today.
Cohere, the Canadian startup building developer APIs for LLMs, raised a $125M Series B led by Tiger Global. Shortly after, the company announced the launch of its London office with the hiring of both Phil Blunsom (DeepMind) and Ed Grefenstette (Meta AI), two well-known NLP researchers.
Modern Intelligence, makers of a platform independent AI for government and defense, raised a $5M Seed led by Bedrock, alongside Air Street Capital and Vine Ventures. The company’s first product, Cutlass, is a maritime surveillance system that enables one operator to see and interact with the entire battlefield.
Gandeeva Therapeutics, the AI-first cryogenic electron microscopy company focused on drug discovery, raised a $40M maiden round led by Lux Capital, with participation from Air Street Capital, Obvious Ventures, Amgen Ventures, and Amplitude. It was Gandeeva that rapidly generated the 3D crystal structure of the Omnicron covid variant.
Wayve, pioneers of end-to-end deep learning for autonomous driving, raised a $200M Series B led by Eclipse. The company is launching their product as a delivery service with Asda and Ocado in the UK.
Thought Machine, the cloud-native core banking technology company, raised a $160M Series D led by Temasek at a $2.7B valuation.
InstaDeep, the enterprise decision AI company, raised a $100M Series B led by Alpha Intelligence Capital. The company recently formed a collaboration with BioNTech to apply AI to the development of mRNA vaccines and an early-warning system to flag future variants of concern. Note we predicted that AI-first mRNA vaccines would be fertile ground in the Feb 2021 issue of this newsletter :-)
Valence Discovery, the AI-first drug design company focused on small molecules, raised an $8.5M Seed co-led by Air Street Capital and Fifty Years.
Agility Robotics, makers of a humanoid walking warehouse robot, raised a $150M Series B led by DCVC and Playground Global.
Viz.ai, the AI-first medical imaging company focused on stroke detection in the emergency room, raised a $100M Series D led by Tiger Global and Insight Partners at a $1.2B valuation. This is quite a landmark growth round for the field that follows the company’s approval for reimbursement by Medicare and Medicaid.
RelationalAI, RelationalAI, a startup that wants to change the way data-driven applications are built by combining a database system with a knowledge graph, raised a $75M Series B led by Tiger Global.
Adept, an SF-based AGI company built by the inventors of the Transformer, the former VP Engineering of OpenAI and a roster of scaling, program synthesis and large language model experts from Google Research and DeepMind, raised a $65M maiden financing round led by Greylock and Addition, alongside Air Street Capital and angels from Tesla, Airtable, Uber and more. The company is building giant models to complete computer-based workflow tasks for knowledge workers. You can think of it as an AI-first approach to UiPath.
Built Robotics, the autonomous building robotics company, raised another $64M led by Tiger Global. The company provides aftermarket solutions to bring autonomy to construction site mobile robots.
Athenian, the data-enabled engineering company, raised a $6M Seed led by Point Nine alongside Air Street Capital.
Predibase, a high-performance low-code AI software built on the Ludwig and Horovod projects from Uber AI Labs, raised a $16M Series A led by Greylock.
Mashgin, a touchless checkout system powered by computer vision, raised a $62.5M Series B led by NEA at a $1.5B valuation.
Lilt, the machine translation company raised a $55M Series C led by Four Rivers.
Hour One, a synthetic video startup, raised a $20M Series A led by Insight Partners.
Kinetix, a no-code AI-first software for creating 3D animations to use in the metaverse, raised a $11M round.
There was a recent flurry of IPOs for Chinese AI startups on local exchanges. This included:
Deep Glint, a facial recognition company serving government and public security, went public on the Shanghai STAR Market at a $1B valuation.
Cloud Walk, an enterprise AI solution for the Chinese market, went public on the Shanghai STAR Market at a $2.4B valuation.
SenseTime, the facial recognition and computer vision company, completed an IPO on the Hong Kong exchange raising $740M just before the New Year.
Over in the US and Europe, there were a number of smaller exits and one SPAC:
SoundHound, makers of voice enterprise AI services, went public via a SPAC at a $2.1B valuation.
Trifacta, an enterprise data engineering platform, was acquired by Alteryx for $400M. The combination will aim to accelerate the digital data transformation of large enterprises.
Testim, an Israeli software test automation solution, was acquired by Tricentis for $150M.
Kryon Systems, an Israeli RPA software company, was acquired by Nintex for $100M.
Scortex, a French visual part inspection company for manufacturing, was acquired by TRIGO Group.
Nathan Benaich and Othmane Sebbouh, 5 June 2022
Air Street Capital is a venture capital firm investing in AI-first technology and life science companies. We’re an experienced team of investors and founders based in Europe and the US with a shared passion for working with entrepreneurs from the very beginning of their company-building journey.