🏄 Your guide to AI: July 2022
Welcome to the latest issue of your guide to AI, an editorialized newsletter covering key developments in AI research, industry, geopolitics and startups. Before we kick off, a couple of news items from us :-)
Spinout.fyi: we’ve crowdsourced the largest global dataset of university spinout deal data to bring transparency to the opaque practice of company creation from academic research. You can now browse the deal terms of 143 spinouts from >70 universities around the world here. Please consider submitting and sharing the database around!
Air Street Capital: after announcing our latest Fund 1 investments including Adept (AGI for your computer workflows), Modern Intelligence (one AI for defense), Gandeeva (AI-first cryoEM) and Athenian (data-enabled engineering), we’re now live making investments from Fund 2.
State of AI Report: with the summer around the corner, we’re about to kick off work on the 2022 edition of our annual report. If you’ve got a killer AI use case or paper coming up soon, do drop us a line to learn about our slide submission process. Register to receive the report by email here or follow us on Twitter here.
RAAIS: We’re back in-person with our non-profit RAAIS one-day summit on Friday 23rd June 2023 in London. As usual, we’ll be hosting a top-tier group of large companies, startups, researchers and students all working on various aspects of AI. You can register your interest here.
As usual, we love hearing what you’re up to and what’s on your mind, just hit reply or forward to your friends :-)
🆕 Technology news, trends and opinions
🌎 The (geo)politics of AI
In January 2021, the US Congress passed the CHIPS for America Act. The Act provides significant financial support for companies willing to build semiconductor fabs in the US. But the budget for the enactment of the law has still not been allocated by Congress. As a consequence, Intel, TSMC and GlobalFoundries have all issued warnings that they would scale back their investments unless the financing is made available. This jeopardizes the construction of a $20B Intel plant in Ohio, a $12B TSMC plant in Phoenix, and a $5B GlobalWafers plant in Texas.
A study from Stanford’s Center on China’s Economy and Institutions showed that Chinese facial recognition AI companies which had access to data-rich public security contracts generated not only more government but also more commercial software products than companies that didn’t. Because having high-quality large datasets is such a differentiator in building AI products, government contracting can give rise to global AI leaders down the road. This is true for Chinese facial recognition AI companies – many of which are computer vision leaders – but could also be the case for companies in other heavily regulated sectors, like the military or healthcare, which build an expertise that is transferable to everyday AI products.
The UK government published a consultation outcome on AI and Intellectual Property. One notable decision was that no change would be made to patent law for AI inventions: “AI is not yet advanced enough to invent without human intervention”. The government made another, more important decision, on text and data mining (TDM). Normally, because it is virtually impossible for TDM software to examine each data owner’s copyright agreement, the UK law (like the EU and the US for example) had already added an exception in 2014 allowing TDM users to ignore the copyright, as long as the data is used for non-commercial research purposes (and the user has lawful access to the data). Now, the UK government is going further: anyone who has lawful access to the data can use TDM for any purpose. Two interesting snippets from this: first, the example used in the press release is a not-so-subtle reference to the Github Copilot license compliance controversy: “Among other uses, data mining can be used when training AI systems. For example, machine-learning software which has been trained on large repositories of computer code is able to intelligently suggest new code to programmers”; second, the language used in the press release suggests that we shouldn’t be expecting a UK AI regulation act any time soon: “This data mining provision will take advantage of the UK’s ability to set its own copyright laws now that we have left the EU and will put the UK’s copyright framework among the most AI and research friendly in the world. It will help make the UK a location of choice for data mining and AI development, supporting the Government’s ambition for the UK to be a global leader in AI innovation and research.”
The MLPerf 2.0 results are out and Dylan Patel’s semianalysis dissected the performance of different AI chips. Intel’s new Habana Gaudi 2 was the fastest on the two models (ResNet50 and BERT) for which they submitted results. They also showed that they were largely superior to other chips when using out of the box software – as opposed to software optimized for MLPerf benchmarks. However, NVIDIA’s 2 year old A100 remained the most flexible as they came out on top or near the top on every task of the benchmark. Notably, most of these chips have only gone through one hardware iteration in 3.5 years, yet on some models are up to 8 times faster than they were, confirming that in the short term, software is the biggest performance driver.
2 new records were shattered in the hardware realm. One in supercomputing: Oak Ridge National Laboratory’s Frontier system, powered by AMD, became the world’s first first supercomputer to break the exaflop per second barrier; another in AI computing: Cerebras set the record for the largest AI model (20B parameters) ever trained on a single device, promising to relieve researchers and companies from parallelization pains.
Amazon announced their fully autonomous warehouse robots. Don’t expect an impressive Boston Dynamics demo video, these robots only lift and transport heavy large carts. But given the scale at which Amazon operates and the trouble the company expects in hiring new workers, this is a big deal. Notably, these robots don’t need to be physically separated from workers in the factory as they were in previous iterations due to safety concerns.
🏭 Big tech
Meta announced it will completely restructure its AI teams. In an effort to better integrate research advances into Meta products, most of the company's AI research will be handed back to the product teams. Meta’s head of AI, Jerome Pesenti, is leaving and will be replaced by Joelle Pineau. As part of its “Metaverse” strategy, FAIR, the company’s core AI research lab, will now be part of Reality Labs Research. As an example of their applied research, Meta AI published some impressive research on AI-driven acoustic synthesis for AR and VR, which aims to match the sound of virtual speakers to their virtual surroundings. The restructuring is certainly a sign that times are hard and that the company needs to further leverage years of AI research leadership. Everyone will undoubtedly keep an eye on the resulting inflow and outflow of talent.
Meta also released OPT, the largest family of open source Language Models (LMs) to date. All the models up to 66B parameters are available to download on their Github repo. The largest, at 175B, is available upon request. For LLMs, we now have: closed source internal models, closed source models with access to inference via API (OpenAI’s GPT, and all LLM as a service startups), freely downloadable models (Eleuther, BLOOM soon), and now hybrid open source freely downloadable up to a critical size.
It has been a busy month for AI for code companies. Github Copilot announced that its AI coding assistant is now generally available to all developers for $10/month or $100/year. Amazon launched its own coding assistant, Code Whisperer, which is now in preview mode. One of their competitors, Tabnine, raised a $15.5M funding round, while Mintlify, which automatically generates documentation from code, raised a $2.8M seed round.
Last month, we covered the raging race for text-to-image models between OpenAI’s DALL·E 2 and Google’s Imagen, but we certainly didn’t expect Google to release another – at least as impressive – model, called Parti. The main fundamental difference is that Parti uses an autoregressive model rather than a diffusion model to transform word embeddings into images. Qualitatively, the most striking difference is that as the number of parameters and training data in Parti models are scaled, the model becomes able to spell words in the images it generates, a feat that none of the other models had achieved.
In the latest framework-related news: for those who want to play around with deep learning models on their laptops, Apple has made available an open-source PyTorch implementation of the Transformer architecture optimized for the Apple Neural Engine. Meanwhile, Google is increasingly replacing Tensorflow with Jax internally, a trend we've been predicting in our State of AI Reports. 🤓
Here’s a selection of impactful work that caught our eye.
Learning to Play Minecraft with Video PreTraining (VPT), OpenAI. OpenAI trained a model to play Minecraft from video frames using a small amount of labeled data. They gathered 2,000 hours of video labeled with mouse and keyboard actions and trained an inverse dynamics model (IDM) to predict actions given past and future frames – this is the PreTraining part. They then used the IDM to label 70 hours of video, on which they trained a model to predict actions given only past video frames. They show that the model can be fine-tuned with imitation learning and reinforcement learning (RL) to achieve a performance which is impossible to achieve with RL from scratch.
No Language Left Behind [NLLB]: Scaling Human-Centered Machine Translation, Meta AI, UC Berkeley, Johns Hopkins University. The NLLB research project aims to create a universal translation model. Researchers open-sourced models that translate directly – not via a third language – between 200 languages. The project is interesting not only because it advances the state of the art in multilingual translation, but also because it makes available large datasets for low-resource languages.
DayDreamer: World Models for Physical Robot Learning, UC Berkeley. One approach to reinforcement learning which has been successful in video games is planning within a learned world model, in which the model learns and acts in a compressed representation of its environment. Dreamer is an algorithm to train models using this approach. Researchers from UC Berkeley apply Dreamer to a quadruped robot without any prior simulations and show that it can learn to walk from scratch without resets in only 1 hour. They also show that it can adapt to humans pushing it within 10 minutes. See for yourself here.
Evolution through Large Models, OpenAI. Researchers show that Large Language Models (LLMs) trained on code can act as an invisible hand guiding the evolution of an AI agent. LLMs for code generation are trained on data containing incremental sequential changes implemented to improve programs. Based on this, and instead of only training an agent to perform a given task, the high-level idea is to also train an LLM to suggest code changes that improve the AI agent’s performance.
Modeling long sequences is difficult and computationally expensive for language models. But research on this problem is advancing fast. In General-purpose, long-context autoregressive modeling with Perceiver AR, researchers from Google and DeepMind extend the Perceiver architecture, which maps inputs to a lower-dimensional (in sequence length) latent space, to autoregressive generation, and enable it to retain strong performance on tasks with inputs of more than 100,000 tokens. A similar feat was achieved by FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness from Stanford and University at Buffalo, where the authors reduce the time complexity of training transformers by “[reducing] the number of memory reads/writes between GPU high bandwidth memory (HBM) and GPU on-chip SRAM”. Finally, GoodBye WaveNet - A Language Model for Raw Audio with Context of 1/2 Million Samples, a working paper from Stanford, fixes the long-range dependency problem for modeling audio signals.
Emergent Abilities of Large Language Models, from DeepMind, examines an interesting phenomenon that has been noticed in training LLMs: models that were completely unable to perform a certain task suddenly achieve very high performance once their size and/or training flops reach a certain level. The concept of Emergence is based on Jacob Steinhard’s blog post, itself inspired by (Physics Nobel Prize winner) Philip Anderson’s essay “More is different”: “Emergence is when quantitative changes in a system result in qualitative changes in behavior.”
Reading our previous paragraphs on LLMs, the 444 authors (from 132 institutions) of Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models would all nod skeptically. Rather than qualitative judgements, they propose a challenging, rightfully named BIG-bench benchmark, consisting of 204 tasks which are designed to be challenging for current language models.
Minerva: Solving Quantitative Reasoning Problems with Language Models, Google. Google trained its (pre-trained) LLM PaLM on an additional 118GB dataset of scientific papers from arXiv and web pages using LaTeX and MathJax. Using chain of thought prompting (adding sentences like “Let’s think step by step” at the beginning of an answer to be completed) and simple techniques like majority voting from another Google paper, Minerva improves the state of the art on most datasets by at least double digit percentage points. For example, on the MATH dataset, it achieved 50.3% accuracy compared to 6.9% for the previous best model. This level of performance was apparently unexpected, as forecasters predicted on average that the best machine-learning model would reach 12.7%.
In A Path Towards Autonomous Machine Intelligence, Yann LeCun outlines his vision for the future of AI. Give it a read if you fancy reading position papers. He posted it on openreview, so every Reviewer 2 of the world will be able to publicly post their unfair comments.
ProGen2: Exploring the Boundaries of Protein Language Models, from Salesforce. LMs’ favorable scaling properties extend to protein sequences: as expected, scaling LMs allows them to better capture the training distribution of protein sequences (as measured by perplexity). Using the largest (6B params) version of ProGen2, the researchers were able to generate proteins with similar folds to natural proteins, but with a substantially different sequence identity. However, although they see some instances of emergent abilities for protein LLMs, the authors caution that for some tasks, smaller models may perform better than their larger counterparts. They insist on the fact that “a bigger and growing emphasis needs to be placed on the data distribution provided to the protein sequence model”.
Genome-wide mapping of somatic mutation rates uncovers drivers of cancer, MIT and Harvard. The search for cancer-driving mutations has tended to focus on areas of the genome that are known to code for proteins. This is in part due to the complexity of modeling highly variable mutation rates that we observe across genomes. To remedy this, the authors used deep learning to estimate cancer-causing mutation rates across the genome for 37 cancer types at a high-resolution. They used this system to identify mutations in previously underexplored genomic contexts.
A hot summer of AlphaFold is upon us as the number of papers that are making us of or citing the protein structure prediction system shoots up. We’ve also seen the successful training of OpenFold, the first open-source PyTorch reproduction of AlphaFold 2 (AF2). The authors found that the system is actually more memory efficient and faster on inference for smaller proteins than AF2.
Funding highlight reel
Ultima Genomics, which promises $100 genome sequencing, came out of stealth with a $600M raise.
Metropolis, which uses computer vision for an automated “drive in drive out” payment system, raised $167M in Series B co-led by 3L Capital and Assembly Ventures.
Shield AI, a defense company making software and hardware for drones, raised $165M at a $2.3B valuation: $90M in Series E led by Snowpoint Ventures and $75M in debt. TechCrunch also reported that Anduril, the largest AI for defense company, could be raising up to $1.2B at a $7B pre-money valuation. Anduril recently published Rebooting The Arsenal of Democracy, a visually impressive case for investing in modern defense technology.
Gloat, an AI-powered job match-maker, raised a $90M Series D led by Generation Investment Management.
Invoca, which analyzes conversations for marketing and sales teams, raised a $83M Series F led by Silver Lake Waterman, valuing the company at $1.1B.
Speechmatics, a B2B startup which translates speech to text in 34 languages, raised a $62M Series B led by Susquehanna Growth Equity. The company spun out of University of Cambridge back in 2006 and was founded by Tony Robinson, who did his PhD on Recurrent Neural Network back in the late 1980s.
Sanas, who develops voice AI models that change accents, raised a $32M Series A led by Insight Partners.
Insilico, an AI drug discovery company, raised a $60M Series D from multiple investors. This follows a $255 Series C from last year.
Charm Therapeutics, which uses deep learning to understand protein ligand co-folding, raised a $50M Series A led by F-Prime Capital and OrbiMed, fueled by the latest advances in machine learning for protein modeling.
Anagenex, an AI-powered small molecule design company, raised a $30M Series A led by Catalio, with participation from Air Street Capital, Lux Capital, Obvious, Menlo and Khosla.
Lightning AI, previously Grid.ai, raised a $40M Series B led by Coatue. Lightning AI provides MLOps software to integrate the whole AI model building pipeline and make it cloud-infrastructure agnostic. The company originated from the open source PyTorch Lightning package created by Lightning AI CEO William Falcon.
Modular, which raised a $30M seed round led by GV, takes the bet that AI systems must avoid being monolithic. The company’s CEO Chris Lattner, who held senior positions at Apple, Tesla, Google, and SiFive, wants to provide ML developers with composable components to build their ML pipelines.
Arkestro, which raised a $26M Series A led by NEA, Construct, and other investors, wants to fix companies’ supply chain troubles by using machine learning and game theory to simulate procurement processes.
Celus raised a $25.6M Series A led by Earlybird Venture Capital. The company uses AI to automate circuit board design.
Papercup, which does AI-powered video localization, raised a $20M Series A led by Octopus Ventures.
Continual, which allows data teams to integrate AI models to companies’ existing data stack, raised a $14.5M Series A led by Innovation Endeavors. Their platform for example allows developers to define AI models declaratively – as one would do with SQL.
As we mentioned in the Big Tech section, Tabnine, which does AI-assisted code generation, raised a $15.5M Series B led by Qualcomm Ventures, OurCrowd and Samsung NEXT Ventures. Mintlify raised a $2.8M Seed round led by Bain Capital Ventures to automate code documentation generation with AI.
Owkin, the federated learning company for medical data, raised $80M (part equity, part upfront payments) as it gears up to enhance drug trials with the pharmaceutical giant Bristol Myers Squibb.
Reddit acquired two AI companies last month: Spell, an MLOps platform founded by a former senior exec at FAIR, and MeaningCloud, a multilingual natural language processing company.
Sonantic, a London-based startup which makes AI-generated voices, was acquired by Spotify. The company had some recent publicity because it generated Val Kilmer’s voice in the Top Gun: Maverick movie.
Reality AI, makers of embedded AI systems, was acquired by Renesas Electronics Corporation, which supplies semiconductor solutions.
Finn AI, a Canadian conversational AI company, was acquired by Glia, a unicorn omnichannel customer service and success platform.
Peltarion, a Swedish enterprise AI platform company, was acquired by mobile game developer King.
THE YES, an AI -powered shopping platform for fashion, was acquired by Pinterest.
VocaliD, a voice synthesis company, was acquired by Veritone, an enterprise AI vendor with a focus on audio and video.
Memomi, a virtual try-on user experience company, was acquired by Walmart. Memoi had been used across more than 2,800 Walmart Vision Centers and 550 Sam’s Clubs for customers buying new glasses.
Othmane Sebbouh and Nathan Benaich, 10 July 2022
Air Street Capital | Twitter | LinkedIn | State of AI Report | RAAIS | London.AI
Air Street Capital is a venture capital firm investing in AI-first technology and life science companies. We invest as early as possible and enjoy iterating through product, market and technology strategy from day 1. Our common goal is to create enduring companies that make a lasting impact on their markets.