🎄Your guide to AI: December 2022
Welcome to the latest issue of your guide to AI, an editorialized newsletter covering key developments in AI research, industry, geopolitics and startups during November 2022. Before we kick off, a couple of news items from us :-)
I joined Daniel Bashir on The Gradient Podcast to discuss AI investing, life science, spinouts, State of AI, geopolitics and compute.
Register for next year’s RAAIS, a full-day event in London that explores research frontiers and real-world applications of AI-first technology at the world’s best companies.
Fortune ran a profile on State of AI Report with a focus on centralisation of AI compute.
My First Million podcast ran a surprise feature of State of AI Report…“dude this is a sick presentation, this is awesome”. They also discuss Adept.ai (Air Street portfolio company”, which they found on “slide 57 of a random slide deck about the State of AI…” :-)
As usual, we love hearing what you’re up to and what’s on your mind, just hit reply or forward to your friends :-)
🆕 AI in Industry
🏥 Life (and) science
Another hot month for AI-guided protein design. Almost simultaneously, Generate Biomedicines (a Boston-based Flagship pioneering company focused on generative biology) and the Baker Lab at the University of Washington, released systems to design novel protein sequences with desired functions. In the era of text-to-image/video models, you can (coarsely) think of these systems as explicitly instructing a model to generate a functional protein with a list of desired structural or functional properties, which include functional structural motifs, symmetry constraints, binding patterns and more. The Baker Lab has created and lab-tested 100s of these artificial proteins, including binders to medically-relevant proteins such as insulin. You can read about Generate’s work here.
Over in drug discovery, Insilico medicine signed a huge deal with Sanofi to the tune of $1.2B in potential payouts for 6 target campaigns. Another Sanofi partner, Exscientia, announced a clinical trial approval for IGNITE-AI, a Phase 1/2 trial of EXS-21546 in patients with advanced solid tumors. According to the company, this molecule was “identified in 9 months after testing 163 compounds and is one of the first AI designed drugs in the industry to enter the clinic.”
🌎 The (geo)politics of AI
With modern machine learning models being computational behemoths, it’s no surprise that we’re hearing about new supercomputers seemingly every week. To make sense of it all, we are launching the State of AI Report Compute Index, where we will be tracking and ranking all public and private AI supercomputers. We’ll also be adding new live data features to this Index in the coming weeks. Ask of you: If you have ideas, questions you’d like answered, or tips on new data, just hit reply!
FTX founder SBF’s demise won’t be without consequences on AI Safety research and companies. According to Bloomberg, a leaked balance sheet showed that SBF and colleagues had invested more than $530M into 70 AI-related companies, labs, individuals, etc., including $500M into Anthropic. FTX’s collapse means many of these organizations fear being associated with fraud, and some are considering not using the invested money. Given the sizable amounts that FTX and SBF were investing into AI safety, it’s unclear, on a longer time horizon, how the field will evolve, and if effective altruists will still have the same reach in the AI community.
In last month’s newsletter, we covered the German government’s backing of the sale of an Elmos wafer fab to Silex, a 100%-owned subsidiary of Sai Microelectronics, a Chinese company. A few days later, the German Federal Cabinet in fact eventually prohibited the sale. In similar fashion, the UK government ordered the Chinese firm Nexperia to sell the Newport Wafer Fab, a UK semiconductor plant it had taken over. European tech sovereignty is in balance here, and the continent must do more to spur its own version of American Dynamism. European Dynamism, anyone?
Cerebras built a supercomputer equipped with 13.5M AI cores that shows near perfect linear scaling for LLMs. That’s more cores than 1,653 NVIDIA A100GPUs and 1.6 times as many cores as the largest supercomputer in the world. But beware, NVIDIA isn’t going anywhere. Cerebras also announced a compute partnership with Jasper, the marketing copy creation software that has traditionally relied upon exclusive licenses to the long-form GPT-3. We shared a hot take that this partnership might suggest Jasper is exploring building its own models on Cerebras’ supercomputer. This hints at a potential playbook that we’ve seen in other software domains: to use a third party API to find product-market-fit, use it to accelerate market dominance, and then go down the stack to vertically integrate away from third party APIs. Jasper’s CEO replied to suggest this wasn’t the case…sort of. Let’s see 😉
NVIDIA and Microsoft seem to be strengthening their ties in AI applications. They announced a collaboration to build a massive AI cloud computer that will reportedly have tens of thousands of Hopper H100s, NVIDIA’s most powerful GPU to date, and A100s. This system will likely be significantly larger than Cerebras’ supercomputer. Microsoft’s part of the deal is providing the Azure cloud infrastructure and the DeepSpeed deep learning software. As an aside, a second NVIDIA-Microsoft deal is a software-software one: Nuance, a medical AI company (which was acquired by Microsoft for $19B in 2021), will use its Nuance Precision Imaging Network together with NVIDIA’s medical imaging framework MONAI in order to own a larger part of the automated medical workflow: from image diagnosis to deriving insights from administrative workflows.
On the autonomous vehicles front: Waymo can now charge for driverless services in San Francisco, Baidu announced it will build the world's largest autonomous ride-hailing service area in 2023, and Tesla opened their Full Self-Driving system to anyone in North America who request it (and jury is out on the performance vs. Autopilot).
🏭 Big tech
OpenAI released ChatGPT, a conversational model trained using the same techniques as InstructGPT, including reinforcement learning from human feedback. ChatGPT goes well beyond chit-chatting and can actually help with tasks such as writing code, conducting mathematical reasoning, retrieving scientific information, etc. It can output smart-sounding answers which are factually wrong. Remember that both Google and Meta had already published “research versions” of their chatbots (LaMDA and BlenderBot 3 respectively). From anecdotal evidence, ChatGPT seems exceptionally good, and has received more appreciation from the AI community than existing chatbots on the market. But it is only now being tested, and the next few days/weeks will undoubtedly reveal some of its flaws, like bypassing its “evil” filter by asking it to output violent or gory content as poetry or code. But it’s better to try it yourself on https://chat.openai.com/.
A few weeks earlier, Meta released — then withdrew — the demo of a tool called Galactica. Galactica is able to answer scientific questions in natural language, referencing scientific works. It was faced with quite a bit of criticism. Opponents of releasing Galactica claimed that it spread misinformation in a perverse way – like chatGPT, the output text is written in a very articulate, scientific way, while occasionally spitting nonsense. Proponents claimed that misinformation was easy to forge anyway, and that the communication channel and having a high reach are what really matters. The opponents seem to have won, as Galactica is now offline. And Yann LeCun isn’t happy about it. In any case, Galactica’s scientific paper, Galactica: A Large Language Model for Science, is actually quite impressive. The model outperforms many LLMs on scientific benchmarks like MATH, PubMedQA, and MedMCQA. This is a second retraction after BlenderBot a few months ago.
Stability AI, OpenAI’s challenger in text-to-image models, and possibly more in the future, announced that it was strengthening its partnership with AWS to both use Amazon’s GPUs and its SageMaker ML platform. OpenAI itself relies on heavy investment and use of Microsoft Azure servers. As a reminder, see our State of AI Report slide on compute partnerships below.
Matthew Butterick, a programmer and lawyer, is suing Microsoft/OpenAI/Github over Github Copilot. Butterick argues that training on code written by other programs is equivalent to piracy. According to the New York Times, this is the first legal attack on AI training. Github’s CEO, Nat Friedman, argues that using open source code for training is “fair use”. But this argument hasn’t been tested in court yet. As modern AI systems are trained on large, mostly non-curated open datasets, this suit will certainly loom large for the future of AI models development.
Human-level play in the game of Diplomacy by combining language models with strategic reasoning, Meta. Meta built a new agent CICERO that is capable of playing an online version of the popular strategy game Diplomacy. CICERO achieved more than double the average score of human players, and ranked in the top 10% of participants who played more than one game. To win at Diplomacy, players need to collaborate in natural language, understand possible bluffs from adversaries, and think strategically over several rounds. Diplomacy is thus much more open-ended than Chess and Go for example, because the information that models needed to ingest aren’t merely positions of pieces, but rather interactions with humans. Technically, CICERO is a 2.7B-parameter model pre-trained on general text from the internet and fine-tuned on 40,000 human games on webDiplomacy.net. At any given turn, CICERO uses an iterative planning algorithm where it predicts every player’s policy for the current turn based on shared dialogues, then improves the predictions by choosing policies that have higher expected value given other players’ predicted policies. You can find an impressive demo here. A remaining issue in CICERO is that it can generate inconsistent dialogue. Another problem that some researchers alluded to are safety concerns around an agent like CICERO. Indeed, to ultimately win the game, any player has to engage in deceptive behavior, where they forge an alliance during most of the game, before eventually turning their back on their allies to win the game. But worry not, in Fine-tuning language models to find agreement among humans with diverse preferences, DeepMind fine-tunes LMs with the sole goal of building consensus among a group of people with politically diverse opinions. The model (the 70B-parameter Chinchilla) generates statements which are preferred over the best human-generated opinions more than 65% of the time. With AI-based recommender systems often blamed for exacerbating political extremism, the development of models like these is a welcome step.
Efficient Scaling of Transformer Inference, Google. Generative modeling is all the rage. Many of its applications have low-latency constraints, which pose a significant challenge in the current parameter scaling paradigm, especially when the inputs to the models are 1000+ tokens. This is due to high memory requirements and the relative difficulty of parallelization during inference compared to training. To mitigate these issues, Google researchers devised a framework to find the best partitioning strategy given the model size. They also make memory optimizations to push the batch size up to the highest possible limit and enable high throughput inference. As a result of these optimizations, they are able to use the 540B-parameter PaLM model on 2048-token context length.
Holistic Evaluation of Language Models, Stanford University. The (infamous?) Center for Research on Foundation Models (CRFM) released a benchmark that aims to evaluate language models on the broadest possible scenarios and use-cases. The article’s starting point is that most language models are evaluated on arbitrary tasks and with arbitrary metrics that are often not shared across models. This offers only a partial view of their performance. For example, “models on average were evaluated on just 17.9% of the core HELM scenarios”. This number is now 96% after CRFM researchers evaluated SOTA language models (GPT-3, GPT-J, GLM, OPT, BLOOM, Anthropic-LM, etc.) on HELM. But where did an academic lab find the resources to evaluate so many models on so many tasks? The answer would be the Together Research Computer. Together Research has contributed compute from 5 institutions: Stanford, ETH Zürich, Open Science Grid, U Wisconsin, and Crusoe Cloud. But the goal isn’t to merely use donated compute, it’s actually to take advantage only of the idle GPUs that might be available at those institutions, and create clever tools that are capable to handle heterogeneous hardware and parallelize compute despite unpredictable availability of GPUs.
AlphaFold predictions: great hypotheses but no match for experiment, Los Alamos National Laboratory, Berkeley, Cambridge, Oxford, Duke. AlphaFold 2 came with the promise of accurate predictions of protein folding. So much so that many scientific articles often use AlphaFold’s predictions as a ground truth in their numerical experiments. Researchers had already warned about potential compounding errors, and this article examines AlphaFold’s predicted structures against experimentally-determined ones and shows that “most differ on a global scale through distortion and domain orientation and on a local scale in backbone and side-chain conformation”, even when AlphaFold is confident in its predictions. AlphaFold, and other open source follow-ups like OpenFold, whose preprint has just been published, should be examined further to understand in which specific cases they fail. This is another reason why it is critical to combine computational with empirical experiments when addressing problems in biology. For example, Gandeeva makes use of both cryogenic electron microscopy and AI to precision-engineer drugs.
VeLO: Training Versatile Learned Optimizers by Scaling Up, Google. This work considers the task of learning optimizer models. That is: instead of using custom optimizers (e.g. SGD, Adam), use a black-box neural network that takes as input gradients and outputs a parameter update. As done in previous works, they use a hierarchical hypernetwork, but introduce several tweaks that make them faster without a hit to performance. Learning an optimizer network requires sampling from model families, training datasets, loss functions, and architectural hyperparameters (in other words, only Google can do it…). Once an optimizer is learned, it still needs to be evaluated, especially on tasks which it hasn’t been trained on (new models, dataset, etc.). The researchers introduce the VeLOdrome benchmark, a set of 83 tasks on which learned optimizers can be benchmarked efficiently. They show that their learned optimizer, VeLO, is “more than 4 times faster than learning rate-tuned Adam on 50% of the tasks”.
Anduril, the world-leader in military AI software and hardware, raised a $1.48B Series E from Valor Equity Partners at an $8.48 billion valuation.
Descript, a video and audio editing company, raised a $50M Series C led by the OpenAI Startup Fund.
Fathom, which works on automated medical coding, raised a $46M Series B raised by Alkeon Capital and Lightspeed Venture Partners.
London-based The Applied AI Company, which built software to automate human processes within the insurance, pharmaceutical, and healthcare sectors, raised a $42M seed round from multiple investors.
Spot AI, which built a vision interface to analyze companies’ CCTV footage, raised a $40M round led by Scale Venture Partners.
V7 Labs, the data engine for AI with a focus on computer vision, raised a $33M Series A led by Radical Ventures and Temasek, with participation from Air Street Capital and others.
ZOE, the personalized nutrition company helping consumers understand what food is good for their body, raised a $30M round led by Accomplice.
Speak, an AI-powered English learning platform, raised a $27M Series B round led by the OpenAI Startup Fund.
Soft Robotics, a company building (food) gripping robots, raised a $26M Series C led by Tyson Ventures.
Taktile, which builds a no-code interface for non-technical bank employees to evaluate their decision-making, raised a $20M Series A led by Index Ventures and Tiger Global.
French startup PhotoRoom raised a $19M Series A led by Balderton Capital. The company offers tools for photo editing, and most famously background removal and editing.
Terzo, a company building an AI-powered contract processing software, raised a $16M Series A led by Align Ventures.
Obrizum, a company that automates corporate training, raised an $11.5M Series A led by Guinness Ventures.
French startup Deepomatic, which develops computer vision tools for photos taken by field workers, raised a $10.5M Series B led by EnBW Ventures and Orbia Ventures.
Harvey, which is building “copilot for lawyers” and is founded by a litigator and a DeepMind researchers, raised a $5M seed round from the OpenAI Startup Fund.
Mem, an automated note-taking app, raised a $23.5M round led by the OpenAI Startup Fund.
Orum, which automates the tedious parts of sales prospecting using AI, raised a $22M Series B led by Tribe Capital.
Deepgram, the API for extracting conversational intelligence from audio, raised a $47M second tranche of its Series B for a total of $72M. This round was led by Alkeon.
Mayo Clinic spinout Anumana, which was formed in partnership with nference to bring AI techniques into electrocardiograms, acquired NeuTrace that built AI software for cardiac electrophysiology.
Maxar, a NYSE-listed provider of space technology and geospatial software solutions, acquired Wovenware to beef up its AI solutions.
Accenture acquired ALBERT, a Japanese data science company.
Nathan Benaich, Othmane Sebbouh, 4 December 2022
Air Street Capital is a venture capital firm investing in AI-first technology and life science companies. We’re an experienced team of investors and founders based in Europe and the US with a shared passion for working with entrepreneurs from the very beginning of their company-building journey.