🎇 Your guide to AI: January 2023
Welcome to the latest issue of your guide to AI, an editorialized newsletter covering key developments in AI research, industry, geopolitics and startups during December 2022. Before we kick off, a couple of news items from us :-)
Check out our 2022 Year in Review at Air Street Capital, State of AI, Spinout.fyi and our other work streams :-)
I joined Jakub from Zeta Alpha for a conversation on what’s hot in AI today, including State of AI Report 2022 predictions and our Compute Index new collaboration.
Why are investors getting so excited about generative AI? I shared my take with Sifted.
Register for next year’s RAAIS, a full-day event in London that explores research frontiers and real-world applications of AI-first technology at the world’s best companies.
As usual, we love hearing what you’re up to and what’s on your mind, just hit reply or forward to your friends :-)
🏥 Life (and) science
Generative AI has captured the minds of many in text and image generation, prompting (pun intended) a deluge of articles in the media, more VC market landscapes and “our (retrofitted) generative AI portfolio” than we can count. But if this can eventually shed more light on important second-order applications of generative modeling such as protein design, we’ll take it. Indeed, generative protein design seemed to garner a lot of attention. Last month, we profiled Boston-based Generate Biomedicines’ Chroma, a diffusion-based generative model that jointly models the 3D structures and sequences of full protein complexes while enabling sampling with diverse requirements without retraining, all with a complexity that scales near-linearly with the system size.
We also flagged another diffusion model for protein design called RoseTTAFold Diffusion, which simultaneously came out of the Baker Lab at University of Washington and other academic institutions. The model is built by fine-tuning RoseTTAFold, a protein structure prediction model (think AlphaFold 2) on denoising. The resulting model “achieves outstanding performance on unconditional and topology-constrained protein monomer design, protein binder design, symmetric oligomer design, enzyme active site scaffolding, and symmetric motif scaffolding for therapeutic and metal-binding protein design.”
This month, Meta’s Fundamental AI Research team, which has been a leader in applying large language models in protein structure prediction, showed that its structure prediction model, ESM2, can also be repurposed for programmable protein structure generation beyond natural proteins. These newly generated proteins were experimentally validated with high success rates (>50%) in the lab in collaboration with the Institute for Protein Design at the University of Washington. The collaborative effort behind the paper, Language models generalize beyond natural proteins, also spans MIT, Harvard University and NYU. A nice thread explaining the work can be found here.
The generative AI for drug discovery buzz was also helped by an article from ex-MIT Tech Review and well-regarded AI journalist, Karen Hao in her article, How AI That Powers Chatbots and Search Queries Could Discover New Drugs, sheds light on how language models are increasingly used in drug discovery efforts, notably through partnerships between big pharma and startups. The next few years will be critical for AI in drug discovery, with almost 20 assets now in clinical trials. It’ll be exciting to see the FDA (hopefully) approve AI products outside of radiology, which dominates ¾ of all 521 medical device FDA submissions to date.
🌎 The (geo)politics of AI
After raising $60M at a $2.3B valuation, Shield AI became at least the second multibillion-valued defense tech company, after Anduril, which was valued at more than $7B at its $1.5B raise led by Valor Equity. Traditionally seen as a “faux pas” area for VCs to invest into, defense technology companies are rightfully attracting significant funding. As we highlighted in the State of AI Report 2022 (slide 86), new and diverse products are coming out at an increasing speed from established companies and startups. At the same time, democratic nation states around the world are growing their procurement, albeit still slowly. But like any sector, military AI has its own too-good-too-fast stories. Rebellion Defense’s is apparently one, according to a Vox article. In 2022, the company – founded in 2019 – raised $150M and was valued at $1.15B. However, the article mentions that the company developed products without regards for which governments they'd be sold to. They exchanged classified information on unsecured Slack channels and Google Docs and ran a dysfunctional work environment where decision making was led by DC politics over technology. It seems that Rebellion has been able to win government contracts thanks to its board and employees’ ties to the US governments, but it’s unclear how market (and battlefield)-ready some of its products are. But Rebellion's woes shouldn’t distract from the transformational potential of AI in modern warfare. The latest Washington Post opinion piece on Palantir’s support to the Ukrainian army is a great argument to that effect.
After the US banned most Chinese entities from accessing advanced semiconductors, including a very strategic ban on NVIDIA’s AI chips, China is preparing a plan to save its domestic semiconductor industry. The government is reportedly working on a 1T yuan ($143B) fiscal stimulus package. China will sorely need plans to modernize its semiconductor industry as other Western nations, like the UK, the Netherlands and Japan, are following suit behind the US – although sometimes hesitantly, as ASML’s CEO alluded to. There’s already evidence for China scaling up the non-banned older nodes in the meantime.
As with data privacy, recommender systems, and predictive AI systems, China is moving fast to regulate the private sector’s use of generative AI algorithms. Their law will enter into effect as early as January 10, 2023. As usual with regulations on nascent technologies, it is still unclear how stringent the regulation will actually be and with which means the government will enforce it.
We have updated the State of AI Report Compute Index with fresh data from our collaborators at Zeta Alpha on cited chip usage in open source AI papers. The data confirms what we had reported in the Index’s V1. NVIDIA leads the ranking by 2 orders of magnitude! 21,452 mentions of NVIDIA chips vs. 740 for FPGAs and 257 for TPUs in 2022. NVIDIA’s V100 GPU is the most popular, and the RTX 3090 and A100s usage is growing. Among the AI semiconductor startup conductors, Graphcore is leading, followed by Intel Habana and Sambanova. But combined, they account for 125x less usage citations. We’ve got a nice summary of our findings here. To dive deeper into the semiconductor supply chain and reshoring initiatives across the West, the FT had a great piece here.
On the autonomous vehicles front, Cruise and Waymo are expanding the reach of their driverless cars. Cruise announced that it had completed its first paid rides in Austin, and Phoenix. With San Francisco, the company is now live in 3 cities, and is following the same strategy of covering small neighborhoods before gradually expanding. After San Francisco, Waymo is following in the Cruise’s footsteps, and has now opened a driverless service between Phoenix’s international airport and downtown Phoenix. We tried Cruise in SF and can vouch for the service!
🏭 Big tech
News about OpenAI’s value, revenues, or cash burn are pretty scarce. But Reuters recently reported that according to three sources, “OpenAI's recent pitch to investors said the organization expects $200 million in revenue next year and $1 billion by 2024.” The same sources scooped a tender offer at $29B. Recall that Microsoft is reportedly looking into a second investment in the company. Will we see a joint unveiling of GPT-4 and a funding mega-round early this year?
In the meantime, the insane reception of OpenAI’s ChatGPT has had some ripple effects over at Google and DeepMind. According to the NYT, Google executives issued a “code red” over a possible existential threat caused by increasingly powerful language models built by OpenAI. We should note that with a very partial understanding of the cost of running ChatGPT at Google scale (even OpenAI apparently doesn’t know exactly), it’s still unclear how ChatGPT itself can overthrow Google Search. But this is a welcome reaction from Google, whose Search product could use some shaking-up. Nevertheless, plans to integrate ChatGPT into Bing are underway. In more ChatGPT news, Stackoverflow quickly issued a ban on answers using ChatGPT. The company is concerned that the flood of ChatGPT-generated answers can’t be effectively checked and moderated. Similarly, ICML, a top-3 machine learning conference, banned authors from using ChatGPT and other LLMs in their papers.
On a more fundamental level, the success of InstructGPT, the fine-tuned GPT-3 model behind OpenAI’s ChatGPT, shows how instruction fine-tuning has been critical in the latest advances in language modeling. Instruction finetuning is the process of training self-supervised LMs further on datasets phrased as instructions. For example, you’d use text preceded by preambles like “Please answer the following question”, “Give the rationale before answering”, “Answer by reasoning step-by-step”. Google showed the benefits of this approach back in October 2022 with their Flan-PaLM models, in Scaling Instruction-Finetuned Language Models. In this paper, they demonstrated that an already powerful 540B-parameter PaLM significantly benefits from instruction fine-tuning.
OpenAI had also proven the superiority of InstructGPT over GPT-3 in January 2022. OpenAI went on to show that adding reinforcement learning from human feedback made their models safer and more helpful to users. This was further confirmed by the HELM benchmark, which was last updated in late December 2022 and looks to be establishing itself as a reference leaderboard for academia and the industry for large model performance comparison. As a further datapoint in support of instruction finetuning, Meta released OPT-IML, a version of their OPT-175B model that is instruction fine-tuned on 2000 NLP tasks from 8 existing benchmarks. While it seems to lag behind FLAN and OpenAI’s InstructGPT models, OPT-IML clearly shows the benefits of instruction finetuning and offers methodological insights into trade-offs between “the diversity and distribution of tuning-tasks, the formatting of their prompts, and the objectives used for fine-tuning.” At a higher level, it’s exciting to see reinforcement learning make a comeback too.
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor, Tel Aviv University, Meta AI. Instruction fine-tuning has so far been a largely manual task. This paper proposes a way to make the process less intensive by leveraging the fact that LLMs generally output well-articulated and diverse text. To generate new instructions, the authors use 3 examples of instructions, and generate a fourth one using an LLM. This process gives them 64,000 new instructions. They further expand this instruction dataset to 240,000 examples by using an LLM to rephrase the generated instructions. Although not all the instructions actually make sense to human readers, the authors find that fine-tuning some LMs on this new synthetic dataset makes them competitive with models trained on fully manually-curated instruction datasets. In an even more direct manner, researchers from University of Washington, Tehran Polytechnic, Arizona State, Johns Hopkins, and Allen Institute for AI, generate “instruction, input, and output samples from a language model, then prunes them before using them to finetune the original model”. In SELF-INSTRUCT: Aligning Language Model with Self Generated Instructions, they show that this leads to a performance similar to the version of InstructGPT that is closest to their experimental setting (which is not the most powerful one). Astute readers will remember that in Large Language Models Can Self-Improve, Google similarly generated labels for unlabeled datasets and used the synthesized labels as ground truth to further train their language models. These are all further confirmations of how self-supervised training on very large datasets has made LLMs (the ones used to generate instructions/labels) extremely powerful.
A Generalist Neural Algorithmic Learner, DeepMind, Oxford, IDSIA, USI Lugano, Purdue, Mila. This is the algorithmic solver version of DeepMind’s Gato: A Generalist Agent. Just as the latter was a general model trained on multiple tasks across multiple modalities (text, images, robotics, etc.), researchers introduce here a single graph neural network that can solve for a variety of algorithmic tasks (sorting, searching, graphs, etc.) without sacrificing performance. The proposed model improves average single-task performance by over 20% compared to the previous best on the well-established CLRS benchmark.
Constitutional AI: Harmlessness from AI Feedback, Anthropic. Reinforcement Learning from Human Feedback was one of the main tools used in InstructGPT models to make their models less harmful and more aligned with human intent. One issue with this approach is that it requires a lot of high quality human supervision (typically tens of thousands of human preference labels). To remedy this problem, the researchers propose a method called Constitutional AI. Instead of the tens of thousands of human preference labels that RLHF typically needs, the authors only set ten principles stated in natural language, which were chosen ad hoc for research purposes. Their model is trained in two stages: (i) The Supervised Stage: an RLHF model generates a response to prompts that elicit harmful samples, then the model is asked to revise its response according to a (randomly drawn) principle in the constitution. The resulting model has a better prediction distribution, which reduces the length of training during the second phase. (ii) The RL Stage: this is the RL from AI Feedback (RLAIF) part of training, where the model evaluates the responses according to the constitutional principles. The LM interpretations of the principles are distilled into a hybrid human/AI preference model (as opposed to RLHF which uses only human labels): the model from the first stage generates a pair of responses to harmful prompts, and then is asked to decide which response is best according to a constitutional principle. The answers to all the prompts constitute an AI-generated dataset of harmlessness, which can be used to train a preference model (PM). The PM is eventually used to fine-tune the supervised model via RL. The authors argue that this results in powerful and harmless models that scale better than “traditional” RLHF.
Cramming: Training a Language Model on a Single GPU in One Day, University of Maryland. It’s all in the title, but the authors find that they can get to a performance close to the original BERT model while training entirely from scratch. The original BERT was trained using 45-136x more flops. By training for 2 days on 8 GPUs, they reach performance levels close to RoBERTa. But it’s Lucas Beyer that talks about it best in this awesome thread, which details all the architecture, optimization, data, and plenty of other choices that the authors experimented with.
Point·E: A System for Generating 3D Point Clouds from Complex Prompts, OpenAI. Another example of improvements which try to do as best as possible with less is this work on 3D object generation from text. Generating a single 3D sample typically takes multiple GPU-hours. Here, a very (conceptually) simple approach is used: a first text-to-image diffusion model is used to generate a single view of the object, then a second diffusion model is used to generate a 3D point cloud from that view. The resulting method isn’t quite as good as state-of-the-art 3D object generation models, but it takes only 1-2 minutes on a single GPU.
PubMed GPT: a Domain-Specific Large Language Model for Biomedical Text, MosaicML and Stanford Center for Research on Foundation Models (CRFM). This work seeks to build an LM that’s trained on abstracts and papers from PubMed (retrieved from the Pile dataset) such that the model can answer questions about biomedical data. They use a 2.7B GPT parameter model and compare a similar sized model trained on the entirety of the Pile (GPT Neo), which contains diverse tokens from Wikipedia and other sources. What’s interesting is that although the overall size of the PubMed dataset is 17.5% of the overall Pile dataset, the custom PubMed GPT outperforms GPT Neo on Q&A tasks including from the Medical Licensing Exams due to the selection of domain-specific data. Relatedly, researchers at the University of Florida built LMs for clinical and medical relation extraction as well as Q&A on electronic health records in A large language model for electronic health records. They demonstrate that scaling parameter count up from 345M to 8.9B makes notable improvements to performance. But not so fast…work from DeepMind and Google in a paper entitled Large Language Models Encode Clinical Knowledge demonstrates that while even larger models such as PaLM and Flan-PaLM (as mentioned earlier) achieve state of the art on medical Q&A benchmarks there are still key gaps. The authors introduce an instruction prompt tuning to align the LM to this medical domain, but the model (Med-PaLM) still underperforms clinicians.
Funding highlight reel
Paris-founded New-York based data science platform Dataiku raised a $200M Series F led by Wellington Management. The company is now valued at $3.7B.
Zappi, an AI-based market research platform founded in 2012, raised a $170M round led by Sumeru Equity Partners.
Locus Robotics, a company that builds robots for fulfillment centers, raised a $117M Series F at a $2B valuation led by Goldman Sachs Asset Management and G2 Venture Partners.
Enveda Biosciences, an AI-first biopharma company that probes dark chemical space from plants for new medicines, raised a $68M Series B led by Dimension, with participation from Air Street Capital. We profiled the company’s work on transformers for interpreting small molecule mass spectrometry in this year’s State of AI Report (slide 45).
As covered in the geopolitics section, defense startup Shield AI raised a $60M funding round at a $2.3B valuation from the US Innovative Technology Fund.
Accounting automation platform Vic.ai raised a $52M Series C led by GGV Capital and ICONIQ Growth.
Runway, which creates AI tools for image and video generation and editing, raised a $50M Series C led by Felicis.
Hunt Club, one of many talent recruiting platforms that use AI for candidate-matching, raised a $40M Series B led by WestCap and Sator Grove.
Autonomous robotics company Exyn raised a $35M Series B, with a $25M investment from Reliance, an Indian conglomerate.
Stockholm-based Sana Labs, which uses AI to help companies manage information at work, raised a $34M Series B at a $180M valuation led by Menlo Ventures.
Helm.ai, a company building software for advanced driver assistance systems, autonomous driving and robotics, raised a $31M Series C led by Freeman Group.
EnCharge AI, an AI hardware company based on research conducted with DARPA funding, raised a $22M Series A led by Anzu Partners.
Pactum, a company that automates contract negotiations in the increasingly data-heavy logistics sector, raised a $20M funding round led by 3VC.
Automated contract analysis startup LexCheck raised a $17M Series A led by Mayfield Fund.
London-based Chattermill, a platform that gathers possibly unstructured customer data and uses machine learning models to extract insights from it, raised a $26M Series B led by Beringea.
Protect AI, a company building tools to guard ML models in production against malicious attacks, raised a $13.5M seed round led by Acrew Capital and Boldstart Ventures. The company’s CEO, Ian Swanson, was the VP of AI and ML at Oracle.
Twelve Labs, which develops AI tools to search within videos — as opposed to using video metadata like titles, uploaders and other manually input data — raised a $12M seed extension led by Radical Ventures.
Taiwan-based Profet AI, which helps non-AI native companies integrate AI into their processes, raised a $5.6M Series A led by Darwin Ventures.
Not too much activity in December 2022, which isn’t all too surprising. We saw the acquisition of Impira, a low code data entry automation software, by Figma. And despite the intense interest in AI for code, one of the early movers in this space, Kite, closed its doors. Why? As tends to be the case in developer tools, “We failed to build a business because our product did not monetize, and it took too long to figure that out…Our 500k developers would not pay to use it…Our diagnosis is that individual developers do not pay for tools. Their manager might…but only for discrete new capabilities”.
Nathan Benaich, Othmane Sebbouh, 8 January 2023
Air Street Capital is a venture capital firm investing in AI-first technology and life science companies. We’re an experienced team of investors and founders based in Europe and the US with a shared passion for working with entrepreneurs from the very beginning of their company-building journey.
Thanks for reading Guide to AI! Subscribe for free to receive new posts and support my work.