Hi all!
Welcome to the final issue of your guide to AI of 2023, rounding off the year with this month’s most important developments in AI policy, research, industry, and startups. Before we get going, some news from us:
Fixing the formation of spinouts from breakthrough university research has been close to our hearts. It was two years of essays, several op-eds, debunkings, data (V1) releases (V2), case studies, and contributions to others’ work, that led the UK Government to conduct an independent review of spinouts, which it has now published. It adopts many of the reforms we proposed and is a significant achievement for Air Street, the entire spinout community and openness. Change happens when we share experiences openly with real data-driven evidence and are loud when the status quo just isn’t good enough.
In our latest research and policy work, we published a report exploring why Europe’s model of defense procurement isn’t working, the risk this poses to liberal values, and some potential solutions. More to come here and please reach out if you’re interested in helping.
Hiring @ Air Street: We are looking to grow the Air Street team in 2024. We’re on the hunt for a Community Lead to help us build out and engage our community of founders, researchers, and builders. If you or a friend might be interested, check out the description here and drop us a line on careers@airstreet.com.
As usual, we love hearing what you’re up to and what’s on your mind, just hit reply or forward to your friends :-)
We hope you enjoy the holiday season and look forward to catching you next year!
🌎 The geopolitics of AI
Overshadowing perhaps everything over the past month has been the surreal series of events over at OpenAI. The anti-Altman coup that turned into a circular firing squad is one of the more mystifying developments we’ve witnessed in the corporate (AI) world.
Much like OpenAI staffers, we have no insight into what really took place, but it has become another fault-line in the AI safety wars. Fairly or unfairly, the AI safety world and effective altruists were widely depicted as the villains of the tale on X/Twitter, with anti-Altman board members’ ties to Open Philanthropy, 80,000 Hours, and Effective Ventures coming under the microscope.
Media reporting has also pointed to the emerging rift between the increasingly successful for-profit arm of the business and researchers who felt increasingly alarmed by the capabilities of the systems they were producing.
For us, the moral of the story has far more to do with the laws of corporate gravity than existential risk.
Firstly, OpenAI’s corporate structure was unsustainable (but perhaps now is back to “normal”?), attempting as it did to avoid real-world divergence in objectives via legal engineering. If the founding team of an organization fundamentally disagrees, legalese can only paper over the cracks for so long.
Secondly, despite the significant progress made by Team Safety in the air wars in 2023, often with the support of tech companies, Microsoft’s defeat of the board shows that mindshare isn’t everything. There is a naivety among some in the AI world who believe that companies will genuinely finance efforts to sabotage their bottom line.
Speaking of powerful factions with divergent interests, chaos has broken out on the other side of the Atlantic, threatening the passage of the EU’s AI Act. After apparently reaching a consensus on a tiered approach to regulating foundation models, the agreement came close to falling apart. France, Germany, and Italy proposed removing foundation model-specific regulation from the Act altogether.
Cynics suggested that this is the product of both countries birthing foundation model providers - with Mistral in France and Aleph Alpha in Germany. If you want a flavor of the debate, check out the exchange between Cédric O (former French digital minister turned Mistral adviser) and Max Tegmark of MIT. It’s safe to say there’s little love lost between the two.
After several dramatic days of negotiations, agreement was reached late on Friday night - compromising on certain transparency requirements for all models, but tougher rules for ‘high impact’ models. We’re yet to see specifics, but this will be determined through some mixture of size, capabilities, and performance. Details are still emerging.
Alleged state support of a very different kind is allegedly occurring over in China. Bloomberg has published a fascinating read, diving into how increasingly stringent US chip sanctions have driven state and industry closer together.
The piece alleges that the Chinese government is providing unprecedented support to Huawei across the chip supply chain - largely in secret, to avoid further US restrictions. State investment funds are backing companies aiding Huawei. In turn, Huawei is lending out its engineering expertise and IP to smaller tech firms, to boost overall domestic chip capabilities. Huawei has also hired ex-ASML technical staff in their quest for lithography knowledge. Huawei denies receiving special government support…
While this level of integration is striking, it touches on a theme we outlined in this year’s State of AI Report. Hundreds of billions of dollars in subsidies have been incinerated by China’s chip industry. Even the much-heralded Kirin 9000S chip produced by SMIC lags the state of the art produced in Taiwan by about half a decade. There’s little reason to think that this new way of routing the money is going to deliver more.
Similarly, even an old ASML EUV machine contained over 450,000 finely arranged components and can be rendered useless by a single flaw. Replicating this capability will require more than poaching a couple of engineers.
The other hotly-contested China development also has a semiconductor angle. A number of US intelligence agencies have allegedly expressed concerns about G42, an Emirati holding company working on a range of government and enterprise AI projects. There are fears that G42, which has partnerships with OpenAI, Microsoft and Cerebras, is being used to channel US intellectual property to China. Officials are concerned about G42’s ties to Huawei, along with Chinese companies working in fields like surveillance technology and genetic testing. The Biden Administration has been unsuccessful pushing the UAE leadership to compel G42 to sever ties over the past year. Again, G42 insists it fully complies with any regulations…
🍪 Hardware
Cementing its position at the top of the chip world, NVIDIA has unveiled a new AI chip, the H200. Due for launch in Q2 of next year, the H200 will pack 1.4x more memory bandwidth and 1.8x more memory capacity than the H100. This should improve its capacity to handle memory-intensive generative AI work. Jensen has also dialed up his conviction for AI’s importance in biology with a Genentech deal: the company will use NVIDIA's DGX Cloud and BioNeMo platform to scale its models, while NVIDIA will in turn gain insights into how its technology can be used in drug discovery. We’ve long been bullish on AI-first biology, writing in 2019 that AI had experienced its “AI moment”.
While NVIDIA is riding high from the intense global demand for its chips, it’s not all plane sailing. Its sanctions-compliant A800 and H800 for the Chinese market were recently hit with US restrictions. The H20, designed to replace them, has been delayed, while US Commerce Secretary Gina Raimondo warned that: “If you redesign a chip around a particular cut line that enables them to do AI, I’m going to control it the very next day.”
Raimondo made these comments days after the US government compelled Saudi investor Prosperity7 (tied to Aramco) to exit its significant investment in neuromorphic chip company Rain AI over national security concerns. Rain AI has received investment from none other than Sam Altman and OpenAI previously signed a letter of intent to acquire $51 million of their chips.
Anduril has unveiled a new reusable jet-powered, AI-controlled drone called Roadrunner. Designed to identify and intercept drones and other missiles, the system will be largely autonomous, but still require a human operator to make decisions on the use of deadly force. With drones used in conflicts such as the Ukraine war increasingly able to evade jamming, Anduril hopes this will remove the need to use expensive missiles to shoot down cheap UAVs.
The UK Government has introduced a new Automated Vehicles Bill, creating one of the first comprehensive legal frameworks in the world to allow the adoption of self-driving vehicles. It will set safety thresholds, clarify legal liability, and establish a testing regime.
🍔 Big model drops
November and early December saw a run of model releases. In early November, Elon announced Grok, an LLM chatbot built by x.ai that is designed to answer questions with a bit (I’d say a lot!) and a rebellious streak. The system has access to real-time knowledge of the world via X and it will “answer spicy questions that are rejected by most other AI systems”. Built in 2 months, the system used an interesting evaluation (amongst several others) on the 2023 Hungarian national high school finals in mathematics, which was published after the training dataset collection date cutoff. The team showed that Grok passed the exam with a C (same as Claude-2) while GPT-4 got a B.
About a month later, Google DeepMind’s long-anticipated Gemini was unveiled to the world. Built for multimodality from the start, it demonstrates impressive audio, video, and image recognition (although its language and code performance is stronger). There was some controversy around the marketing of the model’s performance on MMLU (Massive Multitask Language Understanding) - a benchmark that tests world knowledge and problem-solving over 57 tasks, from math through to law and computer science. Using the 5-shot MMLU benchmark, GPT-4 scores 86.4% vs. 83.7% for Gemini Ultra, mostly because of categories like professional law and sociology. Using the CoT@32 (comprehension over text using the 32k context window), Gemini Ultra beats GPT-4. And the authors found that if you give the model some time to think, it does beat out human experts. This also points to the need for improved benchmarks such as measuring success at multiple consecutive skills, for example. A version of Gemini is being integrated into Bard, but the most powerful version will be released over the course of next year in stages. You can check out the launch page here and the team’s technical report here. Buried in this announcement is a short (and detail light) technical report Alpha Code 2, powered by Gemini, which performed better than 85% of human competitive programming participants that originally competed against Alpha Code.
Days later, Mistral took the polar opposite approach to marketing its latest release, simply dropping a torrent link with no commentary. The open source mixture of expert models consists of 8 7B parameter models, with two of the models used to generate each token. Speaking of France and AI, a new non-profit research organization has been formed in Paris by the name of Kyutai. It is funded with at least $300M from Xavier Niel, Eric Schmidt, and Rodolphe Saadé, and will have access to compute via Niel’s Scaleway. It really does feel that owning your own non-profit AI research lab is now socially cooler than owning a sports team or being on the board of an iconic art museum.
Before the Gemini excitement, Inflection unveiled Inflection-2, which they have described as the “best model in the world for its compute class and the second most capable LLM in the world today”. Trained on 5,000 H100 GPUs, it outperformed Llama-2, PaLM-2, Grok-1, and Claude-2 across a range of reasoning, question-answering, math, and coding benchmarks - losing out consistently to only GPT-4. The model will soon be used to power Pi, Inflection’s chatbot.
It’s been a big month for Chinese LLMs. DeepSeek, founded this year, released the open source DeepSeek 67B. Its base version beats the Llama2 70B base across various reasoning, coding, and Chinese comprehension. They also used new benchmarks, including the Hungarian National High School Exam.
While not present in the model, users have been quick to point to political censorship in DeepSeek’s online chatbot.
Kai-Fu Lee’s start-up 01.AI has released an open source model of its own - YI-34B. The bilingual base model is significantly smaller than many open source models, including Falcon-180B and Llama2-70B, but currently tops Hugging Face’s pre-trained LLM rankings. Following revelations that the model was based on Llama’s architecture, Lee announced that 01.AI would change its tensor name to reflect this.
Stability AI released Stable Video Diffusion, an open-source text to video model, with samples demonstrating state-of-the-art quality. The team has outlined their three-stage approach to training: i) image pretraining on a large text-to-image dataset, ii) video pretraining on a large, curated low res video dataset iii) fine tuning on a smaller, high res video dataset.
Staying on Stability, beyond the world of model releases, the ongoing legal disputes over alleged copyright infringement have started to crystallize over the past month.
A group of visual artists have filed an amended lawsuit in the US against Stability, Midjourney, and Deviantart over the alleged use of their work in training data. This was after their other arguments - including unfair competition, violation of the right of publicity, and that Stable Diffusion produces “substantially similar” output to their original work - were all rejected by the judge. Stability was less successful in its attempt to dismiss a copyright claim from Getty Images, with a British judge ruling that it could go to trial.
We’ve seen a similar outcome in a Sarah Silverman-led copyright claim brought by authors against Meta. Again, the judge allowed their core claim of copyright infringement to proceed while dismissing their wider arguments.
🔬Research
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives, Meta, Project Aria and 15 university partners. This work created the Ego-Exo4D dataset, a dual-perspective video learning resource designed to enrich AI multimodal perception research. The dataset captures both first-person “egocentric” and third-person “exocentric” perspectives, marking a significant milestone in the study of advanced human skills through video analysis. The dataset is the largest synchronized first- and third-person video collection ever released to the public, comprising over 1,400 hours of footage from a host of skilled contributors across multiple countries. The dataset is rich with multimodal elements, incorporating seven-channel audio, inertial measurement units, gaze tracking, head poses, and 3D environmental reconstructions to foster a holistic AI understanding.
Scaling deep learning for materials discovery, Google DeepMind. Alongside drug discovery, materials science discovery is one of the exciting domains of AI for science. Similar to the problem of drug discovery, materials discovery is rate-limited by our ability to explore the vast search space of potential formulations, identify which have the characteristics we want, and how to make them. To accelerate materials discovery, this paper introduces graph networks for materials exploration (GNoME). The system is built from methods to generate diverse candidate material structures and graph neural networks that operate on these structures or compositions to model and predict material properties. Interestingly, the work uses density functional theory (DFT) - first-principles equations that approximate physical energies that govern material behavior - to both verify the predictions of GNoME and to act as training data in an active learning loop. Using iterative rounds of material generation, property prediction and DFT, GNoME discovered >2.2M stable material structures, many of which escaped previous human chemical intuition and would have taken 800 years to discover. Of these, 380,000 are the most stable and should be amenable to synthesis in the lab. External research labs validated 736 of these structures. Indeed, a group at Berkeley made use of GNoME to create an autonomous materials discovery and testing lab focused on solid-state synthesis of inorganic powders. During 17 days of experimentation, this automated lab generated 41 novel compounds. Furthermore, failed experiments could be used to provide actionable labels to improve future syntheses. Overall, this work from Google DeepMind is tremendously exciting, not least because one of its key authors is a high-school friend of mine. Huge congrats, Sam!
Along similar lines, a team at Meta AI, Georgia Tech and Oak Ridge National Laboratory created a dataset of metal-organic framework (MOF) sorbents for direct air capture (DAC) of carbon dioxide to accelerate the discovery of new MOF sorbents. This dataset, Open DAC 2023, contains >38M DFT calculations on >8,400 MOF materials containing absorbed CO2 and/or H2O. The team makes use of this dataset to train models that can approximate calculations at the DFT level, which are typically slower and computationally expensive.
Learning skillful medium-range global weather forecasting, Google DeepMind. In another big paper for the company, DeepMind introduced (and open sourced) GraphCast, a medium-range weather predictor in the journal, Science. In this year’s State of AI Report (slide 57), we reported on learned methods and physics-informed models that incorporate relevant priors are able to deliver performance improvements preferred by professional meteorologists. New benchmark datasets such as Google’s WeatherBench 2 help data-driven weather model development. Meanwhile, GraphCast predicts weather up to 10 days into the future. It is trained on 40 years of weather reanalysis data and predicts five variables pertaining to Earth's surface, such as temperature, wind speed, direction, and mean sea-level pressure, along with six atmospheric variables at 37 different altitude levels. These atmospheric variables include specific humidity, wind speed, direction, and temperature. Upon inference, the model takes as input the state of the weather 6 hours ago and the current state of the weather to then predict the weather over 10-days in less than 1 minute on a single TPUv4 machine (compared to traditional approaches that take hours on a supercomputer). GraphCast was shown to be more accurate on >90% of 1380 test variables and forecast lead times than the gold-standard deterministic system.
Prospective implementation of AI-assisted screen reading to improve early detection of breast cancer, Kheiron Medical Technologies, UCSD, Imperial College London. There has been much anticipation and critique of AI-based medical imaging solutions with limited large-scale clinical studies. This work demonstrates a multi-center, live rollout of Kheiron’s AI system for breast cancer detection. The system is based on an ensemble of deep convolutional neural networks trained for malignancy detection to generate a binary recommendation of “recall” or “no recall” according to desired sensitivity/specificity trade-offs. The system was trained on a diverse collection of >1M images from real-world screening programs across different countries, multiple sites and equipment from different vendors over a period of >10 years. This paper reports results of the AI system being implemented as an additional reader to the standard double reading. Compared to double reading, the AI system can improve early detection of breast cancer (i.e. invasive, small-sized lesions) with minimal to no unnecessary recalls.
Also on our radar:
Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation, Cohere. This paper evaluates several metric-based methods to minimize the amount of human-in-the-loop feedback needed to improve model performance by prioritizing data instances that most effectively distinguish between models.
RoboGen: Towards unleashing infinite data for automated robot learning via generative simulation, CMU, Tsinghua, MIT, UMass Amherst. Much has been said about dwindling data availability for training large-scale AI models. But robotics benefits from simulation technology, which can create data to train behaviors. This work uses generative models to automatically generate diversified tasks, scenes, and training supervisions, thereby scaling up robotic skill learning with minimal human supervision.
ChatGPT systems appear to disproportionately improve the performance of lower-skilled professionals at BCG.
Introducing Emu Video and Emu Edit, our latest generative AI research milestones, Meta AI. Tools for video generation and editing are a new frontier that we’re watching.
💰Startups
🚀 Funding highlight reel
AI21 Labs, the full-stack enterprise genAI developer, added $53M to its previously-announced $155 million Series C, with additional funds from Intel Capital and Comcast Ventures.
Aleph Alpha, the privacy-first LLM-provider, reportedly raised a $500M Series B, led by Schwarz Group, the owners of Lidl. However, there are views that this round was closer to $100M in equity and the remainder in various forms of grants and credits.
Atomic Industries, the computational manufacturing company, raised a $17M seed round, led by Narya.
AutogenAI, the company building genAI for procurement bids, raised $39.5M in a round led by Salesforce.
Coreweave, the cloud computing provider, closed a $642M secondary share sale to investors including Fidelity, Jane Street, JP Morgan Asset Management, Nat Friedman, Daniel Gross and others. It values the company at $7 billion. Wowza.
Cradle, the AI protein design start-up, raised a $24M Series A, led by existing investor Index Ventures.
Essential AI, the stealth start-up founded by two of the “Attention Is All You Need” authors, raised $36.5M of a potential $51.5M total funding round.
EnCharge AI, the AI-accelerating chip manufacturer, raised a $22.6M funding round, led by The VentureTech Alliance, TSMC’s venture partner.
Kognitos, the business automation start-up, raised a $20M seed round, led by Khosla Ventures
Layer Health, the MIT spinout working on clinical notes, raised a $4M seed round, led by Google Ventures.
Manifold, the biomedical data and precision health platform, raised a $15M Series A led by TQ Ventures.
OfferFit, the AI-driven testing automation start-up, raised a $25M Series B, led by Menlo Ventures.
Pika, a start-up building an AI-generation platform, raised a $36M Series A, led by Lightspeed Venture Partners.
Phare Health, a start-up building fintech tools for hospitals, raised a £2.5M seed round, led by General Catalyst.
PhysicsX, the AI-powered engineering simulation start-up, raised a $32M Series A, led by General Catalyst.
Replit, the developer platform, sold $20M in an employee tender to Craft Ventures.
Secondmind, the AI for automotive design start-up, raised a $16M venture round, led by Mazda.
Stability AI, the creators of Stable Diffusion, has raised a $50M convertible note, led by Intel.
Swarmbotics AI, a start-up building ground swarm robots for industry and defense, raised a pre-seed led by Quiet Capital.
Together, a start-up building open source AI infrastructure, raised a $102.5M Series A, led by Kleiner Perkins.
Tabnine, the code-generation company, raised $24M led by Telstra Ventures.
🤝 Exits
Gameplanner.AI, a stealth AI company, has been acquired by Airbnb, in a deal thought to be valued at a little under $200 million.
Rephrase.ai, the text to video generation start-up, has been acquired by Adobe for an undisclosed sum.
---
Signing off,
Nathan Benaich and Alex Chalmers 10 December 2023
Air Street Capital | Twitter | LinkedIn | State of AI Report | RAAIS | London.AI
Air Street Capital invests in AI-first technology and life science entrepreneurs from the very beginning of your company-building journey.
Doesn't CoT in "CoT@32" stand for "chain of thought"?
Great content here guys, thanks for sharing