👋 Good morning from SF!
Welcome to the latest issue of your guide to AI, an editorialized newsletter covering key developments in AI policy, research, industry, and startups during October 2023. Before we kick off, a few news items from us :-)
Last month, we released the 6th edition of the State of AI Report with an in-person event in SF featuring leadership from Recursion, Meta AI and Adept. It was a fun evening in celebration of the AI community’s output over the last year and a sneak peak for what’s to come. This Thursday we also ran the latest installment of our Paris.AI meetup with Dust, Meta AI, PhysicsX and Osium.
We announced our latest investment at Air Street, a defense tech company called Lambda Automata, and shared our view on why it is important to invest in defense and security in a world that is unacceptably suffering from terrorism and wars.
Come join a 1-day event in Lisbon on building AI-first products, from science to software, at the Champalimaud Foundation with our friends at Point Nine.
As usual, we love hearing what you’re up to and what’s on your mind, just hit reply or forward to your friends :-)
🌎 The geopolitics of AI
The past week has probably been one of the most anticipated weeks regarding AI policy. In the UK, the AI Safety summit took place at Bletchley Park, with representatives from 27 countries and a long list of institutions attending.
The Summit resulted in a statement signed by a range of governments, laying out a shared agenda for addressing frontier AI risk and committing to future meetings in France and South Korea. The event defied expectations, with much of the UK’s political media quietly rubbing their hands together with glee at the prospect of an international embarrassment for the government. Instead, the Summit proved to be a powerful sign of the UK’s convening power.
Nevertheless, the UK Government’s wider approach to AI safety has not been without its critics. For example, a recent joint report from Onward, the Tony Blair Institute, and Startup Coalition aired frustrations from early-stage founders that the safety focus was distracting from efforts to harness technology.
The government, however, shows no sign of backing down. It announced plans for a UK AI Safety Institute, responsible for evaluation and exploring a range of risks. Ahead of the Summit, it published a discussion paper, summary, and accompanying Annex on the risks and opportunities of AI. The Annex is probably the best reading - it contains the government’s analysis of a range of hypothetical future AI scenarios (ranging from catastrophe to disappointment), along with a good readout on the state of the frontier.
The UK Government isn’t the only group currently contributing to the sprawling safety literature, with a consensus paper entitled Managing AI Risks in the Era of Rapid Progress, authored by Geoff Hinton, Yoshua Bengio and other luminaries outlining a range of catastrophic risk scenarios, ranging from malicious use through to loss of human control.
In the U.S, President Biden has signed a new Executive Order (official summary here) designed to ensure AI systems are developed in a safe, secure and trustworthy manner. This involves requiring developers of AI systems with certain characteristics to submit safety test results with the government, new measures to label AI-generated content, and instructing various government agencies to expand their work to ensure they are covering the AI-related issues/threats under their purview. The US has also announced an AI Safety Institute of its own.
Signaling the growing prominence of biorisk in the AI risk conversation, the EO also referenced new standards for biological material synthesis. The fear that open source AI could increase bioterrorism is deeply controversial amongst experts and has started to prompt a mini-backlash of its own. For a solid skeptical take, the anonymous 1a3orn has written an essay casting doubt on the state of the evidence presented by organizations he believes are crying wolf.
It’s been a busy few weeks in the US, as the government has fired the next salvo in the chip wars. In this year’s State of AI Report, we highlighted how NVIDIA and others have created chips that comply with the US sanctions regime. The A800 and H800 chips were set below the performance threshold set by the US ban.
Within days of the report landing, the US government announced its intention to tighten restrictions further within 30 days. Weeks ahead of this deadline, in a surprise move, it instructed NVIDIA to halt shipments of the A800 and H800 immediately. It’s unclear what prompted this acceleration, as there were clear signs that the current regime was effective. China’s biggest memory chip maker has already had to raise billions in fresh capital to stay afloat.
Nevertheless, these measures highlight the perils of using “performance thresholds” of any kind as a regulatory tool, whether for chips or LLM training (as the AI Executive Order does). The market has an uncanny ability to adapt around these challenges, and it places regulators in permanent catch-up mode.
Meanwhile, over in everyone’s favorite regulatory superpower, EU lawmakers are continuing to thrash out major issues in the AI Act through the grindingly slow trilogue process. We won’t attempt to capture every nuance of the arguments here, but there are certain key issues under discussion.
It looks likely that AI developers operating in “high-risk” areas might be exempted from some of the most stringent requirements when their system is “not materially influencing the outcome of decision making” (e.g. narrow procedural tasks or reviewing previous human activity). Similarly, foundation model regulation will likely now be tiered based on risk. How risk will be determined - whether this is size, compute power, number of users or something else - all remain up for discussion.
Most controversially, EU member states are seeking exemptions from restrictions around real-time facial recognition and other high-risk applications for national security purposes. The Parliament supports maintaining the proposed ban in its current form.
It is now unclear if the Act will end up passing this year. This process demonstrates the challenge of trying to bring in a single, future-proofed legal framework for a single technology on the basis that the regime driving progress continues (compute and large models are all you need) and will boost those in the UK who see the merit in a more flexible, context-dependent approach.
To complete our regulatory tour, Matt Sheehan of the Carnegie Endowment has shared a fascinating thread breaking down new guidelines exploring how companies should comply with China’s generative AI regulations. In line with the spirit of the regulations, these are focused around controlling illegal and “unhealthy” information.
These measures include random inspections of data points inside a model’s training corpus and test questions on sensitive issues covering religion and politics. As Matt points out, companies might instinctively want to mass block questions on these topics, but to do so would make the censorship too obvious. As a result, refusing to answer ‘safe’ questions too often would also constitute failure in the eyes of the drafters.
If these kinds of tests moved from guidelines to law, they stand to make life very difficult for foundation model developers. China has often managed to defy the conventional wisdom that political control must come at the expense of innovation - these kinds of rules would appear to show gravity reasserting itself.
🍪 Hardware (kinda)
After what looked like an incredible success for autonomous driving with Cruise and Waymo having been granted legal approval to operate a paid service with no safety driver monitoring in San Francisco, Cruise’s permit has been suspended and they have paused all operations. This is the result of a series of incidents, notably one involving a woman being hit by another vehicle (not autonomous) but being thrown onto the path of a Cruise vehicle and stuck underneath it. Waymo on the other hand continues operations and partners with Uber to offer autonomous rides in Phoenix.
Over in the defense world, Anduril has deepened its footprint in the UK, signing a £17 million contract (with the potential to rise to £24 million) with the Ministry of Defence and Strategic Command. This builds on an earlier contract, which provided Anduril’s autonomous surveillance towers for base security.
NVIDIA contenders are still struggling to show any sight of competition. Graphcore, once UK’s AI champion, valued at around $2.5B in 2020, is now rumored to be going after an acquisition after not securing a deal with the UK government (despite their peaked interest in AI and compute). Graphcore’s filed accounts show that it urgently needs to raise funding. It also seems to have been left out of the £100 million AI fund that the government has committed to AI initiatives.
MosaicML, recently acquired by Databricks, continues to champion AMD GPUs. They published a blog post claiming that a cluster of 128 MI250 GPUs interconnected by a high speed w network perform well for distributed LLM training, achieving comparable performance to Nvidia A100-40GB GPUs.
AI infrastructure provider and nonprofit subsidiary Voltage Park acquired 24k H100 GPUs for $500 million funded by Jed McCaleb of Ripple.
🏭 Big tech startups
It is rumored that Mistral seeks to raise an additional $300M of funding despite having raised a $113M seed round less than 6 months ago. The Paris-based GenAI company recently released its first model, Mistral 7B by tweeting a torrent link to the model’s weights. The model was generally well received given its Apache 2.0 license and performance: at the time of the release, it outperformed all 7B models on MT-Bench as well as LLama-2 13B and does not use any proprietary data. HuggingFace further finetuned it on the UltraChat dataset and then trained it with direct preference optimization (DPO) on UltraFeedback dataset. The resulting model, Zephyr 7B Alpha, outperforms Llama 70B on MT Bench.
Adept open-sourced Fuyu-8B, a small multimodal model with a simpler architecture (vanilla transformer-only decoder) and fast training and inference designed specifically for digital agents. Meanwhile, Google Research announces PaLI-3, a 5B parameters VLM competitive with PaLI-X (55B parameters).
Together released Red-Pajama-Data-v2, a 30 trillion token dataset, 25x larger than their first dataset, covering 5 languages. They claim it provides the most complete coverage on CommonCrawl and comes with 40+ quality annotations (the result of different ML classifiers on the data and heuristics).
TwelveLabs, a San Francisco-based video understanding company which recently raised an additional $10M in funding from NVentures, Intel, Samsung Next and others, released Pegasus-1, a 80B parameter video-language model along with their Video-to-Text API.
Elon Musk-backed start-up xAI has released its first AI Model called Grok. Unlike other models that have historically relied on archival material, Grok-1 has real-time access to data from X. On a range of benchmarks, xAI claims that Grok-1 surpassed LLaMA2, ChatGPT-3.5, and Inflection-1, losing out only to models that had been trained on significantly more data.
🔬Research
Since publishing the State of AI Report, there has been significant progress in generating more scalable robot learning data. The Open-X Embodiment dataset unifies robot data from a large number of organizations. Very recent systems such as GAIA-1 from Wayve, UniSim from UC Berkeley, MimicGen from NVIDIA or RoboGen from CMU leverage generative video models to produce more diverse environments, tasks and interactions that robots can learn from, without directly interacting with that data themselves (or that data even existing in the real world).
Open-X Embodiment dataset and the RT-X Model Open X-Embodiment Collaboration, is a collaboration between Google DeepMind, Google Research and 24 other (academic) research groups. They release a first of its kind open-source repository of large scale robotics data together with pre-trained model checkpoints. Open-X Embodiment dataset unified and standardized 60 existing robot datasets from 34 robotics labs covering 22 different embodiments from single manipulators to dual arm robots and quadrupeds. In the State of AI Report, we covered RT-2, a robotics model that finetunes a VLM (i.e. PaLM-E or PaLI-X) on low level robot actions like positional and rotational changes of the end-effector. Actions are represented as text tokens and trained jointly with vision-language data to do next token prediction. RT-2 was already incredibly impressive, being able to perform semantic reasoning (improvising tool use) and detecting objects only seen in web data. A new RT-2-X model has exactly the same architecture as RT-2 but it is trained on the Open-X Embodiment dataset. Unsurprisingly, it outperforms its RT-2 variant in terms of emergent skills. When tasks involve objects and skills that are not included in the RT-2 dataset, RT-2-X demonstrates transfer from other robot platforms, without any additional effort in reducing the domain gap between platforms or breaking down a task into smaller tasks. It shows that a single powerful model trained on diverse data can outperform specialist ones when training at scale and constitutes a big first step towards an embodied robot generalist.
UniSim: Learning Interactive Real-World Simulators, UC Berkeley, Google DeepMind, MIT, is a learned simulator for robotics. It is a large vision model that can predict what will happen in a scene based on the actions taken. These can be both high-level instructions such as "open the drawer" as well as low-level ones such as "move by x, y". It is learned from rich datasets of human activity videos, panorama scans (constructing actions by changing the view of the camera) and text-image data (for instance, LAION where text labels often reveal motion information). UniSim is trained directly in pixel space as a diffusion model. It can 1) generate large amounts of synthetic training data and 2) enable training of RL agents. They additionally trained a vision-language-action (VLA) model (similar to RT-2) and performed rollouts in UniSim using the actions predicted by VLA policy. Then, using a proxy reward (learnt from the number of steps to complete the task) optimized the VLA policy with reinforcement learning, showing improved performance across 48 tasks when compared to a behavior cloning baseline.
Eureka: human-level reward design via coding large language models, NVIDIA continues its progress on agents, this time learning how to perform dexterous tasks by designing reward functions. General purpose dexterity is the holy grail of robotics. LLMs have been making significant progress in high level task planning, reasoning with longer horizons and writing code for easier robot interaction. But very few, if any, address the dexterity part of the problem. Eureka leverages an LLM (GPT-4) to write reward code, typically written by RL practitioners. The rewards Eureka produces outperform human-engineered ones in 83% of the cases covering 29 open-source RL environments that include 10 different robot embodiments (including quadruped, quadcopter, biped as well as several dexterous hands). Eureka takes as input the source code of the environment, a task description and an initial prompt and outputs a reward function. The great thing is that they don’t try to prompt the LLM with task specific formatting and rewards design hints which help in the short term but hinder the generality of the system. Instead, they query the LLM several times for multiple reward functions and iteratively perform reward mutation on the “best” performing ones. The “best” is determined based on an automated reward reflection procedure - each reward component exposes a scalar value of how much it contributes to the final reward. Eureka rewards outperform human rewards, improve over time and do not necessarily correlate with human rewards (i.e. are novel). We’d love to see the pen spinning task performed by a real robot arm.
Also on our radar:
SmartPlay: A benchmark for LLMs as Intelligent Agents, Carnegie Mellon University and Microsoft Research. Introduces a benchmark composed of 6 games (including Minecraft) and methodology to test 9 capabilities (such as spatial reasoning or planning ahead) of LLMs as agents. GPT-4 variants out-perform other LLMs by significant margins, but still greatly under-perform human baselines.
Bridging the Human-AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero, Google DeepMind. They show that it is possible to improve human performance at chess by extracting and transferring concepts from AlphaZero.
AgentTuning: Enabling Generalized Agent Abilities for LLMs, Tsinghua University. Presents a method to enhance agent abilities of LLMs while maintaining their general capabilities. This is achieved by additionally training on a lightweight instruction tuning dataset and open-source instructions from general domains.
Human-like systematic generalization through a meta-learning neural network, New York University, Universitat Pompeu Fabra. They show that neural networks can understand and produce novel combinations from known components by being trained with Meta-Learning for Compositionality (MLC).
PromptBreeder: Self-Referential Self-Improvement Via Prompt Evolution, Google DeepMind. They introduce a general purpose self-improvement mechanism that adapts self-generated prompts for a given domain. It is based on an evolutionary algorithm to iterate over generations of prompts which were subject to various mutations which increase their diversity.
NoMaD: Goal Masking Diffusion Policies for Navigation and Exploration, UC Berkeley. NoMaD combines ViNT (Visual Nav Transformer) with a diffusion action decoder. When the input conditioning goal is not masked, NoMaD performs task-oriented navigation, otherwise it explores the environment in a task-agnostic manner. It matches the performance of DiffusionPolicy, while being additionally able to perform goal-conditioned navigation.
Matryoshka Diffusion Models, Apple proposes a diffusion technique that jointly denoises inputs at multiple resolutions with a NestedUNet architecture, enabling high-quality generation of high-dimensional data directly in pixel-space.
💰Startups
🚀 Funding highlight reel
Writer, which builds a generative AI platform for enterprises, raised $100M Series B led by ICONIQ Growth.
Lambda Automata, an AI-first defense software and hardware company, raised a €6 million Seed round, led by Air Street Capital.
Helsing, the defense AI software company, raised a €209 million Series B, led by General Catalyst.
Shield AI, a defense tech company developing AI-powered fighter pilots and drones, raised $200M in a round co-led by the U.S. Innovation Technology Fund and Riot Ventures.
Metropolis, an LA-based check-out free parking startup, raised $1.7B in debt and equity led by Eldridge and 3L Capital, a deal used to buy logistics firm SP Plus.
Pony.ai, the autonomous driving company operating in Beijing and Guangzhou, raised $100M from Saudi Arabia’s development fund NEOM in order to develop autonomous driving solutions and infrastructure in the region.
Anthropic is set to receive another $2B from Google, a month after Amazon pledged to invest up to $4B in the company.
Imbue, a company developing autonomous agents, secured another $12M follow-on funding to a $200M Series B at a valuation of over $1B.
Crusoe Energy, the GPU datacenter powered by Crusoe’s Digital Flare Mitigation, closed a $200M debt facility backed by NVIDIA GPUs.
Sprout, a London-based insurance claims automation company raised £5.4 million in a Series A round led by Amadeus Capital Partners and Praetura Ventures. Sprout technology uses LLMs for insurance claims assessment and handling.
Unitary, a London-based visual content moderation company raised a $15M Series A led by Creandum.
🤝 Exits
Splunk, the publicly-traded AI-powered cybersecurity firm is to be acquired by Cisco for $28 billion, marking it one of the largest AI acquisitions ever.
Tessian, a London-based AI-driven email security platform has entered a definitive agreement to be acquired by Proofpoint, a leading cybersecurity and compliance company.
Nod.ai, the open source AI software start-up, is being acquired by chip-maker AMD, as part of their push AI software operations
CoRead, the radiology imaging start-up, was acquired by digital health platform Covera Health. Covera completed the transaction at the same time as its $50 million Series C, led by Insight Partners.
Clearbit, the B2B data provider, announced it was hiring HubSpot, a Boston-based marketing software provider and CRM platform.
---
Signing off,
Nathan Benaich, Alex Chalmers, Corina Gurau on 5 November 2023
Air Street Capital | Twitter | LinkedIn | State of AI Report | RAAIS | London.AI
Air Street Capital invests in AI-first technology and life science entrepreneurs from the very beginning of your company-building journey.