🏖️ Your guide to AI: August 2023

Aug 06, 2023

Hi all!

Welcome to the latest issue of your guide to AI, an editorialized newsletter covering key developments in AI policy, research, industry, and startups. This special summer edition (while we’re producing the State of AI Report 2023!) covers our 7th annual Research and Applied AI Summit that we held in London on 23 June. The in-person event brings together 200 researchers, entrepreneurs, and operators with the goal of learning and sharing best practices for building AI-first products as well as exploring where the state of the art is heading next. It supports the RAAIS Foundation, a non-profit that advances education and the funding of open source AI for the common good.

Below are some of our key takeaways from the event and all the talk videos can be found on the RAAIS YouTube channel here. If this piques your interest to join next year’s event, drop your details here.

As usual, we love hearing what you’re up to and what’s on your mind, just hit reply or forward to your friends :-)

Driverless cars have arrived and green batteries are in the making

Self-driving now exceeds human performance thanks to improvements across the entire machine learning stack in recent years. This was the topic of Cruise’s former VP Product, Oliver Cameron’s talk. Until recently, only perception models relied on machine learning and otherwise rule-based driving systems failed to scale. Today, the entire self-driving stack of leading AV companies makes use of machine learning, including planners and prediction engines. But scaling isn’t just about enhancing model performance. Companies needed to build reliable cloud training infrastructure, remote assistance through teleoperations, fleet management, vehicle sensor modifications and facility acquisition (yes, parking spots!) to make running an AV service practical in the real world. Today, Cruise vehicles are scaling to San Francisco, Phoenix, Austin, Dallas, Houston, Los Angeles, Miami and New York City. To put this into perspective, we dove into progress (or lack thereof) in the 2020 State of AI Report (slide 93+): at that time, Cruise was clocking 10k miles per disengagement in 2019 and the entire self-driving industry had completed 2.9M miles in 2019. Fast forward to February 2023 and Cruise’s robotaxi service had completed 1M cumulative self-driving miles. A few months later in May 2023 the company had reached 2M miles!

Even so, there is still space to improve the self-driving stack. For instance, London-based Wayve recently released GAIA-1, a generative model that can generate realistic and diverse driving scenes. These can be used to train and test driving policies more effectively. Toronto-based Waabi presented Unisim, a simulation tool for making the training of the vehicles scalable and controllable.

Over in battery land, Sweden’s Northvolt uses machine learning throughout their green battery building stack. The company’s ML R&D lead, Sid Khullar, dives into this and more in his talk. First, ML is used to understand the ‘anatomy’ of a green battery and help control what is happening inside without opening it up. Controlling how materials interact across space and time allows for performance optimisation and delaying going to second life. While there is a lack of large datasets and standardization across building and evaluating electrodes, the field is ripe with problems for ML practitioners. Figuring out how to increase the quality of batteries and reduce their cost without having a large environmental footprint will be incredibly rewarding.

LLMs for biomedicine have been making strides due to benchmarks and evaluation frameworks that are reflective of real-world clinical workflow. Vivek Natarajan, senior author on Google’s clinical medicine LLM research, explores this work in his talk. The proxy task of question-answering (for instance MultiMedQA) is really powerful in assessing a variety of model capabilities, both on medical-grade and lay questions. Responses are best evaluated by humans across different axes including correctness, usefulness, possibility of harm and bias.

From the get-go Flan-PaLM, a version of PaLM fine tuned on medical data, surpassed the requisite pass score for doctors by a wide margin. Med-PaLM, built on top of Flan-PaLM, used prompt tuning on a set of professional answers in order to be even more medically grounded. Med-PaLM 2 further built on the limitations and learnings of Med-PaLM by switching to PaLM 2, doing end to end finetuning of the LLM and better reasoning through ensemble refinement. Its answers were competitive to those of expert physicians. Google recently released Med-PaLM Multimodal (Med-PaLM M), a large multimodal generative model that can reason using clinical language, imaging, and genomics. This is a milestone towards the development of generalist biomedical AI systems.

Michael Bronstein’s talk brought back the audience to early mathematics and how new deep learning architectures need to be designed with geometry in mind. Current approaches to new protein design and protein folding rely on modern deep learning techniques and, at the same time, use geometric concepts that date back to ancient Greece. Molecules live in euclidean space and atoms have geometric coordinates. Their physical properties need to be preserved through the transformations that are applied to them. An early example of this are convnets - they are translation equivariant, share parameters and have other geometric properties. As protein molecules are geometric objects, even in a world of incredibly successful data-driven methods, it makes most sense that geometry provides the basic scaffolding on which deep learning techniques are being built.

We’re keeping our eyes on Valence Labs, the new semi-independent AI R&D company within the NASDAQ-listed clinical stage techbio company Recursion, who are working at the frontier of AI-enabled drug discovery. Check out this ICML panel discussion.

More broadly, AI for science is a fascinatingly large opportunity to pursue, both in academia and industry (and often in collaboration). To that end, Nathan and Eric Lander held a fireside conversation that charted one of history’s most storied scientific feats, the Human Genome Project. Lander - a professor at both Harvard and MIT - founded the Whitehead Institute’s Center for Genome Research, which became the largest contributor to the project that generated the first physical map of the human genome in 1995. We also discussed how to form mission-driven translational research environments like the Broad Institute that marry the best of industry and academia. We ended by projecting AI’s success in the software world to its capabilities to decode the complexities of biology through experiments across all scales (from DNA to proteins through to cells and organisms).

LLMs are in all major products and no language should be left behind

Intercom realized the disruptive potential of LLMs for customer service and support before many others. Fergal Reid, Senior Director of ML, runs through the company’s best practice learnings in his talk. First, Intercom had already built their previous generation of customer assistance chatbots using ML to save people’s time. While this early version relied on clustering, active learning and some hard coded responses, the newer version builds on GPT models. Intercom’s principles of shipping extremely fast, early, starting small and measuring outcomes meant they quickly released features such as summarisation and tone of voice editing.

Intercom also built Fin, a chat bot which is reliable in production, has an acceptable level of hallucination, and requires minimum configuration. They are not treating language models as token predictors, but rather as a database with a reasoning engine. With a large number of diverse customers, language models at Intercom are now reasoning, planning and turning chatbots into AI agents.

In Angela Fan’s talk you can hear about the progress Meta AI has made on translation for low-resource languages. Existing translation systems like Google or Microsoft Translate only cover around 100-130 languages of the world’s ~3000 written languages. Native speakers of low-resourced languages not only worry that their languages are declining but are also currently not benefitting from AI technologies that are developed for the English language first. Even with translation datasets spanning a large number of low resource languages, rigorous evaluation is still challenging due to finding translators, language standardization and the existence of multiple local variants of languages. Early translation models made breakthroughs through masked language modeling and multilingual distillation. Current state of the art uses many iterations of back translation with filtering and noise removal, reduced interference through a token-level mixture of experts, curriculum learning and scaling.

Bonus: we held an impromptu panel discussion on turning technology breakthroughs into revenue-generating products for industry with Sid (Northvolt), Alex (Tractable) and David (Enveda).