Issue 61 16th July 2024

Highlights

Can the climate survive AI’s thirst for energy? – podcast

Artificial intelligence companies have lofty ambitions for what the technology could achieve, from curing diseases to eliminating poverty. But the energy required to power these innovations is threatening critical environmental targets. Madeleine Finlay hears from the Guardian’s energy correspondent, Jillian Ambrose, and UK technology editor, Alex Hern, to find out how big AI’s energy problem is, and whether it can be solved before it is too late.

RUBICON: Evaluating conversations between humans and AI systems

RUBICON evaluates AI-driven conversations and improves their quality by learning detailed domain-specific rubrics from minimal data. It gathers insights on AI assistant performance while maintaining user privacy and data security.

Research

Better relation and entity extraction with LLMs

We've talked about building knowledge graphs before. It's still a very new topic, and a lot of new research is coming out every week. In this week's paper, for example, we explore how to get better results extracting relationships from documents by mixing large language models and traditional entity and relationship recognition methods. Again, one of those papers is niche but very useful.

Video

Copilot L33t Sp34k - RAI and UX principals in Copilot for Security

Sarah chats to Mark Kendrick and Harmony Mabry from the Copilot for Security product team about how Microsoft's Responsible AI (RAI) principles are realized in the Copilot for Security product to protect against AI harms such as overreliance.

Imitation Intelligence, my keynote for PyCon US 2024

I gave an invited keynote at PyCon US 2024 in Pittsburgh this year. My goal was to say some interesting things about AI - specifically about Large Language Models - both to help catch people up who may not have been paying close attention, but also to give people who were paying close attention some new things to think about.

Articles

Unified Database: Laying the foundation for large language model vertical applications

Unified databases offer better knowledge transfer between multimodal data types. They provide substantial corpus support for large language models and are poised to drive innovation in underlying hardware, laying the foundation for data-enhanced AI.

Diffusion Model from Scratch in Pytorch

In our implementation of the model, we will start by defining our imports (possible pip install commands commented for reference) and coding our sinusoidal time step embeddings. The authors of the DDPM paper used the UNET architecture originally designed for medical image segmentation to build a model to predict the noise for the diffusion reverse process.

Hacking “Codenames” with GloVe Embeddings

In conclusion, this greedy GloVe-based algorithm performs well as both the spymaster and operative in the Codenames game, by offering an effective way to encode and decode words via a clue and number.

Big Opportunities in Small Data

I gave an invited keynote at Citus Con 2023, the PostgreSQL conference. Below is the abstract, video and slides from the presentation.

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Attention, as a core layer of the ubiquitous Transformer architecture, is a bottleneck for large language models and long-context applications. FlashAttention (and FlashAttention-2) pioneered an approach to speed up attention on GPUs by minimizing memory reads/writes, and is now used by most libraries to accelerate Transformer training and inference. This has contributed to a massive increase in LLM context length in the last two years, from 2-4K (GPT-3, OPT) to 128K (GPT-4), or even 1M (Llama 3).