Issue 40 20th February 2024

Highlights

Sora: Creating video from text

Research LeadsBill Peebles & Tim BrooksSystems LeadConnor Holmes Contributors Clarence Wing Yin Ng David Schnurr Eric Luhman Joe Taylor Li Jing Natalie Summers Ricky Wang Rohan Sahai Ryan O’Rourke…

Here's everything you need to know about Gemini 1.5, Google's newly updated AI model that hopes to challenge OpenAI

Google plans to release Gemini 1.5, a next-gen AI model designed to be a personal assistant and business tool that rivals competitors like GPT-4.

Women In AI: Lee Tiedrich, AI expert at the Global Partnership on AI

To give AI-focused women academics and others their well-deserved — and overdue — time in the spotlight, TechCrunch is launching a series of interviews focusing on remarkable women who’ve contributed to the AI revolution. We’ll publish several pieces throughout the year as the AI boom continues, highlighting key work that often goes unrecognized. Read more […]

Video

Jeroen Niesen - Humans in AI

Join us in this episode of "Humans in AI" as Jeroen Niesen, who leads research and development for cybersecurity at Wortell, shares his insights on how AI is changing the world. Jeroen takes us through his journey in AI, highlighting how it's making tasks that were once incredibly complex, a lot easier today. In his role, Jeroen is responsible for both the technical and commercial aspects of Wortell's cybersecurity offerings.

Microsoft AI Tour keynote session by Satya Nadella | February 8, 2024

Learn what every developer needs to know about AI today. Watch the live session at 10:05 AM IST. Subscribe to get notified.

AI Show | Build your own copilot with Azure AI Studio - Part 1

🚀 Join us as we explore Azure AI Studio in the first of a 2-part segment! Learn to create generative AI apps like an enterprise chat copilot using prebuilt ...

Articles

This German nonprofit is building an open voice assistant that anyone can use

There’s been many attempts at open source AI-powered voice assistants (see Rhasspy, Mycroft and Jasper, to name a few) — all established with the goal of creating privacy-preserving, offline experiences that don’t compromise on functionality. But development’s proven to be extraordinarily slow. That’s because, in addition to all the usual challenges attendant with open source.

Research Focus: Week of February 5, 2024

Research Focus: New Research Forum series explores bold ideas in the era of AI; LASER improves reasoning in language models; Cache-Efficient Top-k Aggregation over High Cardinality Large Datasets; Six Microsoft researchers named 2023 ACM Fellows.

‘Humanity’s remaining timeline? It looks more like five years than 50’: meet the neo-luddites warning of an AI apocalypse

From the academic who warns of a robot uprising to the workers worried for their future – is it time we started paying attention to the tech sceptics?

GraphRAG: Unlocking LLM discovery on narrative private data

Perhaps the greatest challenge – and opportunity – of LLMs is extending their powerful capabilities to solve problems beyond the data on which they have been trained, and to achieve comparable results with data the LLM has never seen. This opens new possibilities in data investigation, such as identifying themes and semantic concepts with context […]

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output. Specifically, an LLM is utilized to carry out a direct mapping from the N-best hypotheses list generated by an ASR system to the predicted output transcription.

The Shift from Models to Compound AI Systems

AI caught everyone’s attention in 2023 with Large Language Models (LLMs) that can be instructed to perform general tasks, such as translation or coding, just by prompting. This naturally led to an intense focus on models as the primary ingredient in AI application development, with everyone wondering what capabilities new LLMs will bring.

Video generation models as world simulators

We explore large-scale training of generative models on video data. Specifically, we train text-conditional diffusion models jointly on videos and images of variable durations, resolutions and aspect ratios. We leverage a transformer architecture that operates on spacetime patches of video and image latent codes. Our largest model, Sora, is capable of generating a minute of high fidelity video. Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.

Upcoming events

Global AI Bootcamp 2024 | March 2024

The Global AI Bootcamp is an annual event that occurs worldwide, where developers and AI enthusiasts can learn about AI through workshops, sessions, and discussions. Local chapters of the Global AI Community host the events in various locations across the globe. This year, the bootcamp will take place in the full month of March 2024.

Find a bootcamp