Issue 49 23rd April 2024

Highlights

VASA-1 - Microsoft Research

We introduce VASA, a framework for generating lifelike talking faces of virtual characters with appealing visual affective skills (VAS), given a single static image and a speech audio clip. Our premiere model, VASA-1, is capable of not only producing lip movements that are exquisitely synchronized with the audio, but also capturing a large spectrum of facial nuances and natural head motions that contribute to the perception of authenticity and liveliness.

Meta steps up AI battle with OpenAI and Google with release of Llama 3

Tech firm released early versions of its latest large language model and a real-time image generator as it tries to catch up to OpenAI. Meta Platforms on Thursday released early versions of its latest large language model, Llama 3, and an image generator that updates pictures in real time while users type prompts, as it races to catch up to generative AI market leader OpenAI.

Microsoft hires former Meta exec to bolster AI supercomputing team

Former Meta executive Jason Taylor is joining Microsoft’s AI supercomputing team. In a LinkedIn post on Monday, Microsoft CTO Kevin Scott says Taylor will take on the role of corporate vice president and deputy CTO to help “build the next set of systems that will push the frontier of AI forward.”

Research

Infinite context windows for LLMs?

As context windows for models like GPT-4 get larger, reaching sizes of up to 128K tokens, we should ask ourselves, what if it’s infinite? Google tried it by compressing the values of the attention head in their LLM. The results are definitely promising!

Video

AI Show | Global Face Grouping with Video Indexer

Global Face Grouping empowers VI users to cross-correlate faces that appear in different videos on the VI account. This feature works as a recommendation system for automatically tagging a face recognition catalogue for custom face identification.

Learn Live: Developing a production level RAG workflow

Copilots can work alongside you to provide suggestions, generate content, or help you make decisions. Copilots use language models as a form of generative artificial intelligence (AI) and will answer your questions using the data they were trained on. To ensure a copilot retrieves information from a specific source, you can add your own data when building a copilot with the Azure AI Studio (preview).

Articles

How we built "Ask Learn", the RAG-based knowledge service - Engineering@Microsoft

Learn how Microsoft Learn’s RAG-based generative AI chat system was engineered and their takeaways to help you design a RAG-based chat for your org.

Fine-tune Llama 3 with ORPO

ORPO is a new exciting fine-tuning technique that combines the traditional supervised fine-tuning and preference alignment stages into a single process. This reduces the computational resources and time required for training. Moreover, empirical results demonstrate that ORPO outperforms other alignment methods on various model sizes and benchmarks.

Tiny Llama — a Performance Review and Discussion

Learn how you can utilize a tiny large language model, fine-tune it, and achieve high performance.

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

We're excited to share Jack of All Trades (JAT), a project that aims to move in the direction of a generalist agent. The project started as an open reproduction of the Gato (Reed et al., 2022) work, which proposed to train a Transformer able to perform both vision-and-language and decision-making tasks. We thus started by building an open version of Gato’s dataset. We then trained multi-modal Transformer models on it, introducing several improvements over Gato for handling sequential data and continuous value

What if ChatGPT is Actually a Tour Guide From Another World? (Part 2)

Finally, just for fun, we’ll create a Minecraft world that uses actual GPT-4 data structures and see how it looks. We’ll kick things off by selecting a few words to experiment with: horse, hammer, apple, and lemon .

torchtune: Easily fine-tune LLMs using PyTorch

We’re pleased to announce the alpha release of torchtune, a PyTorch-native library for easily fine-tuning large language models.

Deep Dive into Self-Attention by Hand✍︎

The attention weight matrix A is obtained by feeding the input features into the Query-Key (QK) module. The attention mechanism helps the model scan all parts of a sequence at each step and determine which elements need to be focused on.

AI for Data Journalism: demonstrating what we can do with this stuff right now

I gave a talk last month at the Story Discovery at Scale data journalism conference hosted at Stanford by Big Local News. My brief was to go deep into the things we can use Large Language Models for right now, illustrated by a flurry of demos to help provide starting points for further conversations at the conference.

Quantum computing with subwavelength atomic arrays

Photon-mediated interactions in subwavelength atomic arrays have numerous applications in quantum science. In this paper, we explore the potential of three-level quantum emitters, or “impurities” embedded in a two-dimensional atomic array to serve as a platform for quantum computation. By exploiting the altered behavior of impurities as a result of the induced dipole-dipole interactions mediated by subwavelength arrays, we design and simulate a set of universal quantum gates consisting of the square root iSWAP and single-qubit rotations.

Podcast

Silicon Minds, Human Hearts - Crafting a Future of Synergy in the AI Era

A series that zooms into the intersection of artificial intelligence (AI) with daily human life. It features interviews with leading AI experts, thinkers, and tech innovators, offering viewers a behind-the-scenes look at the advances in AI. With a conversational tone, the film tackles the big moral dilemmas and the future implications of AI, all while keeping things relatable. It's not just tech talk; it's about what it means to be human in an age where computers are getting smarter.