Scalable and Sustainable AI: Rethinking Hardware and System Architecture

Artificial intelligence (AI) has the potential to drive rapid advancements in science and industry. However, existing hardware and system architectures pose challenges; they cannot enable the levels of efficiency, scalability, and sustainability that will be essential to meet the demands of an “AI-everywhere” world.

Training large generative AI models utilizes advanced GPUs and demands significant energy. However, this is an initial investment. With generative AI embedded in myriad applications across every industry, the resource demands for fine-tuning and inferencing scales with more users. The AI compute industry is proactively working to address these challenges by enhancing processing efficiency in terms of tokens per second (T/s) at scale.

In this webinar, moderated by EE Times senior reporter Sally Ward-Foxton, we will:

Examine the challenges of scaling up AI workloads on existing distributed architectures, which suffer from bottlenecks in communication, memory, and power usage.
Discuss emerging solutions that can dramatically improve performance, efficiency, and scalability.
Explain how these architectural shifts can reduce costs and the environmental footprint by orders of magnitude.

By the end of the session, attendees will understand how hardware and system design need to evolve to fully deliver on the promise of AI, both in terms of scalability and sustainability.

Watch the Video Now!

Moderator: Sally Ward-Foxton

Senior Reporter, EE Times.com and EE Times Europe magazine

Sally Ward-Foxton covers AI for EE Times.com and EE Times Europe magazine. Sally has spent the last 18 years writing about the electronics industry from London. She has written for Electronic Design, ECN, Electronic Specifier: Design, Components in Electronics, and many more news publications. She holds a Masters' degree in Electrical and Electronic Engineering from the University of Cambridge.  

David Patterson

Distinguished Engineer, Google

David Patterson is a UC Berkeley Pardee professor emeritus and a Google distinguished engineer. His most influential Berkeley projects likely were RISC and RAID. His best-known book is Computer Architecture: A Quantitative Approach. He and his co-author John Hennessy shared the 2017 ACM A.M Turing Award and the 2022 NAE Charles Stark Draper Prize for Engineering. The Turing Award is often referred to as the “Nobel Prize of Computing,” and the Draper Prize is considered a “Nobel Prize of Engineering.”

John Shalf

Department Head for Computer Science, Lawrence Berkeley National Laboratory

John Shalf is Department Head for Computer Science at Lawrence Berkeley National Laboratory and former deputy director of Hardware Technology for the DOE Exascale Computing Project (ECP). He is a co-author of over 80 publications in the field of parallel computing software and HPC technology, including three best papers and the widely cited report "The Landscape of Parallel Computing Research: A View from Berkeley" (with David Patterson and others). Before joining Berkeley Lab in 2000, he worked at the National Center for Supercomputing Applications at the University of Illinois and was a visiting scientist at the Max-Planck-Institut für Gravitationphysick/Albert Einstein Institut in Potsdam, Germany, where he codeveloped the Cactus code framework for computational astrophysics.

Steve Oberlin

CTO, Accelerated Computing, NVIDIA

Steve Oberlin has been innovating in high performance computing (HPC) since 1980, when he joined Cray Research bringing up CRAY-1 supercomputers. Career highlights include working for Seymour Cray designing the CRAY-2 and CRAY-3 vector supercomputers, and leading the architecture and design of Cray Research's first MPPs, the T3D and T3E, for which he holds 15 patents. In 2000, Steve left supercomputing to co-found and lead a couple of cloud computing startups, but returned to HPC in 2013, joining NVIDIA as CTO for Accelerated Computing.

Matthew Mattina

VP of AI Hardware & Models, Tenstorrent Inc

Matthew Mattina is the Vice President of AI at Tenstorrent, previously serving as a Distinguished Engineer and Senior Director at Arm's Machine Learning Research Lab. He also held significant roles at Tilera and Intel, where he was CTO and a CPU architect respectively. With over 50 patents and 20+ publications, his contributions span across CPU design, efficient neural networks, and multicore processors. Mattina holds a BS from Rensselaer Polytechnic Institute and an MS from Princeton University in related fields.

ON DEMAND WEBINAR VIDEO: