Loading…
PyTorch Conference has ended
October 16-17, 2023
San Francisco, CA
View More Details & Registration

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for PyTorch Conference to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration. 
Monday, October 16
 

4:30pm PDT

5:00pm PDT

2.X Poster Presentations
Accelerating Normalization Under Torch.Compile with Tl.Reduce - Peter Bell, Quansight
This poster will showcase my recent (and ongoing) work allowing TorchInductor to calculate the variance in a single pass over the input; which results in large speedups for torch.compile'd normalization layers. This change was made possible by "tl.reduce" which is the triton language feature I've contributed to support generic block-level reduction operations.

PyTorch 2 Export Quantization - Jerry Zhang, Meta
We will present a transformation flow for quantizing an exported model based on programmable user APIs to set configurations for nodes in a model, built with flexibility and productivity of both modeling users and backend developers in mind.

Activation Checkpointing with Torch.Compile - Animesh Jain, Meta
Activation Checkpointing is a widely used technique to reduce the memory footprint for model training. Activation checkpointing recomputes the forward pass in the backward pass, obviating the need of saving intermediate tensors. torch.compile is a JIT compiler in PyTorch that has shown promising speedups across a wide range of models. In this poster, we present the design of torch.compile and activation checkpointing, the underlying complexities and the our promising results on Huggingface models.

TorchInductor in Quantization with PyTorch 2.0 Export: Flexibility and Out-of-Box INT8 Performance - Jiong Gong & Weiwen Xia, Intel
Today we have FX graph mode quantization which uses symbolic trace to capture the model into graph, and then perform quantization transformations on top of the captured model. This solution covers most use cases relatively well, but the main problem is lack of flexibility to support advanced quantization intention and complicated quantization pattern. Quantization in PyTorch 2.0 Export is a new quantization flow built with flexibility and productivity to resolve these issues. This session introduces TorchInductor as the backend of this new quantization flow for post-training static quantization on X86 CPU device. With this feature, modeling users can directly leverage the new quantization flow with in-tree X86InductorQuantizer to generate the reference quantization model and then lower it into inductor compiler with `torch.compile` API. Combining Quantization in PyTorch 2.0 Export and TorchInductor, we have flexibility and productivity with the new quantization frontend and outstanding out-of-box int8 performance with the compiler backend.

Speakers
avatar for Jiong Gong

Jiong Gong

Principal Engineer, Intel
Jiong Gong is a principal engineer and a software architect from Intel working on PyTorch optimizations. He is also a PyTorch module maintainer on CPU performance and compiler frontend.
avatar for Animesh Jain

Animesh Jain

Software Engineer, Meta
Animesh Jain works on PyTorch compilers.
avatar for Jerry Zhang

Jerry Zhang

Software Engineer, Meta
Jerry Zhang is a Software Engineer in PyTorch Architecture Optimization team at Meta. He has been working on PyTorch Quantization for the past few years, trying to provide self-serve tools for optimizing the inference speed of their model while maintaining accuracy.
avatar for Peter Bell

Peter Bell

Senior Software Engineer, Quansight
Peter is a Senior Software Engineer at Quansight, PyTorch core reviewer, and active PyTorch contributor since 2019.
avatar for Weiwen Xia

Weiwen Xia

AI Frameworks Engineer, Intel Corp.
AI framework engineer at Intel. Working on performance optimization on CPU platforms. Mostly focus on inference and quantization.


Monday October 16, 2023 5:00pm - 8:00pm PDT
Sponsor Showcase

5:00pm PDT

Application Poster Presentations
MIRTorch: A Differentiable Medical Image Reconstruction Toolbox - Guanhua Wang, University of Michigan
Image reconstruction plays a crucial role in modern medical imaging by converting raw signals into digitized images. In various imaging modalities, including magnetic resonance imaging (MRI), image reconstruction poses an inverse problem that is often characterized by underdetermination and large-scale complexity. Consequently, the pursuit of fast and accurate image reconstruction methods becomes a significant area of research. MIRTorch is an advanced medical image reconstruction toolbox developed using native PyTorch. This toolbox offers ultra-fast physics-informed and data-driven image reconstruction capabilities, accompanied by iterative solvers, which greatly benefit scientific and clinical applications. Researchers can conveniently prototype new model-based and learning-based approaches (e.g., unrolled neural networks) using its operator overloading functionality. With comprehensive support for auto-differentiation, gradient-based methods can be employed to optimize medical imaging protocols, such as sampling patterns and image reconstruction parameters.

ML Based Methodology for Designing Complex Innovative Cyber Physical Systems - Michael Winokur, Holon Institute of Technology
Machine Learning methods can play a crucial role in the development of Cyber-Physical Systems which are are inherently complex, consisting of interconnected physical, computational and software components. Machine Learning based Knowledge Engineering provides a systematic approach to designing, analyzing, and managing these complex systems using models derived from large data sets of previous designs as the primary artifacts. However, the successful implementation of MDE and its promised value must address challenging scientific and engineering issues such as multi-physics modelling and simulation and the use of heterogenous models of computation. The presentation gives a brief review of new paradigms proposed and implemented for successful modelling of Cyber Physical systems that address these challenges and centers on the novel paradigm of Machine Learning driven methodology for System Design. Expectations of a significant breakthrough in innovation and productivity, and the challenges it presents for the future of Systems Engineering will be discussed.

Fine-Tuning Very Large Models on Consumer Hardware - Sourab Mangrulkar & Younes Belkada, Hugging Face
Scaling Laws state that the larger the Deep Learning/AI model, the better the performance. As a result, AI models are increasingly growing in size. Have you ever wondered if you could fine-tune an LLAMA-7B model on a Colab T4 GPU or an LLAMA-65B model on an A100 40GB GPU? In fact, full fine-tuning of such very large models becomes infeasible on consumer hardware. In addition, storing and deploying fine-tuned models independently for each downstream task becomes very expensive, because fine-tuned models are the same size as the original pretrained model. To democratize capabilities of such large models, we need to reduce the compute need from multiple/cluster of high-end GPUs to single or few GPUs and reduce the storage costs of the fine-tuned models significantly while preserving the model performance. Enter Parameter-Efficient Fine-tuning (PEFT) approaches, the solution to these challenges. We'll showcase how tools like PEFT, accelerate, and transformers from the HF ecosystem achieve this.

Train ESM-2 Protein Folding Models Using Amazon SageMaker and PyTorch FSDP - Shamika Ariyawansa & Shubha Kumbadakone, AWS
ESM-2 is a powerful protein LLM that enables atomic level structure prediction directly from the sequence. By removing the need for multi sequence alignment, it accelerates inference and opens up structure prediction to metagenomics protein. However training such large models requires applying distributed training techniques across multiple GPU powered instances to accelerate time to train the model. In this session we will deep dive into fine-tuning ESM-2 for your protein prediction task using Amazon SageMaker’s distributed training libraries and PyTorch FSDP.

Using Uncertainty Quantification to Improve Machine Learning Systems - Suraj Muraleedharan, Amazon Web Services
Machine learning models are used in our daily interactions with multiple digital systems. For most of the ML use cases, we predict the outcomes for our customers under uncertainty. The trustworthiness of the result is extremely important due to the impact the decision can have on the customer. Hence, calibrated uncertainty estimates are required to assess reliability of machine learning systems. In this session, we use Fortuna, an open-source library and Amazon Sagemaker to depict end-to-end workflow of a machine learning system. We will focus on the ML development and operational challenges for customers while adhering to regulatory requirements.



Speakers
avatar for Suraj Muraleedharan

Suraj Muraleedharan

Principal Consultant, Amazon Web Services
Suraj is experienced in building software systems for financial services, public sector and consumer goods industries. He works on distributed systems, platform engineering and machine learning for AWS customers.
avatar for Guanhua Wang

Guanhua Wang

Computational Science Engineer, Q Bio, Inc
Guanhua Wang earned his Ph.D. in Biomedical Engineering from the University of Michigan in 2023. Prior to that, he obtained his undergraduate degree from Tsinghua University in 2019. His research focuses on inverse problems and medical imaging, with a particular interest in applying... Read More →
avatar for MICHAEL WINOKUR

MICHAEL WINOKUR

Head, Systems Engineering, Holon Institute of Technology
Dr. Michael Winokur is a Faculty Member at Holon Institute of Technology leading the graduate specialty in Systems Engineering. His research focuses on applications of ML and AI to Design of Complex Systems. Dr. Winokur was Corporate Director of Engineering at Israel Aerospace Industries... Read More →
avatar for Shamika Ariyawansa

Shamika Ariyawansa

Senior Solutions Architect, Amazon Web Services
Shamika Ariyawansa is a seasoned professional with over a decade of experience in distributed application development. Shamika has a deep and comprehensive understanding of distributed systems and their potential for transformative change. For the past few years, Shamika has turned... Read More →
avatar for Sourab Mangrulkar

Sourab Mangrulkar

Machine Learning Engineer, creator of 🤗 PEFT, Hugging Face
Sourab Mangrulkar is an ML Open-Source Engineer at Hugging Face. He has over 5 years of experience, including 2 years each at Microsoft and Amazon as an Applied Scientist, and over 1 year at Hugging Face. Sourab has worked on diverse problems such as click-through rate prediction... Read More →
avatar for Younes Belkada

Younes Belkada

Machine Learning Engineer, Hugging Face
Younes is a ML Engineer working on the Open Source team at Hugging Face and he collaborates with researchers and developers to add new exciting features in the HF ecosystem and have contributed to various libraries in the HF ecosystem (transformers, accelerate, PEFT).
avatar for Shubha Kumbadakone

Shubha Kumbadakone

Principal GTM Specialist, GenAI, Amazon Web Services


Monday October 16, 2023 5:00pm - 8:00pm PDT
Sponsor Showcase

5:00pm PDT

Backends Poster Presentations
Integrating an NPU with PyTorch 2.0 Compile - Sol Kim & Juho Ha, FuriosaAI
We would share our experiences of integrating NPU and its compiler, written in Rust, with 'PyTorch 2.0 Compile' through the following subjects: 1. Leveraging PyTorch 2.0 Features for NPU Accelerable Model Compilation: We will elaborate on how we leveraged PyTorch 2.0 features, such as GraphModule, TorchDynamo, lowered IR (Core Aten, Prims IR), and others, to compile PyTorch written models into NPU accelerable models. 2. Utilizing GraphModule as a Frontend and NPU Execution Interface: We will explain how we utilized GraphModule as both a frontend for our compiler and as an interface for NPU execution. Additionally, we'll discuss its capabilities in providing debuggability and programmability. 3. Influence of PyTorch 2.0 Compile on NPU Compiler Design: We will share how PyTorch 2.0 Compile has influenced the design of our NPU compiler. We will also share our expectations as an NPU vendor and a PyTorch community member for future developments in this domain.

ONNX Runtime Web: Running Your PyTorch Model in Browser - Emma Ning, Microsoft
Running machine-learning-powered web applications in browsers has drawn a lot of attention from the AI community. ONNX Runtime Web (ORT Web) enables JavaScript developers to run and deploy machine learning models in browsers. It accelerates model inference in the browser on both CPUs and GPUs, through WebAssembly (WASM) and WebGPU/WebGL backends separately. Taking Mobilenet V2 as an example, the CPU inference performance can be accelerated by 3.4x with ORT Web WASM backend with two threads together with SIMD enabled, comparing the pure WebAssembly.

PyTorch to EmitC - Translating PyTorch Models to C++ - Marius Brehler & Simon Camphausen, Fraunhofer IML
Authors: Marius Brehler, Simon Camphausen, Lucas Camphausen. Deploying machine learning (ML) models to microcontrollers is of great interest as it enables intelligent decision-making on low-power embedded devices without relying on a network connection or cloud computing. One common approach to deploy an ML model to a microcontroller is to use TensorFlow Lite. To deploy a PyTorch model, the model can be converted to ONNX which can be executed with the ONNX runtime or deployed by using deepC or cONNXr. A more flexible approach is to translate the PyTorch model into C or C++. This is realized by using the MLIR framework and in particular by using the MLIR dialect EmitC. The approach is of special interest when deploying to bare-metal systems, since it enables the use of non-LLVM-based C and C++ compilers. Moreover, the approach is of interest if the dependencies on libraries or runtimes should be reduced. It allows to generate C++ code that has no dependencies other than the standard library, enabling further use cases.

Accelerating PyG with Torch.Compile on Intel Xeon CPUs - Mingfei Ma, Intel
torch.compile() is flag feature introduced in torch 2.0 to speed up your PyTorch code. torch.compile() can auto-generate optimized kernels across multiple backends/accelerators with minimal code change. torch.compile() works perfectly with PyG models now and Intel has been working closely with Kumo.ai to improve the performance and user experience by optimization from the deep learning compiler TorchInductor on both float32 and bfloat16. Additionally, on the 4th generation of Intel Xeon Scalable Processors (codename Sapphire Rapids), bfloat16 computation throughput is further enhanced through Advanced Matrix Extensions (Intel AMX) instruction set extension. Currently, all the commonly used message passing pattern in PyG models can be converted to a single optimized kernel with torch.compile(), which will reduce the memory access payload and increase cache locality. Benchmark results on popular GNN models such as GCN, GraphSAGE, GIN and EdgeCNN demonstrate a performance improvement of over 300% on Sapphire Rapids!

Advanced PyTorch Model Optimization with Neural Network Compression Framework of OpenVINO - Yamini Nimmagadda & Raymond Lo, Intel
Neural Network Compression Framework (NNCF) is the model optimization library designed to improve inference performance when running Intel® Distribution of OpenVINO™ Toolkit. It’s a cross-framework tool that supports PyTorch, ONNX, TensorFlow and OpenVINO formats as well as a diverse set of optimization methods including Post-Training Quantization (PTQ), Quantization-Aware Training, Mixed-precision Quantization, Structured and Fine-grained Pruning methods, NAS, and Knowledge Distillation. NNCF is a cross-hardware framework that follows an OpenVINO paradigm: write once, deploy everywhere. It is also integrated into the Hugging Face ecosystem so users can benefit from the easy-to-use optimization and inference API for Transformer-based models. Post-Training Quantization is the most demanded and scalable method supported in NNCF. In this talk, we will cover recent advances in Post-Training Quantization that are already integrated into NNCF and how users can leverage them for the efficient model deployment of different types of DNN models on various hardware.

A Journey to Enable Generative AI on a New Hardware Platform with PyTorch 2.0 - Kazuaki Ishizaki, IBM Research - Tokyo
This talk explains our journey in enabling generative AI applications on a new hardware (HW) platform. We are working on running generative AI applications on IBM z from correctness and runtime performance perspectives. We share experiences for developers to write PyTorch and its ecosystem for a new HW. IBM z has a unique feature that uses big-endian byte order. While most HW platforms use little-endian, big-endian was not supported well in PyTorch and its pip packages. We supported both endians, for example, to exchange pre-trained models among any platform by fixing test and application failures. Our 32 PRs make no test failure on IBM z. The ecosystem, like the Hugging Face (HF) transformer framework, now works well. We will share our experience to enable CI for a new HW to keep the main branch healthy for a new HW. We enable HW acceleration features in PyTorch runtime and TorchInductor, such as SIMD. We will also briefly explain exploiting an in-core AI accelerator. Here are the takeaways: - Enabling a new HW without test failures in PyTorch and its ecosystem, like HF transformers - Adding CI for a HW new platform in the upstream - Enabling performance features for a new HW

OneDNN Graph with TorchInductor for Enhanced Graph Fusions and Performance - Ashok Emani, Intel Corporation & Frost Mitchell, University of Utah
TorchInductor OpenMP backend has demonstrated promising performance on DL inference workloads with CPU, thanks to optimizations like Conv/GEMM + post-op fusions and vectorization. Graph Extension in oneDNN goes beyond Conv/GEMM post-op fusions and supports aggressive fusion patterns such as Multi-head attention, MLP blocks, and more with its graph compiler backend. Other features include low precision. Since PyTorch 1.12, this has been added in TorchScript JIT fuser path showing promising performance. This poster showcases how integrating oneDNN Graph with TorchInductor OpenMP backend enables PyTorch 2.0 torch.compile use-cases and opportunities for more advanced performance optimizations in the future.



Speakers
SC

Simon Camphausen

Research Assistent, Fraunhofer-Institute for Material Flow and Logistics
Simon Camphausen is with Fraunhofer IML, Dortmund, Germany. His current work includes compilers for machine learning workloads and execution of machine learning models on memory constrained systems. He received a M.Sc. in mathematics from Westfälische Wilhelms-Universität Münster... Read More →
avatar for Ashok Emani

Ashok Emani

AI Frameworks Engineer, Intel Corporation
Ashok Emani is an AI Frameworks Engineer working on enabling Intel® Optimizations for PyTorch*
avatar for Frost Mitchell

Frost Mitchell

Research Assistant, University of Utah
Frost Mitchell is currently pursuing his PhD at the University of Utah. He research interests include HPC optimizations and ML applications in radio networks.
avatar for Marius Brehler

Marius Brehler

Senior Scientist, Fraunhofer-Institute for Material Flow and Logistics
Marius is a senior scientist at Fraunhofer IML, Dortmund, Germany. His work at Fraunhofer includes compilers for machine learning and machine learning on resource constrained devices. Furthermore, he is part of the OSPO at Fraunhofer IML. He received a Dr.-Ing. degree in electrical... Read More →
avatar for Emma Ning

Emma Ning

Principal PM, Microsoft
Emma Ning is a Principal PM in the Microsoft AI Framework team, focusing on AI model operationalization and acceleration with ONNX Runtime/Olive for open and interoperable AI. She has more than five years of product experience in search engines taking advantage of machine learning... Read More →
avatar for Juho Ha

Juho Ha

SW Engineer, FuriosaAI
Juho Ha is a software engineer at FuriosaAI and working on developing NPU compiler.
avatar for Kazuaki Ishizaki

Kazuaki Ishizaki

Senior Researcher, IBM Research
Dr. Kazuaki Ishizaki is a senior technical staff member at IBM Research. He has over 30 years of experience of conducting research and development of compilers for multiple languages. He is an expert in compiler optimizations, runtime systems, and parallel processing. He has been... Read More →
avatar for Mingfei Ma

Mingfei Ma

Senior Software Engineer, Intel
Mingfei Ma is a senior deep learning software engineer in Intel. He is also the maintainer of CPU performance module in PyTorch. Mingfei holds a Master degree from Harbin Institute of Technology where he majored in Control Science and Technology. Mingfei has a 12 years’ experience... Read More →
avatar for Sol Kim

Sol Kim

Software Enginner, FuriosaAI
9yrs of experiences in software engineering, majored in programming language.
avatar for Yamini Nimmagadda

Yamini Nimmagadda

Principal AI Engineer, Intel
Yamini Nimmagadda is a principal engineer at Intel, currently focused on optimizing AI frameworks with OpenVINO. Her areas of expertise include AI frameworks, runtimes, and cloud technologies. Yamini holds a PhD degree in Electrical and Computer Engineering from Purdue University... Read More →
avatar for Raymond Lo

Raymond Lo

AI Software Evangelist, Intel



Monday October 16, 2023 5:00pm - 8:00pm PDT
Sponsor Showcase

5:00pm PDT

Backends Poster Presentations (Continued)
Optimized GPU Primitives for Deep Learning Recommendation Systems - Mengchi Zhang, Meta Platform Inc.
Deep Learning Recommendation Systems present diverse computational challenges. We present unique challenges behind these workloads and GPU primitives and optimizations to address the challenges. The optimized primitives have been deployed within Meta’s datacenters and open-sourced through FBGEMM-GPU library. The unique nature of those workloads derives from categorical features, stored in embedding tables and referenced by varying-length ID lists. The core primitives are the Embedding/EmbeddingBag PyTorch operators, which collect groups of entries from embedding tables. These ops are fused both horizontally, into TableBatchedEmbedding (TBE) operators to amortize launch overheads and increase GPU utilization, and vertically, by combining backward gradient and optimizers. We optimize TBEs with various techniques and leverage Unified Virtual Memory (UVM) with software cache to serve embedding tables larger than GPU memory even with low-precision data types. We also describe Jagged Tensor operations, the primitives that operate on sparse ID lists. Jagged Tensors require a careful implementation to achieve high percentages of memory bandwidth given the sparse nature of the problem.

Torch.Export and Emission to StableHLO - Han Qi, Google Corp LLC & Siyuan Liu, Google
`torch.export` is a PyTorch 2.1 beta feature that can soundly capture a pytorch model into a graph. StableHLO is a MLIR based compute IR proposed by OpenXLA aimed as a lowering target for ML frameworks (Pytorch, JAX, tensorflow). `torch_xla.save_as_stablehlo` a co-released feature on torch/xla that will consume the artifacts produced by `torch.export` and produce StableHLO. In this session we will describe how to use save_as_stablehlo; in which situations the generated StableHLO is useful: we will show snippets of using it for on server and on device inference.

PyTorch/XLA Dynamic Shape - Xiongfei Wei, Google
PyTorch framework plays a vital role in advancing AI research/application and PyTorch/XLA enables PyTorch to run on powerful TPU hardware. However, handling variable-sized inputs and dynamic computation graphs remains a challenge. Our poster presents our most recent updates regarding dynamic shape support in PyTorch/XLA. The most notable improvements that we have observed is that on a multi-layer perceptron model with dynamic shape input, the total compilation time drops nearly 50% and the end-to-end training time drops nearly 1/3. The betterment matches our design and looks very promising.

Dynamic Shape Support in Intel Extension for PyTorch with oneDNN Graph - Sanchit Jain & Jian Hui Li, Intel Corporation
Intel Extensions for PyTorch is an open source library meant to be a "staging-ground" for optimizations from Intel aimed towards (eventual) landing in PyTorch. With oneDNN Graph, a simple interface to use the open-source oneDNN library for high-performance kernels, Intel Extensions for PyTorch delivers high performance for inference workloads with int8 static quantization on x86_64 machines. Dynamic shape is currently supported in an emulated manner, i.e. there are multiple kernels for the same fusion, one compiled for each set of input shapes. While we aim to land this support in PyTorch with Inductor, while in Intel Extensions for PyTorch, we are currently using TorchScript.

PyTorch/XLA Runtime - Will Cromar, Google
PyTorch/XLA was started with a straightforward goal: to support running PyTorch workloads on Cloud TPU and GPU. As a result, PyTorch/XLA's runtime stack was fundamentally built around the constraints of the early Cloud TPU product. In the years since PyTorch/XLA's launch, PyTorch, Cloud TPU/GPU, and the XLA have all evolved in parallel. Over the past PyTorch/XLA has been replacing its legacy runtime with PJRT, a modern runtime and compiler stack developed for JAX and now also being adopted by TensorFlow. In addition to significantly simplifying PyTorch/XLA's internals, PJRT also delivers substantial performance gains and usability improvements compared to the legacy runtime. This talk will discuss the evolution and challenges of PyTorch/XLA's runtime migration, as well as the opportunities the PJRT opens up, such as using the PJRT Device API to allow device vendors to write a plugin to use their device with any of the three major ML frameworks.

Speakers
JH

Jian Hui Li

Principal Engineer, Intel
avatar for Han Qi

Han Qi

Software Engineer, Google Cloud
Han is currently a software engineer at PyTorch/XLA team in Google Cloud. Han is a long time PyTorch contributor. His has contributed to torch.export, torchscript, torch mobile runtime, and the flatbuffer serialization format for torch mobile.
avatar for Mengchi Zhang

Mengchi Zhang

Research Scientist, Meta
Mengchi Zhang is Research Scientist in AI and Systems co-design team at Meta. He focuses on primitives optimizations and sparse data structures on GPU for deep learning recommendation systems. Before joining Meta, he received his PhD degree from AALP lab in Elmore Family School of... Read More →
SJ

Sanchit Jain

AI Frameworks Engineer, Intel Corporation
AI Frameworks Engineer at Intel Corporation
avatar for Siyuan Liu

Siyuan Liu

Software Engineer, Google
Software Engineer at PyTorch/XLA team.
avatar for Will Cromar

Will Cromar

Senior Software Engineer, Google
Software Engineer at Google's PyTorch/XLA team
avatar for Xiongfei Wei

Xiongfei Wei

Software Engineer, Google
Software engineer in PyTorch/XLA team at Google.


Monday October 16, 2023 5:00pm - 8:00pm PDT
Sponsor Showcase

5:00pm PDT

Cloud Poster Presentations
Dealing with I/O Bottlenecks in High Throughput Model Training - Jan Sellner, German Cancer Research Center
Have you ever faced the problem of low GPU utilization due to data loading bottlenecks? I/O heavy workloads require optimized data loading strategies as otherwise the GPU usage is suboptimal and training times increase. This is especially problematic when working with heterogeneous environments and different hardware capabilities (e.g. local machine vs. compute cluster). In my proposed talk, I will share several independent solutions that I developed during the past three years of training deep learning models with high I/O requirements. These include efficient data storage via Blosc compression, appropriate precision settings, the trade-offs between CPU and GPU augmentations and a fixed shared pinned memory buffer for efficient data transfer to the GPU. Each solution will be supported by usage examples leveraging existing libraries. I will show evaluation results from my research project (surgical scene segmentation of medical hyperspectral images), in which I was able to improve GPU utilization from initially around 30 % to 90 %. The solutions are general enough so that people can apply them to other domains as well where similar problems exist.

Lifting the Kubernetes Hood - to Take a Look at Resource Management Evolution in the Context of PyTo - Mike Brown, IBM & Alexander Kanevskiy, Intel
The computer hardware evolution has accelerated. In just the last few years we have seen new ideas on combining heterogeneous CPU cores, multiple types of memory, a renewed focus on power saving, expectations for dynamically changing resource properties, as well as enhancements for security and confidentiality for cloud workloads and data in traditional applications and new AI model processing. This evolution forces us to revisit the resource management domain of Kubernetes. This domain is one of the most fundamental and complex areas in Kubernetes. Under the hood there are several components that Kubernetes relies on: Kubelet, Runtimes, Monitoring Agents, the OS kernel and hardware. With a focus on specifics of the AI workloads and use cases, this presentation will touch on UX changes for end users, interfaces between k8s stack components and how different pluggable algorithms, implemented via code and new policies, can help achieve evolved performance, power and resource utilization and prioritization goals.

I/O Traffic Pattern and Caching Strategies for AI Machine Learning Pipeline - Beinan Wang & Chunxu Tang, Alluxio
I/O is a significant hindrance in AI machine learning pipelines due to the time-consuming process of loading data from storage to GPU. Data loading can comprise nearly 80% of total training time in large or remote datasets. Cache plays an essential role in accelerating data computations, but different stages require distinct cache strategies. In their talk, Beinan and Chunxu will share insights on data access patterns in AI/ML pipelines, based on their experiences with large-scale systems. They'll discuss evaluations of different caching strategies for model training, deployment, and inference, and provide recommendations. They'll cover key practices learned from top tech companies, including: measuring cache efficiency for diverse AI workloads, cache capacity planning based on real-time metrics, and adaptive caching for uncertain traffic patterns.

PyTorch 2.0 Integration for AWS ML Accelerators with AWS SageMaker - Yi-Hsiang Lai, Amazon Web Services, Inc.
PyTorch 2.0 has brought several benefits such as better graph capturing and better model coverage, all with just an additional API torch.compile(). Meanwhile, AWS has been developing machine learning accelerators such as Inferentia and Trainium that provide high-performance and low-cost solutions to customers. With the above two projects, we propose to integrate PT 2.0 with AWS ML accelerators through AWS SageMaker. The goal is to demonstrate that customers can easily deploy their PyTorch ML models on low-cost AWS accelerators without losing high performance brought by PT 2.0. In this session, we plan to show two things. First, the integration creates a user-friendly interface for ML programmers, where they only need to make minimal changes to their code. The rest of the heavy lifting including compilation and instance deployment is handled by AWS SageMaker. Second, we would like to show cost competitive results. We have already tested the inference integration of PT 2.0 through TorchDynamo and PT/XLA on Inferentia. In addition, our preliminary results show that around half of the models from TorchDynamo benchmark suite achieve better cost per inference over P3 GPU instances.

Out of the Box PyTorch FSDP on EKS Example Using the Do-framework - Alex Iankoulski & Kanwaljit Khurmi, Amazon
Do you want to run FSDP on EKS? Are you looking for a working example that you can start with? Use this aws-do framework open-source project to get started from scratch quickly with just a couple of intuitive commands, then dive deep and customize your distributed training as needed.

Speakers
avatar for Chunxu Tang

Chunxu Tang

Research Scientist, Alluxio
Dr. Chunxu Tang is a Research Scientist at Alluxio and a committer of PrestoDB. Prior to Alluxio, he served as a Senior Software Engineer in Twitter’s data platform team, where he gained extensive experience with a wide range of data systems, including Presto, Zeppelin, BigQuery... Read More →
avatar for Alexander Kanevskiy

Alexander Kanevskiy

Principal Engineer, Cloud Software, Intel
Alexander is currently employed by Intel as Principal Engineer, Cloud Software, focusing on various aspects in Kubernetes: Resource Management, Device plugins for hardware accelerators, Cluster Lifecycle and Cluster APIs. Alexander has over 25+ years of experience in areas of Linux... Read More →
avatar for Mike Brown

Mike Brown

Software Engineer/Architect, IBM
OSS Engineer; @containerd maintainer; working @oci, @cncf, @pytorch, and @kubernetes projects
avatar for Beinan Wang

Beinan Wang

Software Engineer, Alluxio
Dr. Beinan Wang is a tech lead manager from Alluxio with extensive experience in data infrastructure. Prior to Alluxio, he was the Tech Lead of the Interactive Query team in Twitter and he built large scale distributed SQL systems for Twitter’s data platform. He has twelve-year... Read More →
avatar for Yi-Hsiang Lai

Yi-Hsiang Lai

Applied Scientist, Amazon Web Services
Yi-Hsiang got his Ph.D. from Cornell ECE under Prof. Zhiru Zhang's advice. He is now an applied scientist in AWS Annapurna Lab. His major interests include high-level synthesis, programming languages, deep learning, and compilers.
avatar for Jan Sellner

Jan Sellner

Doctoral Researcher, German Cancer Research Center
I am a passionate computer scientist and PhD student at the German Cancer Research Center pushing the boundaries of deep learning since 2018. I am working towards autonomous robotic surgery through deep learning-based semantic segmentation of hyperspectral images (github.com/imsy-dkfz/htc... Read More →
avatar for Alex Iankoulski

Alex Iankoulski

Principal Solutions Architect, AWS
Alex Iankoulski is a Principal Solutions Architect, Self-managed Machine Learning at AWS. He’s a full-stack software and infrastructure engineer who likes to do deep, hands-on work. In his role, he focuses on helping customers with containerization and orchestration of ML and AI... Read More →
avatar for Kanwaljit Khurmi

Kanwaljit Khurmi

Principal AI/ML Solutions Architect, AWS
Kanwaljit Khurmi is a Principal Solutions Architect at Amazon Web Services. He works with the AWS customers to provide guidance and technical assistance helping them improve the value of their solutions when using AWS. Kanwaljit specializes in helping customers with containerized... Read More →


Monday October 16, 2023 5:00pm - 8:00pm PDT
Sponsor Showcase

5:00pm PDT

Contributors Poster Presentations
PyTorch and the Wikimedia Ecosystem - Daniel Mietchen, FIZ Karlsruhe — Leibniz Institute for Information Infrastructure
This session will provide a brief tour through the continuously evolving landscape of interactions between the PyTorch and Wikimedia ecosystems. Starting out with a short review of mission alignment and relatively well-known examples - e.g. PyTorch models trained on Wikipedia, or Wikipedia articles about PyTorch - we will then explore some of the more subtle modes of interactions. These include coverage of PyTorch in other Wikimedia projects - e.g. Wikidata, Wikimedia Commons and Wikibooks - or cases in which PyTorch is used to study some aspect of the Wikimedia ecosystem or to contribute to Wikimedia workflows. Yet other perspectives emerge when both PyTorch and Wikimedia resources are used together for third-party purposes, or when considering how both communities address issues like biases, diversity, equity and inclusion, or the community, financial or environmental aspects of sustainability. We will conclude the tour with some thoughts on how certain elements of the Wikimedia community culture - e.g. norms and practices around neutrality, verifiability, consensus building and multilinguality - could inform discussions around responsible AI and vice versa.

Bringing PyTorch to the Masses with Architectural Thinking - David Radley, IBM
I am a middleware developer / architect, who has worked in open source a maintainer for many years. Prior to that I was a middle ware developer. I had a lot of success in bring in architectural ideals and middleware developer thought processes to my previous open source project Egeria. I have just joined PyTorch full time. This session details my thoughts on how PyTorch as a project, can be enhanced and made more accessible by using architectural / middleware development approaches. The idea is not to replace existing working process, but to add additional enrichment. I hope that the audience and community will buy into this approach and help me realise it. I think this approach is key to moving PyTorch from being research centric towards being more widely adopted. The session will cover: - my initial impressions of a noob to PyTorch. - the need for a big picture conceptual overview of PyTorch at a higher level than the code in the documentation. With the need for many more overview pictures. - bringing in the need for a Glossary for users to have terms defined once and referred to.



Speakers
avatar for David Radley

David Radley

Mr, IBM
David Radley BSc has been working in open source since 2016, as an Apache Atlas committer, then an Egeria maintainer and now as a PyTorch contributor. Prior to open source: IBM Solution Architect IBM MDM developer IBM MDM CE technical lead IBM CICS Transaction Gateway service lead... Read More →
avatar for Daniel Mietchen

Daniel Mietchen

Senior Researcher, FIZ Karlsruhe
Daniel Mietchen is a biophysicist interested in integrating open research and education workflows with the web. With research activities spanning across the temporal and spatial scales of life and into mathematics, he is actively engaged in increasing the interactions between the... Read More →


Monday October 16, 2023 5:00pm - 8:00pm PDT
Sponsor Showcase

5:00pm PDT

Core Tools Poster Presentations
Float8 Support in PyTorch - Vasiliy Kuznetsov, Meta
Overview of the float8 dtype and op support in PyTorch.

Mmap_ninja to Accelerate Machine Learning Training - Hristo Vrigazov, LABS.IO
When training a model with Pytorch with a dataset that we cannot fit into memory, we want to store the dataset in a way that would allow us very quick filesystem I/O. One way to do this is to use memory maps. A memory mapped file is a file that is physically present on disk in a way that the correlation between the file and the memory space permits applications to treat the mapped portions as if it were primary memory, allowing very fast I/O. Numpy already supports memory maps, as does the awesome library TensorDict. However, a library which supports sequences of varying shapes (for example, the result of tokenization of NLP or storing images) is lacking. In this talk, we introduce mmap_ninja, a library specifically designed to store sequences of varying shapes in a memory-mapped format, leading to big speedups in filesystem I/O during training.

Performant Optimizers - Jane Xu, Meta
How can one optimize their optimizers? We will dive into some techniques like foreach, fused, zero_grad set_to_None, a potential overlapping optim API, and integration with torch.compile().

Finding Training Inefficiencies with CentML DeepView - Yubo Gao, CentML Inc.
Performance bottlenecks and resource underutilization is a common occurrence to deep learning researchers and developers. They slow down workflows of ML developers and leads to resource waste. The existing ecosystems of DL frameworks and profilers lack a developer friendly avenue for both understanding training performance of DL models and ways to reduce underutilization and increase performance. In this presentation, we will showcase DeepView, an open source visual profiler developed by CentML specifically tailored to ML developers. DeepView provides intuitive and convenient performance visualizations as well as offers hints to ML developers to make their training jobs more efficient. Furthermore, DeepView optimizes deployment targets to meet both budget and time constraints through performance predictions. DeepView seamlessly integrates with PyTorch and Visual Studio Code, and we are actively working on expanding its support for other popular code editors. Through an interactive demo, we walk through optimizing the training of a real model with DeepView where we gain a manifold increase in training throughput.

Getting the Most Out of TorchVision's New Transforms for Detection, Segmentation and Video Tasks - Nicolas Hug, Meta
Torchvision is extending its transforms API to new computer vision pre-processing tasks. The new transforms aren't limited to image classification anymore: image detection, segmentation, and video tasks are now supported. With a focus on improving the overall user experience, these new transforms offer seamless integration with existing workflows by maintaining full backward compatibility. They're also a lot faster! In this talk, we'll go over these transforms and the new features they unlock. We'll also provide tips and recommendations on how to best speed-up your pre-processing pipelines, leveraging the PyTorch 2.0 stack.

Tracing and Compiling NumPy in PyTorch 2.1 - Mario Lezcano, Quansight
In this session we will discuss a new feature in PyTorch 2.1, where torch.compile can be used to trace and compile NumPy programs in a semantics-preserving way.

Speakers
avatar for Vasiliy Kuznetsov

Vasiliy Kuznetsov

software engineer, Meta
Vasiliy is a software engineer on the PyTorch team at Meta focusing on PyTorch support for low precision training and inference.
avatar for Hristo Vrigazov

Hristo Vrigazov

Senior Machine Learning Engineer, LABS.IO
Hristo Vrigazov is a machine learning software engineer, who deeply believes in the extreme programming principles. He has more than 8 years of production software experience, more than 5 years of machine learning experience, and more than 4 years of experience with PyTorch. He's... Read More →
avatar for Jane Xu

Jane Xu

Software Engineer, Meta
I'm Jane and I focus on making our optimizers more...optimal :) in terms of stability, consistency, and performance. My favorite part of PyTorch is the people--they're all smart and cool and fun to learn from! I also like potatoes quite a lot.
avatar for Mario Lezcano

Mario Lezcano

Principal Software Engineer, Quansight
Mario is the team lead at Quansight of a team of 10 PyTorch core and domain libraries devs.He helps maintaining areas like torch.compile, torch.linalg, complex number support, and autograd formulas.
avatar for Nicolas Hug

Nicolas Hug

Research Engineer, Meta
Nicolas is a software engineer in the PyTorch team at Meta, where he mainly contributes to the torchvision library. Prior to that, Nicolas was a research scientist at Columbia University, where he became part of the scikit-learn core development team. Nicolas holds a PhD in machine... Read More →
avatar for Yubo Gao

Yubo Gao

Research Software Development Engineer, CentML Inc.
Yubo is a PhD student at the University of Toronto, supervised by Prof. Gennady Pekhimenko. His research interests lie in performance prediction, analysis, and optimization, particularly for deep learning. He is a research engineers at CentML, who is primarily responsible for the... Read More →


Monday October 16, 2023 5:00pm - 8:00pm PDT
Sponsor Showcase

5:00pm PDT

Distributed Poster Presentations
Automatic Pipeline Parallelism via PyTorch 2.0 Compile - Ho Young Jhoo, Seoul National University
As the size of models continues to grow indefinitely, distributed training using multiple devices has become indispensable in order to train models that do not fit in a single device. While pipeline parallelism allows us to split and distribute the computation across multiple devices, current tools and strategies for pipelining in PyTorch suffer from usability and limited support for non-sequential models. In this work, we present a way to automatically build an efficient pipeline from a PyTorch FX graph generated by TorchDynamo, a key component of PyTorch 2.0 Compile. Our approach takes into account the devices’ memory capacity and communication latency, enabling the generation of optimal split points in the graph and pipeline schedules for any model that can be compiled by TorchDynamo. Our approach is transparent to users.

PyTorch/XLA Distributed - Jiewen Tan & Yeounoh Chung, Google Inc
Additional Authors: Alex Spiridonov, Google
We demonstrate how PyTorch/XLA users can unlock the powerful parallelism capability of Google Cloud TPU via (a) a novel distributed algorithm called General and Scalable Parallelization for ML Computation Graphs (GSPMD), (b) Google’s brand new MultiSlice TPU grid technology, and (c) a full-fledged LLM toolchain. GSPMD is an automatic parallelization system within which the XLA compiler will transform the single device program into a partitioned one with proper collectives, based on the user provided sharding hints. This feature allows developers to write PyTorch programs as if they are on a single large device. MutliSlice is a technology aiming to provide horizontal scaling to support large models; models that require 10Ks of TPU chips. Underneath, the XLA compiler implements a new set of collective ops that can leverage the underlying data center network connecting TPU pods efficiently. LLM toolchain is a set of blessed tools that PyTorch/XLA provides to make the LLM training experience frictionless and it includes: sequential data loader, distributed checkpointing, TPU orchestration, and TPU debuggability. A demo LLM repo will be presented that leverages everything above.

Large Scale Training - Rodrigo Kumpera & Junjie Wang, Meta
Additional Author: Xilun Wu, Meta
We present an overview of our results in scaling up PyTorch distributed to thousands of GPUs and the improvements we shipped in 2023

Speakers
XW

Xilun Wu

Software Engineer, Meta Platforms, Inc.
avatar for Ho Young Jhoo

Ho Young Jhoo

Graduate Student, Seoul National University
Ho Young Jhoo is a PhD student at the Seoul National University (SNU), who likes to combine programming languages and machine learning frameworks. Previously, he interned at FuriosaAI (a Korean startup) to optimize their compiler for an ML accelerator. He also developed a static tensor... Read More →
avatar for Jiewen Tan

Jiewen Tan

Staff Software Engineer, Google
Jiewen is a staff software engineer at Google's PyTorch team. He has a focus on distributed technologies that can power both large scale training and multi-device inference. He used to work at the PyTorch compiler team at Meta and contributed to LazyTensorCore and TorchDynamo.
avatar for Rodrigo Kumpera

Rodrigo Kumpera

CTO, Outropy
Rodrigo Kumpera is the CTO at Outropy. Previously he was an engineer at Meta workingon PyTorch Distributed scalability, before than a Research SWE at MSR New York in the Reinforcement Learning Group and an early engineer at Xamarin, leading the team responsible for a dotnet runtime... Read More →
avatar for Junjie Wang

Junjie Wang

Software Engineer, Meta
Junjie is a software engineer working on PyTorch distributed, especially Tensor Parallel and Sequence Parallel. Working on multiple projects within Meta but eventually found passion in PyTorch.
avatar for Yeounoh Chung

Yeounoh Chung

Software Engineer, Google
Yeounoh supports the PyTorch/XLA team at Google, leading the PyTorch/XLA SPMD project. His work and interests span the PyTorch/XLA framework, Cloud TPU infrastructure and runtime.
avatar for Alexander Spiridonov

Alexander Spiridonov

Group Product Manager, Google
Alex Spiridonov is a Machine Learning Group Product Manager for Google Cloud AI Accelerators. Alex leads product development for Inference, ML Ecosystem integrations, and ML Frameworks (PyTorch, JAX, TensorFlow). Alex is passionate about making Google Cloud the best platform for AI... Read More →


Monday October 16, 2023 5:00pm - 8:00pm PDT
Sponsor Showcase

5:00pm PDT

Higher Level Libs Poster Presentations
XFormers: Building Blocks for Efficient Transformers - Francisco Massa , Meta
We present xFormers, a library of low-level/optimized components to build efficient Transformers. This year, we pushed the multi-head attention even further, adding support for new use-cases. We also present our new components that enable training of larger transformers, leveraging longer sequence lengths, and faster decoding speeds for LLMs.

TorchTNT: A Lightweight Training Framework - Michael Gschwind & Gal Rotem, Meta
We present torchTNT, a lightweight training framework. TorchTNT integrates training infrastructure bindings such as data fetch, checkpointing, in a simple training loop that provides a consistent interface for pytorch model trainers. Particular focus is given to model optimization -- torchTNT provides instrumentations to enable ML practitioners to efficiently and reliably identify bottlenecks through a combination of collected metrics.

Speakers
GR

Gal Rotem

Software Engineer, Meta
avatar for Michael Gschwind

Michael Gschwind

Director, Meta AI
Dr. Michael Gschwind currently leads AI training at Meta AI. He previously served as lead for GPU and Accelerator Inference at Meta and as lead for the MultiRay project. Prior to joining Meta, Michael held leadership roles at IBM. At IBM, he was a lead for 3 #1 Top500 supercomputers... Read More →
avatar for Francisco Massa

Francisco Massa

Research Engineer, Meta AI


Monday October 16, 2023 5:00pm - 8:00pm PDT
Sponsor Showcase

5:00pm PDT

On Device/Edge Poster Presentations
On-Device Training with ONNX Runtime for PyTorch Models - Kshama Pawar, Microsoft Corporation
On-Device Training is a new capability in ONNX Runtime to train models on the edge for Federated Learning and Personalization scenarios. ONNX Runtime provides an efficient, local trainer, that trains with device data on the edge. It supports multiple language bindings through simple APIs and supports Windows, Linux, Android and iOS for developers to target multiple platforms. It also supports minimum builds for smaller binaries with a base build of 1.5MB. This session gives an overview of On-Device Training with ONNX Runtime, followed by a technical discussion and a demonstration. We train a PyTorch inference model on the device, by converting it to an ONNX model and preparing it for local training. The model will learn on the edge device using local data and improve its quality to provide a better experience for the end customer. We will demonstrate through a practical application performing an image tagging scenario, where the application learns from local data and gives better predictions for tagging new images. With on-device training, PyTorch model developers can extend inference on edge to include training so as to personalize and improve the quality of their model.

PyTorch Made Efficient for the Edge: WASI-NN - Rishit Dagli, University of Toronto (Vector Institute and DGP Lab), Civo & Shivay Lamba, meilisearch
Inferring large AI models can be fairly difficult on Edge devices because of the heterogeneous nature for edge devices architectures, resource constraints and given the sheer size of some of the models, they cannot be directly run on these edge devices. In order to overcome these limitations for running machine learning on edge, in this talk we introduce the audience to WASI-nn which is a standard proposal for performing ML inference. It allows WebAssembly modules to call the low-level bits required for running the inference. Wasi-nn abstracts the module from the underlying system, allowing the host to use any available hardware.Thus the same module can run ML inference on multiple systems. Pytorch has libraries that allow developers to convert some built models to the ONNX format, then run them using an ONNX runtime. With the added support for ONNX in WASI NN, guest WebAssembly modules can perform highly performant inferences in a secure manner for these Pytorch models. The audience for this talk would benefit by learning about running popular Pytorch models easily, securely, and in an optimized manner on Edge using WASI NN.

Speakers
avatar for Shivay Lamba

Shivay Lamba

Ambassador, WASMEdge
Shivay Lamba is a software developer specializing in DevOps, Machine Learning and Full Stack Development. He is an Open Source Enthusiast and has been part of various programs like Google Code In and Google Summer of Code as a Mentor and is currently a MLH Fellow. He has also worked... Read More →
avatar for Rishit Dagli

Rishit Dagli

Student, Research Scientist, University of Toronto, Civo
I am a CS Freshman at University of Toronto. I love researching and working with Machine Learning, especially Computer Vision  I also maintain/contribute extensively to popular open-source projects like TensorFlow, PyTorch, Kubernetes, Kubeflow among others. I also love building... Read More →
avatar for Kshama Pawar

Kshama Pawar

Principal Program Manager, Microsoft
Kshama Pawar is a Program Manager on the AI Platform team at Microsoft. She helps drive Training initiatives for both large language models and on-device training through optimization engines like ONNX Runtime. She is also involved in the Triton community effort to improve developer... Read More →


Monday October 16, 2023 5:00pm - 8:00pm PDT
Sponsor Showcase

5:00pm PDT

Attendee Welcome Reception & Poster Presentations
Tater Tot Truck - Potato & Mashed Potato (GF)
- Classic - Sea Salt & Ketchup
- Kamikaze - Sriracha Aioli, Unagi Glaze, Green Onion, Kimchi, Black Sesame Seeds
- Baked Potato - Bacon, Sour Cream, Chives, Cheddar
- Truffle - Garlic, Shaved Parmesan Cheese, White Truffle Oil, Sea Salt

Sliders
- Falafel Slider - Sun-Dried Tomato Hummus, Gluten Free Bun (VE, NF, GF)
- BBQ Chicken - Southern Coleslaw, Honey Mustard Dressing on Soft Brioche Bun
- Angus Beef - Aged White Cheddar, Bacon Onion Jam and Horseradish Aioli on Pretzel Bun (NF)
- Onion Rings and Garlic Parmesan Potato Chips

Wine & Beer
- Domestic Beer - Budweiser, Bud Lite
- Premium & Imported Beer - Modelo, Corona, Heineken, Anchor Steam
- Wine - Canvas, Cabernet Sauvignon, Pinot Noir, Chardonnay
- Soft Drinks/Juices
- Water station

Monday October 16, 2023 5:00pm - 8:00pm PDT
Sponsor Showcase
 
Tuesday, October 17
 

8:00am PDT

Women and Non-Binary in PyTorch Breakfast with Quiana Berry, Red Hat


Navigating the AI Revolution Responsibly in the Open Source Community 
As the AI revolution reshapes the technological landscape, it is imperative to prioritize diversity, equity, and inclusion in open source communities. This engaging breakfast session at the PyTorch Conference invites diverse open source leaders to embark on a reflective journey. Together, we will explore the room for change in the AI ecosystem, establish governance guardrails to shield vulnerable communities from potential harm, and underscore the critical need to protect marginalized voices.
With a compelling blend of thought-provoking insights and creative expression, the session will dive into real-world examples, strategies for ethical AI governance, and showcase initiatives that integrate inclusivity into AI development.
Quiana Berry, a passionate advocate at the intersection of technology and social justice, will present a powerful poetry performance that encapsulates the nuances of the AI revolution and the importance of safeguarding diverse communities.
Join us in this transformative dialogue as we collectively envision an AI landscape that not only embraces innovation but also champions the well-being of every community. Let's harness the power of the open source PyTorch community to ignite change and create a future where diversity is at the heart of responsible AI development.
Key Takeaways:
1. Gain insights into the potential impact of AI on diverse communities and the need for inclusive AI development.
2. Understand the significance of ethical AI governance and how open source communities can shape responsible AI practices.
3. Explore real-life examples and initiatives that prioritize diversity and equity in AI development.
4. Experience a thought-provoking poetry performance that captures the essence of the discussion and inspires action.


Speakers
avatar for Quiana Berry

Quiana Berry

Technical Product Manager, Red Hat
Quiana is an Afro-Latina global citizen and DEI-B advocate with roots in Spanish Harlem NYC. With an educational background in Anthro/Bio/Chemistry from CUNY she approaches technology from a human-centric lens and uses a research approach to business. She applies her passion for Human-Computer... Read More →



Tuesday October 17, 2023 8:00am - 8:45am PDT
Sequoia A

8:00am PDT

9:00am PDT

9:05am PDT

Keynote: How PyTorch Became the Foundation of the AI Revolution - Joe Spisak, Product Director, Meta
Speakers
avatar for Joe Spisak

Joe Spisak

Product Director, Meta Inc.
Joe Spisak is Product Director and Head of Open Source in Meta’s Generative AI organization. A veteran of the AI space with over 10 years experience, Joe led product teams at Meta/Facebook, Google and Amazon where he focused on open source AI, open science and building developer... Read More →


Tuesday October 17, 2023 9:05am - 9:15am PDT
Grand Peninsula Ballroom D/E/F/G

9:15am PDT

Keynote: PyTorch 2.1 Technical Deep Dive
This Deep Dive provides an update on the PT2 development since last conference and dives into the key new features coming in PyTorch 2.1 This will provide high level updates on compile, distributed, inference, export and edge.

Speakers
avatar for Mario Lezcano

Mario Lezcano

Principal Software Engineer, Quansight
Mario is the team lead at Quansight of a team of 10 PyTorch core and domain libraries devs.He helps maintaining areas like torch.compile, torch.linalg, complex number support, and autograd formulas.
avatar for Mark Saroufim

Mark Saroufim

For loop rotator, Meta
Mark Saroufim is a PyTorch Engineer at Meta working on inference, compilers and community.
avatar for Mergen Nachin

Mergen Nachin

Software Engineer, Meta
Mergen Nachin is a Software Engineer specializing in creating rich AI experiences on low latency, high performance, and privacy-aware embedded systems. With a background in distributed systems, developer infrastructure, remote sensing, and localization, he brings a versatile skill... Read More →
avatar for Joe Spisak

Joe Spisak

Product Director, Meta Inc.
Joe Spisak is Product Director and Head of Open Source in Meta’s Generative AI organization. A veteran of the AI space with over 10 years experience, Joe led product teams at Meta/Facebook, Google and Amazon where he focused on open source AI, open science and building developer... Read More →
PW

Peng Wu

Engineering Manager, Meta
avatar for Will Constable

Will Constable

Software Engineer, Meta
YC

Yanan Cao

Tech Lead Manager, PyTorch, Meta



Tuesday October 17, 2023 9:15am - 10:15am PDT
Grand Peninsula Ballroom D/E/F/G
  Keynote Sessions
  • Slides Attached Yes

10:15am PDT

Coffee Break
- Deconstructed Yogurt Parfait: Greek Yogurt, Seasonal Fruit Compote, Fresh Seasonal Berries House-Made Granola (VE without yogurt)
- Mini Seasonal Muffins *Includes Gluten-Free Options (VT)

Tuesday October 17, 2023 10:15am - 10:45am PDT
Atrium

10:15am PDT

Sponsor Showcase
Tuesday October 17, 2023 10:15am - 10:45am PDT
Atrium

10:45am PDT

Lightning Talk: Tensor Query Processing - Matteo Interlandi, Microsoft
The huge demand for computation in artificial intelligence (AI) is driving unparalleled investments in new hardware and software systems for AI. This leads to an explosion in the number of specialized hardware devices, which are now part of the offerings of major cloud providers. Meanwhile, by hiding the low-level complexity through a tensor-based interface, machine learning frameworks such as PyTorch allow data scientists to efficiently exploit the exciting capabilities offered by the new hardware. In this talk, we will present how databases can ride the wave of innovation happening in the AI space thanks to Tensor Query Processor (TQP): a SQL query processor leveraging the tensor interface of PyTorch. TQP can efficiently run the full TPC-H benchmark by implementing novel algorithms for executing relational operators on the specialized tensor routines provided by PyTorch. Meanwhile, TQP can target various hardware while only requiring a fraction of the usual development effort.

Speakers
avatar for Matteo Interlandi

Matteo Interlandi

Principal Scientist, Microsoft
I am a Principal Scientist in the Gray System Lab at Microsoft, working at the intersection between Machine Learning and Database systems. My work has received an honorable mention at SIGMOD 2021, a best demo award at VLDB 2022, a best paper runner-up at VLDB 2023, and it was featured... Read More →



Tuesday October 17, 2023 10:45am - 10:55am PDT
Sequoia A
  Applications
  • Slides Attached Yes

10:45am PDT

Lightning Talk: State of PyTorch - Alban Desmaison, Meta
It takes a village to build an open-source framework, all thanks to our awesome community of contributors, partners and ecosystem tools. This talk gives a run through of who builds PyTorch, new and upcoming improvements to the framework and how to get involved.

Speakers
avatar for Alban Desmaison

Alban Desmaison

Research Engineer, Meta



Tuesday October 17, 2023 10:45am - 10:55am PDT
Cypress A/B/C
  Community/Integrations
  • Audience Experience Level Beginner
  • Slides Attached Yes

10:45am PDT

What's New for Dynamic Shapes in PyTorch 2.1 - Edward Yang, Meta
Last year, we announced dynamic shapes support in PT2. We have come a long way since that announcement, with a lot new features, case studies and debugging tools for using dynamic shapes. This talk will dive into all of the new developments for dynamic shapes in PT2, arguably one of the most new and unusual features of the PT2 compiler stack.

Speakers
avatar for Edward Yang

Edward Yang

Research Engineer, Meta
Edward Yang has worked on PyTorch at Meta since nearly the very beginning. Currently, he works on all aspects of PT2, but with a particular focus on dynamic shapes support across the stack.



Tuesday October 17, 2023 10:45am - 11:10am PDT
Grand Peninsula Ballroom D/E/F/G
  Distributed
  • Audience Experience Level Beginner
  • Slides Attached Yes

10:45am PDT

Introducing ExecuTorch from PyTorch Edge: On-Device AI Stack and Ecosystem, and Our Unique Differentiators - Mergen Nachin & Orion Reblitz-Richardson, Meta
ExecuTorch announcement at the PyTorch Conference 2023.

This presentation focuses on the technological advancements in PyTorch Edge, our on-device AI stack. We will provide an overview of the current market landscape and delve into ExecuTorch’s architecture, unique differentiators, and design trade-offs. Discover how PyTorch Edge in general, and ExecuTorch in particular bridge the gap between research and production, offering performance, portability, and productivity for on-device AI applications.

Speakers
avatar for Orion Reblitz-Richardson

Orion Reblitz-Richardson

Engineering Manager, PyTorch, Meta
Orion supports multiple PyTorch core teams at Meta including privacy, on-device, edge, and developer productivity. His passion lies at the intersection of people and AI, be that egocentric AI, developer productivity enhancement, or model interpretability. He has an EECS BS/MEng from... Read More →
avatar for Mergen Nachin

Mergen Nachin

Software Engineer, Meta
Mergen Nachin is a Software Engineer specializing in creating rich AI experiences on low latency, high performance, and privacy-aware embedded systems. With a background in distributed systems, developer infrastructure, remote sensing, and localization, he brings a versatile skill... Read More →



Tuesday October 17, 2023 10:45am - 11:10am PDT
Grand Peninsula Ballroom A/B/C
  Production

11:00am PDT

Lightning Talk: Energy-Efficient Deep Learning with PyTorch and Zeus - Jae-Won Chung, University of Michigan
Until now, we just wanted to make things faster and faster. However, especially with the recent growth of GenAI, Deep Learning has become one of the primary workloads of cloud datacenters, which already take up 2-3% of the world's electricity usage. Therefore we ask: How much room are there for energy optimization? Can we get free energy reduction without slowdown? What knobs do we have available? How do we even measure energy consumption? In this talk, I aim to persuade the audience of the importance of regarding energy as a first-class metric for deep learning, and present the current state of deep learning energy optimization with Zeus (https://ml.energy/zeus). Integrated with PyTorch, Zeus provides convenient tools for GPU time and energy measurement inside user training scripts and transparently profile and optimize GPU-side knobs to maximize energy efficiency. Finally, I'll share our vision towards making sustainable deep learning as easy as possible while being mindful of existing important metrics such as speed and model quality.

Speakers
avatar for Jae-Won Chung

Jae-Won Chung

PhD Student, University of Michigan
Jae-Won Chung is a third year PhD student at the University of Michigan. His research interests are at the intersection of software systems and deep learning, with a recent focus on sustainability issues like energy and carbon. He leads the ML.ENERGY (https://ml.energy) initiativ... Read More →



Tuesday October 17, 2023 11:00am - 11:10am PDT
Sequoia A
  Applications
  • Audience Experience Level Beginner
  • Slides Attached Yes

11:00am PDT

Lightning Talk: TorchFix - a Linter for PyTorch-Using Code with Autofix Support - Sergii Dymchenko, Meta
TorchFix is a Python code static analysis tool - a linter with autofix capabilities - for users of PyTorch. It can be used to find and fix issues like usage of deprecated PyTorch functions and non-public symbols, and to adopt PyTorch best practices in general.

Speakers
avatar for Sergii Dymchenko

Sergii Dymchenko

Research Engineer, Meta
Sergii is a Research Engineer at Meta, working on improving PyTorch developers experience.



Tuesday October 17, 2023 11:00am - 11:10am PDT
Cypress A/B/C
  Community/Integrations
  • Audience Experience Level Beginner
  • Slides Attached Yes

11:15am PDT

Lightning Talk: Tensor and 2D Parallelism - Rodrigo Kumpera & Junjie Wang, Meta
Speakers
avatar for Junjie Wang

Junjie Wang

Software Engineer, Meta
Junjie is a software engineer working on PyTorch distributed, especially Tensor Parallel and Sequence Parallel. Working on multiple projects within Meta but eventually found passion in PyTorch.
avatar for Rodrigo Kumpera

Rodrigo Kumpera

CTO, Outropy
Rodrigo Kumpera is the CTO at Outropy. Previously he was an engineer at Meta workingon PyTorch Distributed scalability, before than a Research SWE at MSR New York in the Reinforcement Learning Group and an early engineer at Xamarin, leading the team responsible for a dotnet runtime... Read More →



Tuesday October 17, 2023 11:15am - 11:25am PDT
Sequoia A
  Applications

11:15am PDT

Lightning Talk: CUDAGraph in a Partial Graph World - Elias Ellison, Meta
CUDAGraph made safe and easy to use in torch.compile.

Speakers
avatar for Elias Ellison

Elias Ellison

Software Engineer, Meta
Elias has been working on the PyTorch team for four years, most recently on the torch.compile stack



Tuesday October 17, 2023 11:15am - 11:25am PDT
Grand Peninsula Ballroom D/E/F/G
  Distributed

11:15am PDT

What's New for PyTorch Developer Infrastructure - Eli Uriegas & Omkar Salpekar, Meta
A few updates on the state of the world for PyTorch Developer Infrastructure as well as a lookahead into future projects. Also some info on how we do releases for all of PyTorch at scale.

Speakers
avatar for Omkar Salpekar

Omkar Salpekar

Software Engineer, Meta AI
Omkar has worked on PyTorch since 2019, focusing on Distributed Training and Developer Infrastructure. He's worked on PyTorch's support for Data- and Model-Parallel training support for large models, elastic training for large-scale jobs, automating PyTorch's package release infrastructure... Read More →
avatar for Elias Uriegas

Elias Uriegas

Engineering Manager, Meta
Eli Uriegas currently supports the PyTorch Developer Infrastructure team. Previously he had worked on Developer Infrastructure at Docker, as well as Rackspace. Eli has interests in enabling developer productivity at scale and without breaking the bank and has been at the forefront... Read More →



Tuesday October 17, 2023 11:15am - 11:40am PDT
Cypress A/B/C
  Community/Integrations
  • Audience Experience Level Beginner
  • Slides Attached Yes

11:15am PDT

PyTorch Edge: Developer Journey for Deploying AI Models Onto Edge Devices - Mengwei Liu & Angela Yi, Meta
This technical session caters to on-device AI engineers, providing insights into developer flows with PyTorch Edge. Through our semi-live demos and code showcases, attendees will embark on a journey to understand the research-to-production workflow using PyTorch Edge. Discover how our APIs and tooling facilitate productivity while ensuring portability across diverse hardware platforms, ranging from mobile phones, embedded devices and laptops.

Speakers
avatar for Mengwei Liu

Mengwei Liu

Software Engineer, Meta
I'm a software engineer in Meta AI, focusing on building a platform that assists users to deploy PyTorch models to edge devices easily.
avatar for Angela Yi

Angela Yi

Software Engineer, Meta
I'm a software engineer on the PyTorch Compilers team, working mainly on export and integrating it into runtimes like Executorch and AOTInductor!



Tuesday October 17, 2023 11:15am - 11:40am PDT
Grand Peninsula Ballroom A/B/C
  Production

11:30am PDT

Lightning Talk: Seismic Data to Subsurface Models with OpenFWI - Benjamin Consolvo, Intel
Obtaining an accurate "picture" of the subsurface is not as simple as snapping a picture on a smartphone. Seismic exploration is a key component to creating images of the subsurface and finding essential minerals and oil and gas. The process of building images of the subsurface is akin to ultrasound technology used to image the human body. One of the best known physics-based methods to create geologically-accurate images of the subsurface is called full-waveform inversion (FWI). It is a process by which we can take raw seismic data and through applying an iterative physics-based approach recreate the velocities of sound waves in the subsurface (which can be understood as an image). However, one of the challenges of this physics-based approach is that it is computationally expensive and it typically relies heavily on a good initial velocity model that is close to the answer. I will walk you through how I quickly trained a neural network with PyTorch on the latest 4th Gen. Xeon CPU, going directly from seismic data to a subsurface model and bypassing the need for an accurate starting model.

Speakers
avatar for Benjamin Consolvo

Benjamin Consolvo

AI Software Engineering Manager, Intel
Ben Consolvo is an AI Solutions Engineering Manager at Intel. He has been building a team and a program around Intel’s AI technology paired with Intel’s hardware offerings. He brings a background and passion in data science, particularly in deep learning (DL) and computer vision... Read More →



Tuesday October 17, 2023 11:30am - 11:40am PDT
Sequoia A
  Applications
  • Audience Experience Level Beginner
  • Slides Attached Yes

11:30am PDT

Lightning Talk: AOTInductor: Ahead-of-Time Compilation for PT2 Exported Models - Bin Bao, Meta
This talk introduces AOTInductor, an Ahead-Of-Time version of TorchInductor. AOTInductor enables the compilation of exported models into shared libraries, offering a seamless and high-performance deployment solution for PyTorch models in non-Python environments. By leveraging torch.export, AOTInductor ensures a streamlined workflow for efficient PyTorch model deployment.

Speakers
avatar for Bin Bao

Bin Bao

Software Engineer, Meta
Bin Bao is a software engineer working with the PyTorch Compiler team at Meta. He focuses on developing AOTInductor, a n Ahead-of-Time compiler for the PyTorch2 export path. Before joining Meta in 2019, Bin gained experience working on compilers at companies like Adobe, Qualcomm... Read More →



Tuesday October 17, 2023 11:30am - 11:40am PDT
Grand Peninsula Ballroom D/E/F/G
  Distributed

11:45am PDT

LightningTalk: MultiRay: An Accelerated Embedding Service for Content Understanding - Michael Gschwind, Meta
We present an overview of MultiRay, our high performance embedding service. The service provides a shared inference architecture that provides embeddings for content, and serves a small set of foundation models shared by all use cases. At present, we are serving 3 embedding services, for text understanding (TextRay), image understanding (ImageRay) and multi-modal whole-post understanding (text, image, etc) (PostRay). The system serves over over 800B requests daily, with up to 20M queries per second, serving over 125 different use cases.

Speakers
avatar for Michael Gschwind

Michael Gschwind

Director, Meta AI
Dr. Michael Gschwind currently leads AI training at Meta AI. He previously served as lead for GPU and Accelerator Inference at Meta and as lead for the MultiRay project. Prior to joining Meta, Michael held leadership roles at IBM. At IBM, he was a lead for 3 #1 Top500 supercomputers... Read More →



Tuesday October 17, 2023 11:45am - 11:55am PDT
Sequoia A
  Applications
  • Audience Experience Level Beginner
  • Slides Attached Yes

11:45am PDT

Lightning Talk: Accelerating Inference on CPU with Torch.Compile - Jiong Gong, Intel
For the torch.compile CPU backend, we have optimized the static shapes of the float32 path and achieved good performance speedups on popular models. Starting with PyTorch 2.0, we have further enhanced this feature by addressing several issues and optimizing the bfloat16 precision path. The dynamic shape path is also supported, which allows users to get good performance on dynamic shape models, such as GPTJ and Llama, as well as using low precision bfloat16 data type to further improve performance on the 4th generation of Intel Xeon Scalable Processors (Sapphire Rapids) using Advanced Matrix Extensions (AMX) instruction set extension and lower memory footprint. In this topic, we will introduce the key optimization technologies used in the CPU inference path of torch.compile, such as GEMM fusions, vectorization of low precision bfloat16 path, and constant folding with freezing path. We will also discuss how to solve issues that arose when supporting the path of the dynamic shape. Currently, the dynamic shape and bfloat16 paths can work well as static shape path. The geometric mean speedup of the bfloat16 path can range from 1.4x to 2.3x compared to eager mode on Sapphire Rapids.

Speakers
avatar for Jiong Gong

Jiong Gong

Principal Engineer, Intel
Jiong Gong is a principal engineer and a software architect from Intel working on PyTorch optimizations. He is also a PyTorch module maintainer on CPU performance and compiler frontend.



Tuesday October 17, 2023 11:45am - 11:55am PDT
Grand Peninsula Ballroom D/E/F/G
  Distributed
  • Audience Experience Level Beginner
  • Slides Attached Yes

11:45am PDT

PyTorch Korea User Group: The Beginning, Present, and Future - Junghwan Park, PyTorch Korea User Group
As a lead maintainer, I'd like to talk about the PyTorch community in Korea. Here’re some agenda I'm currently thinking about: - How we got started (since 2018) - How we're growing (during Covid-19) - What we want to do in the future - A guide for people who want to start a local community I’d like to talk about what inspired me to start the community, what I've learned along the way, and what other maintainers and I wish having done more of. I'll also include some tips and suggestions for those who want to start a local community like in Korea. With this talk, I hope to make more people aware of how the PyToch user community in Korea has been self-sustaining, and give them the courage to get involved.

Speakers
avatar for Junghwan Park

Junghwan Park

Lead Maintainer, PyTorch Korea User Group
- Data engineer at a telecommunication company in Korea- Lead maintainer at PyTorch Korea User Group- Interested in open-source, community, and time-series forecasting



Tuesday October 17, 2023 11:45am - 12:10pm PDT
Cypress A/B/C
  Community/Integrations
  • Audience Experience Level Beginner
  • Slides Attached Yes

11:45am PDT

PyTorch Edge: Vendor Integration Journey for Compilers and Backends - Kimish Patel, & Chen Lai, Meta Platforms
Join this technical session tailored for partners and hardware vendors aiming to provide high-performance solutions to their customers, including on-device PyTorch users. We will showcase how to integrate backend and compiler toolchain natively with PyTorch Edge IR without compromising on performance. Explore our well-defined entry points and API for integrating compiler passes, delegates, and custom kernel implementations, all without any intermediate conversions.

Speakers
avatar for Chen Lai

Chen Lai

Software Engineer, Meta
Chen is a Software Engineer in PyTorch Edge team and has been focusing on the delegate framework to bring up different backends including CPU, DSP, NPU and more. In the past, she drove the release for Mobile Interpreter in PyTorch 1.9 and mainly focused on the mobile runtime... Read More →
avatar for Kimish Patel

Kimish Patel

Software Engineer, Meta Platforms
Kimish has worked on enabling PyTorch on Meta's family of apps, primarily focusing on performance optimizations. His past experiences include hardware/software co-design, CPU architecture, and CPU/GPU performance optimization.



Tuesday October 17, 2023 11:45am - 12:10pm PDT
Grand Peninsula Ballroom A/B/C
  Production
  • Audience Experience Level Beginner
  • Slides Attached Yes

12:00pm PDT

Lightning Talk: Uplink Interference Optimizer, How to Optimize a Cellular Network in a Single Shot with GNNs - Oscar Llorente Gonzalez, Ericsson
Optimizing cellular networks has been a very difficult task for a long time. In these networks multiple problematic issues appear and the high number of parameters and variables to optimize makes it a difficult problem even for radio experts. Here an optimizer for a cellular network is presented, the Uplink Interference Optimizer. Specifically, the uplink interference problem (degradation of the signal transmitted from a user terminal to a base station) will be solved by constructing a model that predicts a variable that reflects the interference level (SINR) and then optimizing the parameters of the cellular network, based on the model that has been built, to reduce it. That way, we achieve the optimization in a single step, improving previous solutions based on RL that must iterate over the real cellular network for several weeks. The simulator model will be based on Graph Neural Networks (constructed with PyTorch and PyTorch Geometric), allowing us to consider the neighborhood of a cell to make a prediction, which enhances the prediction accuracy over all older models. Then, any algorithm could be run to improve the parameters based on the simulator model we have constructed.

Speakers
avatar for Oscar Llorente Gonzalez

Oscar Llorente Gonzalez

Data Scientist, Ericsson
I am a Data Scientist at Ericsson Cognitive Software. I am passionate about Artificial Intelligence and my Research interests are Explainable Artificial Intelligence and Geometric Deep Learning.



Tuesday October 17, 2023 12:00pm - 12:10pm PDT
Sequoia A
  Applications

12:00pm PDT

Lightning Talk: Lessons from Using Pytorch 2.0 Compile in IBM's Watsonx.AI Inference - Antoni Viros i Martin, IBM Research
In this talk we will cover lessons learned about PT 2.0 compile after using it in IBM’s Watsonx.AI stack with NVIDIA GPUs and custom IBM accelerators as the main inference acceleration solution. Specifically, we will cover the results of our latency and throughput experiments with a range of LLM models, ranging from encoder-only, encoder-decoder, and decoder-only transformer models. We will talk about performance comparisons with other approaches in the field as well as our collaboration with the core PyTorch team to fix some of the bugs we have encountered when using features such as dynamic shapes and CUDA graph trees. We will also comment on how we have been using the torch.compile() API to compile and run models on IBM’s AIU accelerator and why we have made that choice. Finally, we will also cover the interaction of parallel approaches such as Tensor Parallel for bigger models combined with Compile for inference workloads.

Speakers
avatar for Antoni Viros i Martin

Antoni Viros i Martin

Research Scientist, IBM Research
Antoni is currently a Research Scientist at IBM Research, investigating optimization approaches for ML inference and training, with a focus on open-source technologies such as PyTorch. He holds a PhD in Aerospace Engineering form Texas A&M University, and has previously worked at... Read More →



Tuesday October 17, 2023 12:00pm - 12:10pm PDT
Grand Peninsula Ballroom D/E/F/G
  Distributed

12:10pm PDT

Lunch Break
Grab and Go Lunch

- Apple
- Tri-Color Tortellini Pasta Salad

3 Sandwiches to Choose From:
- Italian on Seeded Hoagie with Mortadella, Salami, Cappacola, Pepperoni, Olive Tapenade, Balsamic Glaze
- Chicken pesto on focaccia with roasted chicken breast, pesto, garlic aioli, lettuce, tomato, roasted red pepper
- Tofu bahn mi wrap with pickled and jullienned jicama, carrots and daikon, cilantro, sliced jalapeno and vegan sriracha aioli (GF, Vegan)

- Chocolate Chip Cookies



Tuesday October 17, 2023 12:10pm - 1:10pm PDT
Atrium

1:10pm PDT

Keynote: Welcome & Opening Remarks - Joe Spisak, Product Director, Meta
Speakers
avatar for Joe Spisak

Joe Spisak

Product Director, Meta Inc.
Joe Spisak is Product Director and Head of Open Source in Meta’s Generative AI organization. A veteran of the AI space with over 10 years experience, Joe led product teams at Meta/Facebook, Google and Amazon where he focused on open source AI, open science and building developer... Read More →


Tuesday October 17, 2023 1:10pm - 1:25pm PDT
Grand Peninsula Ballroom D/E/F/G

1:25pm PDT

Keynote: Refik Anadol Studio: Rainforest AI Research - Christian Burke, Lead Data Scientist & Refik Anadol, Media Artist & Director, Refik Anadol Studio
During PyTorch 2022, Refik Anadol and Christian Burke discussed the Studio's history in AI art and unveiled its future endeavors in researching the applications of AI to model and preserve the Amazon Rainforest. This year, we would like to discuss and demonstrate our custom AI Rainforest models, showcasing generative fauna, flora, and funga from the Amazon Rainforest, as well as highlight our research in AI-based Language Preservation for an indigenous tribe, the Yawanawa. In addition, we would like to exhibit how PyTorch enables us to conduct this research and how we are using it to create AI models to preserve the rainforest and humanity.

Speakers
avatar for Refik Anadol

Refik Anadol

Director, Refik Anadol Studio
Refik Anadol (b. 1985, Istanbul, Turkey) is an internationally renowned media artist, director, and pioneer in the aesthetics of machine intelligence. He currently resides in Los Angeles, California, where he owns and operates Refik Anadol Studio and RAS LAB, the Studio’s research... Read More →
avatar for Christian Burke

Christian Burke

Lead Data Scientist, Refik Anadol Studio
Christian Burke, based in Los Angeles, CA, operates as the Lead Data Scientist at Refik Anadol Studio (RAS). RAS fuses data, AI, and machine learning to craft captivating public art displays. In 2018, Burke was celebrated for amassing the then-largest artistic dataset during his work... Read More →


Tuesday October 17, 2023 1:25pm - 1:40pm PDT
Grand Peninsula Ballroom D/E/F/G
  Keynote Sessions

1:40pm PDT

Keynote: AMD & PyTorch: A Powerful Combination for Generative AI - Negin Oliver, Sr. Director, Data Center GPU & Accelerated Processing, AMD
Artificial Intelligence (AI) is a rapidly evolving field with diverse applications, and AMD is at the forefront of this revolution, offering a wide-ranging portfolio of AI solutions. In this keynote talk, learn about AMD’s extensive portfolio of AI solutions from Cloud to Edge to Endpoints and their support for PyTorch framework. We will also showcase the growing AI ecosystem around AMD solutions facilitating a rich experience for AI users. 
By the end of this talk, you will learn how to leverage the synergy of AMD and PyTorch to create amazing generative AI applications with ease and efficiency.

Speakers
avatar for Negin Oliver

Negin Oliver

Sr. Director, Data Center GPU Business Development, AMD
Negin Oliver is a Senior Director of Business Development at AMD, where she leads the strategy and execution of AMD’s data center graphics solutions for Machine Learning & High-Performance Computing. Negin has over 25 years of experience in the semiconductor industry, spanning product... Read More →


Tuesday October 17, 2023 1:40pm - 1:45pm PDT
Grand Peninsula Ballroom D/E/F/G

1:45pm PDT

Keynote: Building an Interoperable Ecosystem for Generative AI - Stella Biderman, Lead Scientist at Booz Allen Hamilton & Executive Director, EleutherAI
Generative AI can be daunting given the rapid growth and new information flooding the broader community. Leveraging her extensive experience in the construction of Generative AI ecosystems, Stella Biderman will highlight areas of the open-source AI community ready to offer invaluable support to industry beginners. Her keynote will discuss the endeavors of over a dozen organizations collaboratively dedicated to provide such resources and methods to become actively engaged in the field, as well as the importance of interoperability.

Speakers
avatar for Stella Biderman

Stella Biderman

Executive Director, Eleuther AI
I am a mathematician and artificial intelligence researcher at Booz Allen Hamilton and EleutherAI who specializes in natural language processing, ML interpretability, and AI ethics.Over the past several years my work has focused on making cutting edge AI technologies more widely accessible... Read More →



Tuesday October 17, 2023 1:45pm - 2:00pm PDT
Grand Peninsula Ballroom D/E/F/G
  Keynote Sessions
  • Slides Attached Yes

2:00pm PDT

Keynote: The Promise of PyTorch as a General-Purpose Array-Oriented Computational Backend - Travis Oliphant, Founder & CEO, Quansight
Array-oriented programming is a key paradigm of the SciPy and PyData, or SciPyData, ecosystem. Most operations in science, engineering, and AI/ML are naturally based around N-dimensional array, or tensor, operations. Domain experts can typically write the algorithms they are implementing using high-level tensor primitives. For over 28 years this has been done in Python primarily using NumPy (previously Numeric) as the foundational Nd-array object. As many additional array objects have been built over the past 8 years to support the growing interest in deep-learning, the SciPyData ecosystem has produced the Array API as part of the Data APIs initiative (https://data-apis.org) to assist in meeting the needs of that community. New features in PyTorch have helped PyTorch implement this Array API, which has enabled libraries like SciPy and scikit-learn to add support for PyTorch as the backend. Quansight has worked closely with Meta and other PyTorch sponsors to enable libraries using the Array API to reliably use PyTorch in their general-purpose workflows outside of just deep-learning. This enables the entire scientific community to potentially take advantage of PyTorch investment in run-times on GPUs, TPUs, and other parallel-hardware. The promise of array-oriented computing has always been that by writing at a high-level, the code can be run with optimizations on a variety of hardware. With Data APIs and PyTorch, this promise is becoming a reality.

Speakers
avatar for Travis Oliphant

Travis Oliphant

CEO, Quansight
Dr. Oliphant has a Ph.D. in Biomedical Engineering from the Mayo Clinic, and M.S. and B.S. degrees in Electrical Engineering (and Math) from Brigham Young University. Travis has worked extensively with Python for numerical and scientific programming since 1997, and was the primary... Read More →



Tuesday October 17, 2023 2:00pm - 2:05pm PDT
Grand Peninsula Ballroom D/E/F/G
  Keynote Sessions
  • Slides Attached Yes

2:10pm PDT

Keynote: How to Leverage PyTorch to Scale AI Training and Inferencing - Raghu Ganti, Principal Research Scientist, IBM Research
As generative AI models grow larger and more complex, the ability to scale these models becomes a critical challenge facing enterprises today. How can developers leverage PyTorch to maximize the value of these large, multi-billion parameter models to make them run faster, more efficiently, and more affordably both on-prem and in the cloud? This keynote will highlight various levers that PyTorch FSDP provides to scale AI model training on hundreds of GPUs and how IBM applied them to obtain state-of-the-art training throughput in models with up to 70 billion parameters. It will also discuss how we combined the latest advancements in PyTorch compile with custom tensor parallel implementation to achieve significantly reduced inferencing latency.



Speakers
avatar for Raghu Ganti

Raghu Ganti

Principal Research Scientist, IBM
I am a Principal Research Scientist at IBM T. J. Watson Research Center, Yorktown Heights. I co-lead the Foundation Models AI training and validation platform, built on OpenShift. My team primarily contributes to the PyTorch training and inference components, with the mission of democratizing... Read More →



Tuesday October 17, 2023 2:10pm - 2:25pm PDT
Grand Peninsula Ballroom D/E/F/G
  Keynote Sessions
  • Slides Attached Yes

2:25pm PDT

Keynote: The Value of Open Source for the Enterprise - Priya Nagpurkar, Vice President, Hybrid Cloud Platform and Developer Productivity, IBM Research
Open-source communities accelerate innovation by empowering members to harness collective insights and build on a vast prior body of work. However, achieving a successful and responsible open-source community, and how enterprise companies should contribute to these communities, can be a delicate balance. Priya Nagpurkar, who leads the strategy for AI and cloud platforms at IBM Research, will discuss what IBM looks for in open-source collaborators, how PyTorch forwards IBM's strategic goals, and the role open-source technologies will play in generative AI’s future.

Speakers
avatar for Priya Nagpurkar

Priya Nagpurkar

Vice President, Hybrid Cloud and AI Platform, IBM Research
Priya Nagpurkar leads the Hybrid Cloud Platform and Developer Productivity mission at IBM's T.J. Watson Research Center. She leads a global team of distributed systems, programming models, and AI experts working on all interesting aspects of the cloud platform -- from serverless computing... Read More →


Tuesday October 17, 2023 2:25pm - 2:30pm PDT
Grand Peninsula Ballroom D/E/F/G

2:30pm PDT

Keynote: Intel and PyTorch: Enabling AI Everywhere with Ubiquitous Hardware and Open Software - Fan Zhao, Senior Director, Deep Learning Frameworks and Technology, Intel
Generative AI technologies have accelerated our journey to an “AI Everywhere” reality and enabling greater access to AI has great societal value. Ubiquitous hardware and open software are the keys to democratizing AI and its benefits. Intel’s rich AI hardware and software portfolios in conjunction with optimized open frameworks such as PyTorch and its ecosystem libraries provide compelling options for any business or entity looking to innovate, develop, and deploy AI applications at scale. In this talk, we are going to provide a brief overview of Intel’s AI solutions, explore how we are working with the PyTorch community to advance “AI Everywhere”, and showcase how you can easily leverage hardware and software AI acceleration to seamlessly optimize your applications.

Speakers
avatar for Fan Zhao

Fan Zhao

Senior Director, Intel
Fan Zhao is a Senior Engineering Director of Deep Learning frameworks and Technology in AI & Analytics (AIA) at Intel. She has nearly two decades of experience in open-source projects and currently leads Intel CPU and GPU enabling and optimizations in deep learning frameworks, libraries... Read More →



Tuesday October 17, 2023 2:30pm - 2:35pm PDT
Grand Peninsula Ballroom D/E/F/G
  Keynote Sessions
  • Slides Attached Yes

2:35pm PDT

Keynote: PyTorch Lightning: Powering the GenAI Revolution from Research to the Enterprise - William Falcon, Founder and CEO, Lightning AI
Since its release in 2019, PyTorch Lightning has boosted the adoption of PyTorch in both research and the enterprise, and it is now at the center of the GenAI revolution. William Falcon, CEO at Lightning AI, introduces PyTorch Lightning 2.1, the latest release bringing key new features to Trainer and Fabric specifically targeted at large model support. He lays down the vision for the future and announces the latest initiative around Lightning AI's commitment to PyTorch and its community.

Speakers
avatar for William Falcon

William Falcon

Founder and CEO, Lightning AI
William Falcon is the creator of PyTorch Lightning, a deep learning framework. He is also the founder and CEO of Lightning AI, and was previously a co-founder and CTO of NextGenVest. He began working on these projects while completing a Ph.D. at NYU, which was funded by Google DeepMind... Read More →



Tuesday October 17, 2023 2:35pm - 2:40pm PDT
Grand Peninsula Ballroom D/E/F/G
  Keynote Sessions
  • Slides Attached Yes

2:45pm PDT

Keynote: The Llama Ecosystem: Past, Present and Future - Joe Spisak, Product Director, Meta
Join Meta on a journey through the evolution of Llama — our new open source large language model (LLM). Understand its growth and the range of available models including the most recently released Code Llama. Discover the ways developers can harness Llama’s potential and learn about the significant community impact Llama has had on the AI ecosystem.


Speakers
avatar for Joe Spisak

Joe Spisak

Product Director, Meta Inc.
Joe Spisak is Product Director and Head of Open Source in Meta’s Generative AI organization. A veteran of the AI space with over 10 years experience, Joe led product teams at Meta/Facebook, Google and Amazon where he focused on open source AI, open science and building developer... Read More →


Tuesday October 17, 2023 2:45pm - 2:50pm PDT
Grand Peninsula Ballroom D/E/F/G

2:50pm PDT

Coffee Break
Tuesday October 17, 2023 2:50pm - 3:10pm PDT
Atrium

2:50pm PDT

Sponsor Showcase
Tuesday October 17, 2023 2:50pm - 3:10pm PDT
Atrium

3:10pm PDT

Lightning Talk: Triton Compiler - Thomas Raoux, OpenAI
Triton is a language and compiler for writing highly efficient custom deep learning primitives. The aim of Triton is to provide an open-source environment to write fast code with higher productivity than CUDA, but also with greater flexibility than other existing DSLs. Triton has been adopted as a fundamental component of Torch inductor to synthesize efficient kernels targeting GPUs. This has multiple advantages compared to traditional library usage. It allows for the creation of a wide variety of fusions, it can be tuned independently, and it has a smaller memory footprint. This talk will present the Triton compiler and describe the process that enables it to generate lightning-fast kernels with minimal user effort.

Speakers
avatar for Thomas Raoux

Thomas Raoux

Member of technical staff, OpenAI
Thomas works on Triton compiler at OpenAI. He has extensive experience in writing ML and GPU compilers at Intel and Google.



Tuesday October 17, 2023 3:10pm - 3:20pm PDT
Cypress A/B/C
  Community/Integrations

3:10pm PDT

Lightning Talk: The Fastest Path to Production: PyTorch Inference in Python - Mark Saroufim, Meta
Historically for inference, users have had to rewrite their models to be jit scriptable which required model rewrites and familiarity with C++ services. This is frustrating especially when the vast majority of real world pytorch users actually deploy python in production. When torch.compile was introduced, it encouraged a UX of gradual model rewrites to optimize models but users would get value even without any. A C++ based option still represents a steep difficulty jump and torch.compile still suffers from long compile times which make it unsuited for server side inference where cold start times are critical. In this talk we introduce the options users have for the quickest possible path to production including new APIs to cache compilation artifacts across devices so users can compile models once for both training and inference and python bindings for AOT Inductor. We'll also end with some real world case studies inspired by users who faced the above problems within the context of torchserve. By which point we hope you'll be fully convinced that it's possible deploy python in production and retain performance.

Speakers
avatar for Mark Saroufim

Mark Saroufim

For loop rotator, Meta
Mark Saroufim is a PyTorch Engineer at Meta working on inference, compilers and community.



Tuesday October 17, 2023 3:10pm - 3:20pm PDT
Grand Peninsula Ballroom A/B/C
  Production
  • Audience Experience Level Beginner
  • Slides Attached Yes

3:10pm PDT

Accelerating Generative AI - Christian Puhrsch & Horace He, Meta
There is a Cambrian explosion of performant and efficient methods to train and serve generative AI models within the community. The PyTorch team will present optimizations to transformer based Generative AI models, using pure, native PyTorch. In this talk we aim to cover both new techniques in PyTorch for driving efficiency gains, as well as showcasing how they can be composed on popular Generative AI models. Highlights will include methods spanning torch compile, quantization, sparsity, memory efficient attention, reducing padding.

Speakers
CP

Christian Puhrsch

Software Engineer, Meta
HH

Horace He

Software Engineer, Meta



Tuesday October 17, 2023 3:10pm - 3:35pm PDT
Sequoia A
  Applications
  • Audience Experience Level Advanced
  • Slides Attached Yes

3:10pm PDT

Accelerating Explorations in Vision and Multimodal AI Using Pytorch Libraries - Nicolas Hug, Philip Bontrager, Evan Smothers & Peng Chen, Meta
PyTorch Libraries provide building blocks (data processing transforms, modeling components, loss functions, etc.) on top of PyTorch as well as examples and tutorials on how to use these building blocks for training SoTA Models. In this talk, we’ll provide insights into ongoing work to accelerate exploration in multimodal understanding and generative AI using TorchMultimodal. We'll also present TorchVision's new transforms API, with added support for image detection, segmentation, and video tasks.

Speakers
avatar for Nicolas Hug

Nicolas Hug

Research Engineer, Meta
Nicolas is a software engineer in the PyTorch team at Meta, where he mainly contributes to the torchvision library. Prior to that, Nicolas was a research scientist at Columbia University, where he became part of the scikit-learn core development team. Nicolas holds a PhD in machine... Read More →
PB

Philip Bontrager

Machine Learning Engineer, Meta
avatar for Evan Smothers

Evan Smothers

Software Engineer, Meta
Evan Smothers is a software engineer on the PyTorch Domains team at Meta. His work focuses on supporting researchers building state-of-the-art multimodal models, and helping to scale these models to billions of parameters. Prior to joining Meta, Evan was a data scientist at Uber and... Read More →



Tuesday October 17, 2023 3:10pm - 3:35pm PDT
Grand Peninsula Ballroom D/E/F/G
  Distributed

3:25pm PDT

Lightning Talk: Harnessing NVIDIA Tensor Cores: An Exploration of CUTLASS & OpenAI Triton - Matthew Nicely US, NVIDIA
Discover the power of NVIDIA Tensor Cores and accelerate your PyTorch development using two cutting-edge open-source libraries: CUTLASS and OpenAI Triton. This presentation aims to inspire both novice and seasoned PyTorch developers to unlock new efficiencies and capabilities in their work. We delve into the architecture of CUTLASS, revealing its diverse use cases and value proposition. Simultaneously, we explore Triton's transformative capabilities for NVIDIA Tensor Cores, and the roadmaps for these essential tools. I hope to inspire the audience to engage with these tools for their projects, contributing to a high-performing PyTorch community.

Speakers
avatar for Matthew US

Matthew US

Deep Learning Compilers Product Manager, NVIDIA
Matthew Nicely is Senior Product Manager at NVIDIA, covering Deep Learning Compilers. He manages cuDNN, CUTLASS, and contributions XLA and OpenAI Triton. Matt received his Ph.D. in computer engineering focusing on algorithm optimizations on GPUs from the University of Alabama in... Read More →



Tuesday October 17, 2023 3:25pm - 3:35pm PDT
Cypress A/B/C
  Community/Integrations
  • Audience Experience Level Beginner
  • Slides Attached Yes

3:25pm PDT

Lightning Talk: Exploring PiPPY, Tensor Parallel and Torchserve for Large Model Inference - Hamid Shojanazeri, Meta
Here, we talk about large model inference with Torchserve, using PiPPy, Tensor Parallel, challenges of distributed inference and available solutions. Discuss the features that Torchserve provide today for serving LLMs in production today.

Speakers
avatar for Hamid Shojanazeri

Hamid Shojanazeri

partner engineer, Meta
Hamid Shojanazeri is a Partner Engineer at Pytorch working on OSS high performance model optimization and distributed training. Hamid holds a P.h.D in Computer vision and worked as a researcher in multimedia labs in Australia, Malaysia and NLP lead in Opus.ai. He is mostly interested... Read More →



Tuesday October 17, 2023 3:25pm - 3:35pm PDT
Grand Peninsula Ballroom A/B/C
  Production

3:40pm PDT

Lightning Talk: TorchRL - RLHF Support - Vincent Moens, Meta
RLHF is notoriously hard to implement, requiring technical knowledge across RL and other domains. For this reason, people often revert to packaged solutions with single entry points and complex configurations that leave little room for custom development. We present a new RLHF support in TorchRL that solves this problem by giving developers users full control over the training pipeline at a reduced development cost on the RL side. This new set of primitives allow users to quickly prototype and train generative models across domains (language, CV and others). With the TorchRL-HF tooling, RL-specific classes and recipes are easily blended within one's code base, and multiple solutions (preprocessing techniques or RL algorithms) can seamlessly be implemented without the need for an in-depth understanding of the RL machinery. We demonstrate how this works in practice with examples from diverse domains, including LLMs and drug design.

Speakers
avatar for Vincent Moens

Vincent Moens

Research Engineer, Meta
Vincent Moens has a diverse background in medicine, neuroscience, and machine learning. Vincent pursued a Ph.D. in computational neuroscience, where his research focused on the intricate relationship between habitual behavior, Bayesian statistics, generative AI, and reinforcement... Read More →



Tuesday October 17, 2023 3:40pm - 3:50pm PDT
Sequoia A
  Applications

3:40pm PDT

Lightning Talk: PyTorch 2.0 on the ROCm Platform - Douglas Lehr, AMD
Talk about the current state of PyTorch on the ROCm platform. Including efforts to achieve day 0 support for Triton on Pytorch 2.0. As well as performance improvements, efforts with Huggingface, and other areas.

Speakers
avatar for Douglas Lehr

Douglas Lehr

Software Engineer, AMD
I'm husband and father of four who is passionate about video games, woodworking, and sports. I am a Principal Engineer and Technical Lead at AMD, where I lead the AI Model Performance team for PyTorch on AMD GPUs. My primary role centers around enhancing operators for both language... Read More →


Tuesday October 17, 2023 3:40pm - 3:50pm PDT
Cypress A/B/C
  Community/Integrations
  • Audience Experience Level Beginner
  • Slides Attached Yes

3:40pm PDT

Lightning Talk: Standardizing CPU Benchmarking with TorchBench for PyTorch Community - Xu Zhao, Meta & Mingfei Ma, Intel
TorchBench is a community-driven open-source benchmark for PyTorch that covers a wide range of popular and critical models. However, the default setup of TorchBench does not cover all CPU-specific features of Pytorch, and its benchmarking methodology is not optimized for CPU devices. This makes it difficult to use TorchBench to benchmark new CPU features such as FX INT8, AMP and so on, as well as CPU-specific scenarios such as core binding. We worked closely with the PyTorch community to enable benchmarking for major CPU optimizations and features. And we leveraged the userbenchmark design to improve and standardize the benchmarking methodology for CPU, aligning it with common CPU benchmarking practices. We also increased model coverage by adding new models and fixing bugs, including GNN models and some fixes for serval models on CPU devices. With these improvements, TorchBench can be used to track regressions, prove the performance benefits of new optimizations, and easily replicate results on CPU devices. We will continue to improve TorchBench in line with the PyTorch roadmap, making it a valuable tool for improving PyTorch quality and showcasing PyTorch performance on CPU.

Speakers
avatar for Mingfei Ma

Mingfei Ma

Senior Software Engineer, Intel
Mingfei Ma is a senior deep learning software engineer in Intel. He is also the maintainer of CPU performance module in PyTorch. Mingfei holds a Master degree from Harbin Institute of Technology where he majored in Control Science and Technology. Mingfei has a 12 years’ experience... Read More →
avatar for Xu Zhao

Xu Zhao

Research Scientist, Meta
Xu Zhao is a research scientist working at Meta Platforms Inc. His focus is on PyTorch benchmarking and performance optimization. Prior to joining Meta, Xu earned his Ph.D. and Master degree from University of Toronto.



Tuesday October 17, 2023 3:40pm - 3:50pm PDT
Grand Peninsula Ballroom A/B/C
  Production

3:40pm PDT

Getting Started with Pytorch 2.0 and Hugging Face Transformers - Philipp Schmid, Hugging Face
The session will highlight the new features of PyTorch 2.0 and how to get started with PyTorch 2.0 and Hugging Face Transformers today. It will cover how to fine-tune a BERT model for Text Classification using the newest PyTorch 2.0 features.

Speakers
avatar for Philipp Schmid

Philipp Schmid

Technical Lead, Hugging Face
Philipp Schmid is a Technical Lead at Hugging Face with the mission to democratize good machine learning through open source and open science. Philipp is passionate about productionizing cutting-edge & generative AI machine learning models.



Tuesday October 17, 2023 3:40pm - 4:05pm PDT
Grand Peninsula Ballroom D/E/F/G
  Distributed

3:55pm PDT

Lightning Talk: Diffusers: Bringing Cutting-Edge Diffusion Models to the Masses - Lysandre Debut, Hugging Face
Diffusers is a go-to library for state-of-the-art pretrained diffusion models, allowing users to generate images, audio, and 3D structures of molecules effortlessly. With a focus on usability, simplicity, and customization, Diffusers is a modular toolbox for both simple inference and training of diffusion models. By seamlessly integrating with PyTorch, Diffusers streamlines the development process, making model implementation, training, and evaluation a breeze. Leveraging the power of PyTorch 2.0, Diffusers takes advantage of new features and improvements, empowering researchers and practitioners to push the boundaries of their projects. The library incorporates optimizations such as accelerated transformers and `torch.compile()`, enhancing accessibility, inference speed, memory efficiency, and resource utilization. Through practical demonstrations, Diffusers showcases its potential in real-world scenarios, providing an indispensable tool for researchers and developers. Join us to explore how Diffusers democratizes diffusion models with PyTorch integration, optimizations, and practical demonstrations.

Speakers
LD

Lysandre Debut

Hugging Face



Tuesday October 17, 2023 3:55pm - 4:05pm PDT
Sequoia A
  Applications

3:55pm PDT

Lightning Talk: Enhancements Made to MPS Backend in PyTorch for Applications Running on Mac Platforms - Kulin Seth, Apple
Since PyTorch 2.0, MPS backend has qualified for “beta” stage which provides wider operator support (300+) and network coverage. We will provide details about new features introduced in MPS backend such as how to add custom operations to your network and profiling applications using MPSProfiler & Instruments. Finally we will provide some debugging tips and best practices on MPS device and conclude with performance results on popular benchmarks.

Speakers
KS

Kulin Seth

Engineering manager, Apple
Provide Metal acceleration to ML Frameworks for running workloads efficiently on Mac platforms



Tuesday October 17, 2023 3:55pm - 4:05pm PDT
Cypress A/B/C
  Community/Integrations
  • Slides Attached Yes

3:55pm PDT

Lightning Talk: Profiling and Memory Debugging Tools for Distributed ML Workloads on GPUs - Aaron Shi, Meta
An overview of PyTorch profiling tools and features (Profiler and Kineto) followed by a practical dive into our extensive GPU memory debugging tools. The PyTorch Profiler will introduce the Memory Profiler for better understanding of GPU memory, as well as newly released OSS repos such as Holistic Trace Analysis (used to understand distributed profiler traces and provide useful views), and Dynolog (used for triggering on-demand traces). Followed by a look into new GPU memory debugging tools for PyTorch: Memory Snapshot, and Reference Cycle Detector. May take a practical approach in understanding memory leaks, fragmentation and reference cycles.

Speakers
avatar for Aaron Shi

Aaron Shi

Software Engineer, Meta
Focused on Tooling (such as the Kineto Profiler) for Distributed ML Training Models. A core member of the PyTorch Perf Infra Team at Meta.



Tuesday October 17, 2023 3:55pm - 4:05pm PDT
Grand Peninsula Ballroom A/B/C
  Production

4:10pm PDT

Lightning Talk: Adding Backends for TorchInductor: Case Study with Intel GPU - Eikan Wang, Intel
- There are two integration levels to add a new backend for the PyTorch compiler - AtenIR/PrimsIR level and Inductor loop IR level. The ATen/Prim level IR integration has been there via the custom backend registration infrastructure (https://pytorch.org/docs/stable/dynamo/custom-backends.html). Yet, the latter offers an option to integrate backend compiler at the lower loop-level IR, which can benefit from the existing compiler infrastructure of the Inductor, such as the loop fusion and memory planning. We developed a dynamic registration mechanism on the Inductor side for a new backend. The mechanism allows a backend to register its codegen for a particular device at runtime. And the new backend just needs to focus on generating optimal code for the device. - Case Study – Intel GPU Backend for Inductor Take Intel GPU Backend for Inductor as an example to study how to support Intel GPU via the proposed registration mechanism to prove the idea. Intel GPU Backend for Inductor is on top of Triton, as we have enabled Triton to support any new HW backend. In this context, the case study will show the power of “Inductor + Triton” to easily support any new accelerator.

Speakers
avatar for Eikan Wang

Eikan Wang

AI Frameworks Engineer, Intel
Eikan is a staff engineer from Intel and a DL framework tech lead having full-stack experience in DL, from various AI applications to framework, library, and DL compiler. He is actively optimizing on torch.compile stack for Intel platforms, including optimizing Inductor C++/OpenMP... Read More →



Tuesday October 17, 2023 4:10pm - 4:20pm PDT
Cypress A/B/C
  Community/Integrations
  • Audience Experience Level Beginner
  • Slides Attached Yes

4:10pm PDT

Lightning Talk: Building Intermediate Logging for PyTorch - Kunal Bhalla, Meta
One of the best ways to understand what's going on with your model is to actually look at all the numbers flowing through it. In this talk I'll walk through implementing an API to capture all values as they flow through a PyTorch model: Module arguments, parameters, buffers, return values -- as well as gradients -- in a way that Just Works even if you're using TorchScript, torch.compile, transforming the model with Torch Fx, distributing it with torch.package or just pickling it and passing it around. We'll end up talking about several Python and PyTorch internals along the way. Having all the numbers available opens up a lot of opportunities to understand and debug your model as well, and I'll also talk through some case studies, though I'm far from a domain expert there.

Speakers
avatar for Kunal Bhalla

Kunal Bhalla

Software Engineer, Meta
I've been moving up and down the stack as at Meta for a decade: working across site reliability, mobile battery optimization, smoothly animated maps, custom notebooks implementations and most recently: making debugging models easier.



Tuesday October 17, 2023 4:10pm - 4:20pm PDT
Grand Peninsula Ballroom A/B/C
  Production

4:10pm PDT

Into Generative AI with PyTorch Lightning 2.0 - Luca Antiga & Carlos Mocholí, Lightning AI
Four years after its initial release, PyTorch Lightning has become one of the most used deep learning frameworks. It has been adopted by tens of thousands of companies and academic groups, contributing to drive the adoption of PyTorch to where it is today. The advent of generative AI has led to new challenges in training large models, and PyTorch Lightning has empowered the industry by making many of the latest innovations accessible and robust. PyTorch Lightning powers state-of-the-art generative AI models like StableDiffusion and SDXL. It has been adopted in exciting new directions for LLMs like Hyena Hierarchy and State Space models, as well as the new RWKW recurrent architecture. On top of all that, the PyTorch Lightning-based NVIDIA NeMo, which includes NeMo Megatron, is enabling companies to train LLMs up to hundreds billions parameters. In this talk we will explore generative AI applications powered by PyTorch Lightning, and cover the latest PyTorch Lightning 2.0 features that make working with large models easy. We will also discuss how Lightning Fabric powers lit-gpt, which has been adopted as the starter kit for the recent LLM Efficiency Challenge at NeurIPS 2023.

Speakers
avatar for Luca Antiga

Luca Antiga

CTO, Lightning AI
CTO @ Lightning AI, Founder (Orobix, Tensorwerk), early PyTorch core contributor, Manning Author (Deep Learning with PyTorch).
avatar for Carlos Mocholí

Carlos Mocholí

Research Engineer, Lightning AI
Starting as a contributor who needed to fix a few bugs in the library, I became PyTorch's Lightning technical lead. I am passionate about making training code faster, more efficient, more maintainable, and more composable. Especially using open-source.


Tuesday October 17, 2023 4:10pm - 4:35pm PDT
Sequoia A
  Applications

4:10pm PDT

Training a LLaMA in your Backyard: fFne-tuning Very Large Models on Consumer Hardware - Sourab Mangrulkar & Younes Belkada, Hugging Face
Speakers
avatar for Sourab Mangrulkar

Sourab Mangrulkar

Machine Learning Engineer, creator of 🤗 PEFT, Hugging Face
Sourab Mangrulkar is an ML Open-Source Engineer at Hugging Face. He has over 5 years of experience, including 2 years each at Microsoft and Amazon as an Applied Scientist, and over 1 year at Hugging Face. Sourab has worked on diverse problems such as click-through rate prediction... Read More →
avatar for Younes Belkada

Younes Belkada

Machine Learning Engineer, Hugging Face
Younes is a ML Engineer working on the Open Source team at Hugging Face and he collaborates with researchers and developers to add new exciting features in the HF ecosystem and have contributed to various libraries in the HF ecosystem (transformers, accelerate, PEFT).



Tuesday October 17, 2023 4:10pm - 4:35pm PDT
Grand Peninsula Ballroom D/E/F/G
  Distributed
  • Slides Attached Yes

4:25pm PDT

Lightning Talk: Accelerated Inference in PyTorch 2.X with Torch-TensorRT - George Stefanakis & Dheeraj Peri, NVIDIA
Torch-TensorRT accelerates the inference of deep learning models in PyTorch targeting NVIDIA GPUs. Torch-TensorRT now leverages Dynamo, the graph capture technology introduced in PyTorch 2.0, to offer a new and more pythonic user experience as well as to upgrade the existing compilation workflow. The new user experience includes Just-In-Time compilation and support for arbitrary Python code (like dynamic control flow, complex I/O, and external libraries) used within your model, while still accelerating performance. A single line of code provides easy and robust acceleration of your model with full flexibility to configure the compilation process without ever leaving PyTorch: torch.compile(model, backend=”tensorrt”) The existing API has also been revamped to use Dynamo export under the hood, providing you with the same Ahead-of-Time whole-graph acceleration with fallback for custom operators and dynamic shape support as in previous versions: torch_tensorrt.compile(model, inputs=example_inputs) We will present descriptions of both paths as well as features coming soon. All of our work is open source and available at https://github.com/pytorch/TensorRT.

Speakers
avatar for Dheeraj Peri

Dheeraj Peri

Senior Deep Learning Software Engineer, NVIDIA
Dheeraj Peri works as a deep learning software engineer at NVIDIA. Before that, he was a graduate student at Rochester Institute of Technology in New York, working on deep learning-based approaches for content retrieval and handwriting recognition tasks. Dheeraj's research interests... Read More →
avatar for George Stefanakis

George Stefanakis

Deep Learning Performance Engineer, NVIDIA
George Stefanakis is a performance software engineer working on accelerating deep learning networks on the TensorRT team at NVIDIA. He is currently working with Torch-TensorRT to develop a novel framework for improving model inference within Torch. Prior to joining NVIDIA, he was... Read More →



Tuesday October 17, 2023 4:25pm - 4:35pm PDT
Cypress A/B/C
  Community/Integrations
  • Slides Attached Yes

4:25pm PDT

Lightning Talk: PT2 Export - A Sound Full Graph Capture Mechanism for PyTorch - Avik Chaudhuri, Meta
In this session, we will introduce torch.export(), which is an Ahead-of-Time full graph capture mechanism with soundness guarantees. It is built on top of same foundational technology as torch.compile(). We recommend torch.export() for vendors integration and users who have requirements of running PyTorch program without Python.

Speakers
avatar for Avik Chaudhuri

Avik Chaudhuri

Software Engineer, Meta
Creator of @flowtype. Machine learning explorer. Rusty programming language researcher. Amateur chef. Soccer dad. Website: https://avikchaudhuri.github.io/ Twitter: @__avik Blog: https://mathydad.wordpress.com/



Tuesday October 17, 2023 4:25pm - 4:35pm PDT
Grand Peninsula Ballroom A/B/C
  Production

4:35pm PDT

Coffee Break
- Jumbo Bavarian Soft Pretzels: Stout Beer Mustard, Stone-Ground Mustard, Sharp Cheddar Cheese Sauce, Deli Mustard (NF, VT)
- Fresh Baked Cookies and More: Chocolate Chip, Peanut Butter, White Chocolate Macadamia Nut, Gluten-Free Cookies Chocolate Chip Cookies, Biscotti, Coconut, Macaroon Garnish (Includes 3% Gluten-Free Options)
- Locally Sourced Whole Fresh Fruit (per dozen) Seasonal Selection, Fully Ripened (VE)

Tuesday October 17, 2023 4:35pm - 4:55pm PDT
Grand Peninsula Foyer

4:35pm PDT

Sponsor Showcase
Tuesday October 17, 2023 4:35pm - 4:55pm PDT
Atrium

4:55pm PDT

Lightning Talk: Large-Scale Distributed Training with Dynamo and PyTorch/XLA SPMD - Yeounoh Chung & Jiewen Tan, Google
In this talk we cover PyTorch/XLA distributed API in relation with Torch.Dynamo. Specifically, we discuss the new PyTorch/XLA SPMD API for automatic parallelization and our latest LLaMA2 training results. PyTorch/XLA SPMD makes it simple for PyTorch developers to distribute their ML workloads (e.g., training & inference with Dynamo) with easy-to-use API, and uses XLA GSPMD, high-performance automatic parallelization system. Under the hood, it transforms the user single-device program into a partitioned one. We will share how we enabled advanced 2D sharding strategies for LLaMA2 using PyTorch/XLA SPMD.

Speakers
avatar for Jiewen Tan

Jiewen Tan

Staff Software Engineer, Google
Jiewen is a staff software engineer at Google's PyTorch team. He has a focus on distributed technologies that can power both large scale training and multi-device inference. He used to work at the PyTorch compiler team at Meta and contributed to LazyTensorCore and TorchDynamo.
avatar for Yeounoh Chung

Yeounoh Chung

Software Engineer, Google
Yeounoh supports the PyTorch/XLA team at Google, leading the PyTorch/XLA SPMD project. His work and interests span the PyTorch/XLA framework, Cloud TPU infrastructure and runtime.



Tuesday October 17, 2023 4:55pm - 5:05pm PDT
Cypress A/B/C
  Community/Integrations

4:55pm PDT

TorchBench: Guarding the Performance of the PyTorch Ecosystem with Continuous Benchmarking - Xu Zhao, Meta
This talk will present new features of TorchBench that make us to achieve even higher coverage of PyTorch & model benchmarking: More SOTA models in workloads; Integration with PT2 and other backends; Improvements on OSS service and infra, including userbenchmarks, model stableness improvements, and GCP A100 support.

Speakers
avatar for Xu Zhao

Xu Zhao

Research Scientist, Meta
Xu Zhao is a research scientist working at Meta Platforms Inc. His focus is on PyTorch benchmarking and performance optimization. Prior to joining Meta, Xu earned his Ph.D. and Master degree from University of Toronto.



Tuesday October 17, 2023 4:55pm - 5:20pm PDT
Sequoia A
  Applications
  • Slides Attached Yes

4:55pm PDT

Distributed Checkpoint - Iris Zhang & Chien-Chin Huang, Meta
This talk will present checkpoint features for distributed training. Distributed checkpoint support saving and loading from multiple ranks in parallel. It handles load-time resharding which enables saving in one cluster topolgy and loading to another. It also supports saving in one parallelism and loading into another. It is currently adopted by IBM, Mosaic, and XLA for FSDP checkpoint, and it is also being used for Shampoo OSS release checkpointing support. We will talk about distributed checkpoint support today and what is coming up next.

Speakers
CH

Chien-Chin Huang

Software Engineer, Meta
PyTorch Distributed
avatar for Iris Zhang

Iris Zhang

Software Engineer, Meta
PyTorch Distributed



Tuesday October 17, 2023 4:55pm - 5:20pm PDT
Grand Peninsula Ballroom D/E/F/G
  Distributed

4:55pm PDT

Llama V2 in Azure AI for Finetuning, Evaluation and Deployment from the Model Catalog - Swati Gharse, Microsoft
Llama 2 is now available in the model catalog in Azure Machine Learning. The model catalog in AzureML is your hub for foundation models. Azure native support for Llama 2 in the model catalog enables you use these models, without having to manage any of the infrastructure or environment dependencies. It provides out-of-the-box support for model finetuning and evaluation,  and includes options for optimizer libraries like DeepSpeed and ORT (ONNX RunTime), which speed up fine-tuning, and LoRA (Low-Rank Adaptation of Large Language Models), which greatly reduces memory and compute requirements for fine-tuning. Deployments of Llama 2 models in Azure have Azure AI Content Safety integrated by default, offering a built-in layered approach to safety, and following responsible AI best practices.

Speakers
avatar for Swati Gharse

Swati Gharse

Principal Product Manager, Microsoft
PM for Foundation models in Azure AI



Tuesday October 17, 2023 4:55pm - 5:20pm PDT
Grand Peninsula Ballroom A/B/C
  Production

5:10pm PDT

Lightning Talk: Streamlining Model Export with the New ONNX Exporter - Maanav Dalal & Aaron Bockover, Microsoft
Join us for a 10-minute talk on the new TorchDynamo-based ONNX exporter, redefining how we convert machine learning models to the ONNX format, with the power of PyTorch 2.0. The talk covers: Exporting with torch.onnx.dynamo_export, Inference with ONNXRuntime, and a variety of of State-of-the-Art models being converted easily. Learn about how ONNX enables more PyTorch models to be cross-platform, all the great benefits of ONNX standard, and all the other features we have baked into the new exporter, including: Symbolic model tracing: Export large models without computation cost involving original data/parameters, saving time and preventing OOM issues. ONNX Script: A new way to architect ONNX Models and the backbone of the Dynamo Exporter Preserving original modules structure as ONNX functions: allowing for a better resulting model that is layered. Better diagnostics: Clearly identify the root of your conversion issues faster!

Speakers
avatar for Aaron Bockover

Aaron Bockover

Principal Software Engineer, Microsoft
avatar for Maanav Dalal

Maanav Dalal

Program Manager, Microsoft
PM @Microsoft, working on the ONNX Exporter team. I adore learning about consumer tech and experimenting with bleeding edge software. I'm passionate about creating delightful user experiences.



Tuesday October 17, 2023 5:10pm - 5:20pm PDT
Cypress A/B/C
  Community/Integrations

5:25pm PDT

Lightning Talk: Simulating Quantum Systems with PyTorch - Pierre Guilmin, Alice & Bob
In this talk, I propose to explain why simulating quantum systems is a formidable challenge, and how leveraging modern hardware and software can result in notable performance improvement. PyTorch is ideally suited for this task, first because running solvers on GPUs results in a significant speed-up, and second because numerous tasks related to the calibration and control of quantum systems require the computation of gradients based on the time-evolved quantum state. The emerging research effort to develop quantum computers heavily relies on such tools. The dynamiqs library (https://github.com/dynamiqs/dynamiqs) is a Python library powered by PyTorch, designed to address this challenge. It provides differentiable solvers for the *Schrödinger Equation* which governs closed quantum systems, the *Lindblad Master Equation* for open quantum systems and the *Stochastic Master Equation* for continuously measured quantum systems. Gradients can be computed with PyTorch’s automatic differentiation, or using a constant memory cost method. The library is being developed by several PhD students in physics, most of whom have substantial experience in software development.

Speakers
avatar for Pierre Guilmin

Pierre Guilmin

PhD Student, Alice & Bob and Mines Paris - PSL
Former software developer (ex-Datadog) currently doing a PhD in theoretical quantum physics in Paris. I'm doing my PhD in collaboration with the French start-up Alice & Bob, whose aim is to build a universal fault-tolerant quantum computer.


slides pdf

Tuesday October 17, 2023 5:25pm - 5:35pm PDT
Sequoia A
  Applications

5:25pm PDT

Lightning Talk: Efficient Inference at the Edge: Performance You Need at the Lowest Power You Deserve - Felix Baum, Qualcomm
Most AI algorithms created for edge applications are initially developed on workstations. Developers then often struggle to get these workloads running on edge devices and achieve performance levels required for new and innovative use cases. This holds true for a wide range of applications, from IoT to automotive to XR to mobile to compute. In this session we would cover the results of the collaborative effort between PyTorch and Qualcomm teams to integrate the Qualcomm AI Stack into PyTorch 2.0 workflow and how we streamlined the path for developers from initial algorithm development to edge deployment. This would make it easy to re-target algorithms to edge hardware by supporting framework and data types that PyTorch developers are familiar with and we provide a set of tools that empower developers to extract the best performance and energy efficiency from their Android handsets to enable advanced use cases with premium features, performance boosts and power savings.

Speakers
avatar for Felix Baum

Felix Baum

Director of Product Management, Qualcomm
Felix Baum spent 20+ years in the embedded industry, both as an embedded developer and as a product manager. Currently he is responsible for AI Software Products at Qualcomm. Prior to that, he led marketing and product management efforts for various real-time operating system technologies... Read More →



Tuesday October 17, 2023 5:25pm - 5:35pm PDT
Cypress A/B/C
  Community/Integrations

5:25pm PDT

Composable Distributed PT2(D) - Wanchao Liang, Meta Platforms, Inc.
In this session, we will explore the technology advancements of PyTorch Distributed, and dive into the details of how multi-dimensional parallelism is made possible to train Large Language Models by composing different PyTorch native distributed training APIs.

Speakers
avatar for Wanchao Liang

Wanchao Liang

Software Engineer, Meta Platforms, Inc.
Software Engineer at Meta, PyTorch Team. Tech Lead in PyTorch Distributed training. Author of DTensor, a fundamental distributed abstraction to perform distributed computation. Previously worked on the TorchScript compiler, ONNX.



Tuesday October 17, 2023 5:25pm - 5:50pm PDT
Grand Peninsula Ballroom D/E/F/G
  Distributed
  • Audience Experience Level Advanced
  • Slides Attached Yes

5:25pm PDT

The Evolving Landscape of Dataloading - Laurence Rouesnel, Meta
Data loading is a critical component in every ML training system. This session covers the dataloading in and around PyTorch today, the pain-points we hear from users, and industry trends. We focus on how we are thinking about our systems as we see models, datasets, and hardware continue to scale.

Speakers
avatar for Laurence Rouesnel

Laurence Rouesnel

Software Engineering Manager, Meta
Laurence Rouesnel is an engineering manager at Meta where he supports the PyTorch Extended team who build open-source libraries like TorchVision, TorchAudio, TorchRec and TorchMultimodal. He is interested in deep learning, distributed systems, and helping to scale model authoring... Read More →



Tuesday October 17, 2023 5:25pm - 5:50pm PDT
Grand Peninsula Ballroom A/B/C
  Production
  • Slides Attached Yes

5:40pm PDT

Lightning Talk: A Novel Domain Generalization Technique for Medical Imaging Using PyTorch - Dinkar Juyal, PathAI
Domain generalization is critical for real-world applications of machine learning models to medical imaging. Variation in histopathology images arises through a complex combination of factors relating to tissue collection and laboratory processing, as well as factors intrinsic to patient samples. Therefore, augmentation-based methods of domain generalization that require domain identifiers and manual fine-tuning are inadequate in this setting. To overcome this challenge, we introduce ContriMix, a domain generalization technique that learns to generate synthetic images by disentangling and permuting the biological content ("content") and technical variations ("attributes") in images. ContriMix does not rely on domain identifiers or handcrafted augmentations and makes no assumptions about the input characteristics of images. ContriMix produces SOTA results on Camelyon17 dataset in Stanford WILDS public leaderboard. ContriMix is developed entirely in PyTorch. The modular nature of PyTorch enables the use of ContriMix as an easy and intuitive plug-and-play setup to generate realistic synthetic medical images at the time of model training. Inference code is available.

Speakers
avatar for Dinkar Juyal

Dinkar Juyal

Senior Machine Learning Engineer, PathAI
Dinkar Juyal is a Senior Machine Learning Engineer at PathAI. His work has centered around topics such as generalization, interpretability and robustness of ML models, as well as building ML products for BioTech. He received his masters's degree in Operations Research from University... Read More →



Tuesday October 17, 2023 5:40pm - 5:50pm PDT
Sequoia A
  Applications

5:40pm PDT

Lightning Talk: Accelerating LLM Training on Cerebras Wafer-Scale Cluster - Mark Browning; Natalia Vassilieva; Behzad Abghari & Emad Barsoum, Cerebras
Large Language Model (LLM) have taken the world by storm; however, a few handfuls of companies can train such foundational models. On this talk, we will discuss the integration of Cerebras Wafer-Scale Clusters with PyTorch 2.0 LTC backend and the technical challenges to enable training such large model efficiently and seamlessly in order to act as a single accelerator regardless of the number of systems used. Another crucial piece of such integration is our collaboration with the open-source community on Torch-MLIR which help benefit the PyTorch community at large especially in canonicalizing multiple PyTorch backend to a unified ATen MLIR dialect, which enable multiple hardware backend integration with multiple lowering frontend (i.e. TorchScript, LTC, TorchDynamo...etc). Furthermore, we present our architecture for representing weight sparsity with both static and dynamic model pruning. A few convenient PyTorch utilities enable practitioners to take advantage of our sparsity-first hardware to decrease training time and enable efficient model deployment.

Speakers
avatar for Behzad Abghari

Behzad Abghari

Tech Lead, Cerebras Systems
Behzad Abghari is a senior engineer with experience in designing and supporting various deep learning frameworks, such as Caffe, Theano, TensorFlow, and PyTorch, on different hardware backends, including Qualcomm SNPE, Android NNAPI, and Cerebras Wafer-Scale Cluster. He is now a technical... Read More →
avatar for Emad Barsoum

Emad Barsoum

Senior Director of AI, Cerebras
Emad Barsoum is an experienced executive, architect and researcher in the areas of Computer Vision, NLP, Deep Learning and AI frameworks. He is currently the Head of AI at Cerebras leading a great team of applied researchers and engineers to enable large language model and high-resolution... Read More →
avatar for Mark Browning

Mark Browning

Distinguished Engineer, Cerebras Systems
ML compiler developer
avatar for Natalia Vassilieva

Natalia Vassilieva

Sr. Director of Product, Cerebras Systems
Natalia Vassilieva is a Sr. Director of Product at Cerebras Systems, a computer systems company dedicated to accelerating deep learning. She leads the vision and strategy for Cerebras products, market, application, and algorithm analysis for machine learning use cases. Prior to Cerebras... Read More →



Tuesday October 17, 2023 5:40pm - 5:50pm PDT
Cypress A/B/C
  Community/Integrations

5:55pm PDT

Lightning Talk: Leveraging PyTorch 2.0 for Bias Reduction in AI - Christina Zhu, Visier
As a software developer, it's staggering to think about the incredible capabilities of AI, but it's vital not to overlook the ethical dimensions that surround its application. One of these ethical dimensions revolves around bias—an issue that, if left unchecked, can inadvertently reinforce societal prejudices and deepen existing inequalities, especially when it comes to inclusivity. I'm excited to explore this topic at this year's PyTorch conference, focusing on the usage of PyTorch 2.0 to reduce bias in AI models. It should be a quick lightning talk with some easy tips on how to do it while training your AI models. The talk aims to underscore the silent challenge of bias in AI. It's an issue I feel warrants more attention, and I hope to contribute to that discussion. My hope is that attendees will walk away with a heightened understanding of bias in AI and the challenges it presents. More critically, I hope to arm them with the knowledge of how tools like PyTorch 2.0 can make a real difference in this fight.

Speakers
avatar for Christina Zhu

Christina Zhu

Developer Relations Manager, Visier
Christina is a Developer Relations Manager at Visier, a hackNY Fellow, a mentor at Girls Who Code, and the Co-Founder of HackDavis. She has previously worked in Developer Experience at Square and at Amazon and has been awarded with the F8 Scholarship by Facebook, Grace Hopper Scholar... Read More →



Tuesday October 17, 2023 5:55pm - 6:05pm PDT
Sequoia A
  Applications
  • Audience Experience Level Beginner
  • Slides Attached Yes

5:55pm PDT

Lightning Talk: Accelerating PyTorch Performance with OpenVINO - Yamini Nimmagadda, Devang Aggarwal & Mustafa Cavus, Intel
Intel® Distribution of OpenVINO™ Toolkit optimizes performance and efficiency of deep learning inference across diverse and heterogeneous hardware like CPUs, Intel integrated and discrete GPUs, and VPUs, with a simplified “write once, deploy everywhere” approach. In this session, we will show the benefits of optimizing PyTorch models with OpenVINO. Converting PyTorch models to ONNX and subsequently loading them into the OpenVINO runtime for optimized inference has been adopted by developers for a while. More recently, we have developed a PyTorch frontend that enables direct consumption of PyTorch models with OpenVINO, without needing the conversion to ONNX. Additionally, with the advent of PyTorch 2.0, we have pushed the boundaries further by seamlessly incorporating OpenVINO as a TorchDynamo backend with torch.compile to simplify the development process further while inferencing with PyTorch APIs. During our presentation, we will demonstrate the practical implementation of each of these techniques by providing example usage of the relevant APIs. We will also highlight the accelerated performance of state-of-the-art PyTorch models using OpenVINO across a range of Intel devices.

Speakers
MC

Mustafa Cavus

AI Frameworks Engineer, Intel
avatar for Yamini Nimmagadda

Yamini Nimmagadda

Principal AI Engineer, Intel
Yamini Nimmagadda is a principal engineer at Intel, currently focused on optimizing AI frameworks with OpenVINO. Her areas of expertise include AI frameworks, runtimes, and cloud technologies. Yamini holds a PhD degree in Electrical and Computer Engineering from Purdue University... Read More →
DA

Devang Aggarwal

Product Manager, Intel



Tuesday October 17, 2023 5:55pm - 6:05pm PDT
Cypress A/B/C
  Community/Integrations

5:55pm PDT

Lessons Learned in WatsonX Training: Scaling Cloud-Native PyTorch FSDP to 20B Parameters - Davis Wertheimer & Supriyo Chakraborty, IBM
In this talk we will cover lessons learned along our almost year-and-a-half journey scaling up the WatsonX.AI stack for foundation model pretraining. Starting from 100M parameters on bare metal, we scaled PyTorch training to 20B parameters on cloud-based multi-node systems. We'll discuss the challenges encountered along the way, as well as the solutions we employed. This includes working with the PyTorch team to field test Fully-Sharded and Hybrid-Shard Data Parallel update protocols (FSDP/HSDP), as well as handling the associated communication vs computation bottlenecks, which are not always straightforward. We'll also review our collaboration on cloud-native distributed checkpointing, and development of a stateful and scalable distributed dataloader, allowing us to restart unstable jobs mid-epoch without revisiting stale data. And finally, we'll cover ongoing and upcoming challenges, like maintaining job stability and tensor parallelism integration.

Speakers
avatar for Supriyo Chakraborty

Supriyo Chakraborty

Senior Research Scientist, IBM Research
Dr. Supriyo Chakraborty is a senior research scientist working with the Distributed Training group at IBM Research. In this role, he is responsible for designing efficient architectures and training mechanisms for large foundation models.
avatar for Davis Wertheimer

Davis Wertheimer

Research Scientist, IBM
Davis Wertheimer comes from a research background in few-shot learning and machine learning under constraints. He now researches and develops AI models for IBM, training and accelerating large language models (learning under a very different set of constraints!). Also a fractal artist... Read More →



Tuesday October 17, 2023 5:55pm - 6:20pm PDT
Grand Peninsula Ballroom D/E/F/G
  Distributed

5:55pm PDT

Cost Effectively Deploy Thousands of Fine Tuned Gen AI Models Like Llama Using TorchServe on AWS - Saurabh Trikande, Amazon Web Services (AWS); Li Ning, Amazon
As Generative AI adoption accelerates across industry, organizations want to deliver hyper-personalized experiences to end users. For building such experiences, thousands of models are being developed by fine-tuning pre-trained large models. To meet their stringent latency and throughput goals, organizations use GPU instances to deploy such models. However, inference costs can add up quickly if deploying thousands of models and provisioning dedicated hardware for each. TorchServe offers feature likes open platform, deferred distribution initialization, model sharing and heterogeneous deployment that make it easy for users to deploy fine tuned large models and save cost. Learn how organization can use these features in conjunction with fine tuning techniques like PEFT (Parameter Efficient Fine Tuning) and use Amazon SageMaker Multi-Model Endpoint (MME) to deploy multiple GenAI models on the same GPU, share GPU instances across thousands of GenAI models, and dynamically load/unload models based on incoming traffic. All of which helps you significantly reduce the cost. Finally we showcase example code for deploying multiple Llama based models which are fine tuned using PEFT on MME.

Speakers
avatar for Li Ning

Li Ning

Sr. Software Engineer, Amazon
Li is a senior software engineer at AWS with a specialization in building large-scale AI solutions. As a tech lead for TorchServe, a project jointly developed by AWS and Meta, her passion lies in leveraging PyTorch and AWS SageMaker to help customers embrace AI for the greater good... Read More →
avatar for Saurabh Trikande

Saurabh Trikande

Senior Product Manager, Amazon Web Services (AWS AI)
Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing machine learning. He focuses on core challenges related to deploying complex ML applications, multi-tenant ML models... Read More →



Tuesday October 17, 2023 5:55pm - 6:20pm PDT
Grand Peninsula Ballroom A/B/C
  Production
  • Audience Experience Level Advanced
  • Slides Attached Yes

6:10pm PDT

Lightning Talk: Dinosaur Bone Hunt - Bob Chesebrough, Intel
In this talk I will describe how to build an AI fossil-hunting tool based on PyTorch and the Intel AI Analytics Toolkit to create a bone likelihood map to help guide your next dinosaur hunt! All of this will be shown on CPUs using PyTorch. To start with, I will walk through dinosaur bone identification and the importance of sediment deposits in locating dinosaur bones. Then I will show you how to decompose an image classification problem, like dinosaur fossil hunting, into a few key components: building context from data, proper data representation to this model, model definition/training, and producing actionable insights from model predictions. The author has used this model to find new dinosaur bones in the Morrison Formation!

Speakers
avatar for Bob Chesebrough

Bob Chesebrough

Sr Solution Architect, Intel
Bob Chesebrough's industry experience is software development/AI solution engineering for fortune 100 companies and national laboratories for over three decades. He is also an hobbyist who has logged over 800 miles and 1000 hours in the field finding dinosaur bones. He and his sons... Read More →



Tuesday October 17, 2023 6:10pm - 6:20pm PDT
Sequoia A
  Applications
  • Audience Experience Level Beginner
  • Slides Attached Yes

6:10pm PDT

Lightning Talk: Orchestrating Machine Learning on Edge Devices with PyTorch and WebAssembly - Rishit Dagli, University of Toronto (Vector Institute and DGP Lab), Civo & Shivay Lamba, meilisearch
Edge computing is becoming increasingly popular for applications that require low latency and high bandwidth. However, building machine learning runtimes for edge computing infra that is both scalable and optimized can be challenging. In this talk, we explore how PyTorch and WebAssembly (Wasm) can be used to build efficient edge computing runtimes. We then discuss how Wasm, a low-level bytecode format, can be used to effectively run PyTorch models and specific optimizations one could make in the models to run on the edge with Wasm, which provides a lightweight and portable runtime environment for edge applications, while we also introduce Akri, an open-source project in this context which allows us to easily discover edge devices to run the PyTorch model on. We will also cover some use cases where PyTorch and Wasm can be used together, such as building machine learning models that can run on edge devices or processing sensor data in real-time. We also share some best practices by showing how we run Neural Radiance Fields on the edge using this setup. The audience will gain a better understanding of how they can use PyTorch to run scalable and optimized machine learning on edge.

Speakers
avatar for Rishit Dagli

Rishit Dagli

Student, Research Scientist, University of Toronto, Civo
I am a CS Freshman at University of Toronto. I love researching and working with Machine Learning, especially Computer Vision  I also maintain/contribute extensively to popular open-source projects like TensorFlow, PyTorch, Kubernetes, Kubeflow among others. I also love building... Read More →
avatar for Shivay Lamba

Shivay Lamba

Ambassador, WASMEdge
Shivay Lamba is a software developer specializing in DevOps, Machine Learning and Full Stack Development. He is an Open Source Enthusiast and has been part of various programs like Google Code In and Google Summer of Code as a Mentor and is currently a MLH Fellow. He has also worked... Read More →



Tuesday October 17, 2023 6:10pm - 6:20pm PDT
Cypress A/B/C
  Community/Integrations
 
Filter sessions
Apply filters to sessions.