
AGENDA
Agenda
The SOS27 Workshop theme is Fostering Innovation at Scale Beyond the Flops.
The conference will be held at the Hotel Bellevue Terminus, Engelberg, Switzerland.
Monday, March 17, 2025
Independent arrival and check-in.
18:30 – 19:00 Registration – Foyer Hotel Bellevue Terminus, Engelberg
18:30 – 20:00 Welcome Reception – Hotel Bellevue Terminus, Engelberg
Tuesday, March 18, 2025
Program
08:00 – 08:10 Welcome and Overview, Joost VandeVondele, Swiss National Supercomputing Centre
Session I: Running Machine Learning at Scale (Session Chair: Stefano Schuppli, Swiss National Supercomputing Centre)
08:10 – 08:30 “ORBIT: AI Foundation Model for Earth System Modeling”, Dan Lu, Oak Ridge National Laboratory
08:30 – 08:50 “Production-Scale Generative AI for Science at NERSC”, Steven Farrell, Lawrence Berkeley National Laboratory
08:50 – 09:10 “Observations from Planning and Operating Very Large Scale AI Infrastructure”, Andrew Jones, Microsoft
09:10 – 09:30 “SAILOR: Fast, Cost-Effective ML Training”, Foteini Strati, ETH Zurich
09:30 – 09:50 “Experiences with Scaling AI Workloads on Aurora”, Väinö Hatanpää, Argonne National Laboratory
09:50 – 10:00 Panel discussion
10:00 – 10:30 Coffee Break
10:30 – 10:50 “AI Challenges and Opportunities for Future Computing”, Christoph Hagleitner, IBM Research Europe – Zurich
Session II: The Role of Non-GPU Accelerators and Chiplets in Future HPC Systems (Session Chair: Andrew Younge, Sandia National Laboratories)
10:50 – 11:10 “NextSilicon Intelligent Compute Accelerator”, Oded Margalit, NextSilicon
11:10 – 11:30 “The Unfortunate Economics of Highly-Specialized Chiplets”, Ben Feinberg, Sandia National Laboratories
11:30 – 11:50 “Unlocking Custom Silicon through Chiplet Standards Ecosystem”, Joshua Randall, Arm
11:50 – 12:10 ”Rethinking the Control Plane for Chiplet-Based Heterogeneous Systems”, Matthew D. Sinclair, University of Wisconsin
12:10 – 12:30 Panel Discussion
12:30 – 17:20 Lunch and Afternoon Free (ski/side meetings)
17:20 – 17:40 “Technologies for Future HPC and AI Systems”, Robert W. Wisniewski, HPE
Session III: DevSecOps in HPC: Towards Automatic Deployment and Zero Downtime (Session Chair: Maxime Martinasso, Swiss National Supercomputing Centre)
17:40 – 18:00 “Google Cluster Toolkit“, Carlos Boneti, Google Cloud
18:00 – 18:20 “OpenCHAMI and IaC: DevSecOps Workflows for HPC Cluster Management”, Alex Lovell-Troy, Los Alamos National Laboratory
18:20 – 18:40 “Service Management and the vCluster Technology”, Miguel Gila, Swiss National Supercomputing Centre
18:40 – 19:00 “Towards DevSecOps: Cybersecurity and IAM for AI and HPC Digital Research Infrastructures”, Anna Price, Bristol Centre for Supercomputing (BriCS)
19:00 – 19:20 Panel Discussion
19:20 Independent Evening Program
Wednesday, March 19, 2025
Program
08:00 – 08:20 “Energy Efficient Computing for HPC and AI”, Michael Schulte, AMD
Session IV: Trustworthy and Energy Efficient Foundation Models for Science (Session Chair: Prasanna Balaprakash, Oak Ridge National Laboratory)
08:20 – 08:40 “The Climate System Foundation Model Landscape, Trustworthiness, Challenges and State-of-the Art”, Sebastian Schemm, ETH Zurich
08:40 – 09:00 “Sandia’s Journey toward Trustworthy and Efficient Foundation Models”, Justin Newcomer, Sandia National Laboratories
09:00 – 09:20 “Scalable Training of Trustworthy and Energy-Efficient Predictive Graph Foundation Models for Atomistic Materials Modeling: A Case Study with HydraGNN”, Massimiliano Lupo Pasini, Oak Ridge National Laboratory
09:20 – 09:40 “Explainable AI for Climate with a Focus on Extreme Events”, Gianmarco Mengaldo, National University of Singapore
09:40 – 10:00 Panel discussion
10:00 – 10:30 Coffee Break
10:30 – 10:50 “Maintaining HPC Debugging and Performance Engineering Leadership”, Rudy Shand, Linaro
Session V: HPC Software Sustainability and Stewardship (Session Chair: Ron Brightwell, Sandia National Laboratories)
10:50 – 11:10 “Software Sustainability through Community Building”, Christian Trott, Sandia National Laboratories
11:10 – 11:30 “Sustainable HPC Software: Lessons from the Trenches (A Maintainer’s Perspective)”, Damien Lebrun-Grandie, Oak Ridge National Laboratory
11:30 – 11:50 “S4PST, Stewardship for Programming Systems”, Keita Teranishi, Oak Ridge National Laboratory
11:50 – 12:10 “Embedding Weather and Climate Applications in the Python Ecosystem: the Case of GT4Py”, Mauro Bianco, Swiss National Supercomputing Centre
12:10 – 12:30 Panel Discussion
Lunch and Afternoon Free (ski/side meetings)
18:30 Night Out Event
Thursday, March 20, 2025
Program
08:00 – 08:20 “HPC Datacenter of the Future”, Robert Triendl, DDN
Session VI: Sensitive Data in HPC: Technology and Processes for Medical Platforms (Session Chair: Pim Witlox, Swiss National Supercomputing Centre)
08:20 – 08:40 “Trusted Research Environments for Handling Sensitive Data in Research: A Swiss Experience and Perspective”, Sergio Maffioletti, ETH Zurich
08:40 – 09:00 “Democratizing AI for Cancer with Privacy Preserving Synthetic Data Generation for Cancer Case Identification”, John Gounley, Oak Ridge National Laboratory
09:00 – 09:20 “Toward a Health Data Commons for Arizona”, Arthur “Barney” Maccabe, University of Arizona
09:20 – 09:40 “Challenges in Hosting Sensitive Data Workflows on Shared HPC Systems”, Fredrik Robertsén, CSC – IT Center for Science
09:40 – 10:00 Panel discussion
10:00 – 10:30 Coffee Break
Session VII: Incorporating New Technologies and Emerging Workloads into Large Scale HPC in the Next Decade (Session Chair: Christopher Zimmer, Oak Ridge National Laboratory)
10:30 – 10:40 Introduction, Christopher Zimmer, Oak Ridge National Laboratory
10:40 – 11:00 “Ultra Ethernet: An HPC and AI Interconnection Network Specification to Empower the Ethernet Ecosystem”, Torsten Hoefler, ETH Zurich
11:00 – 11:20 “Is 95%+ FLOPS or Compute Utilization a Relevant Metric for AI Workloads?”, Sadaf Alam, University of Bristol
11:20 – 11:40 “An Open-Source Framework to Integrate Quantum Computing with HPC”, Frank Mueller, North Carolina State University
11:40 – 12:00 “What’s Needed to Make our HPC Centers More Cloudy?”, Todd Gamblin, Lawrence Livermore National Laboratory
12:00 – 12:20 “The Federated Computing Environment for Autonomous Smart Laboratories”, Christian Engelmann, Oak Ridge National Laboratory
12:20 – 12:50 Session: Crystal Ball (Session Chair: Joost VandeVondele, Swiss National Supercomputing Centre)
Douglas Kothe, Sandia National Laboratories
Tjerk P. Straatsma, Oak Ridge National Laboratory
Joost VandeVondele, Swiss National Supercomputing Centre
12:50 – 13:00 Invitation to SOS28, Ron Brightwell, Sandia National Laboratories
13:00 – 14:15 Lunch