Machine Learning Seminar
About Joint Machine Learning Seminar Series: Our newly launched Joint Machine Learning Seminar Series is a collaborative initiative across three schools at the University of Sydney, co-organized by Dr. Chang Xu (School of Computer Science), Prof. Dmytro Matsypura (Business School), and Yiming Ying (School of Mathematics & Statistics). The goal of this initiative is to foster interdisciplinary interaction and collaboration on cutting-edge research in Machine Learning (ML) and Artificial Intelligence (AI). We welcome suggesions of potential future speakers for this seminar series.
Future Seminars: To maintain a high-quality seminar series, we aim to feature speakers with impactful contributions to ML and AI research. However, if no suitable speaker is available for a given session, we will organize canned seminar talks in the School of math and statistics, focusing on the mathematical and statistical aspects of machine learning, ensuring continuous engagement with fundamental and advanced topics in the field. We invite researchers, faculty, and students across disciplines to join us for this engaging talk and networking opportunity over coffee!
Please direct enquiries about this seminar series to Yiming Ying.
Seminars
Monday, Apr 20, 2026, 1pm-2pm SMRI Seminar Room (A12-03-301) A12 Macleay Building, Level 3, Room 301.
Speaker: Dong Gong (UNSW)
Title: A Thousand Faces of Continual Learning: Self-Improving AI with Modular Learning and Memory
Abstract:
No matter how capable, a static AI cannot meet the demands of real-world deployment, where tasks shift, knowledge ages, users differ, and environments evolve. Yet the vast majority of machine learning methods operate under an essentially static paradigm: train once, freeze, deploy. Enabling models, including today's foundation models, to keep learning is the central promise of continual learning, and, more ambitiously, of *self-improving AI*. It is also where one of the field's most stubborn challenges lives: catastrophic forgetting, the tendency of models to lose what they have already learned as soon as they acquire something new. Continual learning is therefore not optional but demanded. And yet, in the current landscape, it is often dismissed from two opposite directions — either declared impossible, or quietly claimed to be "already solved". In this talk, I want to surface a realistic scene — *a thousand faces of continual learning*. It is not a single algorithm but a spectrum of mechanisms, spanning training-time updates, parameter editing, modular expansion, associative memory, and test-time adaptation, embodying different expectations about an AI system. At its core, continual learning is fundamentally a question of *learning* and *memory*: what to change, what to preserve, and how knowledge lives. I will share my perspective on modular learning and memory that localise change, mitigate catastrophic forgetting, and scale naturally to foundation models. I will then present several of our recent works that instantiate this philosophy: dynamic Mixture-of-Experts for model expansion, rank-1 fine-grained memory for precise knowledge injection, on-demand expansion driven by task difficulty, and dynamic test-time self-adaptation. Together, these span LLMs, multimodal LLMs, diffusion models, and agentic systems, pointing toward AI that continues to evolve after pre-training ends.
Speaker Bio: Dr. Dong Gong is a Senior Lecturer and ARC DECRA Fellow (2023–2026) at the School of Computer Science and Engineering, University of New South Wales (UNSW), where he is running a research group. He also holds an adjunct position at the Australian Institute for Machine Learning (AIML), where he was previously a Research Fellow after completing his PhD in December 2018. His research focuses on building self-improving AI systems and continual learning methods that can learn continuously, adapt reliably, and scale efficiently in open-ended and non-stationary environments, with a particular interest in developing these with realistic application on LLMs, VLMs, and diffusion generative models. He has been actively serving the research community as Area Chair or reviewers for conferences such as CVPR, NeurIPS, ICML, ICLR, ICCV, ACM MM, etc, and was awarded as outstanding reviewers for NeurIPS’18 and outstanding AC for ACM MM’24. Homepage: https://donggong1.github.io/Monday, Apr 13, 2026, 1pm-2pm SMRI Seminar Room (A12-03-301) A12 Macleay Building, Level 3, Room 301.
Speaker: Li Chen (USYD)
Title: Managing Inventory and Pricing with Contextual Robust Optimization
Abstract:
Multiproduct inventory and pricing problems are traditionally approached by estimating a presumed "sufficiently accurate" demand model and then optimizing with it to determine optimal inventory and pricing decisions. However, obtaining an accurate demand model is nearly impossible due to unobservable parameters, resulting in parameter uncertainty; meanwhile, the unknown distribution of the error term in the stochastic demand model introduces residual ambiguity. Additionally, the predicted demand is endogenously affected by pricing, leading to decision-dependent predictions that often brings about intractable bilinear optimization problems. We introduce a contextual robust optimization model that addresses these challenges simultaneously. Our proposed model possesses attractive finite-sample performance guarantees and can be effectively approached using an enhanced affine recourse adaptation to address intractability. Our framework can be readily extended to broader contextual decision-making problems under mild conditions. Extensive numerical studies demonstrate the effectiveness of our approach, showing that it outperforms the conventional estimate-then-optimize approach and the residual-based robust optimization approach that does not account for parameter uncertainty, particularly when the available data is limited. Notably, our proposed model exhibits greater resilience when contextual information is disregarded, reflecting practical situations in which collecting such information might be impossible or costly.
Monday, Mar 23, 2026, 1pm-2pm SMRI Seminar Room (A12-03-301) A12 Macleay Building, Level 3, Room 301.
Speaker: Dingxuan Zhou (USYD)
Title: Mathematical theory of structured deep neural networks
Abstract:
Deep learning has been widely applied and brought breakthroughs in speech recognition, computer vision, natural language processing, and many other domains. The involved deep neural network architectures and computational issues have been well studied in machine learning. But there is much less theoretical understanding about the modelling, approximation or generalization abilities of deep learning models with network architectures. Important families of structured deep neural networks include deep convolutional neural networks induced by convolutions and transformers by attentions. The architectures give essential differences between such structured networks and fully-connected ones. This talk describes some approximation and generalization analysis of deep convolutional neural networks and transformers.
Monday, Mar 16, 2026, 1pm-2pm SMRI Seminar Room (A12-03-301) A12 Macleay Building, Level 3, Room 301.
Speaker: Pulin Gong (USYD)
Title: Shared principles of biological and artificial neural networks
Abstract:
Biological neural networks in the brain and deep neural networks (DNNs) in AI are both built from the interactions of large numbers of basic units (neurons), from which powerful information-processing capabilities emerge. In this presentation, I will discuss several principles shared by these two classes of complex systems. First, I will show that heterogeneous synaptic weights, as observed in the brain and in pretrained DNNs, give rise to a distinct dynamical regime in which neural representations display a multiscale mixture of localized and delocalized features. This suggests a common organizational principle underlying representation and computation in both biological and artificial networks. I will then turn to a second shared principle: rich neural sampling dynamics. In the brain, such dynamics support flexible cognitive functions such as attention, while in deep learning they help optimizers navigate complex loss landscapes and find solutions with good generalization performance.
Monday, March 2, 2026, 1pm-2pm SMRI Seminar Room (A12-03-301) A12 Macleay Building, Level 3, Room 301.
Speaker: Dr. Andi Han (University of Sydney)
Title: Random submanifold methods for efficient optimization on matrix manifolds
Abstract:
Riemannian optimization provides a principled framework for solving constrained optimization problems. Its central idea is to compute a search direction in the tangent space and then update the variables via a retraction. A key computational bottleneck, however, is the retraction step, which is necessary to ensure the feasibility of updates on the manifold. As the problem dimension grows, this cost can limit the applicability of Riemannian optimization to large-scale problems. This talk introduces approaches that perform each update on randomly selected submanifolds, thereby significantly reducing the per-iteration computational complexity while achieving similar convergence guarantees.
Speaker Bio: Andi Han recently joined the School of Mathematics and Statistics, University of Sydney, as a Lecturer in Data Science. Before joining USYD, he was a Postdoctoral Researcher at RIKEN AIP, Continuous Optimization Team. He completed his PhD in Business Analytics at USYD. His research broadly covers large generative models, optimization (on manifolds), efficiency of foundation models and graph neural networks with applications to biology and chemistry.Monday, Feb 23, 2026, 1pm-2pm SMRI Seminar Room (A12-03-301) A12 Macleay Building, Level 3, Room 301.
Speaker: Rafael Oliveira (CSIRO)
Title: Thompson Sampling in Function Spaces via Neural Operators
Abstract:
We propose an extension of Thompson sampling to optimisation problems in function spaces where the objective is a known functional of an unknown operator's output. We assume that queries to the operator (such as running a high-fidelity simulator or physical experiment) are costly, while functional evaluations given the operator's output are inexpensive. Our algorithm employs a sample-then-optimise approach using neural operator surrogates. This strategy avoids explicit uncertainty quantification by treating trained neural operators as approximate samples from a Gaussian process (GP) posterior. We derive regret bounds and theoretical results connecting neural operators with GPs in infinite-dimensional settings. Experiments benchmark our method against other Bayesian optimisation baselines on functional optimisation tasks involving partial differential equations of physical systems, demonstrating better sample efficiency and significant performance gains.
Monday, August 18, 2025, 1pm-2pm (Different time this semester!) Carslaw 375
Speaker: Dr. Sergey Dolgov (University of Bath)
Title: Low-rank approximations for large-scale nonlinear feedback control
Abstract:
Computation of the optimal feedback law for general (nonlinear/unstable/stochastic) dynamical systems requires solving the Hamilton-Jacobi-Bellman Partial Differential Equation (PDE), which suffers from the curse of dimensionality. We develop a unified framework for computing a fast surrogate model of the feedback control function based on low-rank decompositions of matrices and tensors. Firstly, we propose a Statistical Proper Orthogonal Decomposition (SPOD) for Model Order Reduction of very high-dimensional systems, such as the discretized Navier-Stokes equation or other PDEs, by compressing snapshots corresponding to random samples of all parameters in the system, initial condition and time. Secondly, we compute a low-rank Functional Tensor Train (TT) approximation of the feedback control function for the reduced model. Thus pre-trained TT representation of the control function of the reduced state can be used for real-time online generation of the control signal. Using the proposed SPOD and TT approximations, we demonstrate a controller computable in milliseconds that achieves lower vorticity of the Navier-Stokes flow with random inflow compared to using the mean inflow to produce reduced bases or controllers.Speaker Bio: TBA
Monday, June 2, 2025, 11am-12pm Carslaw 173
Speaker: Dr. Pinak Mandal (University of Sydney)
Title: Learning Dynamical Systems with Hit-and-Run Random Feature Maps
Abstract:
Forecasting chaotic dynamical systems is a central challenge across science and engineering. In this talk, we will explore how random feature maps can be adapted to deliver remarkably strong performance on this task. A key ingredient for successful forecasting is ensuring that the features produced by the model lie in the nonlinear region of the activation function. We will see how this can be achieved through careful selection of the internal weights in a data-driven way using a hit-and-run algorithm. With a few additional modifications, such as increasing the depth of the model and introducing localization, we achieve state-of-the-art forecasting results on a variety of high-dimensional chaotic systems, reaching up to 512 dimensions. Our method produces accurate short-term trajectory predictions, as well as reliable estimates of long-term statistical behavior in the test cases.
Speaker Bio: Pinak Mandal is a postdoctoral researcher at the University of Sydney, specializing in machine learning and dynamical systems. His work spans several topics, including unlearning in generative models, learning dynamical systems from data, deep learning for solving PDEs, and data assimilation. He also develops open-source tools for scientific computing and visualization.Monday, MAY 19, 2025, 11:00am, J12 (CS building) lecture theatre 123
Speaker: Dino Sejdinovic (University of Adelaide)
Title: Squared Neural Probabilistic Models
Abstract:
We describe a new class of probabilistic models, squared families, where densities are defined by squaring a linear transformation of a statistic and normalising with respect to a base measure. Key quantities, such as the normalising constant and certain statistical divergences, admit a helpful parameter-integral decomposition giving a closed form normalising constant in many cases of interest. Parametrising the statistic using neural networks results in highly expressive yet tractable models, with universal approximation properties. This approach naturally extends to other probabilistic settings, such as modelling point processes. We illustrate the effectiveness of squared neural probabilistic models on a variety of tasks, demonstrating their ability to represent complex distributions while maintaining analytical and computational advantages. Joint work with Russell Tsuchida, Jiawei Liu, and Cheng Soon Ong.
Speaker Bio: Dino Sejdinovic is a Professor of Statistical Machine Learning at the University of Adelaide (since 2022), where he is affiliated with the Australian Institute for Machine Learning (AIML) and the Responsible AI Research Centre (RAIR). He also holds visiting appointments with the Nanyang Technological University, Singapore and the Institute of Statistical Mathematics, Tokyo. He was previously an Associate Professor at the Department of Statistics, University of Oxford and a Turing Faculty Fellow of the Alan Turing Institute. He held postdoctoral positions at the University College London and the University of Bristol and received a PhD in Electrical and Electronic Engineering from the University of Bristol (2009). His research spans a wide variety of topics at the interface between machine learning and statistical methodology, including large-scale nonparametric and kernel methods, robust and trustworthy machine learning, causal inference, and uncertainty quantification.
