I am interested in emergence, AI, and physics. My current focus is on 1) (empirical) theories of deep learning & deep learning understanding, and 2) applying deep learning methods to the physical sciences, especially astrophysics and quantum. I’m especially keen on neural network scaling & its benefits.

20. Adversarial vulnerability of powerful near out-of-distribution detection
Stanislav Fort

Large models and massive pretraining have significantly improved near-OOD detection. Despite that, we show that even the most well-performing methods are very brittle when we design adversarial attacks specifically against their OOD score. The code is available on GitHub.

19. How many degrees of freedom do we need to train deep networks: a loss landscape perspective
Brett W. Larsen, Stanislav Fort, Nic Becker, Surya Ganguli

Deep neural networks are capable of training and generalizing well in many low-dimensional manifolds in their weights. We explain this phenomenon by first examining the success probability of hitting a training loss sublevel set when training within a random subspace of a given training dimensionality using Gordon's escape theorem.

18. A Simple Fix to Mahalanobis Distance for Improving Near-OOD Detection
Jie Ren, Stanislav Fort, Jeremiah Liu, Abhijit Guha Roy, Shreyas Padhy, Balaji Lakshminarayanan

We analyze the failure modes of the Mahalanobis distance method for near-OOD detection and propose a simple fix called relative Mahalanobis distance (RMD) which improves performance and is more robust to hyperparameter choice.

Accepted at the Uncertainty & Robustness in Deep Learning workshop at ICML 2021.

17. Exploring the Limits of Out-of-Distribution Detection
Stanislav Fort, Jie Ren, Balaji Lakshminarayanan

We improve the AUROC from 85% (current SOTA) to more than 96% using Vision Transformers pre-trained on ImageNet-21k. On a challenging genomics OOD detection benchmark, we improve the AUROC from 66% to 77% using transformers and unsupervised pre-training. For multi-modal image-text pre-trained transformers such as CLIP, we explore a new way of using just the names of outlier classes as a sole source of information without any accompanying images.

Accepted at the Uncertainty & Robustness in Deep Learning workshop at ICML 2021.

16. Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error
Stanislav Fort, Andrew Brock, Razvan Pascanu, Soham De, Samuel L. Smith

By applying augmentation multiplicity to the recently proposed NFNet model family, we achieve a new ImageNet SotA of 86.8% top-1 accuracy without extra data after just 34 epochs of training with an NFNet-F5 using the SAM optimizer.

15. Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes
James Lucas, Juhan Bae, Michael R. Zhang, Stanislav Fort, Richard Zemel, Roger Grosse

Linear interpolation from initial to final neural net params typically decreases the loss monotonically. We investigate this phenomenon empirically and theoretically.

14. Identifying charged particle background events in X-ray imaging detectors with novel machine learning algorithms
D. R. Wilkins, S. W. Allen, E. D. Miller, M. Bautz, T. Chattopadhyay, S. Fort, C. E. Grant, S. Herrmann, R. Kraft, R. G. Morris, P. Nulsen

Using machine learning algorithms for identification of background (noise) charge particles in X-ray imaging detectors, with a particular emphasis on the proposed Athena X-ray observatory's WFI science products module.

Accepted for publication at Proceedings of the SPIE, Astronomical Telescopes and Instrumentation, Space Telescopes and Instrumentation 2020: Ultraviolet to Gamma Ray.

13. Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel
Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M. Roy, Surya Ganguli

We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a data-dependent NTK. We do so through a large-scale phenomenological analysis of training, synthesizing diverse measures characterizing loss landscape geometry and NTK dynamics.

Accepted for publication at NeurIPS 2020 in Vancouver as a poster.

12. Training independent subnetworks for robust prediction
Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew M. Dai, Dustin Tran

Using a multi-input multi-output (MIMO) configuration, we can utilize a single model's capacity to train multiple subnetworks that independently learn the task at hand.

11. The Break-Even Point on the Optimization Trajectories of Deep Neural Networks
Stanislaw Jastrzebski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor, Kyunghyun Cho, Krzysztof Geras

In the early phase of training of deep neural networks there exists a "break-even point" which determines properties of the entire optimization trajectory.

Accepted as a spotlight talk at the International Conference on Learning Representations 2020 (ICLR) in Addis Ababa, Ethiopia.

10. Deep Ensembles: A Loss Landscape Perspective
Stanislav Fort, Huiyi Hu, Balaji Lakshminarayanan

Exploring the consequences of the neural network loss landscape structure for ensembling, Bayesian methods, and calibration.

Accepted as a contributed talk at Bayesian Deep Learning workshop at NeurIPS 2019 in Vancouver.

9. Emergent properties of the local geometry of neural loss landscapes
Stanislav Fort, Surya Ganguli

By modelling logit gradient clustering and the effect of training as logit scale growth, we constructed a simple analytical model of the gradient and Hessian of neural networks in classification problems. From this minimal model, we successfully recovered 4 previously observed surprising empirical phenomena related to the local stucture of neural network loss landscapes, demonstrating that their origin is likely very generic in nature and not specific to the natural data distributions, neural networks, or gradient descent, as previously conjectured.

8. Large Scale Structure of Neural Network Loss Landscapes
Stanislav Fort, Stanislaw Jastrzebski

Building a unified phenomenological model of the low-loss manifold in neural network loss landscapes that incorporates 1) mode connectivity, 2) the surprising ease of optimizing on low-dimensional cuts through the weight space, and 3) the existence of long directions in the loss landscape into a single model. Using this model, we made new predictions about the loss landscape and verified them empirically.

Accepted for publication at NeurIPS 2019 in Vancouver as a poster.

A subset accepted at the Understanding and Improving Generalization in Deep Learning workshop at ICML 2019 as a spotlight talk and a poster, and at the Theoretical Physics for Deep Learning workshop at ICML 2019 as a poster. I also delivered invited talks at Uber AI Labs and Google Brain.

7. Stiffness: A New Perspective on Generalization in Neural Networks
Stanislav Fort, Paweł Krzysztof Nowak, Stanislaw Jastrzebski, Srini Narayanan

We defined the concept of stiffness, showed its utility in providing a perspective to better understand generalization in neural networks, observed its variation with learning rate, and defined the concept of dynamical critical length using it.

6. Adaptive Quantum State Tomography with Neural Networks
Stanislav Fort (equal contributions), Yihui Quek (equal contributions), Hui Khoon Ng

Learning to learn about quantum states using neural networks, swarm optimization and particle filters. We develop a new algorithm for quantum state tomography that learns to perform the state reconstruction directly from data and achieves orders of magnitude computational speedup while retaining state-of-the-art reconstruction accuracy.

A subset accepted at the 4th Seefeld Workshop on Quantum Information, 22nd Annual Conference on Quantum Information Processing (QIP 2019) as a poster, 3rd Quantum Techniques in Machine Learning 2019 (QTML) in Korea as a talk, and McGill Physics-AI conference in Montreal as a talk.

5. The Goldilocks zone: Towards better understanding of neural network loss landscapes
Stanislav Fort, Adam Scherlis

A connection between optimization on random low-dimensional hypersurfaces and local convexity in the neural network loss landscape.

Accepted for publication at AAAI 2019 in Hawaii as an oral presentation and a poster.

A subset accepted at the Modern Trends in Nonconvex Optimization for Machine Learning workshop at ICML 2018 and BayLearn 2018 as The Goldilocks zone: Empirical exploration of the structure of the neural network loss landscapes (link here). Accepted as an oral presentation at the Theoretical Physics for Machine Learning Aspen winter conference.

4. The ATHENA WFI science products module
David N Burrows, Steven Allen, Marshall Bautz, Esra Bulbul, Julia Erdley, Abraham D Falcone, Stanislav Fort, Catherine E Grant, Sven Herrmann, Jamie Kennea, Robert Klar, Ralph Kraft, Adam Mantz, Eric D Miller, Paul Nulsen, Steve Persyn, Pragati Pradhan, Dan Wilkins

A paper on the proposed Athena X-ray observatory's WFI science products module. My part involved exploring the use of AI techniques on board.

Published at the Proceedings Volume 10699, Space Telescopes and Instrumentation 2018: Ultraviolet to Gamma Ray.

3. Towards understanding feedback from supermassive black holes using convolutional neural networks
Stanislav Fort

A novel approach to detection of X-ray cavities in clusters of galaxies using convolutional neural architectures.

Accepted at the Deep Learning for Physical Sciences workshop at NIPS 2017.

2. Gaussian Prototypical Networks for Few-Shot Learning on Omniglot
Stanislav Fort

An architecture capable of dealing with uncertainties for few-shot learning on the Omniglot dataset.

Accepted and presented at BayLearn 2017.
Accepted at the </p> Bayesian Deep Learning workshop at NIPS 2017.

Essential code available on GitHub.

</td> </tr>

1. Discovery of Gamma-ray Pulsations from the Transitional Redback PSR J1227-4853
T. J. Johnson, P. S. Ray, J. Roy, C. C. Cheung, A. K. Harding, H. J. Pletsch, S. Fort, F. Camilo, J. Deneva, B. Bhattacharyya, B. W. Stappers, M. Kerr

A pulsar detection in gamma-ray.