Dissertation Committee Chair: Maissam Barkeshli
Committee:
Andrey Gromov (advisor)
Victor Albert
Tom Goldstein
Christopher Jarzynski (Dean’s representative)
Abstract: As artificial intelligence (AI) systems grow increasingly powerful and permeate every aspect of our lives, their impact on both individuals and society is an urgent concern. Questions of safety and robustness in AI stem largely from our limited understanding of deep learning. Research in this domain has traditionally followed two parallel paths: an empirical approach that prioritizes practical advancements and a theoretical approach that seeks a mathematical understanding from first principles. Despite notable progress, a significant gap remains between deep learning practice and its theoretical underpinnings. This dissertation advocates for a phenomenological approach to understanding AI systems -- one that integrates empirical observations with theoretical model-building. This methodology has been instrumental in the physical sciences, and it holds similar promise for advancing the science of deep learning. Over two broad parts, this work demonstrates the effectiveness of this approach in characterizing model architectures and their emergent capabilities.
In the first part, we explore how signal propagation analysis in large-N limits can inform the design and initialization of model architectures. We develop a diagnostic observable that distinguishes between ordered and chaotic behaviors in neural networks, guiding optimal parameter initialization for training. Our analysis establishes the theoretical soundness of this observable in simple networks and confirms its empirical utility in state-of-the-art architectures. The findings reveal an architecture design paradigm that eliminates the need for careful initialization, shedding light on widely used heuristic practices. Additionally, we introduce an algorithm that automates initialization across diverse model architectures, enhancing their trainability.
In the second part, we highlight the importance of the systems identification approach for characterizing AI systems. We explore several stylized setups where model capabilities emerge as a function of compute, data quantity, and data diversity. Using arithmetic and cryptographic tasks as examples, we demonstrate that emergent abilities such as grokking and in-context learning arise alongside the formation of interpretable structures within the model’s parameters, hidden representations, and outputs. Through targeted experiments, we identify these structures using (i) black-box probing, which examines model responses to characteristic inputs, and (ii) open-box analysis, which leverages curated task-specific observables and metrics to study internal model states.
This dissertation promotes a paradigm for understanding deep learning that complements both heuristic-driven and hypothesis-driven approaches. By integrating experimental methodologies and analytical tools from established scientific disciplines, this framework has the potential to steer the field toward safer, more robust, and more efficient AI systems.