Tianyu
He
Kalra, D. ., He, T. ., & Barkeshli, M. . (2025). Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos. Https://arxiv.org/Abs/2311.02076. Retrieved from https://arxiv.org/abs/2311.02076 (Original work published February 2025)
Tao, T. ., Doshi, D. ., Kalra, D. ., He, T. ., & Barkeshli, M. . (2025). (How) Can Transformers Predict Pseudo-Random Numbers?. Https://arxiv.org/Abs/2502.10390. Retrieved from https://arxiv.org/abs/2502.10390 (Original work published February 2025)
He, T. ., Doshi, D. ., Das, A. ., & Gromov, A. . (2024). Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks. ArXiv. Retrieved from https://arxiv.org/abs/2406.02550 (Original work published November 2024)