Tianyu
He
Kalra, D., He, T., & Barkeshli, M. (2025). Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos. Https://arxiv.org/Abs/2311.02076. Retrieved from https://arxiv.org/abs/2311.02076 (Original work published February 2025)
Tao, T., Doshi, D., Kalra, D., He, T., & Barkeshli, M. (2025). (How) Can Transformers Predict Pseudo-Random Numbers?. Https://arxiv.org/Abs/2502.10390. Retrieved from https://arxiv.org/abs/2502.10390 (Original work published February 2025)
He, T., Doshi, D., Das, A., & Gromov, A. (2024). Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks. ArXiv. Retrieved from https://arxiv.org/abs/2406.02550 (Original work published November 2024)