Darshil
Doshi
Tao, T. ., Doshi, D. ., Kalra, D. ., He, T. ., & Barkeshli, M. . (2025). (How) Can Transformers Predict Pseudo-Random Numbers?. Https://arxiv.org/Abs/2502.10390. Retrieved from https://arxiv.org/abs/2502.10390 (Original work published February 2025)
He, T. ., Doshi, D. ., Das, A. ., & Gromov, A. . (2024). Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks. ArXiv. Retrieved from https://arxiv.org/abs/2406.02550 (Original work published November 2024)