Darshil
Doshi
Tao, T., Doshi, D., Kalra, D., He, T., & Barkeshli, M. (2025). (How) Can Transformers Predict Pseudo-Random Numbers?. Https://arxiv.org/Abs/2502.10390. Retrieved from https://arxiv.org/abs/2502.10390 (Original work published February 2025)
He, T., Doshi, D., Das, A., & Gromov, A. (2024). Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks. ArXiv. Retrieved from https://arxiv.org/abs/2406.02550 (Original work published November 2024)