Making asynchronous stochastic gradient descent work for transformers (2019)
Attributed to:
Peta-5: A National Facility for Petascale Data Intensive Computation and Analytics
funded by
EPSRC
Abstract
No abstract provided
Bibliographic Information
Type: Other
Parent Publication: EMNLP-IJCNLP 2019 - Proceedings of the 3rd Workshop on Neural Generation and Translation