【1】Vaswani A,Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in neuralinformation processing systems. 2017: 5998-6008.【2】Vaswani A,Bengio S, Brevdo E, et al. Tensor2tensor for neural machine translation[J].arXiv preprint arXiv:1803.07416, 2018.【3】A. Conneau*, G. Lample*, L. Denoyer, MA.Ranzato, H. Jégou, WordTranslation Without Parallel Data【4】Dai Z, Yang Z,Yang Y, et al. Transformer-xl: Attentive language models beyond a fixed-lengthcontext[J]. arXiv preprint arXiv:1901.02860, 2019.【5】Graves A.Sequence transduction with recurrent neural networks[J]. arXiv preprintarXiv:1211.3711, 2012.【6】https://github.com/NVIDIA/DeepLearningExamples/tree/master/FasterTransformer