[1]Kim Y. Convolutional neural networks for sentence classification[J]. arXiv preprint arXiv:1408.5882, 2014.[2] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.[3] Liu Y, Ott M, Goyal N, et al. Roberta: A robustly optimized bert pretraining approach[J]. arXiv preprint arXiv:1907.11692, 2019.[4]https://github.com/ymcui/Chinese-BERT-wwm[5]https://github.com/brightmart/roberta_zh[6]Gururangan S, Marasović A, Swayamdipta S, et al. Don\'t Stop Pretraining: Adapt Language Models to Domains and Tasks[J]. arXiv preprint arXiv:2004.10964, 2020.[7]Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network[J]. arXiv preprint arXiv:1503.02531, 2015.[8]Lan Z, Chen M, Goodman S, et al. Albert: A lite bert for self-supervised learning of language representations[J]. arXiv preprint arXiv:1909.11942, 2019.[9] Clark K, Luong M T, Le Q V, et al. Electra: Pre-training text encoders as discriminators rather than generators[J]. arXiv preprint arXiv:2003.10555, 2020. 推荐阅读:Taro2.x 跨端开发实践 基于Flink构建实时数仓实践从Mach-O角度谈谈Swift和OC的存储差异 人物 | 罗景:多业务融合推荐场景下的深度学习实践 58同城无侵入改造业务库为Dynamic Feature工程的探索和实践