参考文献 [1] Yosinski J, Clune J, Bengio Y, et al. Howtransferable are features in deep neural networks?[C]//Advances in neuralinformation processing systems. 2014: 3320-3328.[2] Devlin J,Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformersfor language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.[3] Dodge J,Ilharco G, Schwartz R, et al. Fine-tuning pretrained language models: Weightinitializations, data orders, and early stopping[J]. arXiv preprintarXiv:2002.06305, 2020.[4] Zhang T, Wu F, Katiyar A, et al. RevisitingFew-sample BERT Fine-tuning[J]. arXiv preprint arXiv:2006.05987, 2020.[5] Howard J,Ruder S. Universal language model fine-tuning for text classification[J]. arXivpreprint arXiv:1801.06146, 2018.[6] Clark K,Luong M T, Le Q V, et al. Electra: Pre-training text encoders as discriminatorsrather than generators[J]. arXiv preprint arXiv:2003.10555, 2020. 【推荐阅读】