总第467篇
2021年 第037篇
1 背景
2 方法综述
2.1 数据增强
2.2 半监督学习
2.3 集成学习+自训练
2.4 领域迁移
2.5 主动学习
3 应用实践
3.1 实验结果
3.2 在美团业务中的应用
4 未来展望
参考文献
持续改进现有模型、探索更多模型。目前的实验结果还有很大的改进空间,需要不断探索,改进模型;同时探索更多的领域迁移模型,并应用到业务中,达成业务方可以用最少的数据训练最好的结果。
在更多任务类型上进行实验。目前主要是单句分类、句间分类等任务类型上进行实验,需要进一步MRC模型、命名实体识别模型。
深入探索领域迁移,训练通用模型。目前,我们对接的业务较多,因此也积累了不少领域的文本分类和句间关系数据集,希望能够训练一个通用模型用于该领域的各项任务,从而在一个新的业务中使用更少量的数据就能达到很好的业务效果。如可以通过Facebook的EFL模型,将该领域下的文本分类任务和句间关系任务都重新表述为文本蕴含任务训练一个通用模型,可以直接在新的业务上进行迁移。
建设小样本学习平台。目前正在将小样本学习能力集成到公司统一的BERT平台,开放给公司各业务方灵活使用。后续,在对小样本学习进行了更加深入的探索之后,我们会尝试建立单独的小样本学习平台,提供更多的低资源学习能力。
[1] Wei J, Zou K. Eda: Easy data augmentation techniques for boosting performance on text classification tasks[J]. arXiv preprint arXiv:1901.11196, 2019.
[2] Kobayashi S. Contextual augmentation: Data augmentation by words with paradigmatic relations[J]. arXiv preprint arXiv:1805.06201, 2018.
[3] Anaby-Tavor A, Carmeli B, Goldbraich E, et al. Not Enough Data? Deep Learning to the Rescue![J]. arXiv preprint arXiv:1911.03118, 2019.
[4] Wei J, Huang C, Vosoughi S, et al. Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning[J]. arXiv preprint arXiv:2103.07552, 2021.
[5] Andreas J. Good-enough compositional data augmentation[J]. arXiv preprint arXiv:1904.09545, 2019.
[6] Zhang H, Cisse M, Dauphin Y N, et al. Mixup: Beyond empirical risk minimization[J]. arXiv preprint arXiv:1710.09412, 2017.
[7] Guo D, Kim Y, Rush A M. Sequence-level mixed sample data augmentation[J]. arXiv preprint arXiv:2011.09039, 2020.
[8] Zhang R, Yu Y, Zhang C. Seqmix: Augmenting active sequence labeling via sequence mixup[J]. arXiv preprint arXiv:2010.02322, 2020.
[9] Verma V, Lamb A, Beckham C, et al. Manifold mixup: Better representations by interpolating hidden states[C]//International Conference on Machine Learning. PMLR, 2019: 6438-6447.
[10] Miyato T, Dai A M, Goodfellow I. Adversarial training methods for semi-supervised text classification[J]. arXiv preprint arXiv:1605.07725, 2016.
[11] Liang X, Wu L, Li J, et al. R-Drop: Regularized Dropout for Neural Networks[J]. arXiv preprint arXiv:2106.14448, 2021.
[12] Laine S, Aila T. Temporal ensembling for semi-supervised learning[J]. arXiv preprint arXiv:1610.02242, 2016.
[13] Tarvainen A, Valpola H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results[J]. arXiv preprint arXiv:1703.01780, 2017.
[14] Miyato T, Maeda S, Koyama M, et al. Virtual adversarial training: a regularization method for supervised and semi-supervised learning[J]. IEEE transactions on pattern analysis and machine intelligence, 2018, 41(8): 1979-1993.
[15] Berthelot D, Carlini N, Goodfellow I, et al. Mixmatch: A holistic approach to semi-supervised learning[J]. arXiv preprint arXiv:1905.02249, 2019.
[16] Chen J, Yang Z, Yang D. Mixtext: Linguistically-informed interpolation of hidden space for semi-supervised text classification[J]. arXiv preprint arXiv:2004.12239, 2020.
[17] Xie Q, Dai Z, Hovy E, et al. Unsupervised data augmentation for consistency training[J]. arXiv preprint arXiv:1904.12848, 2019.
[18] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//International Conference on Machine Learning. PMLR, 2017: 1126-1135.
[19] Chen Y, Wang X, Liu Z, et al. A new meta-baseline for few-shot learning[J]. arXiv preprint arXiv:2003.04390, 2020.
[20] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018.
[21] Wang S, Fang H, Khabsa M, et al. Entailment as Few-Shot Learner[J]. arXiv preprint arXiv:2104.14690, 2021.
美团科研合作
阅读更多