[1]Distilling the Knowledge in a Neural Network;[2]Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT;[3]Distilling Task-Specific Knowledge from BERT into Simple Neural Networks;[4]FitNets: Hints for Thin Deep Nets;[5]TinyBERT: Distilling BERT for Natural Language Understanding;[6]Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher