从直觉来看,多场景和多任务的信息建模应属于不同层次的优化,应该进行分层处理。因此,在本文中,我们提出了一种层次化信息抽取网络(Hierarchical information extraction Network,HiNet)。具体来说,我们设计了一个端到端的两层信息抽取框架,来共同建模场景间和任务间的信息共享和协作。
考虑到用户跨场景的穿插式行为以及多个场景间商品重叠的现象,到餐业务中多个场景的数据之间存在着有价值的共享信息。因此在策略上,我们设计了场景共享专家网络。这里受到混合专家网络架构MoE(Mixture of Expert)的影响,场景共享专家网络是通过使用子专家集成模块SEI(Sub-Expert Integration module,如图2-(C))生成的。
a. 场景指示生成的嵌入向量,该向量通过带有Softmax函数的门控网络后可以计算出其他场景对当前场景信息表征的重要性权重。b. ,对应了一系列其他场景生成的信息表示。通过SAN模块计算的第个场景的输出是场景表示的加权和:其中表示场景指示投影为嵌入向量,表示基于权重的门控网络,表示的维数,是场景的个数。
[1] P. Li, R. Li, Q. Da, A.-X. Zeng, and L. Zhang, “Improving multi-scenario learning to rank in e-commerce by exploiting task relationships in the label space,” in Proceedings of the 29th ACM International Conference on * Information & Knowledge Management (CIKM), 2020, pp. 2605–2612.[2] X.-R. Sheng, L. Zhao, G. Zhou, X. Ding, B. Dai, Q. Luo, S. Yang, J. Lv, C. Zhang, H. Deng et al., “One model to serve all: Star topology adaptive recommender for multi-domain ctr prediction,” in Proceedings of the 30th * ACM International Conference on Information & Knowledge Management (CIKM), 2021, pp. 4104–4113.[3] J. Ma, Z. Zhao, X. Yi, J. Chen, L. Hong, and E. H. Chi, “Modeling task relationships in multi-task learning with multi-gate mixture-of-experts,” in Proceedings of the 24th ACM SIGKDD international conference on * knowledge discovery & data mining (SIGKDD), 2018, pp. 1930–1939.[4] H. Tang, J. Liu, M. Zhao, and X. Gong, “Progressive layered extraction (ple): A novel multi-task learning (mtl) model for personalized recommendations,” in Proceedings of the 14th ACM Conference on Recommender Systems (RecSys), 2020, pp. 269–278.[5] L. Torrey and J. Shavlik, “Transfer learning,” in Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global, 2010, pp. 242–264.[6] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, 2010.[7] F. Zhu, Y. Wang, C. Chen, J. Zhou, L. Li, and G. Liu, “Cross-domain recommendation: challenges, progress, and prospects,” in 30th International Joint Conference on Artificial Intelligence (IJCAI). International Joint * Conferences on Artificial Intelligence, 2021, pp. 4721–4728.[8] Y. Zhang and Q. Yang, “A survey on multi-task learning,” IEEE Transactions on Knowledge and Data Engineering, 2021.[9] S. Ruder, “An overview of multi-task learning in deep neural networks,” arXiv preprint arXiv:1706.05098, 2017.[10] O. Sener and V. Koltun, “Multi-task learning as multi-objective optimization,” in Thirty-second Conference on Neural Information Processing Systems (NeurIPS), 2018.[11] C. Rosenbaum, T. Klinger, and M. Riemer, “Routing networks: Adaptive selection of non-linear functions for multi-task learning,” in International Conference on Learning Representations (ICLR), 2018.[12] J. Wang, S. C. Hoi, P. Zhao, and Z.-Y. Liu, “Online multi-task collaborative filtering for on-the-fly recommender systems,” in Proceedings of the 7th ACM conference on Recommender systems (RecSys), 2013, pp. 237–244.[13] R. Caruana, “Multitask learning,” Machine learning, vol. 28, no. 1, pp. 41–75, 1997.[14] K. Weiss, T. M. Khoshgoftaar, and D. Wang, “A survey of transfer learning,” Journal of Big data, vol. 3, no. 1, pp. 1–40, 2016.[15] N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean, “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,” arXiv preprint arXiv:1701.06538, 2017.[16] D. Eigen, M. Ranzato, and I. Sutskever, “Learning factored representations in a deep mixture of experts,” Computer Science, 2013.[17] M. I. Jordan and R. A. Jacobs, “Hierarchical mixtures of experts and the em algorithm,” Neural computation, vol. 6, no. 2, pp. 181–214, 1994.[18] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,” Neural computation, vol. 3, no. 1, pp. 79–87, 1991.[19] S. E. Yuksel, J. N. Wilson, and P. D. Gader, “Twenty years of mixture of experts,” IEEE transactions on neural networks and learning systems, vol. 23, no. 8, pp. 1177–1193, 2012.[20] Y. Zhang, C. Li, I. W. Tsang, H. Xu, L. Duan, H. Yin, W. Li, and J. Shao, “Diverse preference augmentation with multiple domains for cold-start recommendations,” in IEEE International Conference on Data Engineering (ICDE), 2022.