1. T. Brown et al., “Language models are few-shot learners,” Advances in Neural Information Processing Systems, vol. 33, pp. 1877-1901, 2020.
2. H. Touvron et al., “LLaMA: Open and Efficient Foundation Language Models.” 2023.
3. A. Chowdhery et al., “Palm: Scaling language modeling with pathways,” Journal of Machine Learning Research, vol. 24, no. 240, pp. 1-113, 2023.
4. C. Raffel et al., “Exploring the limits of transfer learning with a unified text-to-text transformer,” Journal of Machine Learning Research, vol. 21, no. 140, pp. 1-67, 2020.
5. Y. Zhu et al., “Can Large Language Models Understand Context?,” in Findings of the Association for Computational Linguistics: EACL 2024, 2024, pp. 2004-2018.
6. D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural language processing: state of the art, current trends, challenges,” Multimedia Tools, Applications, vol. 82, pp. 3713-3744, 2023, https://doi.org/10.1007/s11042-022-13428-4.
7. D. Hupkes et al., “A taxonomy, review of generalization research in NLP,” Nature Machine Intelligence, vol. 5, pp. 1161-1174, 2023, https://doi.org/10.1038/s42256-023-00729-y.
8. P. P. Ray, “ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope,” Internet of Things and Cyber-Physical Systems, vol. 3, pp. 121-154, 2023.
9. Y. Yang and Z. Xue, “Training Heterogeneous Features in Sequence to Sequence Tasks: Latent Enhanced Multi-filter Seq2Seq Model,” in Intelligent Systems, Applications, 2023, pp. 103-117.
10. Y. Sun et al., “ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation.” 2021.
11. D. Zmitrovich et al., “A Family of Pretrained Transformer Language Models for Russian,” in Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 2024, pp. 507-524.
12. M. Song and Y. Zhao, “Enhance RNNLMs with Hierarchical Multi-Task Learning for ASR,” in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6102-6106.
13. A. Vaswani et al., “Attention Is All You Need,” in Advances in Neural Information Processing Systems, 2017, vol. 30, pp. 5998-6008.
14. V. Sanh et al., “Multitask Prompted Training Enables Zero-Shot Task Generalization.” 2022.
15. Y. Tay et al., “UL2: Unifying Language Learning Paradigms.” 2023.
16. Y. Bengio, J. Louradour, R. Collobert, and J. Weston, “Curriculum Learning,” in Proceedings of the 26th International Conference on Machine Learning, 2009, pp. 41-48.
17. I. Misra, A. Shrivastava, A. Gupta, and M. Hebert, “Cross-stitch networks for multi-task learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3994-4003.
18. Y. Sun et al., “ERNIE 2.0: A continual pre-training framework for language understanding,” in Proceedings of the AAAI conference on Artificial Intelligence, 2020, vol. 34, no. 05, pp. 8968-8975.
19. L. Xue et al., “mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021, pp. 483-498, https://doi.org/10.18653/v1/2021.naacl-main.41.
20. S. Wang et al., “ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation.” 2021.
21. J. Pfeiffer, A. Kamath, A. R“uckl’e, K. Cho, and I. Gurevych, “AdapterFusion: Non-Destructive Task Composition for Transfer Learning,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021, pp. 487-503, https://doi.org/10.18653/v1/2021.eacl-main.39.
22. W. Fedus, B. Zoph, and N. Shazeer, “Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity,” Journal of Machine Learning Research, vol. 23, no. 120, pp. 1-39, 2022.
23. S. Longpre et al., “The flan collection: Designing data and methods for effective instruction tuning,” in Proceedings of the International Conference on Machine Learning, 2023, pp. 22631-22648.
24. N. Houlsby et al., “Parameter-efficient transfer learning for NLP,” in Proceedings of the International Conference on Machine Learning, 2019, pp. 2790-2799.
25. H. A. A. Al-Khamees, M. E. Manaa, Z. H. Obaid, and N. A. Mohammedali, “Implementing Cyclical Learning Rates in Deep Learning Models for Data Classification,” in Proceedings of the International Conference on Forthcoming Networks and Sustainability in the AIoT Era, 2024, pp. 205-215.
26. A. Koloskova, H. Hendrikx, and S. U. Stich, “Revisiting gradient clipping: Stochastic bias and tight convergence guarantees,” in Proceedings of the International Conference on Machine Learning, 2023, pp. 17343-17363.