1. Aizawa, A., 2003. An information-theoretic perspective of tf-idf measures. Information Processing and Management, 39(1), 45-65. DOI: 10.1016/S0306-4573(02)00021-3
2. Boban, I., Doko, A., & Gotovac, S., 2020. Sentence retrieval using Stemming and Lemmatization with different length of the queries. Advances in Science, Technology and Engineering Systems, 5(3). DOI: 10.25046/aj050345 EDN: VXJKXQ
3. Choi, J., & Lee, S. W., 2020. Improving FastText with inverse document frequency of subwords. Pattern Recognition Letters, 133. DOI: 10.1016/j.patrec.2020.03.003 EDN: UJMUQJ
4. Cover, T. M., & Thomas, J. A., 2005. Elements of Information Theory. In Elements of Information Theory. John Wiley and Sons. DOI: 10.1002/047174882X EDN: SSWPAV
5. Dagdelen, J., Dunn, A., Lee, S., Walker, N., Rosen, A. S., Ceder, G., Persson, K. A., & Jain, A., 2024. Structured information extraction from scientific text with large language models. Nature Communications, 15(1). DOI: 10.1038/S41467-024-45563-X EDN: UEKRXN
6. Dey, R. K., & Das, A. K., 2023. Modified term frequency-inverse document frequency based deep hybrid framework for sentiment analysis. Multimedia Tools and Applications, 82(21). DOI: 10.1007/s11042-023-14653-1
7. Di, Y., Zhang, Y., Zhang, L., Tao, T., & Lu, H., 2017. MdFDIA: A Mass Defect Based Four-Plex Data-Independent Acquisition Strategy for Proteome Quantification. Analytical Chemistry, 89(19), 10248-10255. DOI: 10.1021/acs.analchem.7b01635
8. Friedman, R., 2023. Tokenization in the Theory of Knowledge. Encyclopedia, 3(1). DOI: 10.3390/encyclopedia3010024
9. Gandhi, A. B., Joshi, J. B., Kulkarni, A. A., Jayaraman, V. K., & Kulkarni, B. D., 2008. SVR-based prediction of point gas hold-up for bubble column reactor through recurrence quantification analysis of LDA time-series. International Journal of Multiphase Flow, 34(12), 1099-1107. DOI: 10.1016/j.ijmultiphaseflow.2008.07.001
10. Huang, G. bin, Zhou, H., Ding, X., & Zhang, R., 2012. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 42(2), 513-529. DOI: 10.1109/TSMCB.2011.2168604
11. Huang, Q., Zhang, H., Chen, J., & He, M., 2017. Quantile Regression Models and Their Applications: A Review. Journal of Biometrics & Biostatistics, 08(03). DOI: 10.4172/2155-6180.1000354
12. Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L., 2019. Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78(11). DOI: 10.1007/s11042-018-6894-4
13. Mahmoud, H. A. H., Hafez, A. M., & Alabdulkreem, E., 2023. Language-Independent Text Tokenization Using Unsupervised Deep Learning. Intelligent Automation and Soft Computing, 35(1). DOI: 10.32604/iasc.2023.026235 EDN: HWALDD
14. Mestre, G., Portela, J., Rice, G., Muñoz San Roque, A., & Alonso, E., 2021. Functional time series model identification and diagnosis by means of auto- and partial autocorrelation analysis. Computational Statistics & Data Analysis, 155, 107108. DOI: 10.1016/J.CSDA.2020.107108 EDN: JQSQTX
15. Minogue, C. E., Hebert, A. S., Rensvold, J. W., Westphall, M. S., Pagliarini, D. J., & Coon, J. J., 2015. Multiplexed quantification for data-independent acquisition. Analytical Chemistry, 87(5), 2570-2575. DOI: 10.1021/AC503593D
16. Ozturkmenoglu, O., & Alpkocak, A., 2012. Comparison of different lemmatization approaches for information retrieval on Turkish text collection. International Symposium on Innovations in Intelligent SysTems and Applications. DOI: 10.1109/INISTA.2012.6246934
17. Peng, H., Long, F., & Ding, C., 2005. Feature selection based on mutual information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1226-1238. DOI: 10.1109/TPAMI.2005.159
18. Shantal, M., Othman, Z., & Bakar, A. A., 2023. A Novel Approach for Data Feature Weighting Using Correlation Coefficients and Min-Max Normalization. Symmetry, 15(12). DOI: 10.3390/sym15122185 EDN: FWKAWN
19. Singh, D., & Singh, B., 2020. Investigating the impact of data normalization on classification performance. Applied Soft Computing, 97. DOI: 10.1016/j.asoc.2019.105524 EDN: SZRRRM
20. Toporkov, O., & Agerri, R., 2023. On the Role of Morphological Information for Contextual Lemmatization. Computational Linguistics, 50(1). DOI: 10.1162/coli_a_00497
21. Trewartha, A., Walker, N., Huo, H., Lee, S., Cruse, K., Dagdelen, J., Dunn, A., Persson, K. A., Ceder, G., & Jain, A., 2022. Quantifying the advantage of domain-specific pre-training on named entity recognition tasks in materials science. Patterns (New York, N.Y.), 3(4). DOI: 10.1016/J.PATTER.2022.100488 EDN: INZSMY
22. Zhang, B., Kä, L., & Zubarev, R. A., 2016. DeMix-Q: Quantification-Centered Data Processing Workflow. Molecular & Cellular Proteomics: MCP, 15(4), 1467-1478. DOI: 10.1074/MCP.O115.055475
23. Zhang, W., Wang, Q., Kong, X., Xiong, J., Ni, S., Cao, D., Niu, B., Chen, M., Li, Y., Zhang, R., Wang, Y., Zhang, L., Li, X., Xiong, Z., Shi, Q., Huang, Z., Fu, Z., & Zheng, M., 2024. Fine-tuning large language models for chemical text mining. Chemical Science, 15(27), 10600-10611. DOI: 10.1039/d4sc00924j EDN: UCJEJA
24. Zhang, Z., Lei, Y., Xu, J., Mao, X., & Chang, X., 2019. TFIDF-FL: Localizing faults using term frequency-inverse document frequency and deep learning. IEICE Transactions on Information and Systems, E102D(9). DOI: 10.1587/transinf.2018EDL8237
25. Кадиев, И. П., & Кадиев, П. А., 2016. Однородные регистровые среды с программируемой структурой. Вестник Дагестанского Государственного Технического Университета. Технические Науки, 35(4), 108-112. DOI: 10.21822/2073-6185-2014-35-4-108-112
26. Пучков, Е. В., Puchkov, E. v., Белявский, Г. И., & Belyavsky, G. I., 2018. Применение локальных трендов для предподготовки временных рядов в задачах прогнозирования. Международный Журнал Программные Продукты и Системы, 29, 751-756. DOI: 10.15827/0236-235X.124.751-756
27. Савзиханова, С.А., 2023, Big Data - выигрышная инновация для прогнозирования будущих тенденций. УЭПС: управление, экономика, политика, социология, 69-75. DOI: 10.24412/2412-2025-2023-2-69-76