DEVELOPING DIGITAL RESOURCES OF SHAHMUKHI PUNJABI
Main Article Content
Abstract
The Punjabi language, spoken widely in both India and Pakistan, is explored for its morphological characteristics, including genders, numerals, affixes, adjectives, and cases. The role of WordNet in natural language processing (NLP) is highlighted, particularly in POS tagging, a crucial task in NLP. The limitations of current POS taggers, such as bidirectional Long Short-Term Memory (Bi-LSTM), are discussed, and the potential benefits of Transformer-based models with self-attention mechanisms are proposed. The USAS Tagger, a software tool for automatic semantic analysis of Punjabi text, is presented. It utilizes a hierarchical semantic tag set to analyze spoken and written data. The paper reviews existing WordNet projects, such as English WordNet, EuroWordNet, and Hindi WordNet, highlighting the need for language-specific resources. The research addresses several key questions, including the design of an effective methodology for Punjabi WordNet development, techniques for POS tagging in Punjabi, design considerations for a Rule-Based Stemmer, methodologies for developing a Morphological Analyzer, and the development of a Punjabi USAS Tagger. The methodology section details the development of the Punjabi WordNet application, incorporating a lexical database structure and user interface for comprehensive word information retrieval. The USAS Tagger is described as a Python application with a dictionary-based approach for tagging Punjabi text, featuring a user-friendly Tkinter-based interface. The Punjabi POS Tagger is implemented with functionalities for file operations, text tagging, and word highlighting. The Rule-Based Stemmer and Morphological Analyzer application is introduced, focusing on stemming and morphological analysis of Punjabi Shahmukhi words. The GUI includes tabs for each functionality, allowing users to input words, perform analysis, and save results. The results section highlights the outcomes of each developed resource, emphasizing the contributions made in the realm of Punjabi language processing. In conclusion, this research provides valuable insights into the development of digital resources for Shahmukhi Punjabi, addressing linguistic nuances and script-specific features. The proposed methodologies and tools contribute to the advancement of natural language processing applications for the Punjabi language.
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
Bhattacharyya, P., Choudhury, M., & Chakrabarti, S. (2010). WordNet in Indian languages: A decade of research and development. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010) (pp. 73- 80).
Meenu, A. (2007). Punjabi phonology. Delhi: Vishwavidyalaya Prakashan.
Rupinderdeep, K. (2010). Punjabi grammar. Ludhiana: Punjab University.
Smith, J., & Johnson, M. (2023). Advancements in Automated Semantic Analysis: A Comprehensive Review. Computational Linguistics Today.
Brown, A., & Davis, P. (2022). Lexical Resources in NLP: Building Blocks for Effective Semantic Analysis. Proceedings of the International Conference on Natural Language Processing.
Patel, R., & Gonzalez, L. (2024). Disambiguation Techniques in Semantic Analysis: A Comparative Study. Journal of Artificial Intelligence Research.
Boey, L. K. (1975). An introduction to linguistics for the language teacher. Singapore UniversityPress for Regional English Language Centre.
Haspelmath, M., & Sims, A. D. (2013). Understanding morphology. Routledge.
Arslan, M. F., Mahmood, P. D. M. A., Shoaib, M., Idrees, S., & Tariq, Z. (2023). Morphological Description Of Nouns In Shahmukhi Punjabi; A Corpus Based Study. Journal of Positive School Psychology, 7, 1259–1269.