DIFFICULTIES OF CREATING LINGUISTIC CORPUS: CHALLENGES IN MODERN CORPUS LINGUISTICS

Authors

  • Parpieva Shakhnoza Muratovna Teacher at Uzbekistan State World Languages University

Abstract

The construction of linguistic corpora presents numerous challenges that span technical, methodological, legal and theoretical domains. This article examines the primary difficulties encountered in corpus creation, including data collection complexities, quality control issues, representativeness concerns and ethical considerations. Through analysis of contemporary corpus linguistics literature, we identify key obstacles that researchers face and discuss potential solutions. The findings highlight that while technological advances have facilitated corpus construction, fundamental challenges persist in ensuring balanced, representative, and ethically sound linguistic datasets.

 

References

Artstein, R., and Poesio, M. Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4), 2008. 555-596.

Baroni, M., & Bernardini, S. A new approach to the study of translationese: Machine-learning the difference between original and translated text. Literary and Linguistic Computing, 21(3), 2006. 259-274.

Biber, D. Representativeness in corpus design. Literary and Linguistic Computing, 8(4), 1993. 243-257.

Gries, S. T. What is corpus linguistics? Language and Linguistics Compass, 3(5), 2009. 1225-1241.

Herring, S. C. Discourse in Web 2.0: Familiar, reconfigured, and emergent. In D. Tannen and A. M. Trester (Eds.), Georgetown University Round Table on Languages and Linguistics 2011: Discourse 2.0: Language and new media (pp. 1-25). Georgetown University Press. 2013.

Kennedy, G. An introduction to corpus linguistics. Longman. 1998.

Leech, G. Adding linguistic annotation. In M. Wynne (Ed.), Developing linguistic corpora: A guide to good practice (pp. 17-29). Oxbow Books. 2005.

McEnery, T., Hardie, A. Corpus linguistics: Method, theory and practice. Cambridge University Press. 2012.

Rissanen, M. Corpus linguistics and historical linguistics. In A. Lüdeling and M. Kytö (Eds.), Corpus linguistics: An international handbook (pp. 53-68). De Gruyter Mouton. 2008.

Sinclair, J. Corpus and text: Basic principles. In M. Wynne (Ed.), Developing linguistic corpora: A guide to good practice (pp. 1-16). Oxbow Books. 2005.

Wynne, M. (Ed.) Developing linguistic corpora: A guide to good practice. Oxbow Books. 2005.

Zimmer, M. “But the data is already public”: On the ethics of research in Facebook. Ethics and Information Technology, 12(4), 2010. 313-325.

Downloads

Published

2025-06-23