Grammar Mistake Modification within the Morphologically Rich Dialects: Possible of Russian

Grammar Mistake Modification within the Morphologically Rich Dialects: Possible of Russian

Alla Rozovskaya, Dan Roth; Sentence structure Mistake Correction when you look at the Morphologically Rich Dialects: The way it is regarding Russian. Purchases of Connection getting Computational Linguistics 2019; eight 1–17. doi:

Conceptual

So far, all of the look inside sentence structure error modification worried about English, therefore the situation provides scarcely become searched with other languages. We address work off correcting composing mistakes into the morphologically rich dialects, having a pay attention to Russian. We establish a corrected and error-marked corpus from Russian student writing and create activities that make accessibility current condition-of-the-artwork methods that have been well-studied to own English. Even though unbelievable performance possess been already attained for grammar mistake modification of low-local English creating, such email address details are simply for domains where plentiful studies investigation is available. While the annotation is extremely high priced, this type of tips commonly right for many domains and you can dialects. I for this reason manage measures that use “restricted oversight”; that’s, people who do not have confidence in considerable amounts out of annotated degree study, and have how existing limited-supervision tips expand so you’re able to a very inflectional code such Russian. The outcomes show that these processes are useful correcting mistakes when you look at the grammatical phenomena one cover rich morphology.

step one Introduction

That it papers address the work away from repairing errors into the text message. All lookup in the area of grammar mistake correction (GEC) focused on repairing problems created by English language learners. You to basic approach to writing about this type of problems, hence turned out highly successful in text correction tournaments (Dale and you can Kilgarriff, 2011; Dale ainsi que al., 2012; Ng ainsi que al., 2013, 2014; Rozovskaya mais aussi al., 2017), utilizes a machine- understanding classifier paradigm that is in line with the strategy to have correcting context-sensitive and painful spelling errors (Golding and you can Roth, 1996, 1999; Banko and Brill, 2001). Within method, classifiers try trained getting a particular error kind of: including, preposition, blog post, or noun matter (Tetreault et al., 2010; Gamon, 2010; Rozovskaya and you may Roth, 2010c, b; Dahlmeier and you may Ng, 2012). To start with, classifiers had been coached on the native English data. While the multiple annotated learner datasets turned into available, models was basically plus trained into the annotated student data.

More recently, the fresh new statistical host interpretation (MT) steps, together with sensory MT, keeps gained considerable prominence due to the method of getting high annotated corpora regarding learner composing (elizabeth.grams., Yuan and you can Briscoe, 2016; patt and Ng, 2018). Class procedures work well into really-outlined variety of mistakes, while MT is useful during the correcting communicating and you can state-of-the-art variety of errors, that renders these types of methods complementary in a number of respects (Rozovskaya and you can Roth, 2016).

Due to the supply of higher (in-domain) datasets, good-sized progress in the abilities were made from inside the English grammar correction. Unfortunately, lookup for the most other languages could have been scarce. Prior work comes with services to create annotated student corpora to have Arabic (Zaghouani mais aussi al., 2014), Japanese (Mizumoto ainsi que al., 2011), and you can Chinese (Yu ainsi que al., 2014), and you will shared employment to your Arabic (Mohit ainsi que al., 2014; Rozovskaya et al., 2015) and you may Chinese mistake recognition (Lee ainsi que al., 2016; Rao et al., 2017). However, strengthening powerful habits various other dialects might have been a problem, because the a method you to definitely relies on heavier supervision isn’t viable across languages, styles, and you will student experiences. More over, to own languages that are cutting-edge morphologically, we may you need even more investigation to handle the latest lexical sparsity.

It functions targets Russian, a highly inflectional vocabulary about Slavic category. Russian has more than 260M audio system, to have 47% out-of which Russian isn’t the local code. 1 We corrected and you will error-marked more profile compatible partners 200K terms off non-native Russian texts. We utilize this dataset to construct several sentence structure correction solutions you to mark to your and you will stretch the ways you to showed county-of-the-artwork overall performance into the English grammar modification. Since sized our very own annotation is restricted, compared with what’s utilized for English, one of several goals of your efforts are so you can assess brand new effect of with limited annotation for the current steps. I check both MT paradigm, and that means considerable amounts from annotated learner research, while the class ways that can focus on one amount of supervision.

Bài viết tương tự