2005 IntroToTheSpecIssOnMWE

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Compound Word.

Notes

Cited By

Quotes

Abstract

A multiword expression (MWE) is an expression for which the syntactic or semantic properties of the whole expression cannot be derived from its parts. This definition covers a large number of related but distinct phenomena, such as phrasal verbs (e.g., add up), nominal compounds (e.g., telephone box), institutionalised phrases (e.g., salt and pepper), and many others. They are used frequently in everyday language, usually to express precisely ideas and concepts that cannot be compressed into a single word. They are syntactically and/or semantically idiosyncratic in nature but can have a great deal of flexibility and variation in their form, with complex interrelations that can be found between their components. For instance, some MWEs are fixed in the sense that they do not present internal variation, such as by and large and ad hoc, whilst others are much more flexible and allow different degrees of internal variability and modification, such as touch a nerve (touch/find a raw nerve) and spill the beans (spill/spilt the/several/some of/all the beans).

MWEs are a major part of language. In English, Jackendoff (1997, 156) estimates that the number of MWEs in a speaker's lexicon is of the same order of magnitude as the number of single words. This is borne out by most on-line lexical resources where almost half of the entries are multiword expressions. For example, in WordNet 1.7 (Fellbaum, 1998b), 41% of the entries are multiword.

MWEs are challenging for both linguistic and computational work due to their heterogeneous characteristics which pose problems for successful (computational) linguistic treatment (Sag et al., 2002). However, the importance of MWEs and their impact in linguistics and natural language processing (NLP) has long been recognised. In linguistics, for example, they have been often used to validate the properties of grammatical theories (e.g., should a syntactic theory include transformational operations or not? (Nunberg et al., 1994)). In NLP applications such as machine translation, recognition of MWEs is necessary for systems to preserve the meaning and produce appropriate translations and avoid the generation of unnatural or nonsensical sentences in the target language.

2. What are MWEs?

The term ‘‘Multiword Expression’’ has been defined slightly differently by different researchers. (footnote: 1 Other terms used to refer to MWEs include ‘‘multiwords’’, ‘‘multiword units’’ (Dias et al., 2004) and ‘‘fixed expressions and idioms’’ Moon (1998).) Calzolari et al. (2002) gives a general definition as ‘‘a sequence of words that acts as a single unit at some level of linguistic analysis’’, which in addition must exhibit (some of) the following characteristics to a smaller or greater extent:

  • (1) reduced syntactic and semantic transparency;
  • (2) reduced or lack of compositionality;
  • (3) more or less frozen or fixed status;
  • (4) possible violation of some otherwise general syntactic patterns or rules;
  • (5) a high degree of lexicalisation (depending on pragmatic factors);
  • (6) a high degree of conventionality.

Calzolari et al. (2002) consider MWEs to include fixed or semi-fixed phrases, compounds, support verbs, idioms, phrasal verbs, collocations, etc.

Alegria et al. (2004) define MWEs as referring to ‘‘both semantically compositional and noncompositional combinations, and both syntactically regular and idiosyncratic phrases’’ including idioms, proper names, compounds, lexical and grammatical collocations, institutionalised phrases, date and number expressions. Frequency factors can also be considered in the definition of MWEs, as for Pereira et al. (2004) ‘‘sequences of words that co-occur more often than expected by chance’’.

Sag et al. (2002) define MWEs as ‘‘idiosyncratic interpretations that cross word boundaries (or spaces)’’. The focus of this definition is on the mismatch between the interpretation of a MWE as a whole and the standard meaning of the individual words that compose the expression. Within MWEs, they include fixed and semi-fixed expressions, idioms, compound nominals, proper names, verb-particle constructions, institutionalised phrases, and light verbs. This is the definition we adopt in this special issue.

References


,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2005 IntroToTheSpecIssOnMWEAnna Korhonen
Aline Villavicencio
Francis Bond
Diana McCarthy
Introduction to the Special Issue on Multiword Expressions: Having a crack at a hard nutSpecial issue on Multiword Expression, Computer Speech & Languagehttp://dx.doi.org/10.1016/j.csl.2005.05.00110.1016/j.csl.2005.05.0012005