This chapter presents all of the types of words we will aim to distinguish. This is achieved by saying that a word belongs to a particular word class. Also some words, notably verbs, take a grammatical code that is additional to their word class information.
As seen in the queries of section 1.7, we capture the parse consequences for a word by having it as an item in a word list together with its word class and any grammatical code. The work of this chapter is to establish the initial parse outcome of encountering a class(code)/word pairing in a word list.
We start off with the noun rules of (2.1) for establishing the contribution of the common noun words and proper noun words of Table 2.1.
Table 2.1: Tags for common noun words and proper noun words
NS | plural common noun (e.g., children, revelations, times, wishes) |
N | common noun not subclassified as NS, that is, either singular (e.g., child, revelation, time, wish), or neutral for number (e.g., committee, fish, information) |
NPRS | plural proper noun (e.g., Clintons, Koreas) |
NPR | proper noun not subclassified as NPRS, that is either singular (e.g., Clinton, Tokyo), or neutral for number (e.g., Andes, IBM) |
Prolog query (2.2) asks whether a word list with w('NS','boys') has content for a successful parse of noun structure.
When we connect the noun rules of (2.1) to rules for parsing the internal content of noun phrases in section 3.2, we will find that they contribute nouns which can have a role within the wider noun phrase that makes them either:
There are also the words of Table 2.2 that are noun phrase head words that cannot have premodifiers. The noun_head rules of (2.3) distinguish this kind of word at the rule level.
Table 2.2: Tags for noun phrase head words that cannot have premodifiers
Q;_nphd_ | indefinite pronoun with quantification, which can be a compound (e.g., everybody, nothing), or a word which often occurs with the preposition of (e.g., much, many, a_lot) |
D;_nphd_ | indefinite pronoun not subclassified as Q (e.g., someone, anything, another), and demonstrative pronoun (e.g., this, that, these, those) |
Additionally, there are the words of Table 2.3 that essentially contribute a full noun phrase by themselves, that is, they provide a noun phrase head word that cannot itself have modifiers. To distinguish these words, there are the noun_head_full rules of (2.4).
Table 2.3: Tags for noun phrase head words that cannot have modifiers
PNX | reflexive pronoun (e.g., myself, yourself, itself, ourselves), or reciprocal pronoun (e.g., each_other) |
PRO | personal pronoun (e.g., I, you, them, ours) |
PRO;_ppge_ | nominal possessive personal pronoun (e.g., mine, yours) |
WPRO | wh-pronoun (what, who, whom) |
RPRO | relative pronoun (which, who, whom, that) |
Note how rules 4 and 5 of (2.4) return annotation for a genitive case marked noun phrase (NP-GENV) that is projected by PRO;_ppge_ and that will go on to be the only element of a containing noun phrase, as seen with NP-SBJ in the parse result of (2.5).
We should also note how the noun_head_full rules of (2.4) fall into four different types that depend on the value of the first parameter:
What do the results of (2.6)–(2.21) tell us about where words with the different word classes of Table 2.3 can be used?
The det rules of (2.22) establish the contribution of the determiners of Table 2.4. These words can only serve as noun head premodifiers. Outside of coordination, there can be at most one occurrence of such a word per noun phrase instance.
Table 2.4: Tags for determiner words
Q | quantifier (e.g., every, no) |
D | determiner, which includes articles (e.g., a, the) and demonstratives (e.g., this, that) |
WD | wh-determiner (e.g., which, what, whichever) |
RD | relative determiner of a relative clause (e.g., what, whatever) |
Like the words detected by the noun_head_full rules of (2.4), words detected by the det rules of (2.22) fall into four different types that depend on the value of the first parameter:
A genitive noun phrase which itself acts as the premodifier of an external noun phrase head is signalled by the presence of either a genitive marker or genitive pronoun word from Table 2.5. Phrase structure integration is achieved with genm of (2.23) and pronoun_genm of (2.24).
Table 2.5: Tags for genitive markers
GENM | genitive marker (<apos>s or <apos>) |
PRO;_genm_ | possessive pronoun, pre-nominal (my, your, our) |
WPRO;_genm_ | genitive wh-pronoun (whose) |
RPRO;_genm_ | genitive relative pronoun (whose) |
Words detected by the pronoun_genm rules of (2.24) fall four three different types that depend on the value of the first parameter:
Except when the type parameter of a finite clause is set to imperative_clause, there is always the requirement that a subject should be introduced with the clause. The subject of a non-imperative finite clause is captured in a parse with the rules of (2.25), provided the clause is not a main clause interrogative whose subject contains an interrogative word.
The first four rules of (2.25) allow for the subject to be a formal word from Table 2.6 that will set the value of the SbjType parameter of the clause to:
Table 2.6: Tags for subject indicating words
EX | existential there, i.e., there of the there is ... or there are ... construction co-occurring with an existential subject (NP-ESBJ) |
PRO;_cleft_ | cleft it occuring as part of a cleft construction (so it was you that got them together) |
PRO;_expletive_ | expletive it e.g., occuring in a weather construction (it's raining) |
PRO;_provisional_ | provisional it occuring with extraposition (it bothered her that she probably would never know) |
The subject is more typically a contentful noun phrase, where this noun phrase will correspond to the ‘do-er’, ‘be-er’ or ‘have-er’ of the verb. Such a typical noun phrase subject is identified with rule 5 of (2.25), which will also set the SbjType parameter of the clause to the value of:
Also, in the case of a tough-construction (see section 6.5), a contentful noun phrase can lead to the SbjType parameter of the clause being set to the value of:
Table 2.7 sets out the range of support for different kinds of adverbs. Parse integration is achieved with the adv rules of (2.26).
Table 2.7: Tags for adverb words
ADV | general adverb (e.g., often, well, really). |
ADVR | comparative adverb (e.g., more, less, farther) |
ADVS | superlative adverb (e.g., most, least, farthest) |
RP | adverbial particle (e.g., up, off, out) |
ADV;_cat_ | catenative adverb (about in I was about to say; enough in People feel confident enough to do it) |
WADV | wh-adverb (e.g., how, when, where, why) |
RADV | relative adverb of a relative clause (e.g., how, when, where, whereby) |
The adv rules of (2.26) fall into four different types that depend on the value of the first parameter:
Table 2.8 sets out the range of support for different kinds of adjectives. Parse integration is achieved with the adj rules of (2.27).
Table 2.8: Tags for adjective words
ADJ | general adjective (e.g., old, good, male) |
ADJR | comparative adjective (e.g., older, better) |
ADJS | superlative adjective (e.g., oldest, best) |
ADJ;_cat_ | catenative adjective (able in be able to, willing in be willing to) |
The adj rules of (2.27) fall into two different types that depend on the value of the first parameter:
This section covers how verb words are integrated into a parse. The verb rule of (2.28) matches verb words from the word list, collecting information for three parameters: Tag, Code, and Word.
The collected information is checked for consistency:
Once matched and licensed, the verb word information is added to the parse tree information accumulated with L.
The remainder of this section distinguishes (section 2.8.1) different verb classes with tags that vary to mark form and inflection information, and distinguishes (section 2.8.2) the grammatical codes that are compatible with the different combinations of verb forms and inflection information. Complement selection for grammatical codes is the topic of chapter 4.
The aim of this section is to distinguish with tags different classes of verb words. Distinctions will be made on the basis of form divided further to distinguish inflection.
Table 2.9 gives tags for the different forms of lexical verbs.
Table 2.9: Tags for lexical verb words
VBP | present tense form of lexical verbs (e.g., reaches, supports, writes, sinks, puts, reach, support, write, sink, put) |
VBD | past tense form of lexical verbs (e.g., reached, supported, wrote, sank, put) |
VB | infinitive form of lexical verbs (e.g., reach, support, write, sink, put) |
VAG | present participle ({ing}) form of lexical verbs (used in the progressive construction) (e.g., reaching, supporting, writing, sinking, putting) |
VVN | past participle ({ed}/{en}) form of lexical verbs (used in the perfect construction and the passive construction) (e.g., reached, supported, written, sunk, put) |
Table 2.10 gives tags for the different forms of DO.
DOP | present tense forms of the verb DO: do, does, <apos>s |
DOD | past tense form of the verb DO: did |
DO | infinitive form of the verb DO: do |
DAG | present participle form of the verb DO: doing |
DON | past participle form of the verb DO: done |
Table 2.11 gives tags for the different forms of HAVE.
HVP | present tense forms of the verb HAVE: have, <apos>ve, has, <apos>s |
HVD | past tense form of the verb HAVE: had, <apos>d |
HV | infinitive form of the verb HAVE: have |
HAG | present participle form of the verb HAVE: having |
HVN | past participle form of the verb HAVE: had |
Table 2.12 gives tags for the different forms of BE.
BEP | present tense forms of the verb BE: i.e. is, am, are, <apos>m, <apos>re, and <apos>s |
BED | past tense forms of the verb BE: was and were |
BE | infinitive form of the verb BE: be |
BAG | present participle form of the verb BE: being |
BEN | past participle form of the verb BE: been |
We can access the verb tag information of tables 2.9–2.12 on the basis of inflection information with verb_tag of (2.29).
The verb_tag rules of (2.29) are called by verb of (2.28) above to ensure a tag compatible with the inflection of the Infl parameter when seeking to match a verb word from the word list.
This section outlines verb codes as tag label extensions. The codes allow for a distinction of verbs to reflect the selection criteria each verb has for its complements, detailed in chapter 4.
To form a handle on the complement information for main verbs, we adopt the verb code system from the fourth edition of the Oxford Advanced Learner's Dictionary (OALD4; Cowie 1989). In this dictionary, the verb codes are associated with word sense definitions. The system is a mnemonic based reworking of the earlier system of Hornby (1975). A code from the system has:
For example, the La code marks clause structure (L) with a linking verb + a subject predicative constituent that is an adjective phrase (a). As an example with the dot character, the Cn.a code marks a complex-transitive verb in clause structure (C) with the complex-transitive verb + a direct object constituent that is a noun phrase (n) + an object predicative constituent that is an adjective phrase (a).
Table 2.13: Capital letters L, I, T, C, and D
L | Linking verb | is associated with a subject predicative (-PRD2), an element which provides information about the subject of the clause. | |
I | Intransitive verb | is NOT associated with a subject predicative or an object, although it may be associated with an adverbial, an element which tells us about time, place, manner, etc of the action of the verb. | |
T | Transitive verb | Mono-transitive verb | is associated with a direct object (-OB1), an element which often refers to the person or thing affected by the action of the verb. |
C | Complex-transitive verb | is associated with both a direct object (-OB1) and an object predicative (-PRD), an element which provides more information about the direct object. Note: in the code, a dot divides information about the realisation of the direct object from information about the realisation of the object predicative. | |
D | Ditransitive verb | is associated with both a direct object (-OB1) and an indirect object (-OB2), an element which refers to a person who receives something or benefits from an action. Note: in the code, a dot divides information about the realisation of the direct object from information about the realisation of the indirect object. |
Table 2.14: Lower case letters a, n, p, pr, n/pr, n/a, t, f, w, g, i and r
a | adjective phrase |
n | noun phrase |
p | adverb particle |
pr | preposition phrase |
n/pr | noun phrase/preposition phrase |
n/a | as + noun phrase/adjective phrase |
t | non-finite clause (to-infinitive) (IP-INF with to tagged TO and verb tagged VB) |
f | that-clause (CP-THT) |
w | finite or non-finite clause with wh element (CP-QUE) |
g | participial clause ({ing} form) (IP-PPL with verb tagged VAG) |
i | non-finite clause (bare infinitive) (IP-INF with verb tagged VB but no TO tagged word) |
r | utterance |
Also, we source three codes directly from Hornby (1975):
In addition to Cowie (1989) and Hornby (1975) codes, further verb codes distinguish:
The different verb forms (lexical, DO, HAVE, or BE) allow for different verb codes. verb_code of (2.30) determines compatible verb codes, taking a capital letter as the value for its first parameter to identify the verb form: V for lexical verbs, D for DO verbs, H for HAVE verbs, or B for BE verbs. A verb_code call will then return through the Code parameter a code picked from the corresponding list for compatible codes.
Note how returned codes sometimes depend on inflection information from the Infl parameter.
What does the Prolog query of (2.31) achieve?
Modal verbs have the tags and verb codes of Table 2.15.
Table 2.15: Modal verbs with tags and verb codes
Present tense | Past tense |
w('MD',';~cat_Vi','shall') | w('MD',';~cat_Vi','should') |
w('MD',';~cat_Vi','will') | w('MD',';~cat_Vi','would') |
w('MD',';~cat_Vi','can') | w('MD',';~cat_Vi','could') |
w('MD',';~cat_Vi','may') | w('MD',';~cat_Vi','might') |
w('MD',';~cat_Vi','must') | |
w('MD',';~cat_Vt','ought') | |
w('MD',';~cat_Vi','need') | |
w('MD',';~cat_Vi','dare') | |
w('MD',';~cat_Vt','used') |
The modal rules of (2.32) support the integration of the modal words of Table 2.15.
Besides verbs, other clause level components are words with the tags of Table 2.16.
Table 2.16: Tags for other clause level words
NEG | negative particle not |
NEG;_clitic_ | negative clitic particle n<apos>t |
TO | Infinitive marker to |
CONJ;_cl_ | discourse coordination (e.g., And, But) |
INTJ | interjection (e.g., aah, eh, ummmmm) |
REACT | reaction signal (e.g., good_grief, really, yes, wow) |
FRM | formulaic expression (e.g., good_afternoon, you_see, thank_you) |
The optional_clitic_negation, neg and to rules of (2.33) and the adverbial rules of (2.34) support the integration of the words of Table 2.16.
adverbial calls pick up:
So far we have considered words that serve as components of either phrases or clauses. There is a further class of words with the tags of Table 2.17 that serve as the means to connect phrases and clauses.
Table 2.17: Tags for connective words
CONJ | Coordinating conjunction (and, or, but) |
C | The complementizer that |
WQ | Marker of indirect question (whether or if) |
P-CONN | Subordinating conjunction (e.g., although, when, in_order) |
P-ROLE | Role preposition (e.g., in, of, under) |
The conj, comp, comp_wq, conn, and role rules of (2.35) support the integration of the different connective words of Table 2.17.
Punctuation points are treated as words for the purposes of word tagging with the tags of Table 2.18.
Table 2.18: Tag for punctuation
PUNC | punctuation: general separating mark (? . ! ,) |
PULQ | punctuation: left quotation mark (<ldquo> <lsquo>) |
PURQ | punctuation: right quotation mark (<rdquo> <rsquo>) |
This makes punctuation part of a sentence in its own right. With the creation of constituent structure, punctuation is placed as high as possible. For example, a full stop that ends a sentence is treated as the last constituent of the highest clause layer.
The punc rules of (2.36) give support for the integration of punctuation, with an initial parameter to distinguish types:
Optional punctuation is achieved with (2.37).