This chapter presents all the types of words we will aim to distinguish. We will do this by introducing word classes to which words belong. Also, some words, notably verbs, take a grammatical code that is additional to their word class information.
As seen in the queries of section 1.7, we capture the parse consequences for a word by having it as an item in a word list together with its word class and any grammatical code. The work of this chapter is to establish the initial parse outcome of encountering a class(code)/word pairing in a word list.
We start off with the noun rules of (2.1) for establishing the contribution of the common noun words and proper noun words of Table 2.1.
Table 2.1: Tags for common noun words and proper noun words
NS | plural common noun (e.g., children, revelations, times, wishes) |
N | common noun not subclassified as NS, that is, either singular (e.g., child, revelation, time, wish) or neutral for number (e.g., committee, fish, information) |
NPRS | plural proper noun (e.g., Clintons, Koreas) |
NPR | proper noun not subclassified as NPRS, that is either singular (e.g., Clinton, Tokyo) or neutral for number (e.g., Andes, IBM) |
Prolog query (2.2) asks whether a word list with w('NS','boys') has content for a successful parse of noun structure.
When we connect the noun rules of (2.1) to rules for parsing the internal content of noun phrases in section 3.2, we will find that they contribute nouns which can have a role within the wider noun phrase that makes them either:
There are also the words of Table 2.2 that are noun phrase head words that cannot have premodifiers. The noun_head rules of (2.3) distinguish this kind of word at the rule level.
Table 2.2: Tags for noun phrase head words that cannot have premodifiers
Q;_nphd_ | indefinite pronoun with quantification, which can be a compound (e.g., everybody, nothing) or a word which often occurs with the preposition of (e.g., much, many, a_lot) |
D;_nphd_ | indefinite pronoun not subclassified as Q (e.g., someone, anything, another) and demonstrative pronoun (e.g., this, that, these, those) |
Additionally, there are the noun phrase head words of Table 2.3 that give full content for noun phrases that cannot have modifiers of the noun head. To distinguish these head words, there are the noun_head_full rules of (2.4).
Table 2.3: Tags for noun phrase head words that are full content for noun phrases
PNX | reflexive pronoun (e.g., myself, yourself, itself, ourselves) or reciprocal pronoun (each_other, one_another) |
PRO | personal pronoun (e.g., I, you, them, us) |
PRO;_ppge_ | nominal possessive personal pronoun (e.g., mine, yours, ours) |
WPRO | wh-pronoun (what, who, whom) |
RPRO | relative pronoun (which, who, whom, that) |
Note how [rules 4 and 5] of (2.4) involve the word class together with code information of PRO;_ppge_ that results in the projection of a genitive case marked noun phrase (NP-GENV) that goes on to be the only element of a containing noun phrase, as seen with NP-SBJ in the parse result of (2.5).
Section 2.4 below will introduce other words that lead to genitive case marked noun phrases, with the difference that the projected genitive noun phrases will themselves need to occur inside noun phrases where there is head content to modify.
We should also note how the noun_head_full rules of (2.4) are of four types that depend on the value of the first parameter:
What do the results of (2.6)–(2.21) tell us about where words with the different word classes of Table 2.3 can occur?
The det rules of (2.22) establish the contribution of the determiners of Table 2.4. These words can only serve as noun head premodifiers. Outside of coordination, there can be at most one occurrence of such a word per noun phrase instance.
Table 2.4: Tags for determiner words
Q | quantifier (e.g., every, no) |
D | determiner, which includes articles (e.g., a, the) and demonstratives (e.g., this, that) |
WD | wh-determiner (e.g., which, what, whichever) |
RD | relative determiner of a relative clause (e.g., what, whatever) |
Like the words detected by the noun_head_full rules of (2.4), words detected by the det rules of (2.22) are of four types that depend on the value of the first parameter:
A genitive noun phrase which itself acts as the premodifier of an external noun phrase head is signalled by the presence of either a genitive marker or genitive pronoun word from Table 2.5. Phrase structure integration follows from genm of (2.23) and pronoun_genm of (2.24).
Table 2.5: Tags for genitive markers
GENM | genitive marker (<apos>s or <apos>) |
PRO;_genm_ | possessive pronoun, pre-nominal (my, your, our) |
WPRO;_genm_ | genitive wh-pronoun (whose) |
RPRO;_genm_ | genitive relative pronoun (whose) |
Words detected by the pronoun_genm rules of (2.24) are of four types that depend on the value of the first parameter:
Except when the clause Type setting is imperative_clause, there is always the requirement that a finite clause should include a subject. The rules of (2.25) integrate the subject requirement, provided the clause is not a main clause interrogative whose subject contains an interrogative word.
The first four rules of (2.25) allow for the subject to be a formal word from Table 2.6 that will set the value of the clause SbjType parameter to:
Table 2.6: Tags for subject indicating words
EX | existential there, i.e., there of the there is ... or there are ... construction co-occurring with an existential subject (NP-ESBJ) |
PRO;_cleft_ | cleft it occuring as part of a cleft construction (so it was you that got them together) |
PRO;_expletive_ | expletive it e.g., occuring in a weather construction (it's raining) |
PRO;_provisional_ | provisional it occuring with extraposition (it bothered her that she probably would never know) |
The subject is more typically a contentful noun phrase, where this noun phrase will correspond to the ‘do-er’, ‘be-er’ or ‘have-er’ of the verb. Such a typical noun phrase subject is identified with [rule 5] of (2.25), which will also set the clause SbjType parameter to the value of:
Also, in the case of a tough-construction (see section 6.6), a contentful noun phrase can lead to the clause SbjType parameter being set to the value of:
Table 2.7 sets out the range of support for adverb words. Parse integration follows from the adv rules of (2.26).
Table 2.7: Tags for adverb words
ADV | general adverb (e.g., often, well, really). |
ADVR | comparative adverb (e.g., more, less, farther) |
ADVS | superlative adverb (e.g., most, least, farthest) |
WADV | wh-adverb (e.g., how, when, where, why) |
RADV | relative adverb of a relative clause (e.g., how, when, where, whereby) |
RP | adverbial particle (e.g., up, off, out) |
The adv rules of (2.26) are of five types that depend on the value of the first parameter:
Table 2.8 sets out the range of support for adjective words. Parse integration follows from the adj rules of (2.27).
Table 2.8: Tags for adjective words
ADJ | general adjective (e.g., old, good, male) |
ADJR | comparative adjective (e.g., older, better) |
ADJS | superlative adjective (e.g., oldest, best) |
ADJ;_cat_ | catenative adjective (able in be able to, willing in be willing to) |
The adj rules of (2.27) are of two types that depend on the value of the first parameter:
This section covers how verb words are integrated into a parse. The verb rule of (2.28) matches verb words from the word list, collecting information for three parameters: Tag, Code and Word.
The collected information is checked for consistency:
Once matched and licensed, the verb word information goes into the parse tree information accumulated with L.
The remainder of this section recognises verb classes with tags that further distinguish verb form (section 2.8.1), and establishes the grammatical codes that are compatible with combinations of verb classes and inflection information (section 2.8.2). Complement selection for grammatical codes is the topic of chapter 4.
The aim of this section is to provide tags to distinguish the following classes of verb words:
Tags of the same class will have the same initial letter (V, D, H or B) and then vary to distinguish form:
Table 2.9 gives tags for the different forms of lexical verbs.
Table 2.9: Tags for lexical verb words
VBP | present tense form of lexical verbs (e.g., reaches, supports, writes, sinks, puts, reach, support, write, sink, put) |
VBD | past tense form of lexical verbs (e.g., reached, supported, wrote, sank, put) |
VB | infinitive form of lexical verbs (e.g., reach, support, write, sink, put) |
VAG | present participle ({ing}) form of lexical verbs (used in the progressive construction) (e.g., reaching, supporting, writing, sinking, putting) |
VVN | past participle ({ed}/{en}) form of lexical verbs (used in the perfect construction and the passive construction) (e.g., reached, supported, written, sunk, put) |
Table 2.10 gives tags for the different forms of DO.
DOP | present tense forms of the verb DO: do, does, <apos>s |
DOD | past tense form of the verb DO: did |
DO | infinitive form of the verb DO: do |
DAG | present participle form of the verb DO: doing |
DON | past participle form of the verb DO: done |
Table 2.11 gives tags for the different forms of HAVE.
HVP | present tense forms of the verb HAVE: have, <apos>ve, has, <apos>s |
HVD | past tense form of the verb HAVE: had, <apos>d |
HV | infinitive form of the verb HAVE: have |
HAG | present participle form of the verb HAVE: having |
HVN | past participle form of the verb HAVE: had |
Table 2.12 gives tags for the different forms of BE.
BEP | present tense forms of the verb BE: i.e. is, am, are, <apos>m, <apos>re and <apos>s |
BED | past tense forms of the verb BE: was and were |
BE | infinitive form of the verb BE: be |
BAG | present participle form of the verb BE: being |
BEN | past participle form of the verb BE: been |
We can access the verb tag information of tables 2.9–2.12 on the basis of inflection information with verb_tag of (2.29).
When seeking to match a verb word together with tag and verb code from the word list, the verb rule of (2.28) above calls the verb_tag rules of (2.29) to ensure that the tag is compatible with the inflection inherited from the Infl parameter.
This section outlines verb codes as tag label extensions. The codes allow for a distinction of verbs to reflect the selection criteria each verb has for its complements, detailed in chapter 4.
To form a handle on the complement information for main verbs, we adopt the verb code system from the fourth edition of the Oxford Advanced Learner's Dictionary (OALD4; Cowie 1989). In this dictionary, there is matching of verb codes to word sense definitions. The system is a mnemonic based reworking of the earlier system of Hornby (1975). A code from the system has:
For example, the La code marks clause structure (L) with a linking verb + a subject predicative constituent that is an adjective phrase (a). As an example with the dot character, the Cn.a code marks a complex-transitive verb in clause structure (C) with the complex-transitive verb + a direct object constituent that is a noun phrase (n) + an object predicative constituent that is an adjective phrase (a).
Table 2.13: Capital letters L, I, T, C and D
L | Linking verb | selects a subject predicative (-PRD2), an element which provides information about the subject of the clause. | |
I | Intransitive verb | there is no selection of a subject predicative or an object, although there may be selection of an adverbial, an element which tells us about time, place, manner, etc of the action of the verb. | |
T | Transitive verb | Mono-transitive verb | selects a direct object (-OB1), an element which often refers to the person or thing affected by the action of the verb. |
C | Complex-transitive verb | selects both a direct object (-OB1) and an object predicative (-PRD), an element which provides more information about the direct object. Note: in the code, a dot divides information about the realisation of the direct object from information about the realisation of the object predicative. | |
D | Ditransitive verb | selects both a direct object (-OB1) and an indirect object (-OB2), an element which refers to a person who receives something or benefits from an action. Note: in the code, a dot divides information about the realisation of the direct object from information about the realisation of the indirect object. |
Table 2.14: Lower case letters a, n, p, pr, n/pr, n/a, t, f, w, g, i and r
a | adjective phrase |
n | noun phrase |
p | adverb particle |
pr | preposition phrase |
n/pr | noun phrase/preposition phrase |
n/a | as + noun phrase/adjective phrase |
t | non-finite clause (to-infinitive) (IP-INF with to tagged TO and verb tagged VB) |
f | that-clause (CP-THT) |
w | finite or non-finite clause with wh element (CP-QUE) |
g | participial clause ({ing} form) (IP-PPL with verb tagged VAG) |
i | non-finite clause (bare infinitive) (IP-INF with verb tagged VB but no TO tagged word) |
r | utterance |
Also, we source three codes directly from Hornby (1975):
In addition to Cowie (1989) and Hornby (1975) codes, further verb codes distinguish:
Different verb classes allow for different verb codes. verb_code of (2.30) determines compatible verb codes, taking a capital letter as the value for its first parameter to identify verb class:
A verb_code call will then return through the Code parameter a code picked from the corresponding list for compatible codes.
Note how returned codes sometimes depend on inflection information from the Infl parameter.
What does the Prolog query of (2.31) achieve?
Modal verbs have the tags and verb codes of Table 2.15.
Table 2.15: Modal verbs with tags and verb codes
Present tense | Past tense |
w('MD',';~cat_Vi','shall') | w('MD',';~cat_Vi','should') |
w('MD',';~cat_Vi','will') | w('MD',';~cat_Vi','would') |
w('MD',';~cat_Vi','can') | w('MD',';~cat_Vi','could') |
w('MD',';~cat_Vi','may') | w('MD',';~cat_Vi','might') |
w('MD',';~cat_Vi','must') | |
w('MD',';~cat_Vt','ought') | |
w('MD',';~cat_Vi','need') | |
w('MD',';~cat_Vi','dare') | |
| w('MD',';~cat_Vt','used') |
The modal rules of (2.32) support the integration of the modal words of Table 2.15.
Besides verbs, other clause level components are words with the tags of Table 2.16.
Table 2.16: Tags for other clause level words
NEG | negative particle not |
NEG;_clitic_ | negative clitic particle n<apos>t |
TO | Infinitive marker to |
CONJ;_cl_ | discourse coordination (e.g., And, But) |
INTJ | interjection (e.g., aah, eh, ummmmm) |
REACT | reaction signal (e.g., good_grief, really, yes, wow) |
FRM | formulaic expression (e.g., good_afternoon, you_see, thank_you) |
The optional_clitic_negation, neg and to rules of (2.33) and the initial_adverbial rules of (2.34) support the integration of the words of Table 2.16.
As single word content, initial_adverbial calls can pick up:
initial_adverbial calls can also pick up:
So far, we have considered words that serve as components of either phrases or clauses. There is a further class of words with the tags of Table 2.17 that serve as the means to connect phrases and clauses.
Table 2.17: Tags for connective words
CONJ | Coordinating conjunction (and, or, but) |
C | The complementizer that |
WQ | Marker of indirect question (whether or if) |
P-CONN | Subordinating conjunction (e.g., although, when, in_order) |
P-ROLE | Role preposition (e.g., in, of, under) |
The conj, comp, comp_wq, conn and role rules of (2.35) support the integration of the connective words of Table 2.17.
Punctuation points are treated as words for the purposes of word tagging with the tags of Table 2.18.
Table 2.18: Tag for punctuation
PUNC | punctuation: general separating mark (? . ! ,) |
PULQ | punctuation: left quotation mark (<ldquo> <lsquo>) |
PURQ | punctuation: right quotation mark (<rdquo> <rsquo>) |
This makes punctuation part of a sentence in its own right. With the creation of constituent structure, punctuation occurs as high as possible. For example, a full stop that ends a sentence is the last constituent of the highest clause layer.
The punc rules of (2.36) give support for the integration of punctuation, with an initial parameter to distinguish types:
Optional non-final punctuation follows from (2.37).