Building tree annotation with a spreadsheet

Alastair Butler
Hirosaki University





A spreadsheet consists of a table of cells arranged into columns and rows. The columns are normally represented by letters, “A”, “B”, “C”, etc., while the rows are normally represented by numbers, “1”, “2”, “3”, etc. A single cell can be referenced by its column and row. For example, A5 references the cell containing the value jumps in table (2) below.

    Additionally, spreadsheets have the concept of a range as a grouping of cells. For example, the first ten cells in the first column of spreadsheet (2) form the range A1:A10, which is filled with the words of sentence (1). Note that each consecutive word fills a single cell of a distinct consecutive row. Also, note that punctuation is treated like a word. Finally, there is cell A11 to tell us that cell B11 contains an ID for the sentence.

(1)
The quick brown fox jumps over the lazy dog.
(2)
ABCDEFG
1The
2quick
3brown
4fox
5jumps
6over
7the
8lazy
9dog
10.
11ID1_id
12

    We need to add word class tag information to the words of (2). We first insert cells to the left of A1:A10, so that the contents of A1:A10 become B1:B10, and A1:A10 become empty cells, as in spreadsheet (3). There is now room for adding word class tags.

(3)
ABCDEFG
1The
2quick
3brown
4fox
5jumps
6over
7the
8lazy
9dog
10.
11ID1_id
12

    We add word class tags to the cells of A1:A10, resulting in spreadsheet (4).

(4)
ABCDEFG
1DThe
2ADJquick
3ADJbrown
4Nfox
5VBP;~Iprjumps
6P-ROLEover
7Dthe
8ADJlazy
9Ndog
10PUNC.
11ID1_id
12

    Now that every word has a word class tag, we can begin creating phrase structure. Let's start by adding ADJP structures. This requires that we first insert cells to the left of ADJ word tags, resulting in spreadsheet (5).

(5)
ABCDEFG
1DThe
2ADJquick
3ADJbrown
4Nfox
5VBP;~Iprjumps
6P-ROLEover
7Dthe
8ADJlazy
9Ndog
10PUNC.
11ID1_id
12

    We can now fill the empty cells created in spreadsheet (5) with the ADJP tag, resulting in spreadsheet (6). Note that the second instance of ADJP has extra marking (“;@3”). This is because two ADJP instances are occurring consecutively that need to be distinct. Aside from ensuring distinctness, number choice for such extra marking is not important. Nevertheless, it is helpful to follow some convention. The convention followed in spreadsheet (6) includes ‘;@n’ for an n-th sister node. That “;@3” is an n-th sister node will become clear from spreadsheet (7) onwards.

(6)
ABCDEFG
1DThe
2ADJPADJquick
3ADJP;@3ADJbrown
4Nfox
5VBP;~Iprjumps
6P-ROLEover
7Dthe
8ADJPADJlazy
9Ndog
10PUNC.
11ID1_id
12

    We next insert cell structure for a subject noun phrase, resulting in spreadsheet (7).

(7)
ABCDEFG
1DThe
2ADJPADJquick
3ADJP;@3ADJbrown
4Nfox
5VBP;~Iprjumps
6P-ROLEover
7Dthe
8ADJPADJlazy
9Ndog
10PUNC.
11ID1_id
12

    We next fill the subject noun phrase cell structure created in spreadsheet (7) with the NP-SBJ tag, resulting in spreadsheet (8). Note how all consecutive instances of NP-SBJ (A1:A4) belong to the same phrase.

(8)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJPADJquick
3NP-SBJADJP;@3ADJbrown
4NP-SBJNfox
5VBP;~Iprjumps
6P-ROLEover
7Dthe
8ADJPADJlazy
9Ndog
10PUNC.
11ID1_id
12

    We next insert cell structure for a second noun phrase, resulting in spreadsheet (9).

(9)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJPADJquick
3NP-SBJADJP;@3ADJbrown
4NP-SBJNfox
5VBP;~Iprjumps
6P-ROLEover
7Dthe
8ADJPADJlazy
9Ndog
10PUNC.
11ID1_id
12

    We next fill the noun phrase cell structure created in spreadsheet (9) with the NP tag, resulting in spreadsheet (10).

(10)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJPADJquick
3NP-SBJADJP;@3ADJbrown
4NP-SBJNfox
5VBP;~Iprjumps
6P-ROLEover
7NPDthe
8NPADJPADJlazy
9NPNdog
10PUNC.
11ID1_id
12

    We next insert cell structure for a preposition phrase, resulting in spreadsheet (11).

(11)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJPADJquick
3NP-SBJADJP;@3ADJbrown
4NP-SBJNfox
5VBP;~Iprjumps
6P-ROLEover
7NPDthe
8NPADJPADJlazy
9NPNdog
10PUNC.
11ID1_id
12

    We next fill the preposition phrase cell structure created in spreadsheet (11) with the PP-CLR tag, resulting in spreadsheet (12).

(12)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJPADJquick
3NP-SBJADJP;@3ADJbrown
4NP-SBJNfox
5VBP;~Iprjumps
6PP-CLRP-ROLEover
7PP-CLRNPDthe
8PP-CLRNPADJPADJlazy
9PP-CLRNPNdog
10PUNC.
11ID1_id
12

    We next insert cell structure for a matrix clause, resulting in spreadsheet (13).

(13)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJPADJquick
3NP-SBJADJP;@3ADJbrown
4NP-SBJNfox
5VBP;~Iprjumps
6PP-CLRP-ROLEover
7PP-CLRNPDthe
8PP-CLRNPADJPADJlazy
9PP-CLRNPNdog
10PUNC.
11ID1_id
12

    We next fill the matrix clause cell structure created in spreadsheet (13) with the IP-MAT tag, resulting in the completed tree information of spreadsheet (14).

(14)
ABCDEFG
1IP-MATNP-SBJDThe
2IP-MATNP-SBJADJPADJquick
3IP-MATNP-SBJADJP;@3ADJbrown
4IP-MATNP-SBJNfox
5IP-MATVBP;~Iprjumps
6IP-MATPP-CLRP-ROLEover
7IP-MATPP-CLRNPDthe
8IP-MATPP-CLRNPADJPADJlazy
9IP-MATPP-CLRNPNdog
10IP-MATPUNC.
11ID1_id
12

    We obtain (15) when the completed tree information of spreadsheet (14) is presented in traditional bracketed (Penn/CorpusSearch) format. Note that this bracketed presentation of the tree information keeps consecutive constituents with the same node tag distinct. Consequently, any extra marking for distinct nodes, like “ADJP;@3” introduced in spreadsheet (6), is no longer required.

(15)
( (IP-⁠MAT (NP-⁠SBJ (D The)
                  (ADJP (ADJ quick))
                  (ADJP (ADJ brown))
                  (N fox))
          (VBP jumps)
          (PP-⁠CLR (P-⁠ROLE over)
                  (NP (D the)
                      (ADJP (ADJ lazy))
                      (N dog)))
          (PUNC .))
  (ID 1_id))

The completed tree information of spreadsheet (14) can also be presented as the graphical tree (16).

(16)
IP-MAT NP-SBJ D The ADJP ADJ quick ADJP ADJ brown N fox VBP jumps PP-CLR P-ROLE over NP D the ADJP ADJ lazy N dog PUNC .