Building tree annotation with a spreadsheet

Alastair Butler

Hirosaki University



A spreadsheet consists of a table of cells arranged into columns and rows. The columns are normally represented by letters, “A”, “B”, “C”, etc., while the rows are normally represented by numbers, “1”, “2”, “3”, etc. A single cell can be referenced by its column and row (A5 would represent the cell containing the value jumps in table (2) below). Additionally, spreadsheets have the concept of a range, a group of cells, normally contiguous. For example, the first ten cells in the first column of (2) form the range “A1:A10”, which is filled with the words of the example sentence (1). Punctuation is treated like a word, where each consecutive word fills a single cell of a distinct consecutive row. Cell B11 contains an ID for the sentence, with there needing to be at least one completely blank row (row 12 in (2)) following the row of an ID cell.

(1)
The quick brown fox jumps over the lazy dog.
(2)
ABCDEFG
1The
2quick
3brown
4fox
5jumps
6over
7the
8lazy
9dog
10.
11IDexample
12

    We need to add word class tag information to the words of (2). We first insert cells to the left of A1:A10, so that the contents of A1:A10 become B1:B10, and A1:A10 become empty cells, as in (3). There is now room for adding word class tags.

(3)
ABCDEFG
1The
2quick
3brown
4fox
5jumps
6over
7the
8lazy
9dog
10.
11IDexample
12

    We add word class tags to the cells of A1:A10, resulting in (4).

(4)
ABCDEFG
1DThe
2ADJquick
3ADJbrown
4Nfox
5VBPjumps
6P-ROLEover
7Dthe
8ADJlazy
9Ndog
10PUNC.
11IDexample
12

    Now that every word has a word class tag, we can start projecting phrase structure. In particular, let's start by projecting ADJP structures. This requires that we first insert cells to the left of the ADJ word tags, resulting in (5).

(5)
ABCDEFG
1DThe
2ADJquick
3ADJbrown
4Nfox
5VBPjumps
6P-ROLEover
7Dthe
8ADJlazy
9Ndog
10PUNC.
11IDexample
12

    We can now fill the empty cells created in (5) with the ADJP tag, resulting in (6). Note that two instances of ADJP are given indexes “;@2” and “;@3“. This is because these ADJP instances occur consecutively and yet they need to be distinct ADJP projections. Aside from making consecutive projections distinct, the chosen numbering is not important.

(6)
ABCDEFG
1DThe
2ADJP;@2ADJquick
3ADJP;@3ADJbrown
4Nfox
5VBPjumps
6P-ROLEover
7Dthe
8ADJPADJlazy
9Ndog
10PUNC.
11IDexample
12

    We next insert cell structure for a subject noun phrase, resulting in (7).

(7)
ABCDEFG
1DThe
2ADJP;@2ADJquick
3ADJP;@3ADJbrown
4Nfox
5VBPjumps
6P-ROLEover
7Dthe
8ADJPADJlazy
9Ndog
10PUNC.
11IDexample
12

    We next fill the subject noun phrase cell structure created in (7) with the NP-SBJ tag, resulting in (8). There is no indexing with NP-SBJ because all consecutive instances of NP-SBJ (A1:A4) are the same NP-SBJ projection.

(8)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJP;@2ADJquick
3NP-SBJADJP;@3ADJbrown
4NP-SBJNfox
5VBPjumps
6P-ROLEover
7Dthe
8ADJPADJlazy
9Ndog
10PUNC.
11IDexample
12

    We next insert cell structure for a second noun phrase, resulting in (9).

(9)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJP;@2ADJquick
3NP-SBJADJP;@3ADJbrown
4NP-SBJNfox
5VBPjumps
6P-ROLEover
7Dthe
8ADJPADJlazy
9Ndog
10PUNC.
11IDexample
12

    We next fill the noun phrase cell structure created in (9) with the NP tag, resulting in (10).

(10)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJP;@2ADJquick
3NP-SBJADJP;@3ADJbrown
4NP-SBJNfox
5VBPjumps
6P-ROLEover
7NPDthe
8NPADJPADJlazy
9NPNdog
10PUNC.
11IDexample
12

    We next insert cell structure for a preposition phrase, resulting in (11).

(11)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJP;@2ADJquick
3NP-SBJADJP;@3ADJbrown
4NP-SBJNfox
5VBPjumps
6P-ROLEover
7NPDthe
8NPADJPADJlazy
9NPNdog
10PUNC.
11IDexample
12

    We next fill the preposition phrase cell structure created in (11) with the PP-DIR tag, resulting in (12).

(12)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJP;@2ADJquick
3NP-SBJADJP;@3ADJbrown
4NP-SBJNfox
5VBPjumps
6PP-DIRP-ROLEover
7PP-DIRNPDthe
8PP-DIRNPADJPADJlazy
9PP-DIRNPNdog
10PUNC.
11IDexample
12

    We next insert cell structure for a matrix clause, resulting in (13).

(13)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJP;@2ADJquick
3NP-SBJADJP;@3ADJbrown
4NP-SBJNfox
5VBPjumps
6PP-DIRP-ROLEover
7PP-DIRNPDthe
8PP-DIRNPADJPADJlazy
9PP-DIRNPNdog
10PUNC.
11IDexample
12

    We next fill the matrix clause cell structure created in (13) with the IP-MAT tag, resulting in the completed tree information of (14).

(14)
ABCDEFG
1IP-MATNP-SBJDThe
2IP-MATNP-SBJADJP;@2ADJquick
3IP-MATNP-SBJADJP;@3ADJbrown
4IP-MATNP-SBJNfox
5IP-MATVBPjumps
6IP-MATPP-DIRP-ROLEover
7IP-MATPP-DIRNPDthe
8IP-MATPP-DIRNPADJPADJlazy
9IP-MATPP-DIRNPNdog
10IP-MATPUNC.
11IDexample
12

    The completed tree information of (14) can be represented in traditional bracketed (Penn/CorpusSearch) format as (15). Note that the bracketing keeps consecutive constituents with the same node tag distinct, so that the node indexing introduced in (6) is no longer required.

(15)
( (IP-MAT (NP-SBJ (D The)
                  (ADJP (ADJ quick))
                  (ADJP (ADJ brown))
                  (N fox))
          (VBP jumps)
          (PP-DIR (P-ROLE over)
                  (NP (D the)
                      (ADJP (ADJ lazy))
                      (N dog)))
          (PUNC .))
  (ID example))

The completed tree information of (14) can be represented as the graphical tree (16).

(16)
IP-MAT NP-SBJ D The ADJP ADJ quick ADJP ADJ brown N fox VBP jumps PP-DIR P-ROLE over NP D the ADJP ADJ lazy N dog PUNC .