Building tree annotation with a spreadsheet

Alastair Butler

Hirosaki University





A spreadsheet consists of a table of cells arranged into columns and rows. The columns are normally represented by letters, “A”, “B”, “C”, etc., while the rows are normally represented by numbers, “1”, “2”, “3”, etc. A single cell can be referenced by its column and row. For example, A5 represents the cell containing the value jumps in table (2) below.

    Additionally, spreadsheets have the concept of a range as a grouping of cells. For example, the first ten cells in the first column of (2) form the range “A1:A10”, which is filled with the words of the example sentence (1). Note that each consecutive word fills a single cell of a distinct consecutive row. Also, note that punctuation is treated like a word. Finally, there is cell A11 to tell us that cell B11 contains an ID for the sentence.

(1)
The quick brown fox jumps over the lazy dog.
(2)
ABCDEFG
1The
2quick
3brown
4fox
5jumps
6over
7the
8lazy
9dog
10.
11IDexample
12

    We need to add word class tag information to the words of (2). We first insert cells to the left of A1:A10, so that the contents of A1:A10 become B1:B10, and A1:A10 become empty cells, as in (3). There is now room for adding word class tags.

(3)
ABCDEFG
1The
2quick
3brown
4fox
5jumps
6over
7the
8lazy
9dog
10.
11IDexample
12

    We add word class tags to the cells of A1:A10, resulting in (4).

(4)
ABCDEFG
1DThe
2ADJquick
3ADJbrown
4Nfox
5VBP;~Iprjumps
6P-ROLEover
7Dthe
8ADJlazy
9Ndog
10PUNC.
11IDexample
12

    Now that every word has a word class tag, we can start creating phrase structure. In particular, let's start by adding ADJP structures. This requires that we first insert cells to the left of the ADJ word tags, resulting in (5).

(5)
ABCDEFG
1DThe
2ADJquick
3ADJbrown
4Nfox
5VBP;~Iprjumps
6P-ROLEover
7Dthe
8ADJlazy
9Ndog
10PUNC.
11IDexample
12

    We can now fill the empty cells created in (5) with the ADJP tag, resulting in (6). Note that the second instance of ADJP is given an index (“;@2“). This is because two ADJP instances are occurring consecutively that need to be distinct. Aside from ensuring distinctness, the chosen numbering is not important.

(6)
ABCDEFG
1DThe
2ADJPADJquick
3ADJP;@2ADJbrown
4Nfox
5VBP;~Iprjumps
6P-ROLEover
7Dthe
8ADJPADJlazy
9Ndog
10PUNC.
11IDexample
12

    We next insert cell structure for a subject noun phrase, resulting in (7).

(7)
ABCDEFG
1DThe
2ADJPADJquick
3ADJP;@2ADJbrown
4Nfox
5VBP;~Iprjumps
6P-ROLEover
7Dthe
8ADJPADJlazy
9Ndog
10PUNC.
11IDexample
12

    We next fill the subject noun phrase cell structure created in (7) with the NP-SBJ tag, resulting in (8). There is no indexing with NP-SBJ because all consecutive instances of NP-SBJ (A1:A4) belong to the same phrase.

(8)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJPADJquick
3NP-SBJADJP;@2ADJbrown
4NP-SBJNfox
5VBP;~Iprjumps
6P-ROLEover
7Dthe
8ADJPADJlazy
9Ndog
10PUNC.
11IDexample
12

    We next insert cell structure for a second noun phrase, resulting in (9).

(9)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJPADJquick
3NP-SBJADJP;@2ADJbrown
4NP-SBJNfox
5VBP;~Iprjumps
6P-ROLEover
7Dthe
8ADJPADJlazy
9Ndog
10PUNC.
11IDexample
12

    We next fill the noun phrase cell structure created in (9) with the NP tag, resulting in (10).

(10)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJPADJquick
3NP-SBJADJP;@2ADJbrown
4NP-SBJNfox
5VBP;~Iprjumps
6P-ROLEover
7NPDthe
8NPADJPADJlazy
9NPNdog
10PUNC.
11IDexample
12

    We next insert cell structure for a preposition phrase, resulting in (11).

(11)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJPADJquick
3NP-SBJADJP;@2ADJbrown
4NP-SBJNfox
5VBP;~Iprjumps
6P-ROLEover
7NPDthe
8NPADJPADJlazy
9NPNdog
10PUNC.
11IDexample
12

    We next fill the preposition phrase cell structure created in (11) with the PP-CLR-DIR tag, resulting in (12).

(12)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJPADJquick
3NP-SBJADJP;@2ADJbrown
4NP-SBJNfox
5VBP;~Iprjumps
6PP-CLR-DIRP-ROLEover
7PP-CLR-DIRNPDthe
8PP-CLR-DIRNPADJPADJlazy
9PP-CLR-DIRNPNdog
10PUNC.
11IDexample
12

    We next insert cell structure for a matrix clause, resulting in (13).

(13)
ABCDEFG
1NP-SBJDThe
2NP-SBJADJPADJquick
3NP-SBJADJP;@2ADJbrown
4NP-SBJNfox
5VBP;~Iprjumps
6PP-CLR-DIRP-ROLEover
7PP-CLR-DIRNPDthe
8PP-CLR-DIRNPADJPADJlazy
9PP-CLR-DIRNPNdog
10PUNC.
11IDexample
12

    We next fill the matrix clause cell structure created in (13) with the IP-MAT tag, resulting in the completed tree information of (14).

(14)
ABCDEFG
1IP-MATNP-SBJDThe
2IP-MATNP-SBJADJPADJquick
3IP-MATNP-SBJADJP;@2ADJbrown
4IP-MATNP-SBJNfox
5IP-MATVBP;~Iprjumps
6IP-MATPP-CLR-DIRP-ROLEover
7IP-MATPP-CLR-DIRNPDthe
8IP-MATPP-CLR-DIRNPADJPADJlazy
9IP-MATPP-CLR-DIRNPNdog
10IP-MATPUNC.
11IDexample
12

    The completed tree information of (14) in traditional bracketed (Penn/CorpusSearch) format is (15). Note that the bracketing keeps consecutive constituents with the same node tag distinct, so that the node indexing introduced in (6) is no longer required.

(15)
( (IP-MAT
    (NP-SBJ (D The)
      (ADJP (ADJ quick))
      (ADJP (ADJ brown))
      (N fox))
    (VBP jumps)
    (PP-CLR-DIR (P-ROLE over)
      (NP (D the)
        (ADJP (ADJ lazy))
        (N dog)))
    (PUNC .))
  (ID example))

The completed tree information of (14) can be represented as the graphical tree (16).

(16)
IP-MAT NP-SBJ D The ADJP ADJ quick ADJP ADJ brown N fox VBP jumps PP-CLR-DIR P-ROLE over NP D the ADJP ADJ lazy N dog PUNC .