Treebanks: Building and Using Parsed Corpora by Ann Taylor, Mitchell Marcus, Beatrice Santorini (auth.),

By Ann Taylor, Mitchell Marcus, Beatrice Santorini (auth.), Anne Abeillé (eds.)

Linguists and engineers in common Language Processing are inclined to use digital corpora increasingly more. such a lot study has lengthy been constrained to uncooked (unannotated) texts or to tagged texts (annotated with components of speech only), yet those ways be afflicted by a note via notice point of view. a brand new line of study comprises corpora with richer annotations comparable to clauses and significant components, grammatical capabilities and dependency hyperlinks. the 1st parsed corpora have been the English Lancaster treebank and Penn Treebank. New ones have lately been constructed for different languages.
This publication:

provides a cutting-edge on paintings being performed with parsed corpora;

gathers 21 papers on development and utilizing parsed corpora elevating many correct questions;

deals with a number of languages and various corpora;

is for these operating in linguistics, computational linguistics, average language, syntax, and grammar.

Show description

Read Online or Download Treebanks: Building and Using Parsed Corpora PDF

Best nonfiction_7 books

Pharmaceutical photostability and stabilization technology

In line with a coaching path built via Dr. Joseph T. Piechocki and different specialists during this box whose contributions seem during this publication for 2 foreign conferences at the Photostability of gear and Drug items, this article clarifies the information set via the foreign convention on Harmonization (ICH) and gives a finished historical past within the clinical ideas inquisitive about photostability trying out.

Microscopy of Semiconducting Materials 2007

The 15th foreign convention on Microscopy of Semiconducting fabrics came about in Cambridge, united kingdom on 2-5 April 2007. It used to be organised through the Institute of Physics, with co-sponsorship via the Royal Microscopical Society and endorsement through the fabrics study Society. The convention centred upon the newest advances within the research of the structural and digital homes of semiconducting fabrics through the applying of transmission and scanning electron microscopy, scanning probe microscopy and X-ray-based tools.

Electrochemistry of Immobilized Particles and Droplets

Immobilizing debris or droplets on electrodes is a singular and strongest procedure for learning the electrochemical reactions of three-phase structures. It offers entry to a wealth of knowledge, starting from quantitative and part research to thermodynamic and kinetic info of electrode tactics.

Additional resources for Treebanks: Building and Using Parsed Corpora

Sample text

If an adjunct is topicalized, the fronted element does not leave a trace since the level of attachment is the same, only the word order is different. Topicalized arguments, on the other hand, always are marked by a null element: (S (NP-TPC-S This) (NP-SBJ every man) (VP contains (NP *T*-S) (PP-LOC wi t hi n (NP him)))) Again, this makes predicate argument interpretation straightforward, if the null element is simply replaced by the constituent to which it is eo-indexed. 13 THE PENN TREEBANK : AN OVERVIEW With only a skeletal parse as used in the first phase of the Treebank project, many otherwise clear argument/adjunct relations cannot be recovered due to its essentially context-free representation.

Beatrice Santorini, and Mary Ann Marcinkiewicz. (1993) Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics 19(2):313-330. , Grace Kim, Mary Ann Marcinkiewicz, Robert MacIntyre, Ann Bies, Mark Ferguson, Karen Katz, and Britta Schasberger. (1994). The Penn Treebank: Annotating predicate-argument structure. In ARPA Human Language Technology Workshop. Mateer, Marie, and Ann Taylor. (1995). Disfiuency Annotation Stylebook for the Switchboard Corpus. , Department of Computer and Information Science, University of Pennsylvania.

In FIG. 1, the next to rightmo st field contains the words of a speech turn uttered by the speaker whose CHRISTINE code name is Gemm a006 , and the rightmo st field shows the tree structure in which the words occur, displayed on successive lines as segments of a labelled bracketing . Gemma's second word you is a noun phrase (N) functioning as subject ( : 8) of its clause, whose verb group (V) is the single word want. The object of want is an infinitival clause (Ti : 0) , whose understood logical subject is again you, hence a "ghost" element 8101 is inserted in the word stream with an index number, 101, which marks it as identical to the subject of the main clause - and so on.

Download PDF sample

Rated 4.29 of 5 – based on 50 votes