Automatic Content Extraction Program

The Automatic Content Extraction (ACE) Program is a research program intended to motivate the progress of natural language processing tasks.

AKA: ACE Program.
Context:
- It can include the following Annotation Tasks:
- It can include a Speech task http://www.nist.gov/speech/tests/ace/index.htm
- It can include an ACE Performance Metric definition.
Example(s):
- - ACE-2007.
  - ACE-2005.
  - ACE-2004.
  - ACE-2003.
  - ACE-2002.
- …
Counter-Example(s):
- MUC.
- TREC.
- TIDES Program.
See: NLP Research.

References

2017a

(ACE Project, 2017) http://projects.ldc.upenn.edu/ace/ Retrieved: 2017-06-24
- The objective of the Automatic Content Extraction (ACE) Program was to develop extraction technology to support automatic processing of source language data (in the form of natural text and as text derived from ASR and OCR). Automatic processing, defined at that time, included classification, filtering, and selection based on the language content of the source data, i.e., based on the meaning conveyed by the data. Thus the ACE program required the development of technologies that automatically detect and characterize this meaning. The ACE research objectives were viewed as the detection and characterization of Entities, Relations, and Events.
  LDC developed annotation guidelines, corpora and other linguistic resources to support the ACE Program. Some of these resources were developed in cooperation with the TIDES Program in support of TIDES Extraction evaluations.
  ACE annotators tagged broadcast transcripts, newswire and newspaper data in English, Chinese and Arabic, producing both training and test data for common research task evaluations. There were three primary ACE annotation tasks corresponding to the three research objectives: Entity Detection and Tracking (EDT), Relation Detection and Characterization (RDC) and Event Detection and Characterization (EDC). A fourth annotation task, Entity Linking (LNK), grouped all references to a single entity and all its properties together into a Composite Entity.

2017b

(Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Automatic_Content_Extraction Retrieved:2017-6-24.
- Automatic Content Extraction (ACE) is a research program for developing advanced Information extraction technologies convened by the NIST from 1999 to 2008, succeeding MUC and preceding Text Analysis Conference.

2007A

(Surdeanu and Ciaramita, 2007) ⇒ Mihai Surdeanu and Massimiliano Ciaramita. (2007). “Robust Information Extraction with Perceptrons.” In: Proceedings of NIST 2007 Automatic Content Extraction Workshop.
- Abstract: We present a system for the extraction of entity and relation mentions. Our work focused on robustness and simplicity: all system components are modeled using variants of the Perceptron algorithm (Rosemblatt, 1858) and only partial syntactic information is used for feature extraction. Our approach has two novel ideas. First, we define a new large-margin Perceptron algorithm tailored for class unbalanced data which dynamically adjusts its margins, according to the generalization performance of the model. Second, we propose a novel architecture that lets classification ambiguities flow through the system and solves them only at the end. The system achieves competitive accuracy on the ACE English EMD and RMD tasks.

2007B

(Bunescu & Mooney, 2007) ⇒ Razvan C. Bunescu, Raymond Mooney. (2007). “Extracting Relations from Text: From Word Sequences to Dependency Paths.” In: Anne Kao, and Steve Poteet (eds.) "[http://cefarhangi.iust.ac.ir/upload/files/623/Natural_Language_Processing_and_Text_Mining.pdf Text Mining and Natural Language
- Introduction: In this chapter, we present two recent approaches to relation extraction that differ in terms of the kind of linguistic information they use:
  - 1. In the first method (Section 2), each potential relation is represented implicitly as a vector of features, where each feature corresponds to a word sequence anchored at the two entities forming the relationship. A relation extraction system is trained based on the subsequence kernel from [2]. This kernel is further generalized so that words can be replaced with word classes, thus enabling the use of information coming from POS tagging, named entity recognition, chunking or Wordnet [3].
  - 2. In the second approach (Section 3), the representation is centered on the shortest dependency path between the two entities in the dependency path between the two entities in the dependency graph of the sentence. Because syntactic analysis is essential in this method, its applicability is limited to domains where syntactic parsing gives reasonable accuracy.

2007C

(Jiang and Zhai, 2007) ⇒ J. Jiang and C. Zhai, (2007). “A Systematic Exploration of the Feature Space for Relation Extraction, In Human Language Technologies (HLT-2007).
- Abstract: Relation extraction is the task of finding semantic relations between entities from text. The state-of-the-art methods for relation extraction are mostly based on statistical learning, and thus all have to deal with feature selection, which can significantly affect the classification performance. In this paper, we systematically explore a large space of features for relation extraction and evaluate the effectiveness of different feature subspaces. We present a general definition of feature spaces based on a graphic representation of relation instances, and explore three different representations of relation instances and features of different complexities within this framework. Our experiments show that using only basic unit features is generally sufficient to achieve state-of-the-art performance, while overinclusion of complex features may hurt the performance. A combination of features of different levels of complexity and from different sentence representations, coupled with task-oriented feature pruning, gives the best performance."

2006A

(Zhang et al., 2006a) ⇒ M. Zhang, J. Zhang, J. Su, and G. Zhou. (2006). “A Composite Kernel to Extract Relations between Entities with Both Flat and Structured Features.” In: Proceedings of COLING-ACL 2006.

2006B

(Zhang et al., 2006b) ⇒ M. Zhang, J. Zhang, J. Su (2006). “Exploring Syntactic Features for Relation Extraction using a Convolution Tree Kernel.” In: Proceedings of HLT-2006.

2006C

(HassanHN, 2006 ⇒ H. Hassan, A. Hassan and S. Noeman. (2006). “Graph Based Semi-Supervised Approach for Information Extraction.” In: Proceedings of the Workshop on Graph-based methods for NLP at HLT/NAACL-2006.

2005A

(Srihari, 2005) ⇒ Rohini K Srihari. (2005). “Evaluation Methodology for IE Tasks.” Tutorial on Evaluation of Information Extraction Systems presented at ICON-2005.
- Reviews how the task is evaluated

2005B

(Harabagiu et al., 2005)
- Used this dataset to discover twenty four (24) types of relations. E.g. At_located, At_Residence, Role_Staff, Role_Owner, Role_Client, … (I could not find mentions of these in the corpus page. Ask authors).

2004B

(Doddington et al., 2004) ⇒ George Doddington, A. Mitchell, M. Przybocki, L. Ramshaw, S. Strassel, and Ralph Weischedel. (2004). “The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation.” In: Proceedings of Conference on Language Resources and Evaluation (LREC 2004).
- Abstract: The objective of the ACE program is to develop technology to automatically infer from human language data the entities being mentioned, the relations among these entities that are directly expressed, and the events in which these entities participate. Data sources include audio and image data in addition to pure text, and Arabic and Chinese in addition to English. The effort involves defining the research tasks in detail, collecting and annotating data needed for training, development, and evaluation, and supporting the research with evaluation tools and research workshops. This program began with a pilot study in (1999). The next evaluation is scheduled for September 2004.

2004C

(Kambhatla, 2004) ⇒ Nanda Kambhatla. (2004). Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. Poster. In: Proceedings of ACL-2004.

2004D

(Culotta and Sorensen, 2004)

2003

(Maynard et al., 2003) ⇒ Diana Maynard, K. Bontcheva, and Hamish Cunningham. (2003). “Towards a Semantic Extraction of Named Entities.” In: Recent Advances in Natural Language Processing.
- QUOTE: The ACE program began in September 1999, administered by NSA, NIST, and the CIA. It was designed as \a program to develop technology to extract and characterise meaning from human language". Formal evaluations of ACE algorithm performance are held at approximately 6 month intervals, and are open to all sites who wish to participate, but the results of the evaluations are closed. For this reason we can only publish here details of internal evaluations rather than o cial scores. ACE includes both Entity Detection and Tracking (EDT) and Relation Detection and Characterisation (RDC). EDT is broadly comparable with the MUC Named Entity (NE) task, while RDC is broadly comparable with the MUC template elements task, although both ACE tasks are more challenging than their MUC forerunners.

Automatic Content Extraction Program

References

2017a

2017b

2007A

2007B

2007C

2006A

2006B

2006C

2005A

2005B

2005C

2005D

2004A

2004B

2004C

2004D

2003