2010 IdentifyingTheInfStructOfSciAbstracts

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Scientific Paper Abstract, Scientific Paper.

Notes

Cited By

Quotes

Abstract

  • Many practical tasks require accessing specific types of information in scientific literature; e.g. information about the objective, methods, results or conclusions of the study in question. Several schemes have been developed to characterize such information in full journal papers. Yet many tasks focus on abstracts instead. We take three schemes of different type and granularity (those based on section names, argumentative zones and conceptual structure of documents) and investigate their applicability to biomedical abstracts. We show that even for the finest-grained of these schemes, the majority of categories appear in abstracts and can be identified relatively reliably using machine learning. We discuss the impact of our results and the need for subsequent task-based evaluation of the schemes.

References

  • J. Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20:37–46.
  • J. R. Curran, S. Clark, and J. Bos. (2007). Linguistically motivated large-scale nlp with c&c and boxer. In: Proceedings of the ACL 2007 Demonstrations Session, pages 33–36.
  • K. Hirohata, N. Okazaki, S. Ananiadou, and M. Ishizuka. (2008). Identifying sections in scientific abstracts using conditional random fields. In: Proceedings of 3rd International Joint Conference on Natural Language Processing.
  • Anna Korhonen, L. Sun, I. Silins, and U. Stenius. (2009). The first step in the development of text mining technology for cancer risk assessment: Identifying and organizing scientific evidence in risk assessment literature. BMC Bioinformatics, 10:303.
  • J. Lafferty, A. McCallum, and F. Pereira. (2001). Conditionl random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML.
  • J. R. Landis and G. G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics, 33:159–174.
  • Maria Liakata and L.N. Soldatova. (2008). Guidelines for the annotation of general scientific concepts. Aberystwyth University, JISC Project Report http://ie-repository.jisc.ac.uk/88/.
  • Maria Liakata, Claire Q, and L.N. Soldatova. (2009). Semantic annotation of papers: Interface & enrichment tool (sapient). In: Proceedings of BioNLP-09, pages 193–200, Boulder, Colorado.
  • Maria Liakata, S. Teufel, A. Siddharthan, and C. Batchelor. (2010). Corpora for the conceptualisation and zoning of scientific papers. To appear in the 7th International Conference on Language Resources and Evaluation.
  • J. Lin, D. Karakos, D. Demner-Fushman, and S. Khudanpur. (2006). Generative content models for structural analysis of medical abstracts. In: Proceedings of BioNLP-06, pages 65–72, New York, USA.
  • J. Lin. (2009). Is searching full text more effective than searching abstracts? BMC Bioinformatics, 10:46.
  • S. Merity, T. Murphy, and J. R. Curran. (2009). Accurate argumentative zoning with maximum entropy models. In: Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, pages 19–26. Association for Computational Linguistics.
  • Y. Mizuta, Anna Korhonen, T. Mullen, and N. Collier. (2005). Zone analysis in biology articles as a basis for information extraction. International Journal of Medical Informatics on Natural Language Processing in Biomedicine and Its Applications.
  • T. Mullen, Y. Mizuta, and N. Collier. (2005). A baseline feature set for learning rhetorical zones using full articles in the biomedical domain. Natural language processing and text mining, 7:52–58.
  • P. Ruch, C. Boyer, C. Chichester, I. Tbahriti, A. Geissbuhler, P. Fabry, J. Gobeill, V. Pillet, D. Rebholz- Schuhmann, C. Lovis, and A. L. Veuthey. (2007). Using argumentation to extract key sentences from biomedical abstracts. Int J Med Inform, 76:195– 200.
  • H. Shatkay, F. Pan, A. Rzhetsky, and W. J. Wilbur. (2008). Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users. Bioinformatics, 18:2086–2093.
  • S. Siegel and N. J. Jr. Castellan. 1988. Nonparametric Statistics for the Behavioral Sciences. McGraw- Hill, Berkeley, CA, 2nd edition.
  • L. Sun and Anna Korhonen. (2009). Improving verb clustering with automatically acquired selectional preference. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing.
  • I. Tbahriti, C. Chichester, Frederique Lisacek, and P. Ruch. (2006). Using argumentation to retrieve articles with similar citations. Int J Med Inform, 75:488–495.
  • S. Teufel and M. Moens. (2002). Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics, 28:409–445.
  • S. Teufel, A. Siddharthan, and C. Batchelor. (2009). Towards domain-independent argumentative zoning: Evidence from chemistry and computational linguistics. In: Proceedings of EMNLP.
  • I. H. Witten, 2008. Data mining: Practical Machine Learning Tools and Techniques with Java Implementations. http://www.cs.waikato.ac.nz/ml/weka/.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2010 IdentifyingTheInfStructOfSciAbstractsAnna Korhonen
Maria Liakata
Yufan Guo
Ilona Silins
Lin Sun
Ulla Stenius
Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different SchemesProceedings of the BioNLP Workshop on Linking Natural Language Processing and Biologyhttp://www.aclweb.org/anthology/W/W10/W10-1913.pdf2010