2002 ExtensiveFeatureDetOfNtermProSortSignals

From GM-RKB
Jump to navigation Jump to search

Subject Headings: Scientific Knowledge Discovery, Protein Sorting Process.

Notes

Quotes

  • Motivation: The prediction of localization sites of various proteins is an important and challenging problem in the field of molecular biology. TargetP, by Emanuelsson et al. (J. Mol. Biol., 300, 1005–1016, 2000) is a neural network based system which is currently the best predictor in the literature for N-terminalsorting signals. One drawback of neural networks, however, is that it is generally difficult to understand and interpret how and why they make such predictions. In this paper, we aim to generate simple and interpretable rules as predictors, and still achieve a practical prediction accuracy. We adopt an approach which consists of an extensive search for simple rules and various attributes which is partially guided by human intuition.
  • Results: We have succeeded in finding rules whose prediction accuracies come close to that of TargetP, while still retaining a very simple and interpretable form. We also discuss and interpret the discovered rules.
  • Availability: An (experimental) web service using rules obtained by our method is provided at http://hypothesiscreator.net

Overview of Our Approach

  • Several very important aspects in the process of scientific knowledge discovery are: 1) the generation or discovery of good attributes, and ways of looking at the data, which is then used to explain the data, 2) the incorporation of and reflection on existing knowledge, and 3) the trial and error interaction between the expert and the problem.

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2002 ExtensiveFeatureDetOfNtermProSortSignalsHideo Bannai
Yoshinori Tamada
Osamu Maruyama
Kenta Nakai
Satoru Miyano
Extensive feature detection of N-terminal protein sorting signalshttp://hc.ims.u-tokyo.ac.jp/caml-ipsort/iPSORT.pdf