Noisy Dataset

From GM-RKB
(Redirected from Noisy data)
Jump to navigation Jump to search

A Noisy Dataset is a dataset whose data records contain measurement error (or measurement uncertainty).



References

2017

In addition to errors, training examples may have missing attribute values. That is, the values of some attribute values are not recorded.

 Noisy data can cause learning algorithms to fail to converge to a concept description or to build a concept description that has poor classification accuracy on unseen examples. This is often due to overfitting

2009

  • (Hu et al., 1999) ⇒ Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou. (2009). “Exploiting Wikipedia as External Knowledge for Document Clustering.” In: Proceedings of ACM SIGKDD Conference (KDD-2009). doi:10.1145/1557019.1557066
    • … There are two major issues for this approach: (1) the coverage of the ontology is limited, even for WordNet or Mesh, (2) using ontology terms as replacement or additional features may cause information loss, or introduce noise.

2008