Data Binning Task

From GM-RKB
Jump to navigation Jump to search

A Data Binning Task is a data preprocessing task that converts a continuous data set into multiple data bins.



References

2018a

  • (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Data_binning Retrieved:2018-5-20.
    • Data binning or bucketing is a data pre-processing technique used to reduce the effects of minor observation errors. The original data values which fall in a given small interval, a bin, are replaced by a value representative of that interval, often the central value. It is a form of quantization.

      Statistical data binning is a way to group a number of more or less continuous values into a smaller number of "bins". For example, if you have data about a group of people, you might want to arrange their ages into a smaller number of age intervals. [1] It can also be used in multivariate statistics, binning in several dimensions at once.

2018b

2018c

  • (NMRProcFlow, 2018) ⇒ Bucketing. In: NMRProcFlow Quick Tutorial. Retrieved: 2018-05-20
    • QUOTE: An NMR spectrum may contain several thousands of points, and therefore of variables. In order to reduce the data dimensionality binning is commonly used. In binning the spectra are divided into bins (so-called buckets) and the total area within each bin is calculated to represent the original spectrum. The more simple approach consists to divide all the spectra with uniform areas width (typically 0.04 ppm). Due to the arbitrary division of peaks, one bin may contain pieces from two or more peaks which may affect the data analysis. We have chosen to implement the Adaptive, Intelligent Binning method (De Meyer et al. 2008[1]) that attempt to split the spectra so that each area common to all spectra contains the same resonance, i.e. belonging to the same metabolite. In such methods, the width of each area is then determined by the maximum difference of chemical shift among all spectra.

  1. De Meyer, T., Sinnaeve, D., Van Gasse, B., Tsiporkova, E., Rietzschel, E. R., De Buyzere, M. L., … & Van Criekinge, W. (2008). "NMR-based characterization of metabolic alterations in hypertension using an adaptive, intelligent binning algorithm" (PDF). Analytical chemistry, 80(10), 3783-3790.