Censored Dataset

From GM-RKB
(Redirected from censoring (statistics))
Jump to navigation Jump to search

A Censored Dataset is a missing data dataset with censored observations having partially known values.



References

2018

  • (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Censoring_(statistics) Retrieved:2018-3-5.
    • In statistics, engineering, economics, and medical research, censoring is a condition in which the value of a measurement or observation is only partially known.

      For example, suppose a study is conducted to measure the impact of a drug on mortality rate. In such a study, it may be known that an individual's age at death is at least 75 years (but may be more). Such a situation could occur if the individual withdrew from the study at age 75, or if the individual is currently alive at the age of 75.

      Censoring also occurs when a value occurs outside the range of a measuring instrument. For example, a bathroom scale might only measure up to . If a individual is weighed using the scale, the observer would only know that the individual's weight is at least .

      The problem of censored data, in which the observed value of some variable is partially known, is related to the problem of missing data, where the observed value of some variable is unknown.

      Censoring should not be confused with the related idea truncation. With censoring, observations result either in knowing the exact value that applies, or in knowing that the value lies within an interval. With truncation, observations never result in values outside a given range: values in the population outside the range are never seen or never recorded if they are seen. Note that in statistics, truncation is not the same as rounding.

2013

  • http://en.wikipedia.org/wiki/Censoring_%28statistics%29#Types
    • Left censoring – a data point is below a certain value but it is unknown by how much
    • Interval censoring – a data point is somewhere on an interval between two values
    • Right censoring – a data point is above a certain value but it is unknown by how much
    • Type I censoring occurs if an experiment has a set number of subjects or items and stops the experiment at a predetermined time, at which point any subjects remaining are right-censored.
    • Type II censoring occurs if an experiment has a set number of subjects or items and stops the experiment when a predetermined number are observed to have failed; the remaining subjects are then right-censored.
    • Random (or non-informative) censoring is when each subject has a censoring time that is statistically independent of their failure time. The observed value is the minimum of the censoring and failure times; subjects whose failure time is greater than their censoring time are right-censored.
  • Censoring should not be confused with the related idea truncation. With censoring, observations result either in knowing the exact value that applies, or in knowing that the value lies within an interval. With truncation, observations never result in values outside a given range – values in the population outside the range are never seen or never recorded if they are seen. Note that in statistics, truncation is not the same as rounding.

    The problem of censored data, in which the observed value of some variable is partially known, is related to the problem of missing data, where the observed value of some variable is unknown.

    Interval censoring can occur when observing a value requires follow-ups or inspections. Left and right censoring are special cases of interval censoring, with the beginning of the interval at zero or the end at infinity, respectively.

    Left-censored data, is observed, for example, in environmental analytical data where trace concentrations of chemicals may indeed be present in an environmental sample (e.g., groundwater, soil) but are "non-detectable," i.e., below the detection limit of the analytical instrument or laboratory method. Estimation methods for using left-censored data vary, and not all methods of estimation may be applicable to, or the most reliable, for all data sets.[1]

  1. Helsel, D. Much ado about next to Nothing: Incorporating Nondetects in Science, Ann. Occup. Hyg., Vol. 54, No. 3, pp. 257-262, 2010

2012

2011

  • (Singh & Mukhopadhyay, 2011) ⇒ Ritesh Singh, and Keshab Mukhopadhyay. (2011). “Survival Analysis in Clinical Trials: Basics and Must Know Areas.” Perspectives in clinical research, 2(4) doi:10.4103%2F2229-3485.86872.
    • QUOTE: … Most survival analyses consider a key analytical problem called censoring. It occurs when we have some information about individual survival time, but we do not know the survival time exactly. Three reasons of censoring are: When a person does not experience the event before the study ends, when a person is lost to follow-up during the study period, and when a person withdraws from the study because of death (if death is not the event of the interest) or some other reason like adverse drug reaction. Censoring is of two types, right and left.[6] We generally encounter right-censored data. Left-censored data can occur when a person's survival time becomes incomplete on the left side of the follow-up period for the person. As an example, we may follow up a patient for any infectious disorder from the time of his or her being tested positive for the infection. We may never know the exact time of exposure to the infectious agent. …

2010

2008

  • (Upton & Cook, 2008) ⇒ Graham Upton, and Ian Cook. (2008). “A Dictionary of Statistics, 2nd edition revised." Oxford University Press. ISBN:0199541450
    • QUOTE: Censored Data: Data items in which the true value is replaced by some other value. For example, suppose a set of components are being monitored to see how long they last before breaking. If the monitoring stops before all the components have broken, then the information concerning the lifetimes of the broken components has been right-censored. The score of a cricketer who is not out is an example of censored data, since it is not known what score would have been achieved if the cricketer's innings had been allowed to continue. In both cases the value used is the largest value so far achieved for that data item. To avoid bias, subsequent calculations should take account of the censoring.

1976