Probabilistic Database System
Jump to navigation
Jump to search
A Probabilistic Database System is a Database System that can manage a Probabilistic Data Record Set.
References
2009
- (Dalvi et al., 2009) ⇒ Nilesh Dalvi, Christopher Ré, and Dan Suciu. (2009). “Probabilistic Databases: diamonds in the dirt.” In: Communications of the ACM, 52(7). doi:10.1145/1538788.1538810
- The de facto formal semantics of a probabilistic database is the possible worlds model.[Dalvi & Suciu, 2007] By contrast, there is no agreement on a representation system, instead there are several approaches covering a spectrum between expressive power and usability.[Benjelloun et al., 2008] A key concept in most representation systems is that of lineage, which is derived from early work on incomplete databases by Immelinski and Lipski.[Imielinski & Lipski, 1984]
- Possible Worlds Semantics. In its most general form, a probabilistic database is a probability space over the possible contents of the database. It is customary to denote a (conventional) relational database instance with the letter I. Assuming there is a single table in our database, I is simply a set of tuples (records) representing that table; this is a conventional database. A probabilistic database is a discrete probability space PDB = (W, P), where W = {I1,I2, ..., In} is a set of possible instances, called possible worlds, and P: W → [0, 1] is such that ∑j=1,nP(Ij) = 1. In the terminology of networks of belief, there is one random variable for each possible tuple whose values are 0 (meaning that the tuple is not present) or 1 (meaning that the tuple is present), and a probabilistic database is a joint probability distribution over the values of these random variables. *...
- Consider some tuple t (we use interchangeably the terms tuple and record in this article). The probability that the tuple belongs to a randomly chosen world is P(t) = ∑j:t∈Ij P(Ij), and is also called the marginal probability of the tuple t. Similarly, if we have two tuples t1, t2, we can examine the probability that both are present in a randomly chosen world, denoted P(t1t2). When the latter is P(t1)P(t2), we say that t1, t2 are independent tuples; if it is 0 then we say that t1, t2 are disjoint tuples or exclusive tuples. If none of these hold, then the tuples are correlated in a nonobvious way. Consider a query Q, expressed in some relational query language like SQL, and a possible tuple t in the query's answer. P(t ∈ Q) denotes the probability that, in a randomly chosen world, t is an answer to Q. The job of a probabilistic database system is to return all possible tuples t1, t2, ...together with their probabilities P(t1 ∈ Q), P(t2 ∈ Q),....
2007
- (Dalvi & Suciu, 2007) ⇒ Nilesh Dalvi, and Dan Suciu. (2007). “Efficient query evaluation on probabilistic databases.” In: VLDB Journal, 16(4). doi:10.1007/s00778-006-0004-3