Co-Occurrence Matrix
A Co-Occurrence Matrix is a symmetric square matrix that represents a co-occurrence statistics (of co-occurrence relations) for some co-occurrence data.
- AKA: Coöccurrence Distribution.
- Context:
- It can range from being a Symmetrical Co-Occurrence Matrix to being an Assymetrical Co-Occurrence Matrix.
- It can (typically) be a Domain-Specific Co-Occurrence Matrix, such as a word-word co-occurrence matrix, a Citation Co-Occurrence Matrix, a Pixel-Pixel Co-Occurrence Matrix, ...
- Example(s):
- a Co-Occurrence Count Matrix, such as: [math]\displaystyle{ \begin{bmatrix} 21 & 1 & 0 \\1 & 57 & 3 \\0 & 3 & 18 \end{bmatrix}. }[/math]
- a Pointwise Mutual Information (PMI) Matrix.
- a Domain-Specific Co-Occurrence Matrix, such as a Lexical Cooccurrence Matrix, a Citation Co-Occurrence Matrix, a Pixel-Pixel Co-occurrence Matrix.
- …
- Counter-Example(s):
- an Adjacency Matrix.
- a Covariance Matrix.
- an Identity Matrix.
- See: Rotational Invariance, Distribution (Mathematics), Digital Image, Binary Numeral System, Color Mapping, Pearson Correlation Matrix.
References
2015
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/co-occurrence_matrix Retrieved:2015-2-8.
- A co-occurrence matrix or co-occurrence distribution (less often coöccurrence matrix or coöccurrence distribution) is a matrix or distribution that is defined over an image to be the distribution of co-occurring values at a given offset. Mathematically, a co-occurrence matrix C is defined over an n × m image I, parameterized by an offset (Δx,Δy), as:
[math]\displaystyle{ C_{\Delta x, \Delta y}(i,j)=\sum_{p=1}^n\sum_{q=1}^m\begin{cases} 1, & \mbox{if }I(p,q)=i\mbox{ and }I(p \Delta x,q \Delta y)=j \\ 0, & \mbox{otherwise}\end{cases} }[/math]
where i and j are the image intensity values of the image, p and q are the spatial positions in the image I and the offset (Δx,Δy) depends on the direction used [math]\displaystyle{ \theta }[/math] and the distance at which the matrix is computed d. The 'value' of the image originally referred to the grayscale value of the specified pixel, but could be anything, from a binary on/off value to 32-bit color and beyond. Note that 32-bit color will yield a [math]\displaystyle{ 2^{32} \times 2^{32} }[/math] co-occurrence matrix!
Really any matrix or pair of matrices can be used to generate a co-occurrence matrix, though their main applicability has been in the measuring of texture in images, so the typical definition, as above, assumes that the matrix is in fact an image.
It is also possible to define the matrix across two different images. Such a matrix can then be used for color mapping.
Note that the (Δx,Δy) parameterization makes the co-occurrence matrix sensitive to rotation. We choose one offset vector, so a rotation of the image not equal to 180 degrees will result in a different co-occurrence distribution for the same (rotated) image. This is rarely desirable in the applications co-occurrence matrices are used in, so the co-occurrence matrix is often formed using a set of offsets sweeping through 180 degrees (i.e. 0, 45, 90, and 135 degrees) at the same distance to achieve a degree of rotational invariance.
- A co-occurrence matrix or co-occurrence distribution (less often coöccurrence matrix or coöccurrence distribution) is a matrix or distribution that is defined over an image to be the distribution of co-occurring values at a given offset. Mathematically, a co-occurrence matrix C is defined over an n × m image I, parameterized by an offset (Δx,Δy), as:
2010
- (Leydesdorff & Vaughn, 2010) ⇒ Loet Leydesdorff, and Liwen Vaughan. (2010). “Co‐Cccurrence Matrices and their Applications in Information Science: extending ACA to the web environment.” In: Journal of the American Society for Information Science and Technology 57, no. 12 (2006): 1616-1628.
- QUOTE: Co-occurrence matrices, such as co-citation, co-word, and co-link matrices, provide us with useful data for mapping and understanding the structures in the underlying document sets. Various types of analysis have been carried out on this data and a significant body of literature has been built up, making it an important area of information science (e.g., White & McCain, 1998). However, confusion persists about the nature of these matrices and the kinds of analysis that are appropriate. For example, the debate between Ahlgren, Jarneving, & Rousseau (2003, 2004a and b), White (2003, 2004), and Bensman (2004) on the use of the Pearson correlation coefficient or the cosine in the case of author cocitation analysis (ACA) shows some of these problems. In our opinion, co-occurrence matrices like the ones used in ACA are proximity data which do not require conversion before mapping. We shall argue that it is advisable to use, if possible, the asymmetrical matrices of documents versus attributes from which the co-occurrence matrices can be derived for mapping purposes.
… In summary: using the Pearson correlation on a symmetrical co-occurrence matrix distorts the information contained in the co-occurrence data.
- QUOTE: Co-occurrence matrices, such as co-citation, co-word, and co-link matrices, provide us with useful data for mapping and understanding the structures in the underlying document sets. Various types of analysis have been carried out on this data and a significant body of literature has been built up, making it an important area of information science (e.g., White & McCain, 1998). However, confusion persists about the nature of these matrices and the kinds of analysis that are appropriate. For example, the debate between Ahlgren, Jarneving, & Rousseau (2003, 2004a and b), White (2003, 2004), and Bensman (2004) on the use of the Pearson correlation coefficient or the cosine in the case of author cocitation analysis (ACA) shows some of these problems. In our opinion, co-occurrence matrices like the ones used in ACA are proximity data which do not require conversion before mapping. We shall argue that it is advisable to use, if possible, the asymmetrical matrices of documents versus attributes from which the co-occurrence matrices can be derived for mapping purposes.