Sparse Matrix

From GM-RKB
Jump to navigation Jump to search

A Sparse Matrix is a matrix that is populated mostly with a single value (typically zero).



References

2014

  • (Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/Sparse_matrix Retrieved:2014-11-18.
    • In numerical analysis, a sparse matrix is a matrix in which most of the elements are zero. By contrast, if most of the elements are nonzero, then the matrix is considered dense. The fraction of zero elements (non-zero elements) in a matrix is called the sparsity (density).

      Conceptually, sparsity corresponds to systems which are loosely coupled. Consider a line of balls connected by springs from one to the next: this is a sparse system as only adjacent balls are coupled. By contrast, if the same line of balls had springs connecting each ball to all other balls, the system would correspond to a dense matrix. The concept of sparsity is useful in combinatorics and application areas such as network theory, which have a low density of significant data or connections.

      Large sparse matrices often appear in scientific or engineering applications when solving partial differential equations.

      When storing and manipulating sparse matrices on a computer, it is beneficial and often necessary to use specialized algorithms and data structures that take advantage of the sparse structure of the matrix. Operations using standard dense-matrix structures and algorithms are slow and inefficient when applied to large sparse matrices as processing and memory are wasted on the zeroes. Sparse data is by nature more easily compressed and thus require significantly less storage. Some very large sparse matrices are infeasible to manipulate using standard dense-matrix algorithms.

2001

  • (Hand et al., 2001) ⇒ David J. Hand, Heikki Mannila, Padhraic Smyth. (2001). “Principles of Data Mining.” In: MIT Press. ISBN:026208290X
    • QUOTE: A collection of text documents can also be viewed as a matrix, in which the rows represent documents and the columns represent words. The entry (d;w), corresponding to document d and word w, can be the number of times [math]\displaystyle{ w }[/math] occurs in d, or simply 1 if [math]\displaystyle{ w }[/math] occurs in d and 0 otherwise. With this approach we lose the ordering of the words in the document (and, thus, much of the semantic content), but still retain a reasonably good representation of the document’s contents. For a document collection, the number of rows is the number of documents, and the number of columns is the number of distinct words. Thus, large multilingual document collections may have millions of rows and hundreds of thousands of columns. Note that such a data matrix will be very sparse; that is, most of the entries will be zeroes.