Subgraph Isomorphism Search Task
A Subgraph Isomorphism Search Task is a Search Task for determining whether the first of two undirected graphs contains a subgraph that is isomorphic to the second one.
- AKA: Subgraph Isomorphism Problem, Subgraph Matching Task.
- Context:
- Task Requirement: 2 undirected graphs $(G,H)$.
- Task Output: All subgraphs of $G$ that are isomorphic to $H$.
- It can be solved by a Subgraph Isomorphism System that implements a Subgraph Isomorphism Algorithm.
- It is a generalization of both Graph Isomorphism and Maximum Clique Problems.
- It can range from being a Subgraph Isomorphism Backtracking-based Search Task to being a Subgraph Isomorphism Graph-Index Based Search Task.
- Example(s):
- a G-Index Subgraph Matching Task (Yan et al., 2004),
- a GraphQL Subgraph Matching Task (He & Singh, 2008),
- a Local All Different (LAD) Filtering Task (Solnon, 2010),
- a Neural Subgraph Isomorphism Counting Task (Liu et al., 2019),
- a RI Subgraph Isomorphism Search Task (Bonnici et al., 2013),
- a Triangle Subgraph Isomorphism Search Task (Williams, 2016),
- a TurboISO Subgraph Isomorphism Search Task (Han et al., 2013)
- an Ullmann's Subgraph Isomorphism Task (Ullman 2011, 1976),
- a VF Subgraph Matching Task (Carletti et al., 2018, Cordella et al. 2004, 1999).
- …
- Counter-Example(s):
- See: Maximum Common Induced Subgraph, Maximum Common Edge Subgraph, Constraint Satisfaction Problem, 3-SAT, Theoretical Computer Science, Undirected Graph, Graph Theory , Subgraph, Graph Isomorphism, Clique Problem, Hamiltonian Cycle, NP-Complete, Cook–Levin Theorem, Search Space Tree.
References
2020
- (Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/Subgraph_isomorphism_problem Retrieved:2020-3-7.
- In theoretical computer science, the subgraph isomorphism problem is a computational task in which two graphs G and H are given as input, and one must determine whether G contains a subgraph that is isomorphic to H.
Subgraph isomorphism is a generalization of both the maximum clique problem and the problem of testing whether a graph contains a Hamiltonian cycle, and is therefore NP-complete. [1] However certain other cases of subgraph isomorphism may be solved in polynomial time[2].
Sometimes the name subgraph matching is also used for the same problem. This name puts emphasis on finding such a subgraph as opposed to the bare decision problem.
- In theoretical computer science, the subgraph isomorphism problem is a computational task in which two graphs G and H are given as input, and one must determine whether G contains a subgraph that is isomorphic to H.
- ↑ The original paper that proves the Cook–Levin theorem already showed subgraph isomorphism to be NP-complete, using a reduction from 3-SAT involving cliques.
- ↑ (1999); Nedetril & Ossona de Mendez (2012)
2019
- (Liu et al., 2019) ⇒ Xin Liu, Haojie Pan, Mutian He, Yangqiu Song, and Xin Jiang (2019). "Neural Subgraph Isomorphism Counting". ArXiv:1912.11589.
- QUOTE: Subgraph Isomorphism Problems. Given a pattern graph and a data graph, the subgraph isomorphism search aims to find all occurrences of the pattern in the data graph with bijection mapping functions. Subgraph isomorphism is an NP-complete problem among different types of graph matching problems (monomorphism, isomorphism, and subgraph isomorphism). Most subgraph isomorphism algorithms are based on backtracking. They first obtain a series of candidate vertices and update a mapping table, then recursively revoke their own subgraph searching functions to match one vertex or one edge at a time. Ullmann’s algorithm (Ullmann, 1976), VF2 (Cordella et al., 2004), and GraphQL (He & Singh, 2008) belong to this type of algorithms. However, it is still hard to perform search when either the pattern or the data graph grows since the search space grows exponentially as well. Some other algorithms are designed based on graph-index, such as Index (Yan et al., 2004), which can be used as filters to prune out many unnecessary graphs. However, graph-index based algorithms have a problem that the time and space in indexing also increase exponentially with the growth of the graphs (Sun et al., 2012). TurboISO (Han et al., 2013) and VF3 (Carletti et al., 2018) add some weak rules to find candidate subregions and then call the recursive match procedure on subregions. These weak rules can significantly reduce the searching space in most cases.
2018
- (Carletti et al., 2018) ⇒ Vincenzo Carletti, Pasquale Foggia, Alessia Saggese, and Mario Vento (2018). “Challenging The Time Complexity Of Exact Subgraph Isomorphism For Huge And Dense Graphs With VF3". In: IEEE Transactions on Pattern Analysis and Machine Intelligence 40(4):804–818, 2018. DOI: 10.1109/TPAMI.2017.2696940.
- QUOTE: In this paper we introduce a novel subgraph isomorphism algorithm, VF3, particularly efficient in the challenging case of graphs with thousands of nodes and a high edge density. Its performance, both in terms of time and memory, has been assessed on a large dataset of 12, 700 random graphs with a size up to 10,000 nodes, made publicly available. VF3 has been compared with four other state-of-the-art algorithms, and the huge experimentation required more than two years of processing time. The results confirm that VF3 definitely outperforms the other algorithms when the graphs become huge and dense, but also has a very good performance on smaller or sparser graphs.
2016
- (Williams, 2016) ⇒ Virginia Williams (2016). "Algorithms for Fixed Subgraph Isomorphism". In: CS267 Lecture 1, MIT Computer Science & Artificial Intelligence Lab.
- QUOTE: Reducing Subgraph Isomorphism to triangle detection. The triangle subgraph isomorphism problem seems to be the hardest out of all of the subgraph isomorphism problems whenever $k = 3$. There is another sense in which triangle finding is “hard”. Nesetril and Poljak considered the subgraph isomorphism problem for arbitrary constant size subgraphs $H$ and showed that if one has a really good algorithm for finding triangles, one can also obtain good (though slightly slower) algorithms for finding a copy of $H$ in a given graph.
2013a
- (Bonnici et al., 2013) ⇒ Vincenzo Bonnici, Rosalba Giugno, Alfredo Pulvirenti, Dennis Shasha, and Alfredo Ferro (2013). "A Subgraph Isomorphism Algorithm And Its Application To Biochemical Data". In: BMC Bioinformatics 14(7). DOI:10.1186/1471-2105-14-S7-S13
- QUOTE: In this paper we present a novel subgraph isomorphism algorithm, called RI (http://ferrolab.dmi.unict.it/ri.html). It creates a search strategy based only on the pattern graph topology. The order is chosen to create constraints as early as possible in the matching phase. Roughly, vertices having high valence and that are highly connected with vertices previously present in the ordering tend to come early in the final variable-ordering. During the matching phase, RI does not apply any computationally costly pruning or inference rules. This is the first paper that compares all the most recent and used algorithms (LAD, FocusSearch, VFlib). We analyze algorithmic aspects including the size of search space, the memory requirement, the timeout of the algorithms, the matching time and the total time, varying the density and dimension of pattern and target graphs, the number and the distribution of the labels. Dataset characteristics are typical of molecular biological data. We also used the synthetic data analyzed in the previous work by authors of LAD and VFLib. In order to validate our strategy, we compare RI and two versions of RI, called RI-Ds and RI-DsPm. RI-Ds computes, after defining the variable order of pattern vertices and before the subgraph isomorphism starts, an initial domain assignment. For each pattern vertex, RI-Ds computes its domain and verifies that pattern edges are compatible in the target domains. It does not apply inference or domain reduction during backtracking. This low-priced verification helps in large dense targets, because it reduces the number of candidates to be verified during backtracking. RI-DsPm, in addition to RI-Ds, uses the prematch phase defined in FocusSearch, i.e. filters domains by using vertex invariants based on neighbor labels and topology.
2013b
- (Han et al., 2013) ⇒ Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee (2013, June). “TurboISO: Towards Ultrafast And Robust Subgraph Isomorphism Search In Large Graph Databases". In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. DOI: 10.1145/2463676.2465300.
- QUOTE: In this paper, we present an efficient and robust subgraph search solution, called TurboISO, which is turbo-charged with two novel concepts, candidate region exploration and the combine and permute strategy (in short, Comb/Perm). The candidate region exploration identifies on-the-fly candidate subgraphs (i.e, candidate regions), which contain embeddings, and computes a robust matching order for each candidate region explored. The Comb/Perm strategy exploits the novel concept of the neighborhood equivalence class (NEC). Each query vertex in the same NEC has identically matching data vertices. During subgraph isomorphism search, Comb/Perm generates only combinations for each NEC instead of permutating all possible enumerations. Thus, if a chosen combination is determined to not contribute to a complete solution, all possible permutations for that combination will be safely pruned. Extensive experiments with many real datasets show that TurboISO consistently and significantly outperforms all competitors by up to several orders of magnitude.
2012
- (Sun et al., 2012) ⇒ Zhao Sun, Hongzhi Wang, Haixun Wang, Bin Shao, and Jianzhong Li (2012). "Efficient Subgraph Matching On Billion Node Graphs". In: Proceedings of the VLDB Endowment (PVLDB), 5(9):788–799, 2012.
- QUOTE: The subgraph matching problem is defined as follows: For a data graph $G$ and a query graph $Q$, retrieve all subgraphs of $G$ that are isomorphic to $Q$. Subgraph matching is one of the most fundamental operators in many applications that handle graphs, including protein-protein interaction networks (...), knowledge bases (....), and program analysis (...)
2011
- (Ullmann, 2011) ⇒ Julian R. Ullmann (2011). "Bit-vector Algorithms for Binary Constraint Satisfaction and Subgraph Isomorphism". In: Journal of Experimental Algorithmics (JEA), 15, 1-1. DOI:10.1145/1671970.1921702.
- QUOTE: This article revises and updates bit-vector algorithms published in the 1970's, and introduces focus search, which is a new bit-vector algorithm relying more on search and less on domain-editing than previous algorithms. Focus search is competitive within a limited family of constraint satisfaction problems.
Determination of subgraph isomorphism is a specialized binary constraint satisfaction problem for which bit-vector algorithms have been widely used since the 1980s, particularly for matching molecular structures. This article very substantially updates the author's 1976 subgraph isomorphism algorithm, and reports experimental results with random and real-life data.
- QUOTE: This article revises and updates bit-vector algorithms published in the 1970's, and introduces focus search, which is a new bit-vector algorithm relying more on search and less on domain-editing than previous algorithms. Focus search is competitive within a limited family of constraint satisfaction problems.
2010
- (Solnon, 2010) ⇒ Christine Solnon (2010). "Alldifferent-based Filtering For Subgraph Isomorphism". In: Artificial Intelligence, 174(12-13. DOI:10.1016/j.artint.2010.05.002.
- QUOTE: The subgraph isomorphism problem involves deciding if there exists a copy of a pattern graph in a target graph. This problem may be solved by a complete tree search combined with filtering techniques that aim at pruning branches that do not contain solutions. We introduce a new filtering algorithm based on local all different constraints. We show that this filtering is stronger than other existing filterings — i.e., it prunes more branches — and that it is also more efficient — i.e., it allows one to solve more instances quicker.
2008
- (He & Singh, 2008) ⇒ Huahai He, and Ambuj K. Singh (2008). "Graphs-At-A-Time: Query Language And Access Methods For Graph Databases". In: Proceedings of the 2008 ACM SIGMOD International Conference On Management Of Data (SIGMOD '08). DOI:10.1145/1376616.1376660
- QUOTE: Pattern matching over large graphs is challenging due to the NP-completeness of subgraph isomorphism. We address this by a combination of techniques: use of neighborhood subgraphs and profiles, joint reduction of the search space, and optimization of the search order. Experimental results on real and synthetic large graphs demonstrate that our graph specific optimizations outperform an SQL-based implementation by orders of magnitude (...)
In this paper, we propose GraphQL, a graph query language that uses a graph pattern as the basic operational unit. A graph pattern consists of a graph structure and a predicate on attributes of the graph. To allow flexible manipulation on graph structures (useful for definition of graph queries as well as database graphs), we introduce the notion of formal languages for graphs. The core of GraphQL is a graph algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs.
- QUOTE: Pattern matching over large graphs is challenging due to the NP-completeness of subgraph isomorphism. We address this by a combination of techniques: use of neighborhood subgraphs and profiles, joint reduction of the search space, and optimization of the search order. Experimental results on real and synthetic large graphs demonstrate that our graph specific optimizations outperform an SQL-based implementation by orders of magnitude (...)
2004a
- (Cordella et al., 2004) ⇒ Luigi P. Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento (2004). "A (Sub)Graph Isomorphism Algorithm for Matching Large Graphs". In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 26 (10). DOI:1367-1372. 10.1109/TPAMI.2004.75.
- QUOTE: In this paper, we propose a deterministic matching method for verifying both isomorphism and subgraph isomorphism. The algorithm has general validity since no constraints are imposed on graph topology. A state space representation (SSR) of the matching process is used and a set of five feasibility rules for pruning the search tree are introduced. The adopted representation allows one to simultaneously carry out the syntactic and semantic comparison of the pairs of nodes to be matched. With respect to a preliminary version of the algorithm, described in Cordella et al. (1999). and referred to as the VF algorithm, the main improvement introduced is that the data structures employed during the exploration of the search space are organized in such a way to significantly reduce memory requirements. Thus, the algorithm is suitable for matching graphs with thousands of nodes and branches.
2004b
- (Yan et al., 2004) ⇒ Xifeng Yan, Philip S. Yu, and Jiawei Han (2004|). "Graph Indexing: A Frequent Structure-Based Approach". In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. DOI: 10.1145/1007568.1007607
- QUOTE: Different from the existing path-based methods, our approach, called Index, makes use of frequent substructure as the basic indexing feature.
1999
- (Codella et al., 1999) ⇒ Luigi P. Cordella, Pasquale Foggia, Carlo Sansone, and Mario Vento (1999). “Performance evaluation of the VF graph matching algorithm". In: Proceedings of the 10th International Conference on Image Analysis and Processing. DOI: 10.1109/ICIAP.1999.797762.
- QUOTE: The paper discusses the performance of a graph matching algorithm tailored for dealing with large graphs in computer vision without using information about the topology of the graphs to be matched. The algorithm, presented in more detail in other papers (and publicly available on the WWW as VF), is now discussed with reference to its computational complexity and memory requirement.
1976
- (Ullmann, 1976) ⇒ Julian R. Ullmann. (1976). “An Algorithm for Subgraph Isomorphism.” In: Journal of the ACM (JACM). DOI:10.1145/321921.321925
- QUOTE: Subgraph isomorphism can be determined by means of a brute-force tree-search enumeration procedure. In this paper a new algorithm is introduced that attains efficiency by inferentially eliminating successor nodes in the tree search. To assess the time actually taken by the new algorithm, subgraph isomorphism, clique detection, graph isomorphism, and directed graph isomorphism experiments have been carried out with random and with various nonrandom graphs.