2009 AddressStandardizationwithLaten

(Guo et al., 2009) ⇒ Honglei Guo, Huijia Zhu, Zhili Guo, XiaoXun Zhang, and Zhong Su. (2009). “Address Standardization with Latent Semantic Association.” In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2009). doi:10.1145/1557019.1557144

Subject Headings: Mail Address Canonicalization.

Notes

Cited By

Quotes

Author Keywords

Abstract

Address standardization is a very challenging task in data cleansing. To provide better customer relationship management and business intelligence for customer-oriented cooperates, millions of free-text addresses need to be converted to a standard format for data integration, de-duplication and householding. Existing commercial tools usually employ lots of hand-craft, domain-specific rules and reference data dictionary of cities, states etc. These rules work better for the region they are designed. However, rule-based methods usually require more human efforts to rewrite these rules for each new domain since address data are very irregular and varied with countries and regions. Supervised learning methods usually are more adaptable than rule-based approaches. However, supervised methods need large-scale labeled training data. It is a labor-intensive and time-consuming task to build a large-scale annotated corpus for each target domain. For minimizing human efforts and the size of labeled training data set, we present a free-text address standardization method with latent semantic association (LaSA). LaSA model is constructed to capture latent semantic association among words from the unlabeled corpus. The original term space of the target domain is projected to a concept space using LaSA model at first, then the address standardization model is active learned from LaSA features and informative samples. The proposed method effectively captures the data distribution of the domain. Experimental results on large-scale English and Chinese Language corpus show that the proposed method significantly enhances the performance of standardization with less efforts and training data.

References

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2009 AddressStandardizationwithLaten	Honglei Guo Huijia Zhu Zhili Guo Zhong Su XiaoXun Zhang			Address Standardization with Latent Semantic Association		KDD-2009 Proceedings		10.1145/1557019.1557144		2009

Retrieved from "http://www.gabormelli.com/RKB/index.php?title=2009_AddressStandardizationwithLaten&oldid=882085"