2020 TheTextClassificationofTheftCri
- (Qi, 2020) ⇒ Zhang Qi. (2020). “The Text Classification of Theft Crime based on TF-IDF and XGBoost Model.” In: 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA-2020). DOI:10.1109/ICAICA50127.2020.9182555.
Subject Headings: Theft Crime Prediction, TF-IDF Feature Generation, Supervised Text Classification Algorithm.
Notes
- It utilizes 2622 preprocessed theft crime cases from a city spanning 2009-2019, aiming to enhance crime prediction accuracy using text classification.
- It employs the TF-IDF (Term Frequency-Inverse Document Frequency) model for feature extraction, determining the relevance of words in the crime data documents.
- It incorporates the Jieba tool for word segmentation in preprocessing, crucial for accurately extracting meaningful features from Chinese language text data.
- It evaluates and compares multiple machine learning algorithms: XGBoost, KNN (K-Nearest Neighbors), Naive Bayes, SVM (Support Vector Machine), and GBDT (Gradient Boosting Decision Tree).
- It demonstrates that the XGBoost algorithm provides superior performance in terms of precision, recall, and F1-score, improving the classification accuracy of theft crime data.
- It emphasizes the importance of data quality in machine learning, suggesting that better-prepared data significantly enhances the effectiveness of crime prediction models.
- It suggests future research directions, advocating for the use of classified data in spatial and temporal analysis to advance the prediction and prevention of theft crimes.
Cited By
Quotes
Abstract
Classifying theft crime data of a city from 2009 to 2019 based on text classification technology. Firstly, manually classifying and defining theft crimes based on legal view and criminal practice view, then selecting 2621 data at random from the whole data. Extracting features from pre-processed sample data by TF-IDF model, then training and testing text classification model by XGBoost algorithm, and comparing the test results of KNN algorithm, Naïve Bayes algorithm, SVM algorithm and GBDT algorithm. The results show that the XGBoost algorithm are better than KNN, Naïve Bayes, SVM and GBDT. Adjusting slightly various categories to improve the accuracy of classification, and the accuracy of each algorithm is improved by 2-5 percentage points and the accuracy of XGBoost is highest. So, the results show that, 1. XGBoost algorithm is best to use as classifying the whole data. 2. The influence of data quality on classification accuracy is obvious and can improve the accuracy of algorithms rapidly. The classified theft crime data of 2009-2019 through XGBoost algorithm can be used as based data for the prediction of various types of crimes.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2020 TheTextClassificationofTheftCri | Zhang Qi | The Text Classification of Theft Crime based on TF-IDF and XGBoost Model | 2020 |