Machine Learning (ML) Feature Space Design Task
A Machine Learning (ML) Feature Space Design Task is a data structure design task that designs ML features for a ML model design task.
- AKA: Feature Engineering.
- Context:
- It can (often) be followed an ML Feature Development Task.
- It can (often) be an Iterative Software Design Task.
- It can range from being a Manual Feature Engineering Task to being an Automated Feature Engineering Task.
- It can include an ML Feature Contribution Evaluation Task.
- …
- Example(s):
- designing features during the investigative phase of an predictive ML task.
- …
- Counter-Example(s):
- See: Data Model Designing, Data Preparation, Feature Store, Statistical Model Selection.
References
2017
- "Using Machine Learning to Predict Value of Homes On Airbnb." 2017-07-17
- QUOTE: … One of the first steps of any supervised machine learning project is to define relevant features that are correlated with the chosen outcome variable, a process called feature engineering. For example, in predicting LTV, one might compute the percentage of the next 180 calendar dates that a listing is available or a listing’s price relative to comparable listings in the same market. ...
2014
- (Razavian et al., 2014) ⇒ Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. (2014). “CNN Features Off-the-Shelf: An Astounding Baseline for Recognition.” In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops. ISBN:978-1-4799-4308-1 doi:10.1109/CVPRW.2014.131
- QUOTE: Recent results indicate that the generic descriptors extracted from the convolutional neural networks are very powerful. This paper adds to the mounting evidence that this is indeed the case. We report on a series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13.
2012
- (Domingos, 2012) ⇒ Pedro Domingos. (2012). “A Few Useful Things to Know About Machine Learning.” In: Communications of the ACM Journal, 55(10). doi:10.1145/2347736.2347755
- QUOTE: … Easily the most important factor is the features used. Learning is easy if you have many independent features that each correlate well with the class. On the other hand, if the class is a very complex function of the features, you may not be able to learn it. Often, the raw data is not in a form that is amenable to learning, but you can construct features from it that are. This is typically where most of the effort in a machine learning project goes. It is often also one of the most interesting parts, where intuition, creativity and "black art" are as important as the technical stuff.
First-timers are often surprised by how little time in a machine learning project is spent actually doing machine learning. But it makes sense if you consider how time-consuming it is to gather data, integrate it, clean it and preprocess it, and how much trial and error can go into feature design. Also, machine learning is not a one-shot process of building a dataset and running a learner, but rather an iterative process of running the learner, analyzing the results, modifying the data and/or the learner, and repeating. Learning is often the quickest part of this, but that is because we have already mastered it pretty well! Feature engineering is more difficult because it is domain-specific, while learners can be largely general purpose. However, there is no sharp frontier between the two, and this is another reason the most useful learners are those that facilitate incorporating knowledge.
Of course, one of the holy grails of machine learning is to automate more and more of the feature engineering process. One way this is often done today is by automatically generating large numbers of candidate features and selecting the best by (say) their information gain with respect to the class. But bear in mind that features that look irrelevant in isolation may be relevant in combination. For example, if the class is an XOR of k input features, each of them by itself carries no information about the class. (If you want to annoy machine learners, bring up XOR.) On the other hand, running a learner with a very large number of features to find out which ones are useful in combination may be too time-consuming, or cause overfitting. So there is ultimately no replacement for the smarts you put into feature engineering.
- QUOTE: … Easily the most important factor is the features used. Learning is easy if you have many independent features that each correlate well with the class. On the other hand, if the class is a very complex function of the features, you may not be able to learn it. Often, the raw data is not in a form that is amenable to learning, but you can construct features from it that are. This is typically where most of the effort in a machine learning project goes. It is often also one of the most interesting parts, where intuition, creativity and "black art" are as important as the technical stuff.
2011
- (Dinakar et al., 2011) ⇒ Karthik Dinakar, Roi Reichart, and Henry Lieberman. (2011). “Modeling the Detection of Textual Cyberbullying.” The Social Mobile Web 11, no. 02
- QUOTE: … The feature space design for the two experiments can be categorized into two kinds: general features that are common for all three labels and specific features for the detection of each label. ...
2010
- (Mayfield & Penstein-Rosé, 2010) ⇒ Elijah Mayfield, and Carolyn Penstein-Rosé. (2010). “Using Feature Construction to Avoid Large Feature Spaces in Text Classification.” In: Proceedings of the 12th annual conference on Genetic and evolutionary computation. ISBN:978-1-4503-0072-8 doi:10.1145/1830483.1830714
- QUOTE: Feature space design is a critical part of machine learning. This is an especially difficult challenge in the field of text classification, where an arbitrary number of features of varying complexity can be extracted from documents as a preprocessing step.
1999
- (Scott & Matwin, 1999) ⇒ Sam Scott, and Stan Matwin. (1999). “Feature Engineering for Text Classification.” In: Proceedings of 16th International Conference on Machine Learning (ICML 1999).