Compare the different between Filter Method, Wrapper Method and Embedded Method.
We can simply derive data selection into 3 categories — Filter Method, Wrapper Method, Embedded Method. Let’s discuss about what these method does, and their pro and cons.
Definition: Select features based on statistical tests or correlations
- Easy and fast to implement
- Does not require fitting a model
- Can handle a large number of features
- Can be used as a pre-processing step before other methods
- Interpretability of the selected features
- May not capture complex relationships between the features and the target variable.- Does not require fitting a model
- May select irrelevant features
- Chi-square test
- Information Gain
Definition: Uses a predictive model to evaluate the features
- Can capture complex relationships between features and the target variable
- Can select a subset of features that work well together for the given model
- Computationally expensive
- May overfit the model
- May miss important features
- May not be easily interpretable
- Forward selection
- Backward selection
- Stepwise selection
Definition:Learns feature importance as part of the model training process
- Can capture complex relationships between features
- Considers feature interactions
- Can be more efficient than wrapper methods for large datasets
- Computationally expensive during model training and the target variable.
- May not perform well with certain types of models or data
- May miss important features
- May not be easily interpretable, especially for complex models
Here some examples of each method, and you may tell when is good for which method.
Filter Method Example:
Suppose you have a dataset with many features, and you want to select only the most relevant ones for predicting the target variable. You could use a filter method such as the correlation-based feature selection (CFS) algorithm. CFS evaluates the relevance of each feature by computing its correlation with the target variable, and also evaluates the redundancy between features by computing their correlation with each other. It then selects a subset of features that are highly relevant to the target variable, but also minimizes the redundancy among features.
Wrapper Method Example:
Suppose you want to train a machine learning model to predict whether a customer will churn (i.e., cancel their subscription) based on their demographic and behavioral data. You could use a wrapper method such as recursive feature elimination (RFE) with a logistic regression model. RFE starts with all the features and trains the logistic regression model, then removes the least important feature and trains the model again, and repeats this process until a desired number of features is reached. The features selected by RFE are those that work best with the logistic regression model for the given prediction task.
Embedded Method Example:
Suppose you want to train a deep neural network to classify images of cats and dogs. You could use an embedded method such as dropout regularization to learn the feature importance as part of the training process. Dropout randomly drops out some of the neurons during each training epoch, forcing the network to learn redundant representations of the features. The features that are more important for the classification task will tend to have a stronger presence in the network’s learned representations, and will thus be more robust to dropout. Dropout can thus be seen as a form of feature selection that is integrated into the learning process.