Feature selection
function: scorecardbundle.feature_selection.FeatureSelection.selection_with_iv_corr()
Retrun a table of each feature' IV and their highly correlated features to help users select features.
Parameters
trans_woe: scorecardbundle.feature_encoding.WOE.WOE_Encoder object,
The fitted WOE_Encoder object
encoded_X: numpy.ndarray or pandas.DataFrame,
The encoded features data
threshold_corr: float, optional(default=0.6)
The threshold of Pearson correlation coefficient. Exceeding
This threshold means the features are highly correlated.
Return
result_selection: pandas.DataFrame,
The table that contains 4 columns. column factor contains the
feature names, column IV contains the IV of features,
column woe_dict contains the WOE values of features and
column corr_with contains the feature that are highly correlated
with this feature together with the correlation coefficients.
function: scorecardbundle.feature_selection.FeatureSelection.unstacked_corr_table()
Return the unstacked correlation table to help analyze the colinearity problem.
Parameters
encoded_X: numpy.ndarray or pandas.DataFrame,
The encoded features data
dict_iv: python dictionary.
The ditionary where the keys are feature names and values are the information values (iv)
Return
corr_unstack: pandas.DataFrame,
The unstacked correlation table
function: scorecardbundle.feature_selection.FeatureSelection.identify_colinear_features()
Identify the highly-correlated features pair that may cause colinearity problem.
Parameters
encoded_X: numpy.ndarray or pandas.DataFrame,
The encoded features data
dict_iv: python dictionary.
The ditionary where the keys are feature names and values are the information values (iv)
threshold_corr: float, optional(default=0.6)
The threshold of Pearson correlation coefficient. Exceeding
This threshold means the features are highly correlated.
Return
features_to_drop_auto: python list,
The features with lower IVs in highly correlated pairs.
features_to_drop_manual: python list,
The features with equal IVs in highly correlated pairs.
corr_auto: pandas.DataFrame,
The Pearson correlation coefficients and information values (IV)
of highly-correlated features pairs where the feature with lower IV
will be dropped.
corr_manual: pandas.DataFrame,
The Pearson correlation coefficients and information values (IV)
of highly-correlated features pairs where the features have equal IV values
and human intervention is required to choose the feature to drop.