Feature selection

function: scorecardbundle.feature_selection.FeatureSelection.selection_with_iv_corr()

Retrun a table of each feature' IV and their highly correlated features to help users select features.

Parameters

trans_woe: scorecardbundle.feature_encoding.WOE.WOE_Encoder object,
        The fitted WOE_Encoder object

encoded_X: numpy.ndarray or pandas.DataFrame,
        The encoded features data

threshold_corr: float, optional(default=0.6)
        The threshold of Pearson correlation coefficient. Exceeding
        This threshold means the features are highly correlated.

Return

result_selection: pandas.DataFrame,
        The table that contains 4 columns. column factor contains the 
        feature names, column IV contains the IV of features, 
        column woe_dict contains the WOE values of features and 
        column corr_with contains the feature that are highly correlated
        with this feature together with the correlation coefficients.

function: scorecardbundle.feature_selection.FeatureSelection.unstacked_corr_table()

Return the unstacked correlation table to help analyze the colinearity problem.

Parameters

encoded_X: numpy.ndarray or pandas.DataFrame,
        The encoded features data

dict_iv: python dictionary.
        The ditionary where the keys are feature names and values are the information values (iv)

Return

corr_unstack: pandas.DataFrame,
        The unstacked correlation table

function: scorecardbundle.feature_selection.FeatureSelection.identify_colinear_features()

Identify the highly-correlated features pair that may cause colinearity problem.

Parameters

encoded_X: numpy.ndarray or pandas.DataFrame,
        The encoded features data

dict_iv: python dictionary.
        The ditionary where the keys are feature names and values are the information values (iv)

threshold_corr: float, optional(default=0.6)
        The threshold of Pearson correlation coefficient. Exceeding
        This threshold means the features are highly correlated.

Return

features_to_drop_auto: python list,
        The features with lower IVs in highly correlated pairs.

features_to_drop_manual: python list,
        The features with equal IVs in highly correlated pairs.

corr_auto: pandas.DataFrame,
        The Pearson correlation coefficients and information values (IV) 
        of highly-correlated features pairs where the feature with lower IV
        will be dropped.

corr_manual: pandas.DataFrame,
        The Pearson correlation coefficients and information values (IV) 
        of highly-correlated features pairs where the features have equal IV values
        and human intervention is required to choose the feature to drop.