Quick Start

This part introduces the main features of Scorecard-Bundle, including feature discretization, WOE encoding, discretization adjustment, feature selection, scorecard training/adjustment, and model interpretation. Users can quickly develop their first scorecard model following the instructions in Quick Start, but in practice some tricks is often needed to deal with various challenges (e.g. performing ChiMerge discretization to all features requires high computation cost, and adjusting discretization is very time-consuming and labor-intensive without automation tools). Please refer to the complete code examples for the best practice to building scorecard models.

Like Scikit-Learn, Scorecard-Bundle basically have two types of objects, transformers and predictors, which comply with the fit-transform and fit-predict convention;
Complete code examples showing how to build a scorecard with Scorecard-Bundle can be found in Example Notebooks;
See more details in API Reference;
Note that the feature intervals in Scorecard-Bundle are open to the left and close to the right.

Load Scorecard-Bundle

from scorecardbundle.feature_discretization import ChiMerge as cm
from scorecardbundle.feature_discretization import FeatureIntervalAdjustment as fia
from scorecardbundle.feature_encoding import WOE as woe
from scorecardbundle.feature_selection import FeatureSelection as fs
from scorecardbundle.model_training import LogisticRegressionScoreCard as lrsc
from scorecardbundle.model_evaluation import ModelEvaluation as me
from scorecardbundle.model_interpretation import ScorecardExplainer as mise
  

Feature Discretization (ChiMerge)

Scorecard-Bundle applies ChiMerge algorithm (introduced by Randy Kerber in "ChiMerge: Discretization of Numeric Attributes") for feature discretization. ChiMerge is a bottom-up discretization algorithm based on the feature's distribution and the target classes' relative frequencies in each feature value. As a result, it keep statistically significantly different intervals and merge similar ones. The discretization step turns numerical features into intervals or merge similar values of ordinal features. Categorical features should be encoded into ordinal features before this step(e.g. encodings with event rate rankings).

By default, equal-frequency binning is used as a pre-step to ChiMerge since using ChiMerge directly tends to output highly-imbalanced feature values (some values of the feature have few samples)

As shown below, we can initialize a ChiMerge instance and fit it to data. max_intervals and min_intervals control the number of unique output intervals for each feature. decimal controls the number of decimals of boundaries. Check the API Reference for rest of parameters of ChiMerge.

trans_cm = cm.ChiMerge(max_intervals=10, min_intervals=2, decimal=3, output_dataframe=True)
result_cm = trans_cm.fit_transform(X, y) 
trans_cm.boundaries_ # see the interval boundaries for each feature
  

Like any transformer in sklearn, ChiMerge supports:

fit(): Fitting the data. Get the optimized discretization for each feature. e.g. trans_cm.fit(X,y)
transform(): transform original features to discretized ones using learnt discretization. e.g. trans_cm.transform(X)
fit_transform(): Fit and transform the features at once. e.g. trans_cm.fit_transform(X,y)

Feature Encoding (WOE)

Weight of Evidence (WOE) is the logarithm of the quotient of dependent variable's local distribution on each feature value divided by its global distribution. It represents the difference between the local event rate of each feature value and the global event rate. This means the values of WOE encodings is linear to the discriminative power of feature values and therefore performing WOE encoding enables regression models to better capture non-linear patterns.

Information value (IV) is calculated for features as a byproduct of WOE. IV is a commonly-used metric that represents a feature's discriminative power towards a binary target. IV can be accessed from the iv_ attribute after training WOE_Encoder

Check the API Reference for calculation details and parameter introduction.

trans_woe = woe.WOE_Encoder(output_dataframe=True)
result_woe = trans_woe.fit_transform(result_cm, y)
print(trans_woe.iv_) # information value (iv) for each feature
print(trans_woe.result_dict_) # woe dictionary and iv value for each feature

# IV result
res_iv = pd.DataFrame.from_dict(trans_woe.iv_, orient='index').sort_values(0,ascending=False).reset_index()
res_iv.columns = ['feature','IV']
  

Discretization Adjustment

For a Scorecard model, we usually have the following expectations from a feature:

The feature should have acceptable level of discriminative power (e.g. IV>0.02);
The feature distribution shouldn't be highly imbalanced (e.g. all feature values shouldn't have too few samples);
The event rate curve of the feature is often monotonic or quadratic so that the pattern can be easily interpreted by human;
The trend of the event rate curve should be inconsistent with that of feature value distribution. Statistically speaking, assuming a feature has no discriminative power upon a dependent variable, the feature value that accounts for a small proportion of total samples is usually more likely to have a lower event rate then the value that dominate the distribution, especially when the dependent variable is imbalanced. Therefore when the direction of event rate curve is consistent with the direction of feature value distribution, we are not certain that whether the feature has some sort of discriminative power, or the low event rates in less-dominant feature values are simply due to the fact that less-dominant feature values cover a smaller region of the feature value distribution and are therefore less likely to encounter the samples with positive dependent variable;

In the Discretization Adjustment step, check the sample distribution and event rate distribution for each feature, and then adjust the feature intervals to meet the above expectations.

Check the sample distribution and event rate distribution

Use plot_event_dist() function can easily visualize the feature's distribution, including the sample size and event rate of each feature value.

col = 'HouseAge'
fia.plot_event_dist(result_cm[col],y,x_rotation=60)
  

Adjust the feature intervals

Now that new boundaries for feature discretization has been determined based on the plot above, we can use assign_interval_str() function to pass user-defined boundaries to the original feature values and get the new discretized feature.

new_x = cm.assign_interval_str(X[col].values,[24,36,45]) # apply new interval boundaries to the feature
woe.woe_vector(new_x, y.values) # check the information value of the resulted feature that applied the new intervals
  

({'-inf~24.0': -0.37674091199664517,
'24.0~36.0': -0.0006838162136153891,
'36.0~45.0': 0.16322806760041855,
'45.0~inf': 0.7012457415969229},
 0.12215245735367213)
  

Check the distributions again

fia.plot_event_dist(new_x,y
                  ,title=f'Feature distribution of {col}'
                  ,x_label=col
                  ,y_label='More valuable than Q90'
                  ,x_rotation=60
                 ,save=False # Set to True if want to save to local position
                 ,file_name=col # filename in the case saving to local position
                 ,table_vpos=-0.6 # The smaller the value is, the further down the table's pisition will be
                   ) 
  

Update the dataset of discretized features

Update the dataset of discretized features. Once finishing adjusting features, this dataset will be encoded with WOE and then fitted to the Logistic regression model.

result_cm[col] = new_x # Update with adjusted features
feature_list.append(col) # The list that records the selected features
  

WOE on the interval-adjusted feature data

After finishing interval adjustments for all features, perform WOE encoding to the adjusted feature data

trans_woe = woe.WOE_Encoder(output_dataframe=True)
result_woe = trans_woe.fit_transform(result_cm[feature_list], y) 
result_woe.head()
  

	Latitude	HouseAge	Population	Longitude	AveRooms
0	0.016924	0.163228	0.060771	-0.374600	-0.660410
1	0.016924	-0.376741	-0.231549	-0.374600	-0.660410
2	0.016924	0.163228	-0.231549	-0.374600	-0.660410
3	-0.438377	-0.000684	0.060771	0.402336	0.724149
4	0.016924	0.163228	-0.231549	-0.374600	-0.660410

trans_woe.iv_ # the information value (iv) for each feature
  

{'Latitude': 0.08626935922214038,
 'HouseAge': 0.12215245735367213,
 'Population': 0.07217596403800937,
 'Longitude': 0.10616009747356592,
 'AveRooms': 0.7824038737089276}
  

Feature selection

The purpose of feature selection step is mainly mitigate the co-linearity problem caused by correlated features in regression models. After Identifying highly-correlated feature pairs, the feature with lower IV within each pair is dropped. There are 3 tools in Scorecard-Bundle that built for this purpose:

Function selection_with_iv_corr() sort features by their IVs and identifies other features that are highly-correlated with the feature;
Function identify_colinear_features() identifies highly-correlated feature pairs and the feature to drop within each pair;
Function unstacked_corr_table() returns all feature pairs sorted by their correlation coefficient.

Note that Pearson Correlation Coefficient is used to measure the correlation between features.

fs.selection_with_iv_corr(trans_woe, result_woe,threshold_corr=0.7) # column 'corr_with' lists the other features that are highly correlated with the feature 
  

	factor	IV	woe_dict	corr_with
2	AveRooms	47.130083	{0.8461538461538461: -23.025850929940457, 1.0:…	{‘MedInc': 0.886954762287924, ‘AveBedrms': 0.8…
5	AveOccup	45.534320	{1.0892678034102308: -23.025850929940457, 1.08…	{‘MedInc': 0.8654189306421639, ‘AveRooms': 0.9…
0	MedInc	41.907477	{0.4999: -23.025850929940457, 0.536: -23.02585…	{‘AveRooms': 0.886954762287924, ‘AveBedrms': 0…
3	AveBedrms	37.630560	{0.4444444444444444: -23.025850929940457, 0.5:…	{‘MedInc': 0.7527641328927565, ‘AveRooms': 0.8…
4	Population	16.181549	{5.0: -23.025850929940457, 6.0: -23.0258509299…	{}
7	Longitude	8.396207	{-124.35: -23.025850929940457, -124.27: -23.02…	{}
6	Latitude	8.314223	{32.54: -23.025850929940457, 32.55: -23.025850…	{}
1	HouseAge	0.236777	{1.0: -23.025850929940457, 2.0: 0.341285523686…	{}

features_to_drop_auto,features_to_drop_manual,corr_auto,corr_manual = fs.identify_colinear_features(result_woe_raw,trans_woe_raw.iv_,threshold_corr=0.7)
print('The features with lower IVs in highly correlated pairs: ',features_to_drop_auto)
print('The features with equal IVs in highly correlated pairs: ',features_to_drop_manual)
corr_auto # highly correlated feature pairs (with unequal IVs)
  

	feature_a	feature_b	corr_coef	iv_feature_a	iv_feature_b	to_drop
0	MedInc	AveRooms	0.886955	41.907477	47.130083	MedInc
1	MedInc	AveBedrms	0.752764	41.907477	37.630560	AveBedrms
2	MedInc	AveOccup	0.865419	41.907477	45.534320	MedInc
3	AveRooms	MedInc	0.886955	47.130083	41.907477	MedInc
4	AveRooms	AveBedrms	0.827336	47.130083	37.630560	AveBedrms
5	AveRooms	AveOccup	0.952849	47.130083	45.534320	AveOccup
6	AveBedrms	MedInc	0.752764	37.630560	41.907477	AveBedrms
7	AveBedrms	AveRooms	0.827336	37.630560	47.130083	AveBedrms
8	AveBedrms	AveOccup	0.811198	37.630560	45.534320	AveBedrms
9	AveOccup	MedInc	0.865419	45.534320	41.907477	MedInc
10	AveOccup	AveRooms	0.952849	45.534320	47.130083	AveOccup
11	AveOccup	AveBedrms	0.811198	45.534320	37.630560	AveBedrms

# Return the unstacked correlation table for all features to help analyze the colinearity problem
fs.unstacked_corr_table(result_woe,trans_woe.iv_)
  

	feature_a	feature_b	corr_coef	abs_corr_coef	iv_feature_a	iv_feature_b
12	Longitude	latitude	-0.464314	0.464314	0.106160	0.086269
2	latitude	Longitude	-0.464314	0.464314	0.086269	0.106160
5	HouseAge	Population	0.258828	0.258828	0.122152	0.07217
…	…	…	…	…	…	…

Based on the analysis results above, ‘MedInc', ‘AveBedrms', and ‘AveOccup' are dropped.

feature_list = list(set(features)-set(['MedInc', 'AveBedrms', 'AveOccup']))
print(feature_list)
  

Model Training

Scorecard-Bundle put scorecard transformation tools together with sklearn's logistic regression in the LogisticRegressionScoreCard class, users can train a scorecard model directly with this class (fit) and make predictions (predict).

A scorecard model uses parameters basePoints and PDO to control the center and variability of score distribution. Default choices can be PDO=-20, basePoints=100 or PDO=-10, basePoints=60.

Base odds is a ratio criterion (positive class/negative class), users can pass their own base odds via parameter baseOdds (e.g. define defaults/normal as 1:60). The number of y=1 divided by that of y=0 in the dependent variable will be used if user did not pass a value to baseOdds .
PDO is the score changes when the base odds doubles. Its absolute value is in proportion to the variability of score distribution. When PDO is negative, the higher the model score, the higher the probability that the sample belongs to the positive class. This applies to most binary classification tasks; When PDO is positive, the higher the model score, the lower the probability that the sample belongs to the positive class. This applies to conventional credit modeling situation where high score means low-risk samples.
basePoints is the expected score of base odds. It determines the center of the score distribution.

Also, LogisticRegressionScoreCard class accepts all parameters of sklearn.linear_model.LogisticRegression and its fit() fucntion accepts all parameters of the fit() of sklearn.linear_model.LogisticRegression (including sample_weight). Thus in practice we can use methods like Grid Search to find a optimal set of parameters for Logistic regression and pass the parameters to LogisticRegressionScoreCard to train a scorecard model.

Now we train a scorecard model.

model = lrsc.LogisticRegressionScoreCard(trans_woe, PDO=-20, basePoints=100, verbose=True)
model.fit(result_woe, y)
  

If User want to define base odds as 1:60

# Users can use `baseOdds` parameter to set base odds. 
# Default is None, where base odds will be calculate using the number of positive class divided by the number of negative class in y
# Assuming Users want base odds to be 1:60 (positive:negative)
model = lrsc.LogisticRegressionScoreCard(trans_woe, PDO=-20, basePoints=100, baseOdds=1/60,
                                         verbose=True,C=0.6,penalty='l2')
model.fit(result_woe, y)
  

After fitting the model, User can access the scoring rules via attribute woe_df. Note that the feature intervals (value column in the rules table) are all open to the left and close to the right (e.g. 34.0~37.6 means (34.0, 37.6] ).

model.woe_df_ # the scorecard rules
'''
  	feature: feature name;
  	value: feature intervals that are open to the left and closed to the right;
  	woe: the woe encoding for each feature interval;
  	beta: the regression coefficients for each feature in the Logistic regression;
  	score: the assigned score for each feature interval;
'''
  

	feature	value	woe	beta	score
0	Latitude	-inf~34.1	0.016924	1.907463	34.0
1	Latitude	34.1~34.47	0.514342	1.907463	61.0
2	Latitude	34.47~37.59	0.097523	1.907463	38.0
3	Latitude	37.59~inf	-0.438377	1.907463	9.0
4	HouseAge	-inf~24.0	-0.376741	1.640162	15.0
5	HouseAge	24.0~36.0	-0.000684	1.640162	33.0
6	HouseAge	36.0~45.0	0.163228	1.640162	40.0
7	HouseAge	45.0~inf	0.701246	1.640162	66.0
8	Population	-inf~420.0	0.168914	0.464202	35.0
9	Population	1274.0~2812.0	-0.231549	0.464202	30.0
10	Population	2812.0~inf	-0.616541	0.464202	24.0
11	Population	420.0~694.0	0.277570	0.464202	36.0
12	Population	694.0~877.0	0.354082	0.464202	37.0
13	Population	877.0~1274.0	0.060771	0.464202	34.0
14	Longitude	-118.37~inf	-0.374600	1.643439	15.0
15	Longitude	-121.59~-118.37	0.056084	1.643439	35.0
16	Longitude	-inf~-121.59	0.402336	1.643439	52.0
17	AveRooms	-inf~5.96	-0.660410	1.124053	11.0
18	AveRooms	5.96~6.426	0.120843	1.124053	37.0
19	AveRooms	6.426~6.95	0.724149	1.124053	56.0
20	AveRooms	6.95~7.41	1.261640	1.124053	74.0

Scorecard Adjustment

Users can manually adjust the Scorecard rules (as shown below, or output excel files to local position, edit it in excel and load it back to Python), and use load_scorecardparameter of predict() to load the adjusted rule table. See details in the documentation of load_scorecard.

Assuming we want to change the highest score for AveRooms from 92 to 91.

sc_table = model.woe_df_.copy()
sc_table['score'][(sc_table.feature=='AveRooms') & (sc_table.value=='7.41~inf')] = 91
sc_table
  

Apply the Scorecard

Scorecard should be applied on the original feature values, namely the features before discretization and WOE encoding.

result = model.predict(X[feature_list], load_scorecard=sc_table) # Scorecard should be applied on the original feature values
result_val = model.predict(X_val[feature_list], load_scorecard=sc_table) # Scorecard should be applied on the original feature values
result.head() # if model object's verbose parameter is set to False, predict will only return Total scores
  

	Latitude	HouseAge	Population	Longitude	AveRooms	TotalScore
0	34.0	40.0	34.0	15.0	11.0	134.0
1	34.0	15.0	30.0	15.0	11.0	105.0
2	34.0	40.0	30.0	15.0	11.0	130.0
3	9.0	33.0	34.0	52.0	56.0	184.0
4	34.0	40.0	30.0	15.0	11.0	130.0

Loading a scorecard rules file from local position to a new kernal:

# OR if we load rules from local position.
sc_table = pd.read_excel('rules')

model = lrsc.LogisticRegressionScoreCard(woe_transformer=None, verbose=True) # Initialize an empty model instance. Pass None to woe_transformer because the predict function does not need it
result = model.predict(X[feature_list], load_scorecard=sc_table) # Scorecard should be applied on the original feature values
result_val = model.predict(X_val[feature_list], load_scorecard=sc_table) # Scorecard should be applied on the original feature values
result.head() # if model object's verbose parameter is set to False, predict will only return Total scores
  

Model Evaluation

Classification performance on different levels of model scores (precision/recall/f1/etc.):

me.pref_table(y_val,result_val['TotalScore'].values,thresholds=result['TotalScore'].quantile(np.arange(1,10)/10).values)
  

	y_pred_group	event_num	sample_size	cum_event_num	cum_sample_size	cum_sample_pct	cum_precision	cum_recal	cum_f1
9	(192.0, inf]	337	794	337	794	0.096172	0.424433	0.408485	0.416306
8	(174.0, 192.0]	143	794	480	1588	0.192345	0.302267	0.581818	0.397845
7	(163.0, 174.0]	96	891	576	2479	0.300266	0.232352	0.698182	0.348668
6	(152.0, 163.0]	79	862	655	3341	0.404675	0.196049	0.793939	0.314450
5	(144.0, 152.0]	53	793	708	4134	0.500727	0.171263	0.858182	0.285541
4	(134.0, 144.0]	42	740	750	4874	0.590359	0.153878	0.909091	0.263204
3	(129.0, 134.0]	24	709	774	5583	0.676235	0.138635	0.938182	0.241573
2	(123.0, 129.0]	25	762	799	6345	0.768532	0.125926	0.968485	0.222873
1	(109.0, 123.0]	18	1053	817	7398	0.896076	0.110435	0.990303	0.198711
0	(-inf, 109.0]	8	858	825	8256	1.000000	0.099927	1.000000	0.181698

Performance plots (K-S curve/ROC curve/ precision recall curve):

# Validation
evaluation = me.BinaryTargets(y_val, result_val['TotalScore'])
evaluation.plot_all()
  

Model Interpretation

Interpretation the model score of an individual instance.

# Features that contribute 80%+ of total score
imp_fs = mise.important_features(result_val
                 ,feature_names=list(sc_table.feature.unique())
                 ,col_totalscore='TotalScore'
                 ,threshold_method=0.8, bins=None)
result_val['important_features'] = imp_fs

# Features with top n highest score
imp_fs = mise.important_features(result_val
                 ,feature_names=list(sc_table.feature.unique())
                 ,col_totalscore='TotalScore'
                 ,threshold_method=2, bins=None)
result_val['top2_features'] = imp_fs

# Define the prediction threshold based on classification performance
result_val['y_pred'] = result_val['TotalScore'].map(lambda x: 1 if x>152 else 0)
result_val
  

	Latitude	HouseAge	Population	Longitude	AveRooms	TotalScore	important_features	top2_features	y_pred
0	38.0	15.0	30.0	35.0	11.0	129.0	{‘Latitude': 38.0, ‘Longitude': 35.0, ‘Populat…	{‘Latitude': 38.0, ‘Longitude': 35.0}	0
1	9.0	15.0	30.0	35.0	11.0	100.0	{‘Longitude': 35.0, ‘Population': 30.0}	{‘Longitude': 35.0, ‘Population': 30.0}	0
2	9.0	15.0	34.0	52.0	11.0	121.0	{‘Longitude': 52.0, ‘Population': 34.0}	{‘Longitude': 52.0, ‘Population': 34.0}	0
3	9.0	66.0	36.0	52.0	11.0	174.0	{‘HouseAge': 66.0, ‘Longitude': 52.0}	{‘HouseAge': 66.0, ‘Longitude': 52.0}	1
4	34.0	15.0	37.0	15.0	91.0	192.0	{‘AveRooms': 91.0, ‘Population': 37.0}	{‘AveRooms': 91.0, ‘Population': 37.0}	1
…	…	…	…	…	…	…	…	…	…
8251	34.0	33.0	37.0	15.0	37.0	156.0	{‘Population': 37.0, ‘AveRooms': 37.0, ‘Latitu…	{‘Population': 37.0, ‘AveRooms': 37.0}	1
8252	9.0	40.0	30.0	52.0	11.0	142.0	{‘Longitude': 52.0, ‘HouseAge': 40.0}	{‘Longitude': 52.0, ‘HouseAge': 40.0}	0
8253	34.0	15.0	24.0	15.0	37.0	125.0	{‘AveRooms': 37.0, ‘Latitude': 34.0, ‘Populati…	{‘AveRooms': 37.0, ‘Latitude': 34.0}	0
8254	9.0	15.0	36.0	35.0	11.0	106.0	{‘Population': 36.0, ‘Longitude': 35.0}	{‘Population': 36.0, ‘Longitude': 35.0}	0
8255	34.0	40.0	30.0	15.0	11.0	130.0	{‘HouseAge': 40.0, ‘Latitude': 34.0}	{‘HouseAge': 40.0, ‘Latitude': 34.0}	0

Now we can interpret model scores based on the analysis above. For example, the 4th entry (index=3) has a total score of 174. The primary driver of house value in this case is housing age (‘HouseAge') and position(‘Longitude') because they contribute more than 80% of the total score (also they are the 2 features with the highest scores).