Model training
class: scorecardbundle.model_training.LogisticRegressionScoreCard
Take encoded features, fit a regression and turn it into a scorecard
Parameters
woe_transformer: WOE transformer object from WOE module.
C: float, optional(Default=1.0)
regularization parameter in linear regression. Default value is 1.
A smaller value implies more regularization.
See details in scikit-learn document.
class_weight: dict, optional(default=None)
weights for each class of samples (e.g. {class_label: weight})
in linear regression. This is to deal with imbalanced training data.
Setting this parameter to 'auto' will aotumatically use
class_weight function from scikit-learn to calculate the weights.
The equivalent codes are:
>>> from sklearn.utils import class_weight
>>> class_weights = class_weight.compute_class_weight('balanced',
np.unique(y), y)
random_state: int, optional(default=None)
random seed in linear regression. See details in scikit-learn doc.
PDO: int, optional(default=-20)
Points to double odds. One of the parameters of Scorecard.
Default value is -20.
A positive value means the higher the scores, the lower
the probability of y being 1.
A negative value means the higher the scores, the higher
the probability of y being 1.
basePoints: int, optional(default=100)
the score for base odds(# of y=1/ # of y=0).
baseOdds: float, optional(default=None)
The ratio of the number of positive class(y=1) divided by that of negative class(y=0)
Leave this parameter to None means baseOdds will be automatically calculated
using the number of positive class divided by the number of negative class in y.
decimal: int, optional(default=0)
Control the number of decimals that the output scores have.
Default is 0 (no decimal).
start_points: boolean, optional(default=False)
There are two types of scorecards, with and without start points.
True means the scorecard will have a start poitns.
output_option: string, optional(default='excel')
Controls the output format of scorecard. For now 'excel' is
the only option.
output_path: string, optional(default=None)
The location to save the scorecard. e.g. r'D:\\Work\\jupyter\\'.
verbose: boolean, optioanl(default=False)
When verbose is set to False, the predict() method only returns
the total scores of samples. In this case the output of predict()
method will be numpy.array;
When verbose is set to True, the predict() method will return
the total scores, as well as the scores of each feature. In this case
The output of predict() method will be pandas.DataFrame in order to
specify the feature names.
delimiter: string, optional(default='~')
The feature interval is representated by string (i.e. '1~2'),
which takes the form lower+delimiter+upper. This parameter
is the symbol that connects the lower and upper boundaries..
**kargs: other keyword arguments in sklearn.linear_model.LogisticRegression()
Attributes
woe_df_: pandas.DataFrame, the Scorecard scoring rules.
The table contains 5 columns (feature, value, woe, beta and score).
- 'feature' column: feature names
- 'value' column: feature intervals (right-closed)
- 'woe' column: WOE encodings of feature intervals
- 'score' column: the score for the feature interval respectively
An example would be as followed,
feature value woe beta score
x1 30~inf 0.377563 0.631033 5.0
x1 20~-30 1.351546 0.631033 37.0
x1 -inf~20 1.629890 0.631033 -17.0
AB_ : A and B when converting regression to scorecard.
Methods
fit(woed_X, y,**kargs):
fit the Scorecard model.
predict(X_beforeWOE, load_scorecard=None):
Apply the model to the original feature
(before discretization and woe encoding).
If user choose to upload their own Scorecard,
user can pass a pandas.DataFrame to `load_scorecard`
parameter. The dataframe should contain columns such as
feature, value, woe, beta and score. An example would
be as followed (value is the range of feature values, woe
is the WOE encoding of that range, and score is the socre
for that range):
feature value woe beta score
x1 30~inf 0.377563 0.631033 5.0
x1 20~-30 1.351546 0.631033 37.0
x1 -inf~20 1.629890 0.631033 -17.0