What will you learn?

You will be able to build a scoring model. Even if you will start without any knowledge in this topic.
Learn all the stages of the scoring system development process: starting from gathering data, through selection of best features, determination of scoring points, quality assessments, up to monitoring of a working system.
Learn how to preprocess data for development of scoring systems.
What are statistical methods that are applied there?
Learn how to solve problem of lack of information about rejected applications (reject inference).
Get knowledge and skills in assessment of quality of scoring models.
Learn essential basics of R.
Work on these topics hands-on with a computer: we use R and RStudio.

Methodology:
The training bases on presentations and hands-on exercises with R. Thanks to this fact you will be able to work on your own data using this tool. As your experience will grow you will prepare your own set of methods and R functions for building models. It will suit you: effective, convenient, and using powerful methods -- methods you like to use and appropriate for specificity of credit portfolios and data you work with. Even if you don't use R you will benefit from the training. The methods introduced are best practices and are accessible in many statistical tools. R aims as an illustrative facility. Thanks to using R in the training you will get knowledge and practical skills independent of commercial software.

There will be a lot of hands-on computer exercises. Participants are required to bring laptops. We will use RStudio through a web browser. It means that you will need just a web browser and MS Excel.

Who should attend?
Employees of credit risk, CRM, audit, and IT departments who:

build scoring models or willing to start building them,
monitor working scoring models,
validate existing models,
are credit risks analysts,
are for any reason interested in learning how to build a scoring system and how it works.

Materials:
You will get printouts of the slides and R scripts allowing you working single-handedly on your data after the course.

Program of the seminar: Credit Scoring with R

The seminar timetable follows Central European Time (CET).

Day One

09.00 - 09.15 Welcome and Introduction

09.15 - 12.00

Short introduction to R and RStudio

introduction to R
using RStudio
basics of R: data types and data structures
- objects and their main properties (vectors, matrices, strings, lists and data frames)
- basic operations on objects
elements of programming in R language
- basics of R language
- controlling of code flow
- writing own scripts and functions
data input and output
basic data wrangling with dplyr

Overview of scorecard development process

organization of the project (including definition of a business goal)
preliminary data analysis
definition of project parameters
- definition of good and bad client: transformation of business goal into a statistical goal
- application window and performance window
- exclusions
- segmentation
data preparation
- characteristics used in credit scoring
- selection of a development sample
- gathering and cleaning data
building of a scoring card
- analysis and transformation of characteristics used to building scoring system
- logistic regression
- selection of characteristics for building scoring models
- methods of assessing of predictive power of scoring systems
- reject inference
using of scoring systems
- summary of the process: scorecard management reports
- implementation of a scorecard (including cut-off point selection: iso-risk, iso-acceptance)
- monitoring

12:00 - 13:00 Lunch

13:00 - 16:30

Analysis and transformation of characteristics used to building scoring system

analysis of single characteristics
- Weight of Evidence, odds
- distributions of characteristics (contingency tables, histograms)
- handling of missing data and outliers
- quality control and cleaning of data
- preliminary choice of characteristics for building a model - analysis of discriminative power of characteristics
binning (discretization) for numeric characteristics
- role of binning
- using weight of evidence (WoE)
- using classification trees

Day Two

09:00 - 09:15 Recap

09:15 - 12:00

Logistic regression: theory and practice

an introduction to logistic regression
statistical basics
modeling using three approaches: dummy variables, WoE encoding, using continuous variables
building of a model
diagnostics of model: statistical tests and plots
statistical inference for logistic regression
other methods of building scoring systems and their pros and cons (classification trees, random forest, neural networks)

12:00 - 13:00 Lunch

13:00 - 16:30

Selection of characteristics for building scoring models

introduction to assessment of predictive power of scoring models
criteria of using characteristics in scoring models: statistical, business, operational
Information Value of a characteristic
exhaustive search
stepwise methods basing on AIC criterion
using random forest
handling correlated variables
analysis of dependency of characteristics and construction of generated characteristics (cross characteristics)

Sampling

model complexity vs. model generalization ability
learn/test split
cross validation
stratified sampling

Day Three

09:00 - 09:15 Recap

09:15 - 12:00

Methods of assessing of predictive power of scoring systems

goodness of fit criteria (AIC, R^2)
analysis of predictive power of model
distributions of scoring points
assessment of classification quality: confusion matrix
assessment of discriminative power: ROC curve, AR, KS, and divergence measures

12:00 - 13:00 Lunch

13:00 - 16:15

Calculation of scorepoints

scaling and shift

Reject inference: taking into account rejected applications

an idea of reject inference
overview of reject inference methods
- define as bad
- extrapolation
- augmentation

Using of scoring systems

choosing a cut-off
monitoring of effectiveness of scoring systems and reporting

Credit Scoring with R

Location

Price

Lecturer

Language

Evaluation

Program of the seminar: Credit Scoring with R

09.00 - 09.15 Welcome and Introduction

09.15 - 12.00

Short introduction to R and RStudio

Overview of scorecard development process

12:00 - 13:00 Lunch

13:00 - 16:30

Analysis and transformation of characteristics used to building scoring system

09:00 - 09:15 Recap

09:15 - 12:00

Logistic regression: theory and practice

12:00 - 13:00 Lunch

13:00 - 16:30

Selection of characteristics for building scoring models

Sampling

09:00 - 09:15 Recap

09:15 - 12:00

Methods of assessing of predictive power of scoring systems

12:00 - 13:00 Lunch

13:00 - 16:15

Calculation of scorepoints

Reject inference: taking into account rejected applications

Using of scoring systems

16:15 - 16:30 Evaluation and Termination of the Seminar