Image

Credit Scoring with R

Agenda Program
divider graphic
icon
Location
Prague, NH Hotel Prague
icon
Price
N/A
icon
Lecturer
N/A
icon
Language
English
icon
Evaluation
N/A
divider graphic
Overview of scorecard development process
Analysis and transformation of characteristics
Logistic regression: theory and practice
Selection of characteristics for building scoring models
Methods of assessing of predictive power of scoring systems
Calculation of scorepoints
Reject inference: taking into account rejected applications
Using of scoring systems
What will you learn?
  • You will be able to build a scoring model. Even if you will start without any knowledge in this topic.
  • Learn all the stages of the scoring system development process: starting from gathering data, through selection of best features, determination of scoring points, quality assessments, up to monitoring of a working system.
  • Learn how to preprocess data for development of scoring systems.
  • What are statistical methods that are applied there?
  • Learn how to solve problem of lack of information about rejected applications (reject inference).
  • Get knowledge and skills in assessment of quality of scoring models.
  • Learn essential basics of R.
  • Work on these topics hands-on with a computer: we use R and RStudio.
Methodology:
The training bases on presentations and hands-on exercises with R. Thanks to this fact you will be able to work on your own data using this tool. As your experience will grow you will prepare your own set of methods and R functions for building models. It will suit you: effective, convenient, and using powerful methods -- methods you like to use and appropriate for specificity of credit portfolios and data you work with. Even if you don't use R you will benefit from the training. The methods introduced are best practices and are accessible in many statistical tools. R aims as an illustrative facility. Thanks to using R in the training you will get knowledge and practical skills independent of commercial software.

There will be a lot of hands-on computer exercises. Participants are required to bring laptops. We will use RStudio through a web browser. It means that you will need just a web browser and MS Excel.

Who should attend?
Employees of credit risk, CRM, audit, and IT departments who:
  • build scoring models or willing to start building them,
  • monitor working scoring models,
  • validate existing models,
  • are credit risks analysts,
  • are for any reason interested in learning how to build a scoring system and how it works.
Materials:
You will get printouts of the slides and R scripts allowing you working single-handedly on your data after the course.

Program of the seminar: Credit Scoring with R

The seminar timetable follows Central European Time (CET).

09.00 - 09.15 Welcome and Introduction

09.15 - 12.00

Short introduction to R and RStudio

  • introduction to R
  • using RStudio
  • basics of R: data types and data structures
    • objects and their main properties (vectors, matrices, strings, lists and data frames)
    • basic operations on objects
  • elements of programming in R language
    • basics of R language
    • controlling of code flow
    • writing own scripts and functions
  • data input and output
  • basic data wrangling with dplyr

Overview of scorecard development process

  • organization of the project (including definition of a business goal)
  • preliminary data analysis
  • definition of project parameters
    • definition of good and bad client: transformation of business goal into a statistical goal
    • application window and performance window
    • exclusions
    • segmentation
  • data preparation
    • characteristics used in credit scoring
    • selection of a development sample
    • gathering and cleaning data
  • building of a scoring card
    • analysis and transformation of characteristics used to building scoring system
    • logistic regression
    • selection of characteristics for building scoring models
    • methods of assessing of predictive power of scoring systems
    • reject inference
  • using of scoring systems
    • summary of the process: scorecard management reports
    • implementation of a scorecard (including cut-off point selection: iso-risk, iso-acceptance)
    • monitoring

12:00 - 13:00 Lunch

13:00 - 16:30

Analysis and transformation of characteristics used to building scoring system

  • analysis of single characteristics
    • Weight of Evidence, odds
    • distributions of characteristics (contingency tables, histograms)
    • handling of missing data and outliers
    • quality control and cleaning of data
    • preliminary choice of characteristics for building a model - analysis of discriminative power of characteristics
  • binning (discretization) for numeric characteristics
    • role of binning
    • using weight of evidence (WoE)
    • using classification trees

09:00 - 09:15 Recap

09:15 - 12:00

Logistic regression: theory and practice

  • an introduction to logistic regression
  • statistical basics
  • modeling using three approaches: dummy variables, WoE encoding, using continuous variables
  • building of a model
  • diagnostics of model: statistical tests and plots
  • statistical inference for logistic regression
  • other methods of building scoring systems and their pros and cons (classification trees, random forest, neural networks)

12:00 - 13:00 Lunch

13:00 - 16:30

Selection of characteristics for building scoring models

  • introduction to assessment of predictive power of scoring models
  • criteria of using characteristics in scoring models: statistical, business, operational
  • Information Value of a characteristic
  • exhaustive search
  • stepwise methods basing on AIC criterion
  • using random forest
  • handling correlated variables
  • analysis of dependency of characteristics and construction of generated characteristics (cross characteristics)

Sampling

  • model complexity vs. model generalization ability
  • learn/test split
  • cross validation
  • stratified sampling

09:00 - 09:15 Recap

09:15 - 12:00

Methods of assessing of predictive power of scoring systems

  • goodness of fit criteria (AIC, R^2)
  • analysis of predictive power of model
  • distributions of scoring points
  • assessment of classification quality: confusion matrix
  • assessment of discriminative power: ROC curve, AR, KS, and divergence measures

12:00 - 13:00 Lunch

13:00 - 16:15

Calculation of scorepoints

  • scaling and shift

Reject inference: taking into account rejected applications

  • an idea of reject inference
  • overview of reject inference methods
    • define as bad
    • extrapolation
    • augmentation

Using of scoring systems

  • choosing a cut-off
  • monitoring of effectiveness of scoring systems and reporting

16:15 - 16:30 Evaluation and Termination of the Seminar

Training catalogue in PDF
arrow-up icon