# Predicting Credit Worthiness Using Tabular Data The data files required for this tutorial are available in the following file. {% file src="/files/CnojRjnyXhRaZTPLvjKg" %} Credit Worthiness Tutorial files {% endfile %} ### Credit Worthiness Introduction A bank customer applies for a line of credit; to assess the creditworthiness of the customer, the bank retrieves data from its own warehouse and several third-party API endpoints. This data is then utilized to predict the customer's creditworthiness. ### Motivation Predicting creditworthiness accurately is a cornerstone of banking operations, enabling financial institutions to manage risk effectively and ensure customers are not overburdened with unmanageable credit. With the advent of data-driven decision making, banks can leverage vast amounts of data to make these predictions more effectively. This tutorial demonstrates the use of on dbnl on tabular data to ensure consistent predictions of creditworthiness through continuous testing of third-party endpoints and testing a live production system. These steps are crucial as predicting creditworthiness involves integrating data from various sources, including a bank's own data warehouse and several third-party API endpoints. Furthermore, in a live environment, the system interacts with real-time data, introducing additional complexities. ## Defining the Credit Worthiness System The creditworthiness prediction system incorporates several third-party endpoints. To prevent potential harm to consumers, it's crucial to regularly test these endpoints for expected behavior. In this tutorial, we conduct monthly tests on these third-party endpoints against a pre-established baseline. To complete this tutorial, the run config, test payload and data (as stored in runs) are needed. All can be found using the following [link](broken://pages/MuOH2XN2DzsYD7hqZu0m). The Credit Worthiness system consists of 11 different components.

The system used throughout the Credit Worthiness demo.

Component Definitions

For each component, we are specifying the various columns and identifying the respective owners of the different components. **Component: Credit\_request** A bank customer submits a credit request that includes details about the intended purchase, its price, the repayment plan, and whether there's a co-signer for the line of credit. **Columns:** * **Purpose** - Purpose for credit * **Guarantors** - Guarantors * **Instalment\_per\_cent** - Installment % * **Duration\_of\_Credit\_\_month\_** - Duration of credit per month * **Credit\_Amount** - Amount of Credit *** **Component: Unique\_ID** An identifier, used by the bank, facilitates the association of customer information with their internal records and enables data retrieval from third-party applications. **Columns:** * **SSN** - Unique ID *** **Component: Data Warehouse** The bank maintains a repository of customer-related information, which, for simplicity, is categorized into account information and personal information. In the context of continuous testing, the actual creditworthiness classification for each customer, based on their request, is known. **Columns:** * **Target\_Worthiness** - True classes *** **Component: Account Information** Subset of the Data Warehouse containing information about the value of the assets in a customer account. **Columns:** * **Account\_Balance** - Account balance * **Value\_Savings\_Stocks** - Savings/stock value *** **Component: Personal Information** Subset of the Data Warehouse containing personal information about a customer of the bank. **Columns:** * **Sex\_\_\_Marital\_Status** - Sex/Marital status * **Most\_valuable\_available\_asset** - Most valuable available asset * **Age\_\_years\_** - Age (years) * **No\_of\_dependents** - Number of dependents *** **Component: API:Credit\_History** Third party API providing the information about credit history for a given bank customer. **Columns:** * **Payment\_Status\_of\_Previous\_Credit** - Payment Status * **No\_of\_Credits\_at\_this\_Bank** - Number of credits at this Bank *** **Component: API:Credit\_Report** Third party API providing a FICO (credit) score for a given bank customer. **Columns:** * **FICO\_score** - FICO score *** **Component: API:Employment\_Veri. (verification)** Third party API providing employment verification for a given bank customer. **Columns:** * **Length\_of\_current\_employment** - Length of current employment * **Foreign\_Worker** - Foreign worker * **Occupation** - Occupation *** **Component: API:Rental\_History** Third party API providing information about the rental history of a given customer. **Columns:** * **Duration\_in\_Current\_address** - Duration in current address * **Type\_of\_apartment** - Type of apartment *** **Component: XGB:Classifier** Output of the XGBoost classifier used to predict whether a line of credit should get approved for a given customer. **Columns:** * **Predicted\_Worthiness** - Predicted classes * **Probability\_Bad** - Probability for class bad * **Probability\_Good** - Probability for class good * **Latency\_\_ms** - Latency for the model *** **Component: Evaluation** Run-level metrics coming from comparing the predicted credit worthiness class to the true credit worthiness class. **Scalars:** * **Model\_Accuracy** - Accuracy for the model * **Model\_F1** - F1-score for the model * **Model\_Precision** - Precision for the model * **Model\_Recall** - Recall for the model

Example Run Configuration - find complete run configuration in the tutorial zip file

**Example Column Configuration** ```yaml { "component": "Account_information", "description": "Account balance", "name": "Account_Balance", "type": "category" }, { "component": "Account_information", "description": "Savings/stock value", "name": "Value_Savings_Stocks", "type": "category" }, { "component": "Personal_information", "description": "Sex/Marital status", "name": "Sex___Marital_Status", "type": "category" }, { "component": "Personal_information", "description": "Most valuable available asset", "name": "Most_valuable_available_asset", "type": "category" } ``` **Example Scalar Configuration** ```json { "component": "Evaluation", "description": "Accuracy for the model", "name": "Model_Accuracy", "type": "float" }, { "component": "Evaluation", "description": "F1-score for the model", "name": "Model_F1", "type": "float" }, { "component": "Evaluation", "description": "Precision for the model", "name": "Model_Precision", "type": "float" }, { "component": "Evaluation", "description": "Recall for the model", "name": "Model_Recall", "type": "float" } ``` **Example Component DAG** ```yaml "API:Credit_History": [ "XGB:Classifier" ], "API:Credit_Report": [ "XGB:Classifier" ], "API:Employment_Veri.": [ "XGB:Classifier" ], "API:Rental_History": [ "XGB:Classifier" ], "Account_information": [ "XGB:Classifier" ], ```

## Creating dbnl Tests To test the credit worthiness system, 7 different groupings of tests are used. These are denoted using the dbnl [test tagging](https://docs.dbnl.com/creating-tests-in-distributional/dbnl-testing-objects) strategy. ### Change in probability This set of tests pertains to the distribution shape of the probability scores. Utilizing the scaled [Kolmogorov-Smirnov statistic](https://docs.dbnl.com/creating-tests-in-distributional/suggested-testing-strategies/tests-that-columns-are-similarly-distributed), the test is deemed unsuccessful if the difference between the distribution shapes of the baseline and experimental probabilities exceeds 0.5.