Here, agent is an insurer or risk manager
1. DDDM applies to agents role
- actuary’s role has become more vital in this era of data-driven decision making1
- traditional: internal and external data
- modern: data analytics → improved available data, DDDM and predictive modelling
- human analysis on data, previously. Today, various methods for analyzing data exist
- Predictive Modelling can help an insurer:
- accurate Underwriting
- precise pricing
- efficient claims processing (+fraud detection)
- wide variety of business problems
Improve Business Results
- Automatic decisions making: online quotes for personal auto, computer algorithm
- Organizing large volumes of new data: telematics to examine speed, braking patterns, left turns2 and distance traveled
- Discover new relationships in data: what are chars of workers who never had an accident and use that info to improve safety for all workers
- Exploring new sources of data: e.g. text mining (claim adjusters’ notes) → automated system to predict claim’s severity
Transclude of Planning-an-Insurer-Data-Modeling-Project-2024-09-28-20.46.32.excalidraw
Data -engineering and -processing
- allows management of data too large for conventional systems
- models are dev to analyze data in the context of insurer’s area of interest
DDDM
- solve business problems
- achieve greater efficiency
- competitive advantage
Two approaches
Descriptive
- we have a specific problem that can be solved through data science
- once solved, this approach is no longer used
- a use-once-and-throw approach, tailor-made for the problem
- one-time analysis
Predictive
- can be used repeatedly to provide info for DDDM
- automated underwriting
- receive data from sensors (required safety equipment)
Data driven solutions can create a good impact. Snow Slip and Fall, preventing slips by adopting safety mechanisms.
A model for Data-Driven Decision Making in Risk Management and Insurance Rules of thumb.
Define the agent’s problem (w/o business context, ineffective)
2. Applying Data Quality Principles to an insurer data modeling project
- GIGO
- Actuaries need to be familiar with quality issues &
- data management principals
What is data quality?
- how well it meets predictive modelling needs and expectations?
- what is the right amount of premium at a level of detectible.
- collect enough data to get statistically significant results
So, it is the right data (purpose) in appropriate volume.
Ensuring data quality requires a systematic and purpose-driven review of data.
Characteristics of Quality Data
- First understand, how the data will ultimately be applied in the predictive context
- If the goal is model prediction, the quality of data will ultimately depend on how effective the model is. BUT we won’t know that from the beginning (we are at a point when we need to select the data to use for our purposes)
- So how to ensure data quality? No clear-cut answer
- can be indicated as a percentage or degree
Factors
C.R.T.L.A.V.C to rememberpneumonic
Context: Wildfire-related homeowner claims
Validity
- Relevance or suitability for an app or project
- different from accuracy, though can be described as that within predefined or accepted parameters
- e.g. In wildfire example, dataset included auto claims might not be considered valid (not much predictive value) for homeowners claims
Accuracy
- how well the true values or measurements are represented
- how true is the business information
- e.g. zip code 10021 is valid for New York City, data is not accurate if it’s intended to represent a loss exposure in rural upstate New York.
Completeness
- whether all the variables are present3
- e.g. results for February were unreadable or absent
Reasonability
- applicable business conditions, does the data make sense?
- e.g. if age of injured parties is over 100 years
Timeliness
- appropriateness of fresh data
- e.g. new coverage offered only in the last six months, omit the data from before the most recent six-month time frame
Lineage
- tracing data from its source to explain unexpectedly inconsistent data
- identifying errors
- miscodings
- different reporting systems
- e.g. data gen and collected by orgs before appropriate business-handling rules were put to place
Consistency
- to what extent do data stored in multiple locations match each other.
Richard Morales
- We need to get data quality right
- There is so much data out there
- But we should ensure that the data is right (where it is coming from etc) to make key business decisions
Data Quality Management
- we need metrics!
- quantifiable
- should demo its applicability to correlated predictive models
Validity: is this within acceptable range (expected by business)
- correctly stored and formatted
- internal data governance standards
Accuracy: is the data correct? is the form of the data unambiguous and consistent
Completeness: How comprehensive wrt what it claims to represent, the extent to which it can be used for that purpose
- can be considered complete even if optional info (irrelevant to the purpose of the study) is missing
Reasonability: does the data have a level of consistency that mirrors business conditions? (e.g. U.S. zip code containing letters)
Timeliness:
Inconsistency | Category | Reason |
---|---|---|
U.S. zip code containing letters | Reasonability | People who have the domain level knowledge know that its not possible |
Middle names are missing | Completeness | It is still complete because we are not using middle names in the study |
Date of birth is 01-02-2002 | Accuracy | We don’t know if it is US format or India format. |
Age is 100 years | Validity | It cannot be! We are looking at atheletes |