Here, agent is an insurer or risk manager

1. DDDM applies to agents role

  • actuary’s role has become more vital in this era of data-driven decision making1
  • traditional: internal and external data
  • modern: data analytics improved available data, DDDM and predictive modelling
  • human analysis on data, previously. Today, various methods for analyzing data exist
  • Predictive Modelling can help an insurer:
    • accurate Underwriting
    • precise pricing
    • efficient claims processing (+fraud detection)
    • wide variety of business problems

Improve Business Results

  1. Automatic decisions making: online quotes for personal auto, computer algorithm
  2. Organizing large volumes of new data: telematics to examine speed, braking patterns, left turns2 and distance traveled
  3. Discover new relationships in data: what are chars of workers who never had an accident and use that info to improve safety for all workers
  4. Exploring new sources of data: e.g. text mining (claim adjusters’ notes) automated system to predict claim’s severity

Transclude of Planning-an-Insurer-Data-Modeling-Project-2024-09-28-20.46.32.excalidraw

Data -engineering and -processing

  • allows management of data too large for conventional systems
  • models are dev to analyze data in the context of insurer’s area of interest

DDDM

  • solve business problems
  • achieve greater efficiency
  • competitive advantage

Two approaches

Descriptive

  • we have a specific problem that can be solved through data science
  • once solved, this approach is no longer used
  • a use-once-and-throw approach, tailor-made for the problem
  • one-time analysis

Predictive

  • can be used repeatedly to provide info for DDDM
  • automated underwriting
  • receive data from sensors (required safety equipment)

Data driven solutions can create a good impact. Snow Slip and Fall, preventing slips by adopting safety mechanisms.

A model for Data-Driven Decision Making in Risk Management and Insurance Rules of thumb.

Define the agent’s problem (w/o business context, ineffective)

2. Applying Data Quality Principles to an insurer data modeling project

  • GIGO
  • Actuaries need to be familiar with quality issues &
  • data management principals

What is data quality?

  • how well it meets predictive modelling needs and expectations?
  • what is the right amount of premium at a level of detectible.
  • collect enough data to get statistically significant results

So, it is the right data (purpose) in appropriate volume.

Ensuring data quality requires a systematic and purpose-driven review of data.

Characteristics of Quality Data

  • First understand, how the data will ultimately be applied in the predictive context
    • If the goal is model prediction, the quality of data will ultimately depend on how effective the model is. BUT we won’t know that from the beginning (we are at a point when we need to select the data to use for our purposes)
  • So how to ensure data quality? No clear-cut answer
  • can be indicated as a percentage or degree

Factors

C.R.T.L.A.V.C to rememberpneumonic

Context: Wildfire-related homeowner claims

Validity

  • Relevance or suitability for an app or project
  • different from accuracy, though can be described as that within predefined or accepted parameters
  • e.g. In wildfire example, dataset included auto claims might not be considered valid (not much predictive value) for homeowners claims

Accuracy

  • how well the true values or measurements are represented
  • how true is the business information
  • e.g. zip code 10021 is valid for New York City, data is not accurate if it’s intended to represent a loss exposure in rural upstate New York.

Completeness

  • whether all the variables are present3
  • e.g. results for February were unreadable or absent

Reasonability

  • applicable business conditions, does the data make sense?
  • e.g. if age of injured parties is over 100 years

Timeliness

  • appropriateness of fresh data
  • e.g. new coverage offered only in the last six months, omit the data from before the most recent six-month time frame

Lineage

  • tracing data from its source to explain unexpectedly inconsistent data
    • identifying errors
    • miscodings
    • different reporting systems
  • e.g. data gen and collected by orgs before appropriate business-handling rules were put to place

Consistency

  • to what extent do data stored in multiple locations match each other.

Richard Morales

  • We need to get data quality right
  • There is so much data out there
  • But we should ensure that the data is right (where it is coming from etc) to make key business decisions

Data Quality Management

  • we need metrics!
    • quantifiable
    • should demo its applicability to correlated predictive models

Validity: is this within acceptable range (expected by business)

  • correctly stored and formatted
  • internal data governance standards

Accuracy: is the data correct? is the form of the data unambiguous and consistent

Completeness: How comprehensive wrt what it claims to represent, the extent to which it can be used for that purpose

  • can be considered complete even if optional info (irrelevant to the purpose of the study) is missing

Reasonability: does the data have a level of consistency that mirrors business conditions? (e.g. U.S. zip code containing letters)

Timeliness:

InconsistencyCategoryReason
U.S. zip code containing lettersReasonabilityPeople who have the domain level knowledge know that its not possible
Middle names are missingCompletenessIt is still complete because we are not using middle names in the study
Date of birth is 01-02-2002AccuracyWe don’t know if it is US format or India format.
Age is 100 yearsValidityIt cannot be! We are looking at atheletes

3.

Footnotes

  1. Let’s call it DDDM for fun

  2. Apparently, left turns are illegal in some of the countries, except when there are traffic lights and in few more scenarios

  3. that it purports to