Skip to content

Evolving into a Python Chainladder Ninja

Aggregating a Triangle from Data

Task

We wish to find the OYDY triangle for Workers' Comp LOB.

Let's start by loading data.

clrd = cl.load_sample('clrd') #(1)
clrd #(2)
  1. Load the dataset (It is already a triangle in this case)
  2. Check its summary to familiarize yourself with the columns and indices.
clrd['LOB'].unique() #(1)
  1. We wish to select wkcomp so check the existing lines of businesses by selecting the 'LOB' index.
clrd = clrd[clrd['LOB'] == 'wkcomp']['CumPaidLoss'] #(1)
clrd #(2)
  1. Select the appropriate triangle
  2. See summary, notice that there are 132 rows corresponding to each unique GRNAME in the wkcomp LOB, each representing a 10x10 triangle.
clrd.sum() #(1)
  1. This gives us the aggregated triangle for wkcomp LOB summed across all GRNAME
Indices

model.std_residuals_.shape \(\to\) (1,1,9,9)

  • The first two axes are to select the triangle you want to take
    • axis 1: The index along which data should be aggregated
    • axis 2: The column (e.g. Incurred Losses or Paid losses)
  • The next two axes actually correspond to the triangle.
    • axis 3: The origin period (Accident Year)
    • axis 4: The development period

So, if we do

model.std_residuals_.iloc[0,0,1,3]

We get

48
1982 1.5900

and

model.std_residuals_.iloc[0,0]

gives us the full triangle.

What does model.std_residuals_.iloc[...,:-1].mean('origin') mean?

In our example it simply means, model.std_residuals_.iloc[:,:,:,:-1] where we select everything but the last development period (because its empty).

Aggregations

Aggregation refers to methods like sum, mean product etc.

model.std_residuals_.iloc[0,0].mean('origin') will give us

12 24 36 48 60 72 84 96 108
1981 0.3612 0.0872 0.0671 0.0876 0.0864 0.0156 0.0643 0.0248

which means, for each development, period we take the average of the standard deviations from various origin periods in that development period.

model.std_residuals_.iloc[0,0].mean('development') will give us a single column wherein we have averaged out the residuals from all development periods within an origin period.

Plotting

We often plot the origin period / Development year on the x-axis but our triangles have them as rows (y-axis) thus we should transpose the sequence before plotting them.

model.std_residuals_.T.plot( #(1)
    style='.', color = 'gray', legend=False, grid=True, ax=ax00,
    xlabel='Development Year', ylabel='Weighted Standardized Residuals',
)
  1. .T gives us the transpose and .plot() plots them using the matplotlib library.
Plotting Style Shortcuts
Component Shortcut Description
Color b Blue (default)
g Green
r Red
c Cyan
m Magenta
y Yellow
k Black
w White
Marker . Point marker
o Circle marker
v Triangle down marker
^ Triangle up marker
< Triangle left marker
> Triangle right marker
s Square marker
* Star marker
+ Plus marker
x X marker
Line Style - Solid line
-- Dashed line
-. Dash-dot line
: Dotted line
(none) No line (markers only)
Subplots
fig, ((ax00, ax01), (ax10, ax11)) = plt.subplots(ncols=2, nrows=2, figsize=(10,8)) #(1)

model.std_residuals_.T.plot(
    style='.', color='gray', legend=False, grid=True, ax=ax00, #(2)
    xlabel='Development Month', ylabel='Weighted Standardized Residuals',
)


model.std_residuals_.plot(
    style='.', color='gray', legend=False, ax= ax01,
    xlabel='Origin Period', sharey=True #(3)
)
  1. Now you have four ax values which correspond to different panels of the subplots.
  2. Just pass in ax=ax00 to specify which panel exactly you wish to plot it to.
  3. When you want the y-axes values to be in sync.
Joining Tables
fitted = (raa[raa.valuation<raa.valuation_date] * model.ldf_.values).unstack().rename('Fitted Values') #(1)

residuals = model.std_residuals_.unstack().rename('Residual')

pd.concat([fitted, residuals], axis=1) #(2)
  1. .unstack() method flattens the columns
  2. axis=1 join by columns

Will join columns in this fashion

Fitted Values Residual
12 - 1/1/1981 15032.78556 -0.572151204
12 - 1/1/1982 317.932017 2.307508065
12 - 1/1/1983 10227.813 -0.12673661
12 - 1/1/1984 16961.37317 -0.430542702

Mental Notes

  1. When you wish to put together tables then pd.concat will be very useful. There are two paradigms where we would be working. One is while using triangles (doing actuarial work). The other is when we need to take results and do some data science with them. Thus, we will use the .to_frame() to convert the triangle into a data frame and thus we would be able to do our conventional data science stuff with that.