Evolving into a Python Chainladder Ninja

Aggregating a Triangle from Data¶

Task

We wish to find the OYDY triangle for Workers' Comp LOB.

Let's start by loading data.

clrd = cl.load_sample('clrd') #(1)
clrd #(2)

Load the dataset (It is already a triangle in this case)
Check its summary to familiarize yourself with the columns and indices.

clrd['LOB'].unique() #(1)

We wish to select wkcomp so check the existing lines of businesses by selecting the 'LOB' index.

clrd = clrd[clrd['LOB'] == 'wkcomp']['CumPaidLoss'] #(1)
clrd #(2)

Select the appropriate triangle
See summary, notice that there are 132 rows corresponding to each unique GRNAME in the wkcomp LOB, each representing a 10x10 triangle.

clrd.sum() #(1)

This gives us the aggregated triangle for wkcomp LOB summed across all GRNAME

Indices¶

model.std_residuals_.shape \(\to\) (1,1,9,9)

The first two axes are to select the triangle you want to take
- axis 1: The index along which data should be aggregated
- axis 2: The column (e.g. Incurred Losses or Paid losses)
The next two axes actually correspond to the triangle.
- axis 3: The origin period (Accident Year)
- axis 4: The development period

So, if we do

model.std_residuals_.iloc[0,0,1,3]

We get

	48
1982	1.5900

and

model.std_residuals_.iloc[0,0]

gives us the full triangle.

What does model.std_residuals_.iloc[...,:-1].mean('origin') mean?

In our example it simply means, model.std_residuals_.iloc[:,:,:,:-1] where we select everything but the last development period (because its empty).

Aggregations¶

Aggregation refers to methods like sum, mean product etc.

model.std_residuals_.iloc[0,0].mean('origin') will give us

	12	24	36	48	60	72	84	96	108
1981	0.3612	0.0872	0.0671	0.0876	0.0864	0.0156	0.0643	0.0248

which means, for each development, period we take the average of the standard deviations from various origin periods in that development period.

model.std_residuals_.iloc[0,0].mean('development') will give us a single column wherein we have averaged out the residuals from all development periods within an origin period.

Plotting¶

We often plot the origin period / Development year on the x-axis but our triangles have them as rows (y-axis) thus we should transpose the sequence before plotting them.

model.std_residuals_.T.plot( #(1)
    style='.', color = 'gray', legend=False, grid=True, ax=ax00,
    xlabel='Development Year', ylabel='Weighted Standardized Residuals',
)

.T gives us the transpose and .plot() plots them using the matplotlib library.

Plotting Style Shortcuts¶

Component	Shortcut	Description
Color	`b`	Blue (default)
	`g`	Green
	`r`	Red
	`c`	Cyan
	`m`	Magenta
	`y`	Yellow
	`k`	Black
	`w`	White
Marker	`.`	Point marker
	`o`	Circle marker
	`v`	Triangle down marker
	`^`	Triangle up marker
	`<`	Triangle left marker
	`>`	Triangle right marker
	`s`	Square marker
	`*`	Star marker
	`+`	Plus marker
	`x`	X marker
Line Style	`-`	Solid line
	`--`	Dashed line
	`-.`	Dash-dot line
	`:`	Dotted line
	(none)	No line (markers only)

Subplots¶

fig, ((ax00, ax01), (ax10, ax11)) = plt.subplots(ncols=2, nrows=2, figsize=(10,8)) #(1)

model.std_residuals_.T.plot(
    style='.', color='gray', legend=False, grid=True, ax=ax00, #(2)
    xlabel='Development Month', ylabel='Weighted Standardized Residuals',
)


model.std_residuals_.plot(
    style='.', color='gray', legend=False, ax= ax01,
    xlabel='Origin Period', sharey=True #(3)
)

Now you have four ax values which correspond to different panels of the subplots.
Just pass in ax=ax00 to specify which panel exactly you wish to plot it to.
When you want the y-axes values to be in sync.

Joining Tables¶

fitted = (raa[raa.valuation<raa.valuation_date] * model.ldf_.values).unstack().rename('Fitted Values') #(1)

residuals = model.std_residuals_.unstack().rename('Residual')

pd.concat([fitted, residuals], axis=1) #(2)

.unstack() method flattens the columns
axis=1 join by columns

Will join columns in this fashion

	Fitted Values	Residual
12 - 1/1/1981	15032.78556	-0.572151204
12 - 1/1/1982	317.932017	2.307508065
12 - 1/1/1983	10227.813	-0.12673661
12 - 1/1/1984	16961.37317	-0.430542702

Mental Notes¶

When you wish to put together tables then pd.concat will be very useful. There are two paradigms where we would be working. One is while using triangles (doing actuarial work). The other is when we need to take results and do some data science with them. Thus, we will use the .to_frame() to convert the triangle into a data frame and thus we would be able to do our conventional data science stuff with that.