Evolving into a Python Chainladder Ninja
Aggregating a Triangle from Data¶
Task
We wish to find the OYDY triangle for Workers' Comp LOB.
Let's start by loading data.
- Load the dataset (It is already a triangle in this case)
- Check its summary to familiarize yourself with the columns and indices.
- We wish to select
wkcompso check the existing lines of businesses by selecting the'LOB'index.
- Select the appropriate triangle
- See summary, notice that there are
132rows corresponding to each unique GRNAME in thewkcompLOB, each representing a10x10triangle.
- This gives us the aggregated triangle for
wkcompLOB summed across all GRNAME
Indices¶
model.std_residuals_.shape \(\to\) (1,1,9,9)
- The first two axes are to select the triangle you want to take
- axis 1: The
indexalong which data should be aggregated - axis 2: The
column(e.g. Incurred Losses or Paid losses)
- axis 1: The
- The next two axes actually correspond to the triangle.
- axis 3: The
originperiod (Accident Year) - axis 4: The
developmentperiod
- axis 3: The
So, if we do
We get
| 48 | |
|---|---|
| 1982 | 1.5900 |
and
gives us the full triangle.
What does model.std_residuals_.iloc[...,:-1].mean('origin') mean?
In our example it simply means, model.std_residuals_.iloc[:,:,:,:-1] where we select everything but the last development period (because its empty).
Aggregations¶
Aggregation refers to methods like sum, mean product etc.
model.std_residuals_.iloc[0,0].mean('origin') will give us
| 12 | 24 | 36 | 48 | 60 | 72 | 84 | 96 | 108 | |
|---|---|---|---|---|---|---|---|---|---|
| 1981 | 0.3612 | 0.0872 | 0.0671 | 0.0876 | 0.0864 | 0.0156 | 0.0643 | 0.0248 |
which means, for each development, period we take the average of the standard deviations from various origin periods in that development period.
model.std_residuals_.iloc[0,0].mean('development') will give us a single column wherein we have averaged out the residuals from all development periods within an origin period.
Plotting¶
We often plot the origin period / Development year on the x-axis but our triangles have them as rows (y-axis) thus we should transpose the sequence before plotting them.
model.std_residuals_.T.plot( #(1)
style='.', color = 'gray', legend=False, grid=True, ax=ax00,
xlabel='Development Year', ylabel='Weighted Standardized Residuals',
)
.Tgives us the transpose and.plot()plots them using the matplotlib library.
Plotting Style Shortcuts¶
| Component | Shortcut | Description |
|---|---|---|
| Color | b |
Blue (default) |
g |
Green | |
r |
Red | |
c |
Cyan | |
m |
Magenta | |
y |
Yellow | |
k |
Black | |
w |
White | |
| Marker | . |
Point marker |
o |
Circle marker | |
v |
Triangle down marker | |
^ |
Triangle up marker | |
< |
Triangle left marker | |
> |
Triangle right marker | |
s |
Square marker | |
* |
Star marker | |
+ |
Plus marker | |
x |
X marker | |
| Line Style | - |
Solid line |
-- |
Dashed line | |
-. |
Dash-dot line | |
: |
Dotted line | |
| (none) | No line (markers only) |
Subplots¶
fig, ((ax00, ax01), (ax10, ax11)) = plt.subplots(ncols=2, nrows=2, figsize=(10,8)) #(1)
model.std_residuals_.T.plot(
style='.', color='gray', legend=False, grid=True, ax=ax00, #(2)
xlabel='Development Month', ylabel='Weighted Standardized Residuals',
)
model.std_residuals_.plot(
style='.', color='gray', legend=False, ax= ax01,
xlabel='Origin Period', sharey=True #(3)
)
- Now you have four
axvalues which correspond to different panels of the subplots. - Just pass in
ax=ax00to specify which panel exactly you wish to plot it to. - When you want the y-axes values to be in sync.
Joining Tables¶
fitted = (raa[raa.valuation<raa.valuation_date] * model.ldf_.values).unstack().rename('Fitted Values') #(1)
residuals = model.std_residuals_.unstack().rename('Residual')
pd.concat([fitted, residuals], axis=1) #(2)
.unstack()method flattens the columnsaxis=1join by columns
Will join columns in this fashion
| Fitted Values | Residual | |
|---|---|---|
| 12 - 1/1/1981 | 15032.78556 | -0.572151204 |
| 12 - 1/1/1982 | 317.932017 | 2.307508065 |
| 12 - 1/1/1983 | 10227.813 | -0.12673661 |
| 12 - 1/1/1984 | 16961.37317 | -0.430542702 |
Mental Notes¶
- When you wish to put together tables then
pd.concatwill be very useful. There are two paradigms where we would be working. One is while using triangles (doing actuarial work). The other is when we need to take results and do some data science with them. Thus, we will use the.to_frame()to convert the triangle into a data frame and thus we would be able to do our conventional data science stuff with that.