Publications

Prediction of the outcome of insurance claims with deep neural networks

Published in In pending, 2023

In this paper, we develop a methodology to perform the prediction on an open insurance claim (RBNS, Reported But Not Settled) from a set of complex covariates with various structures (structured and unstructured data). The technique combines different deep neural networks architectures (such as Long Short Term Memory for text data) with survival analysis prediction methods (to predict the time of settlement of the claim). The deep learning methods are used to extract features from our complex data, hence to perform dimension reduction. These features may be plugged in a final neural network predictor, or combined with more intelligible models like a Generalized Linear Model, if the need for interpretation is more important than the quality of the prediction. A real data analysis illustrates the technique.

Recommended citation:

Statistical analysis of the evolution of extreme claims for bodily risk cover.

Published in Sorbonne Université, 2022

The objective of this thesis is to show how the richness of the available data on insurance claims can be used to significantly improve the prediction of the final amount of a claim (or its outcome, when we are interested in the classification of a claim according to its level of severity). To process such a large volume of data - some of which are textual data, not very common in insurance - we use statistical learning techniques (in particular deep neural networks such as Convolutional Neural Networks or Long Short Term Memory networks) both as predictors and as information extractors. The study of bodily claims requires a specific treatment due to their characteristics and the extreme volatility of their cost. Extreme Value Theory tools allowed us to analyze the tail distribution of the claim’s amount, but also to determine a severity threshold. Another specificity of our approach was to take into account the temporal flow of claims, which is particularly important when we are interested in a branch of insurance with a long development such as third party liability. During this thesis, we used several times IPCW (Inverse-Probability-of-Censoring Weighting) weights in order to deal with the phenomenon of censorship which makes the information available incomplete.

Recommended citation: Isaac Cohen Sabban. Analyse statistique de l'évolution des sinistres graves pour une garantie risque corporel. Mathématiques. Sorbonne Université, 2022. Français.

Method for automatically detecting anomalies in log files.

Published in European Patent Office, 2021

The present invention relates to a method for monitoring log files related to the operation of one or more components, such as applications, services, sensors, or instruments. Modern software systems, such as an application executed on a computer, can generate log files comprising several messages that may be used to troubleshoot the program. Messages comprised in log files can be unstructured free-form text strings, which can record events or states of interest and capture the intent of the developer. These logs files can be read by a developer or a user to detect the events, states, and other interesting occurrences. Usually, when the execution of a program fails, for example when it does not perform according to expectations, system operators can examine recorded log files to obtain insights about the failure and find out the potential root causes. The present invention relates to a method for automatically detecting anomalies in one or more log files by using a neural network trained beforehand. The invention also relates to a system for implementing such method. Finally, the invention relates to a method for training the neural network used in the method for automatically detecting anomalies in one or more log files.

Automatic analysis of insurance reports through deep neural networks to identify severe claims.

Published in Annals of Actuarial Science, 2021

In this paper, we develop a methodology to automatically classify claims using the information contained in text reports (redacted at their opening). From this automatic analysis, the aim is to predict if a claim is expected to be particularly severe or not. The difficulty is the rarity of such extreme claims in the database, and hence the difficulty, for classical prediction techniques like logistic regression to accurately predict the outcome. Since data is unbalanced (too few observations are associated with a positive label), we propose different rebalance algorithm to deal with this issue. We discuss the use of different embedding methodologies used to process text data, and the role of the architectures of the networks.

Recommended citation: Cohen Sabban, I., Lopez, O., & Mercuzot, Y. (2021). Automatic analysis of insurance reports through deep neural networks to identify severe claims. Annals of Actuarial Science, 1-26. doi:10.1017/S174849952100004X.