Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Page Not Found

Page not found. Your pixels are in another canvas.

Jupyter notebook markdown generator

Posts

Why should you use Transfer Learning for your Image Recognition App ? Permalink

plus petit que 1 minute de lecture

Mis à jour : June 05, 2020

Image recognition is one of the most important deep learning topics today. We find it in medicine, in Automobile Industry, on Social Networks even on our phones. The problem whit Image Recognition, is that it requires a lot of resources and a lot of data. For example, if tomorrow you decide to create an app to differentiate a dog from a man or identify the breed of a dog, if you have few data and few resources this may be difficult. That’s why we use Transfer Learning.

Exploratory Data Analysis for Natural Language Processing Permalink

plus petit que 1 minute de lecture

Mis à jour : May 22, 2020

As a data scientist for an insurance company, I found myself working on text data. Text is an unstructured data which can provide a lot of information. And doing a statistical analysis on it allows to draw some information. Text analysis allows companies to automatically extract and classify information from text. Popular text analysis techniques include sentiment analysis, topic detection, and keyword extraction. Today I’m going to show you three tools that I use each time for extract and classify information from text

mooc

portfolio

Portfolio item number 1

Mis à jour : December 23, 2022

Short description of portfolio item number 1

Portfolio item number 2

Mis à jour : December 23, 2022

Short description of portfolio item number 2

publications

Automatic analysis of insurance reports through deep neural networks to identify severe claims.

Published in Annals of Actuarial Science, 2021

In this paper, we develop a methodology to automatically classify claims using the information contained in text reports (redacted at their opening). From this automatic analysis, the aim is to predict if a claim is expected to be particularly severe or not. The difficulty is the rarity of such extreme claims in the database, and hence the difficulty, for classical prediction techniques like logistic regression to accurately predict the outcome. Since data is unbalanced (too few observations are associated with a positive label), we propose different rebalance algorithm to deal with this issue. We discuss the use of different embedding methodologies used to process text data, and the role of the architectures of the networks.

Recommended citation: Cohen Sabban, I., Lopez, O., & Mercuzot, Y. (2021). Automatic analysis of insurance reports through deep neural networks to identify severe claims. Annals of Actuarial Science, 1-26. doi:10.1017/S174849952100004X.

Method for automatically detecting anomalies in log files.

Published in European Patent Office, 2021

The present invention relates to a method for monitoring log files related to the operation of one or more components, such as applications, services, sensors, or instruments. Modern software systems, such as an application executed on a computer, can generate log files comprising several messages that may be used to troubleshoot the program. Messages comprised in log files can be unstructured free-form text strings, which can record events or states of interest and capture the intent of the developer. These logs files can be read by a developer or a user to detect the events, states, and other interesting occurrences. Usually, when the execution of a program fails, for example when it does not perform according to expectations, system operators can examine recorded log files to obtain insights about the failure and find out the potential root causes. The present invention relates to a method for automatically detecting anomalies in one or more log files by using a neural network trained beforehand. The invention also relates to a system for implementing such method. Finally, the invention relates to a method for training the neural network used in the method for automatically detecting anomalies in one or more log files.

Statistical analysis of the evolution of extreme claims for bodily risk cover.

Published in Sorbonne Université, 2022

The objective of this thesis is to show how the richness of the available data on insurance claims can be used to significantly improve the prediction of the final amount of a claim (or its outcome, when we are interested in the classification of a claim according to its level of severity). To process such a large volume of data - some of which are textual data, not very common in insurance - we use statistical learning techniques (in particular deep neural networks such as Convolutional Neural Networks or Long Short Term Memory networks) both as predictors and as information extractors. The study of bodily claims requires a specific treatment due to their characteristics and the extreme volatility of their cost. Extreme Value Theory tools allowed us to analyze the tail distribution of the claim’s amount, but also to determine a severity threshold. Another specificity of our approach was to take into account the temporal flow of claims, which is particularly important when we are interested in a branch of insurance with a long development such as third party liability. During this thesis, we used several times IPCW (Inverse-Probability-of-Censoring Weighting) weights in order to deal with the phenomenon of censorship which makes the information available incomplete.

Recommended citation: Isaac Cohen Sabban. Analyse statistique de l'évolution des sinistres graves pour une garantie risque corporel. Mathématiques. Sorbonne Université, 2022. Français.

Prediction of the outcome of insurance claims with deep neural networks

Published in In pending, 2023

In this paper, we develop a methodology to perform the prediction on an open insurance claim (RBNS, Reported But Not Settled) from a set of complex covariates with various structures (structured and unstructured data). The technique combines different deep neural networks architectures (such as Long Short Term Memory for text data) with survival analysis prediction methods (to predict the time of settlement of the claim). The deep learning methods are used to extract features from our complex data, hence to perform dimension reduction. These features may be plugged in a final neural network predictor, or combined with more intelligible models like a Generalized Linear Model, if the need for interpretation is more important than the quality of the prediction. A real data analysis illustrates the technique.

Recommended citation:

teaching

Intelligence artificielle pour l’actuariat

Masters Courses , ENSAE, 2019

In 2019 and 2020.

Modeles de duree

MOOC, Fun-MOOC, 2019

From 2019 to 2021.

Isaac Cohen Sabban