101-B
Unsupervised Learning Methods Applied to Property-Casualty Databases
Thus, unsupervised learning is a kind of analysis where there is no explicit dependent variable. Examples of unsupervised learning in insurance include modeling of questionable claims (foe some action such as referral to a Special Investigation Unit) and the construction of territories by grouping together records that are geographically “close” to each other. Databases used for detecting questionable claims analysis often do not contain a fraud indicator as a dependent variable. Unsupervised learning methods are often used to address this limitation. The PRIDIT (Principal Components of RIDITS) and Random Forest (a tree based data-mining method) unsupervised learning methods will be introduced. We will apply the methods to an automobile insurance database to model questionable[1] claims.
A simulated database containing features observed in actual questionable claims data was developed for this research. The database is available from the author.
[1] The simulated data is based on a research database originally constructed to investigate claims that were suspected not to be legitimate, such as staged accidents and inflated damages. The term “fraudulent” is generally not used in referring to such claims as claims that meet the definition of criminal fraud are a very small percentage of claims.
See more of: Conference Program: Tracks