Insurance carriers use third party data to validate the information auto insurance customers are submitting to get a quote because getting the details wrong about rate-able factors almost guarantees that a policy will be a loser. So carriers use purchased data to test what the customer provides.
While useful, this approach has several pitfalls that can lure carriers into a false sense of security:
The Problem of Data Errors
All data sets have errors: mis-keys, lagging data, missing data, and so on. In the real world, a data set with 95% accuracy is almost unheard of. And third party data sometimes combines multiple data sources which makes things even worse. For example a head of household data set may be merged with a college marketing data set to identify young drivers living in a household. When data vendors create this type of 'synthetic data' the errors multiply. For example imagine a scenario where two nearly perfect (95% accurate) data sets are being combined to create synthetic data. Combining them drives their new 'Synthetic' data accuracy down to about 90% (.95 * .95). Combine three data sets and it falls even further.
Consumer errors. Most consumers shop for insurance online, even if they eventually use an agent to complete the transaction. This means that consumer errors from mis-keys, misunderstandings and carelessness are frequently introduced into the process. Errors that serve to magnify third party data's error problems.
Fraud and rate manipulation are low probability events. There are many ways that consumers can manipulate data to reduce their premiums or get payments they don't deserve but each of these taken in isolation is a small probability event, a few percent at most. With purchased data having error rates of at least 5 and typically 10% and with consumers making mistakes, the overall number of both false positive and false negative errors on any given data diagnostic can easily equal five or ten times the number of true positives.
So you can see how impractical it is to use purchased data alone to judge whether a specific customer has submitted accurate data. More work must be done to validate the initial data diagnosis. And critically, most of this work must be done immediately, during the quote session.
It is important to note that this problem isn't particular to insurance. Indeed, if you follow medical research you'll constantly hear "studies show" that this or that food item or activity is healthy or deadly. If you live long enough, you'll hear it described both ways. Using large, complex data sets to diagnose small probability events is inherently difficult, so difficult that government regulators won't allow drugs, diagnostics or medical devices to be marketed simply based upon this form of 'epidemiological' analysis. They require double blind controlled experiments.
Which isn't really possible when quoting insurance online. So what can insurance carriers do to reduce data diagnostic error rates sufficiently to detect, deter and defeat fraud without also rejecting a large proportion of good customers? There are three keys to diagnosing and eliminating misrepresentation and errors in the insurance quote and application process:
Leverageallsources of data, not just purchased ones.