Big data? How about starting with just ‘data’?0 March 2, 2018 at 11:51 am by Glenn McGillivray
We live in an age of ‘Big Data’. And while it seems to me that the actual term is being used less and less these days (indicating, perhaps, that the concept has become mainstreamed), the overall notion is alive and well and thriving in some Canadian p&c insurance companies.
There are many sides to Big Data, but one of the foundational concepts is the idea that there is a wealth of untapped and unstructured information available, both within organizations and elsewhere, that could be mined in order to identify key business trends and the like. Much of these data can be found in non-traditional places and, sometimes, have to be analyzed using non-traditional means.
The concept is intriguing: Going to non-traditional sources of data that might be able to confirm old assumptions or tell you wholly new things about your insureds, about individual or entire categories of risks, about the industry and so on – data that probably went ignored for many decades because no one saw its intrinsic value or had the means to collect and analyze it.
But my thinking is that before we plow headlong into this brave new world, shouldn’t we first try to do the best we can with traditional industry data?
The truth is we have a big problem with data quality in the Canadian p&c insurance industry. I mean, for a G7 country with an industry that wrote $53 billion in DWP in 2016 (all private insurers, IBC Facts Book 2017) we make some really big decisions based on often very lousy inputs. Garbage in garbage out.
Take disaster data for one. Systematically collected insured catastrophe data has only been collected in this country for about a decade. The cat data that goes back further than this, while helpful to show an overall tendency, has been cobbled together from various sources and is not very robust.
Then there is the normal, everyday data that insurers depend on. Data about individual insureds and risks gleaned by some brokers during the normal course of their interactions with clients often has more holes it in than a block of Swiss cheese.
The same can be said of information collected by some claims adjusters. It is commonplace to see individual claims in company claims management systems with key fields being left blank (including entire claims notes fields). As a result, fields either remain unfilled, or will often default to ridiculous (and very unhelpful) presets – like setting a roof’s age at 75 years for a home that is only 25 years old.
According to some sources (including a broker I know), these data quality issues could, at least partially, be responsible for the trend of insurers leaning on such data points as credit score to determine insurability and client desirability. Without access to proper information about the insured and the risk, it has been argued that insurers have been forced to turn to other data points to make business decisions. If this charge is even just partially true, brokers could be doing a great disservice to themselves and to their continued viability by remaining stingy on data provision.
Many companies, it seems, are not enforcing requirements that brokers and adjusters fill out all relevant fields in the underwriting and claims systems they use and, consequently, many insurers are flying blind with some of the business decisions they are making.
An additional, and really acute problem in my mind, is the issue of company claims coding and industry data aggregation.
A paper co-written by a colleague explains the issue quite well:
Historical loss data are widely used by numerous stakeholders within the insurance industry to help understand and assess risk. Databases maintained by CGI’s Insurance Information Services (IIS) for example, including the Habitational Information Tracking Systems (HITS) and Commercial Tracking System (CTS), are used to access personal and commercial property and liability claims information, including claims histories for specific properties.
Current loss codes, however, used by insurers to populate the CGI IIS HITS and CTS databases have limited ability to reflect the nuances of many types of personal and commercial property losses. Loss codes are highly aggregated, group many types of relatively distinct perils, and limit the ability of insurers to fully understand property level exposure to natural disaster risk based on historical claims. Aggregated loss codes also limit the ability of the insurance industry to participate in key public policy discussions surrounding mitigation of natural disaster risk and climate change adaptation.
The paper recommends refinements “aimed at increasing the granularity of the [loss] codes for pertinent insured perils” to collect more detailed claims information on:
- Plumbing failures resulting in water damage, including failures related to appliances, sprinkler systems and pipe freeze;
- Water damage associated with ice damming;
- Water damage associated with sump pump failure;
- Sewer backup, including differentiation between isolated and regional sewer backup events;
- Seepage and groundwater related water damage;
- Overland water influx;
- Structural/urban fire and wildland fire, and;
- Wind and hail.
Recently, I had the opportunity to see data from one insurance company’s new product offering where the company used the launch as an opportunity to design their data system and set out their data collection methods from scratch. The company developed a very detailed range of risk and loss codes and sub-codes, and uses state-of-the-art GIS/heat mapping technology to plot exposures and losses. In a few years time, when the new product becomes more ubiquitous, the company will have fantastic datasets on insureds, risks, and loss trends that will give it a significant edge over the competition.
It is not too late for the industry to change its ways with regard to data collection. But if it refuses to address the issue, it will hit the point of no return and every company will be on its own. The largest players will then have a key competitive edge simply because of the sheer volume of internal data and data analytics sources available to them.
It seems to me that it only makes sense for the industry to get its bread and butter data together first. Then, any findings from Big Data analysis can be used to augment and improve what companies already have and already know.
Before we go to wild and woolly places to get new – and new forms – of data, companies should first get their traditional data house in order. This will be particularly key if reliance on such data points as credit score prove not to be the long-term panacea that some hope they will be.
A simple roadmap to better company and industry data
- Cat data: Companies should subscribe to CatIQ and feed data into the database when requested. Though we can’t change cat data from the past, we can move forward in a systemized, coordinated way to ensure that future cat data is as close to unimpeachable as possible.
- Underwriting data: Companies should work with producers to ensure that underwriting questionnaires are kept up-to-date and that all relevant information is fed into appropriate systems.
- Claims data: Companies should work with claims adjusters, both internal and independent, and others to ensure that all proper information gets recorded into clams management systems.
- Loss codes and data aggregation: Companies should get more granular with loss codes and sub-codes. Data fields labelled ‘Other’ should be shunned whenever possible. Ideally, a new industry standard should be set out and companies should adhere to it.
Note: By submitting your comments you acknowledge that insBlogs has the right to reproduce, broadcast and publicize those comments or any part thereof in any manner whatsoever. Please note that due to the volume of e-mails we receive, not all comments will be published and those that are published will not be edited. However, all will be carefully read, considered and appreciated.
Leave a Reply