
Historically, organizations used analytics in situations where they had lots of internal data. The data was the outcome of well-structured, repetitive processes supported by transactional systems. If, for example, you had sold a lot of products at a lot of different prices, you could do pricing analytics on data in your order management system. If youâd hired a lot of people, you could do analytics on your hiring process, using data from your HR system. If you had done a lot of promotions, you could do analytics to understand which ones worked best using data from marketing systems.
However, there are many processes for which you donât have lots of internal data. What if, for example, your company is planning to introduce a new product or service that is unlike any one youâve ever brought to market? You are unlikely to have data on how such offerings fare in the marketplace. Many organizations, faced with such uncertainty, simply âgo with their gutâ and introduce the new product or service without data or analytics.
External Data to the Rescue
It doesnât have to be that way, however. In circumstances where you have little or no internal data, there is likely to be some external data that will help you make a better decision. A whole host of external providers and curators of external data is available to you. Let me provide a couple of examples, and then Iâll tell you what I think you should do with external data.
Iâve been an advisor for many years to a company called Signals Analytics, which was founded by former Israeli military intelligence officers. They specialize in what might be called âimportant but infrequentâ decisions about processes like innovation and new product development. They find and curate a wide variety of unstructured external information sourcesâsocial networks, blogs, online forums, competitor announcements, job listings, and so forth. Consumer products companies pursuing innovation projects, for example, get access to data and analytics to âunderstand consumer needs,â âuncover emerging trends,â and âassess early concepts.â Signals also helps companies with unstructured decision-making about marketing and strategyâtwo other areas where internal data are often hard to come by.
Companies are also increasingly interested in understanding demand and supply for their offerings, which often also requires external data. In my book The AI Advantage, I wrote about the air charter firm XOJET, which has over 1,300 private jets available for charter. XOJET once used a simple set of spreadsheet rules derived from internal data to set prices. Now, however, with help from the machine learning technology company Noodle.ai, XOJET creates models based on external data to assess supply and demand and price their charter trips. The external datasets include industry-wide flight activity and aircraft location to establish competitive supply, and data on major demand-driving events, seasonal patterns, and booking curve observations to predict demand. Upon installing the new algorithm, the companyâs revenue per occupied flight rose 5%.
How to Manage External Data
External data is unlike internal data in many ways, so it often needs to be managed with different methods. External data isnât under an organizationâs control, so it doesnât make much sense to try to subdue it with tools like master data management. My view is that those top-down modeling tools donât work all that well on internal data. But there is an alternative approach.
Catalogs, Not Models
Instead of creating a data model or set of master data management rules, they should create a catalog of their external dataâa straightforward listing of what external data exists in the organization, where it resides, whoâs responsible for it, and so forth. A catalog effort often reveals that both internal and external data are chaoticâduplicated, going under multiple names, old, expired, etc. Itâs not easy to face up to all of the informational chaos that a cataloging effort can reveal. Perhaps needless to say, however, cataloging data is worth the trouble and initial shock at the outcome. A data catalog that lists what data the organization has or has access to, what itâs called, where itâs stored, whoâs responsible for it, and other key metadata can easily be the most valuable information offering that an IT group or Chief Data Office can create.
Given that IT organizations have been more preoccupied with modeling the future than describing the present, enterprise vendors havenât really addressed the catalog tool space to a significant degree. There are several catalog tools for individuals and small businesses, and several vendors of ETL (extract, transform, and load) tools have some cataloging capabilities built into their own tools. Some also tie a catalog to a data governance process, although in my experience, âgovernanceâ is right up there with âbureaucracyâ as a term that makes many people wince.
At least a few data providers and vendors are actively pursuing catalog work, however. One company, Enigma, has created a catalog for public external data, for example. The company has compiled a set of public databases, and you can simply browse through its catalog (for free if you are an individual) and check out what data you can access and analyze. Thatâs a great model for what private enterprises should be developing, and I know of some companies (including Tamr, Informatica, Paxata, and Trifacta) that are developing tools to help companies develop their own catalogs for both internal and external data.
External Data People
Beyond an information catalog, organizations in ardent pursuit of external data are likely to need some human experts on the topic. Unfortunately, jobs involving finding, assessing, and wrangling external data are not common in our society. Iâve done some internet searching and found a few âexternal data analyst,â âexternal data coordinator,â and âexternal data specialistâ roles, but you may have challenges staffing such positions. Universities donât train people for them, so qualified candidates are probably going to be autodidacts.
The external data wranglers you do manage to hire will have some different responsibilities than the average data person. Since external data is often sold and bought in the marketplace, theyâll have to be knowledgeable about negotiating deals for it and reading licensing agreements. In a large company, itâs likely that multiple functions or business units will have bought the same data, so tracking down those redundant purchases is another important task. External data also often has quality problemsâdespite the fact that youâve paid something for itâso managing data quality is another skill that your external data folks will need to possess.
Turning Free Data into Dollars
Of course, the best external data is that which you donât have to pay for but can turn into valuable products and services. Governments, as you might imagine, are the leading source of free external dataâor at least, as a taxpayer, youâve already paid for your share of it. Many governments, including the United States, are increasingly listing available public data, often called âopen dataââin the U.S., the relevant site is Data.gov. There are also sites that collect and sell public dataâPublicData.com is an example.
There are examples of companies that have made quite a successful business out of open data. Climate Corporation, for example, was a startup that used government data to make recommendations to farmers about the best ways to grow crops. They dug upâaccording to a Harvard Business School case studyâ30 years of National Weather Service data; 60 years of crop yield data and 14 terabytes of soil type data from the U.S. Department of Agriculture; and satellite images, topography maps, and weather data from 1 million U.S. locations gathered by the U.S. government. They used all this external data, most of which was free, to build a âdigital agricultureâ business that was acquired by Monsanto (now Bayer) for $1.1 billion.
You may not strike external data gold to quite this degree, but chances are good that external data can help your company achieve its analytical objectives. Whether you have to pay for it or not, external data can tell much about the state of the world outside your organization. And that is an undeniably useful thing to know about.