Back to the future … If you’re not managing your data, you can’t use AI
AI offers us an exciting new capability, but making it work requires a fairly old school competence to effectively manage data. Some organizations have historically done a good job at knowledge management, but others let’s say have not really made it a priority. Whether good or bad at knowledge management … getting the most out of the new technology will require businesses to refocus (or just focus) on doing the data basics well.
Data is the new oil (Clive Humby, 2006). It’s easy to see the parallels – data, like oil once was, is a vast, largely untapped asset. In its raw form it can be messy, confusing and difficult to work with, but is immensely valuable once it has been extracted and processed. However, data, unlike oil, is effectively an infinite resource – with the total volume of data out there roughly doubling every couple of years. While the end of the age of oil is imminent, at the current rate the volume of data is expected to increase by a factor of over 30 million in the next 50 years.
More data has been generated in the last ten minutes than ever even existed before 2003.
There is no sign of this data avalanche slowing down either. The “internet of things” means that in the not-so-distant future we may have every single electronic object generating data, and there may even be new types of data we haven’t even considered yet.
(For more on this, listen to ‘A conversation with technologist, Anthony Day’ October 3, 2023 – link at the end of the article)[1]
The complexity and variability of this unstructured data will provide a management challenge all of its own and it will be hard to use, search, integrate and otherwise exploit unless appropriate data management protocols are put in place.
It’s more important than ever for businesses to understand their data and what they can do with it. The scale of available data helped develop the latest generative AI models which in turn may help organizations extract new value from the data available to them. How data is used will be a key point of difference between organizations that maximize their potential and those that do not. Really successful organizations will be knowledge-based. Data-driven knowledge businesses are 23 times more likely to acquire customers than those which are not[2] and insight-driven businesses are growing at an average of 30% each year; taking $1.8 trillion annually from their less-informed industry competitors[3].
In a recent interview (FT, December 19, 2023) Accenture Chief Executive, Julie Sweet, observed that “Corporate executives are keen to deploy the technology to understand data across their organisation better […] the thing that is going to hold it back, though, is most companies do not have mature data capabilities and if you can’t use your data, you can’t use AI.”
Age-old data management challenges—visibility, silos, complexity, compatibility and inconsistency have always limited our ability to use data to make better, faster decisions and improve day-to-day workflows - but now in the AI-age, not being able to effectively marshal quality actionable data will leave organizations at a significant AI-disadvantage.
The evolution of AI models and capabilities will be characterized by discontinuous change i.e. it will move forward in leaps and bounds and maximizing the utility of the latest developments for your organization is going to be a constant challenge. However, to give yourself the best chance of effectively using what comes next, you need to establish robust data foundations within your business which will allow you to flex and respond in the most advantageous way to these developments. Building your data foundations to ensure your organization can get the most out of AI starts with these six steps.
ENSURE VISIBILITY
Former Hewlett-Packard CEO, Lew Platt, famously said: “If HP knew what HP knows, we'd be three times more productive.”
All too often there is no central repository or catalogue of available data. It often sits in departmental siloes if it is stored at all. There is no system for collecting, collating, indexing and integrating data to provide a central hub where the ‘corporate memory’ can be accessed. Challenge number one is to make ‘corporate knowledge’ visible and searchable. Visibility is the key to having the right data available to the right people at the right time.
SET A DATA GOVERNANCE POLICY
Will the organization be using public AI tools and sending data externally or creating their own LLM (SLM) to side-step any questions of confidentiality. What data goes into this ‘walled garden’?
Whilst making corporate memory more widely accessible supports better decision-making, it is not a ‘free-for-all’. The policy should address what data gets stored, when, where and how. Who has access to it? What data is used in AI tools and by whom. Thought needs to be given to the nature of the data available and a governance framework which would cover data protection, confidentiality and an ethical dimension. Some data may be sensitive and need to be held within a more secure and controlled domain within the repository with access appropriately restricted.
HAVE THE RIGHT SENSORS IN THE GROUND
What data is useful for the types of decision that we need to make? We need to ensure that our repository or walled data garden is populated with ‘useful’ data … updated in a timely fashion.
According to a PwC survey of 1,000 senior executives, highly data-driven organizations are three times more likely to report significant improvements in decision-making compared to those who rely less on data.[4] However the quality of decision-making is a function of the quality of the data that supports it.
CLEAN HOUSE
A good data repository isn’t just a place to shove any data, document or report – upload it and forget about it on the basis that ‘more is more’. It needs to be actively managed.
As all industries become more data intensive, redundant, obsolete, and trivial (ROT) data i.e. digital data or information that has little value to the organization but is still stored needs to be regularly cleaned out as this not only costs money to maintain, but it may also adversely colour the decision-making process (AI or otherwise). File and forget and not a good data strategy
MORE TAGGING
Data becomes far more usable with the addition of some simple meta tags that enable greater search capabilities. Tags might range from the nature of the data (e.g. final output, questionnaire, discussion guide, contract/proposal, data tables etc.), date; scope etc. through to topic, category and brand.
We have also found the human element important and recommend tagging data with names of internal stakeholders who generated or used the data. This human context can be an important filter or screen to review any AI-generated output that utilizes this data.
AUDIT DATA IN USE
A pell-mell rush to share the organization’s data with an external LLM in order to get some outputs is a bad idea. Inputs and outputs need to be managed and filtered – the organization may share data and expose itself in terms of data protection, privacy and/or security. But also whatever outputs are generated need to be considered through a human lens … as results may be inaccurate or imprecise. Remember ‘garbage in, garbage out’. There needs to be some mechanism for auditing inputs and sense-checking outputs.
Attention should also be paid to the amount and sequencing of data in use – ensuring that the ‘right’ data (and the right amount of data) is being used. A study by Stanford University/ University of California, Berkeley[5] found that curation of right data/right amount of data for AI models is important in a sort of ‘less is more’ kind of way.
“Current language models do not robustly make use of information in long input contexts […] performance is often highest when relevant information occurs at the beginning or end of the input context, and significantly degrades when models must access relevant information in the middle of long contexts, even for explicitly long-context models”
We cannot just load up the AI with data, expect it to make sense of it, and provide an ‘oven-ready’ output. The process needs good data management protocols.
A June 2023 survey by talent marketplace Upwork[6] found that 62 percent of midsize companies and 41 percent of large companies are leveraging generative AI technology, and another study[7] by IT consultancy Insight and research agency Harris found that most business leaders from Fortune 500 companies (72%) plan to incorporate generative AI within the next three years to improve employee productivity. In their August 2023 Global Survey[8], McKinsey reported that of global executives surveyed …
One-third said their organizations are using gen AI regularly in at least one business function.
Nearly one-quarter said they are personally using gen AI tools for work.
More than one-quarter from companies using AI said gen AI is already on their boards’ agendas.
40 percent said their organizations will increase their investment in AI overall because of advances in gen AI.
However large or small your business and wherever it is on the AI-adoption spectrum … AI is going to make its presence felt within your business sooner or later. But if you can’t use your data, you can’t use AI and so to reap the benefits of the latest development that AI will offer requires you to get to grips with the rather prosaic world of data management. It’s the plumbing that you can’t see that is key to successful adoption of what comes next.
[1] ‘A conversation with technologist, Anthony Day’ October 3, 2023
https://polymathmind.substack.com/publish/posts/detail/137156068?referrer=%2Fpublish%2Fposts
[2] https://www.mckinsey.com/business-functions/marketing-and-sales/our-insights/data-driven-marketing
[3]https://www.forrester.com/report/InsightsDriven+Businesses+Set+The+Pace+For+Global+Growth/-/E-RES130848
[4] https://www.pwc.com/us/en/services/consulting/analytics/big-decision-survey.html
[5] https://cs.stanford.edu/~nfliu/papers/lost-in-the-middle.arxiv2023.pdf
[6] www.upwork.com/blog/generative-ai-disconnect.
[7] https://www.insight.com/en_US/content-and-resources/gated/beyond-hypotheticals--understanding-the-real-possibilities-of-generative-ai-ac1293.html
[8] https://www.mckinsey.com/featured-insights/mckinsey-global-surveys