ANGELINE CORVAGLIA

AI is great, but without data quality don’t get your hopes up

Broken toy brought to you by bad data quality

The AI train is on the tracks and heading full speed ahead. It will surely make great things happen over the next few decades. Yet, I have a critical message for business leaders before they shift their investment budgets to the promise of artificial intelligence. Don’t use AI until you know how to solve data quality issues! It is possible to work with AI models trained on publicly available information. Yet, the real advantage comes when a business adds its data to the AI’s algorithm, which is only valuable if it’s good quality.

Companies are ready to spend on AI, but data quality is a big problem

A company’s data represents its collective wisdom built up over the years. Not using it would be a huge step back when implementing AI. Technology should augment human capabilities, not completely start over and replace them. The problem is that, in the vast majority of cases, data is stored in ways that seriously limit its usability for digital.

Data quality can make or break a positive return on investment in AI, and the starting point in most cases is not good. According to a 2022 survey by Great Expectations1, 77% of data professionals said they have data quality issues. 91% claim that it affects their company’s performance.

Take the information above with the results of a recent survey about AI adoption, Zeitgeist the 2023 AI Readiness Report2 by Scale, and it looks like trouble ahead. According to the poll, 69% of companies surveyed said AI will be critical or highly critical for their businesses over the next three years. 80%-64% of companies across all industries surveyed planned to increase their investment in AI over the next three years.

Companies have overwhelmingly recognized the importance of AI. They are also willing to put money into benefiting from this technological advancement. Yet, most data professionals already say that data quality is causing performance issues. AI and all computers before it, no matter how sophisticated, process what is given to them, so unless the data readiness problem is solved first, it will be just another case of garbage in, garbage out.

Why is it so hard to get data quality right?

As I mentioned, data quality issues are nothing new in the digital world. I have never encountered a digital initiative that didn’t involve figuring out how to clear up bad-quality data that slowed down implementation efforts of a new way of working. I also never encountered a situation where people were quickly convinced that taking an active role in fixing the problem was their responsibility or worth their time. Once problems are identified, it is hard to move forward in solving them.

At the beginning of the data quality process, there are usually two types of people involved: the ones who create it and those who are experts at managing it. The ones who manage it usually find data quality issues and immediately understand that something must be done about it. After all, garbage in, garbage out. The ones who create the data agree but under no circumstances believe that it should be them fixing it. They have other things to do that they perceive as having greater strategic significance for the company and their remuneration.

The critical thing that needs to happen at this point is a mindset change in those creators. They need to fully appreciate the value of having a quality data set. Their mindset about what brings value personally and professionally must change. That’s why it’s so hard to get data quality right. Mindset changes are seldom straightforward and must be handled carefully if they are to bring the desired change in behavior. Educating them on what is new and expected takes a lot of focus and leading by example. Data Leaders can also be helpful allies on lower levels of the organization (Know your data leaders! Transformation is impossible without them. (corvaglia.me)).

The most crucial step in improving data quality is to get people to own the problem

The most common data quality issues come from:

  • Inaccurate data
  • Incomplete data
  • Inclusion of duplicates in data
  • Data that is out-of-date
  • Data that is inconsistent between different sources (or within one field/data table)
  • Compromised or corrupted data

There are several ways to set processes to fix these issues, but the most critical first step, in any case, is to clarify roles and responsibilities. Owners must always be responsible for the data, both on an operational level and in the company’s top leadership. The owner should be the one who produces the data, and they need to be given tools that make it easy for them to manage it.

The senior leadership is responsible for bringing about the mindset change necessary to get people to make producing quality data a priority. At the same time, data experts must feel accountable for putting proper system controls in place to put system checks in place at the time of creation. Examples are making fields mandatory, setting up format checks, and adding controls if a data point (such as a customer) already exists.

These three groups must own their uniquely critical roles in this process. They must also work together to find solutions for challenges that will inevitably come up. Otherwise, there will be a lot of inefficiencies and blame games! So please, companies ready to get on the AI train, don’t use AI until you know how to solve data quality issues. If not, you (and your pocketbooks) will regret it.

Data Quality must come before use of AI

In conclusion, as the AI revolution accelerates full steam, an essential message resonates for business leaders considering their investment strategies in artificial intelligence. Before shifting budgets towards AI, effort must be put into resolving data quality issues. While working with AI models trained on publicly available data can offer benefits, the true competitive advantage emerges when a company integrates its own data into AI algorithms. Yet, this is only possible if the data is of sufficient quality. Ultimately, AI integration is contingent on a data ownership, accountability, and cooperation culture. This ensures data integrity and opens the doors to AI’s transformative capabilities.