AI Readiness: Addressing your data should be step #1

Chris Weidemann

Many executives, especially in industries that traditionally experience slower technical disruption, currently find themselves at a technical crossroads. They know they must plan for AI yet are unsure where to begin. For many companies, this crossroads is less confusing than it initially seems. The hype and mystery surrounding AI has created a lot of buzz, leaving business decision-makers longing for big, transformative custom AI systems.

While we are excited to see clients pursuing these transformative projects, we do encourage every client to spend a few hours with us first, strategizing on what the most efficient entry into AI for their organization might be. In doing so, we tend to discover low-hanging fruit that is too valuable to overlook, enabling the organization to quickly gain AI wins while still making progress on the larger transformative projects. In this three-part series, I will outline the three most common projects we uncover in this process. These are options every organization should consider, regardless of size, industry, or technical strategy:

  • A private large language model – similar to ChatGPT but tailored specifically for your company.
  • Combining Enterprise Search and Generative AI – technically known as Retrieval-Augmented Generation (RAG), which combines the retrieval of relevant documents or information with generative AI models (like LLMs).
  • Addressing data health before embarking on a larger custom AI project – as high-quality data is the foundation for accurate, reliable, and effective AI models.

In this article, which is part one of three, I’ll focus on the third possibility: addressing your organization’s data health. In parts two and three I’ll provide insights into the other options.

Addressing Data Health and Permissions:

The success of AI initiatives hinges on one critical factor: the quality and accessibility of your organization’s data. Before reaping the benefits of the large transformative AI systems, companies must first tackle the foundational step of scrubbing and preparing their data. This blog post aims to guide business decision makers (nontechnical or technical) through the essential steps to ensure your data is AI-ready, preparing for the transformative AI systems you have been dreaming up.

Why It Matters

Enterprises embarking on AI projects often encounter the challenge of poor data health and inadequate data permissions. While data suggests that at least 3 in 5 businesses will need to address their data issues before embarking on custom AI, it has been my personal experience that every business can receive measurable benefit from addressing these issues when trying to implement AI systems. Those that don’t address data issues prior to implementation have experienced inaccurate insights, data leaks, inefficient operations, and missed opportunities.

Data health refers to the accuracy, completeness, and reliability of your data, while proper data permissions ensure that data access is managed and compliant with policies and regulations. Not addressing either of these aspects can result in several negative consequences, including:

  • Inaccurate AI Models: Poor data quality leads to unreliable AI outputs, which can misinform decision-making processes.
  • Compliance Risks: Inadequate data permissions can result in regulatory breaches, leading to legal and financial penalties.
  • Operational Inefficiencies: Time and resources wasted on cleaning and managing data during AI project execution can delay benefits and reduce ROI.
  • Data Leaks: If data permissions are not properly managed, the AI systems can provide access to sensitive data to anyone that knows how to look.

Data Permission Trimming for AI

Many organizations do not realize they have data permissions problems until they try to use AI. For example, a new perspective client found us after they discovered that employees could see sensitive data immediately upon activating Microsoft Office 365’s CoPilot. Data permissions issues can prevent organizations from fully benefiting from AI, causing inefficiencies, security risks, and lost chances for innovation.

Data permissions are the rules and policies that determine who can access, modify, or share your data. Properly applying permissions to your data is crucial for ensuring data security, privacy, and compliance, as well as avoiding data leaks that can compromise your AI systems or expose sensitive information. Data permissions can also affect the quality and availability of your data, which can impact the performance and accuracy of your AI models. Therefore, it is essential to audit and correct any data permission issues before using your data for AI. In this guide, we will provide the high-level steps required for a business decision maker to begin auditing and correcting data permission issues.

How to Begin:

Defining data permissions governance is not a new concept. Most large enterprises already have data access and permissions policies and procedures in place to protect their data assets and comply with regulations. Proper data permissions not only ensure data security and accessibility but also compliance and regulations. Fortunately, it’s not hard to implement data permissions governance. It is just a matter of implementing well defined processes and using tools that suit your organization’s data needs and goals. Here is a high-level guide for non-technical decision-makers to lead their organizations through addressing data permissions:

Define Data Access Policies

Overview: Establish clear data access policies that outline who can access what data and under what conditions.

Action: Collaborate with IT and legal teams to draft data access policies that are aligned with business needs and regulatory requirements. Ensure policies are documented and communicated to all employees.

Conduct a Data Access Audit

Overview: Evaluate current data access levels to identify who has access to different types of data within the organization.

Action: Request IT to conduct a comprehensive audit of current data access permissions. This should include an inventory of data assets and a list of users with access to each asset. Most organizations will need to invest in software to produce a truly comprehensive list of all assets and access.

Identify and Remove Unnecessary Access

Overview: Determine which employees have unnecessary access to sensitive or critical data and remove these permissions to minimize security risks.

Action: Based on the audit results, work with IT to revoke access that is not essential for specific roles. Implement the principle of least privilege, where users have the minimum level of access necessary for their job functions.

Set Up Regular Access Reviews

Overview: Regularly review and update data access permissions to ensure they remain appropriate as roles and responsibilities change.

Action: Schedule periodic access reviews (e.g., quarterly) where managers and IT review and confirm current access levels for their teams. Implement automated tools to assist in monitoring and managing access permissions.

Educate Employees on Data Security and Permissions

Overview: Ensure that all employees understand the importance of data security and the policies governing data access.

Action: Conduct regular training sessions and provide resources to educate employees on data security best practices and the importance of adhering to data access policies. Make it clear that data access violations can have serious consequences.

Monitor and Respond to Access Violations

Overview: Continuously monitor for unauthorized access attempts and respond promptly to any violations to protect sensitive data.

Action: Implement monitoring tools to detect and alert on suspicious access activities. Establish a response plan for investigating and addressing access violations, including potential disciplinary actions.

Ensure Compliance with Data Protection Regulations

Overview: Align data access policies and practices with relevant data protection regulations (e.g., GDPR, CCPA) to avoid legal penalties.

Action: Work with legal and compliance teams to ensure that data access policies are compliant with all applicable regulations. Regularly review and update policies to reflect changes in regulatory requirements.

BONUS - Implement Role-Based Access Control (RBAC)

Overview: Role-Based Access Control standardizes data access permissions based on job roles, instead of by user. While RBAC is not a foundational requirement of proper permissioning, it is a worthy investment that will reinforce the concepts above and create a more efficient access control mechanism, reducing governance costs moving forward.

Action: Create a matrix of the different roles within the organization and the data they should have access to. Ensure that new employees are assigned appropriate access levels upon onboarding and that changes in roles trigger updates to data permissions.

By following these steps, decision-makers can effectively lead their organizations in addressing data permissions issues. Proper data permissions not only enhance data security and regulatory compliance but also ensure that data is accessible to those who need it, enabling more effective and efficient operations.

Data Cleaning for AI

Identifying and correcting errors, inconsistencies, and missing values in the data is a worthy endeavor for every organization that is planning for AI. Data cleaning can improve the quality, reliability, and usability of the data, as well as prevent potential problems such as operational inefficiencies or AI hallucinations. However, data cleaning can also be a complex and time-consuming process that requires careful planning and execution. Below you will find high level topics that can help you start your data cleaning journey in an efficient and effective way. This guide outlines essential steps for non-technical decision-makers to lead their organizations through resolving data permissions issues, ensuring a solid foundation for AI initiatives.

How to Begin:

Define Your Data Quality Goals

Overview: Establish clear objectives for what you aim to achieve with your data cleaning efforts. This could include improving data accuracy, reducing duplicates, or enhancing data completeness.

Action: Set measurable goals such as reducing data errors by a certain percentage or achieving a specific level of data completeness.

Assess Current Data Quality

Overview: Evaluate the current state of your data to identify the extent and types of issues present.

Action: Conduct a data quality assessment using tools and techniques like data profiling, data audits, and quality scoring to understand the gaps and issues in your data.

Prioritize Data Sets for Cleaning

Overview: Not all data needs to be cleaned at once. Prioritize the most critical datasets that will have the highest impact on your AI initiatives.

Action: Focus on datasets that are most relevant to your business goals and AI applications, such as customer data, financial records, or operational logs.

Choose the Right Data Cleaning Tools and Techniques

Overview: Select appropriate tools and methodologies for your data cleaning tasks. These can range from manual processes to automated software solutions.

Action: Evaluate and implement tools for deduplication, normalization, validation, and enrichment of data. Consider both in-house solutions and third-party platforms.

Develop a Data Cleaning Plan

Overview: Create a structured plan outlining the steps, resources, and timelines for your data cleaning efforts.

Action: Detail the phases of the cleaning process, assign responsibilities, and set deadlines to ensure systematic progress.

Execute and Monitor the Data Cleaning Process

Overview: Implement the data cleaning plan and continuously monitor the progress and effectiveness of the cleaning activities.

Action: Regularly review progress reports, conduct quality checks, and adjust the plan as needed to address any emerging issues.

Maintain and Improve Data Quality

Overview: Data cleaning is not a one-time task but an ongoing process. Establish practices to maintain and continuously improve data quality.

Action: Implement data governance policies, conduct regular data quality assessments, and foster a culture of data stewardship within the organization.

By following these high-level steps, your organization can effectively begin the process of cleaning data in preparation for custom AI solutions. Clean and well-managed data not only enhances the accuracy and reliability of AI models but also drives better business decisions and operational efficiencies. Start today to unlock the full potential of AI for your enterprise.

Closing

Cleaning data is not a daunting task, but a necessary one. It is where every organization should start, especially if there is any doubt or uncertainty about the quality and consistency of their data. Without clean and properly permissioned data, AI solutions will be unreliable, inaccurate, and potentially harmful.

If you need help or guidance on how to start this journey, Advisor Labs is here for you. We have the expertise and experience to assist you in cleaning your data and creating custom AI solutions that leverage your clean data to deliver value to your enterprise. Send me a direct message today and let us help you unlock the power of AI for your business.

About the Author

Chris Weidemann

Chris has been interested in what we all now refer to as AI for over ten years. In 2013, he published his first research journal article on the topic. He now helps companies implement these progressive systems. Chris' posts try to explain these topics in a way that any business decision maker (technical or nontechnical) can leverage.

Don't miss these stories: