How can we turn the "tacit knowledge" lying dormant within our company into "AI knowledge" and utilize it?

Data Utilization
Generation AI

Shinnosuke Yamamoto

5 minutes to finish reading

How can we turn the "tacit knowledge" lying dormant within our company into "AI knowledge" and utilize it?

As many companies move forward with the use of AI, they face one major obstacle: deciding what to have the AI refer to. Even if they introduce the latest large-scale language models (LLMs), simply reading in standard in-house manuals and procedures will not be enough to reproduce the "experienced judgment" and "complex exception handling" that are truly required in the field.
What truly supports a company's competitiveness is "tacit knowledge," which is difficult to document. How can this tacit knowledge be extracted and converted into "explicit knowledge" that can be handled by AI? This column explains the specific process for capitalizing unstructured data scattered throughout a company and maximizing the potential of AI.

Corporate knowledge exists as tacit knowledge in "unstructured data"

It is said that over 80% of information within a company is unstructured data. Tacit knowledge is not contained in manuals stored in neat folders.

Beautiful documents aren't the only data

The documents organized and shared within a company as "explicit knowledge" are only the tip of the iceberg of all the knowledge an organization possesses. In reality, valuable information is buried in verbal communications in meetings, scribbles on whiteboards, and unorganized notes stored on personal computers.

Successful experiences and painful failures that arise in daily work will fade away over time unless they are properly reflected upon on the spot. If this "real know-how" can be made available to AI, it will be possible to move away from "personalization" that relies on the experience of specific individuals and foster a culture in which knowledge is shared and utilized throughout the organization. Converting tacit knowledge into data assets is one of the highest priorities in modern digital transformation.

Identify your internal knowledge "stash"

The first step in making use of tacit knowledge is to extract data from "hidden areas" of information that may seem difficult to utilize at first glance. This is where practical wisdom that is not usually put into manuals lies.

Communication history (Teams, Slack, email, etc.): Chat tool logs contain vivid details of how problems were handled and insightful exchanges between experts. This can be said to be knowledge that is close to the actual "solution" and not documented in a manual.
Records and reports (meeting minutes, daily reports, completion reports): Meeting recordings and automatically transcribed minutes contain not only the conclusion but also the background (context) of why that conclusion was reached. The "lessons learned" contained in past project completion reports is important knowledge for ensuring that the next project is successful and does not repeat the same mistakes.
On-site activity logs and technical assets (operation logs, proposal slides): What procedures do experts follow to operate applications, or what logic was incorporated into proposals that won past competitions? By turning these into data, it becomes possible to extract implicit "winning patterns."

"Structuring" technology that transforms chaotic information into knowledge

Collected unstructured data is nothing more than "noise that is difficult to read" for AI. The process of "structuring" transforms this data into a form that AI can process efficiently.

The evolution of multimodal LLM and AI-OCR

The recent emergence of AI technology, particularly multimodal LLM, has made it dramatically easier to utilize data other than text. For example, handwritten diagrams left on conference room whiteboards, paper documents, and PDFs containing complex tables were previously areas that people had given up on digitizing. However, by combining the latest AI-OCR technology with multimodal processing, it is now possible to convert these documents into text with high accuracy and incorporate them into data assets while preserving their context.

The key here is not simply to transcribe the text, but to add attribute information (metadata) such as "who produced the output, when, and for what purpose." This will enable AI to more accurately determine the importance of the information and its applicable situations.

▼I want to know more about LLM
⇒ Large Language Model (LLM) | Glossary

Redefining information as "knowledge"

Massive amounts of text data that have simply been transcribed are difficult to search and can cause confusion when AI tries to generate answers. In order to transform information into "knowledge," it is essential to organize (structure) it from the following perspectives:

Combining issues, solutions, and results: Reconstructing fragmented information into pairs of business questions and answers. For example, by extracting the "event, cause, and solution" from a trouble report and converting it into a Q&A format, AI can instantly generate advice for the field.
Knowledge summarization and abstraction: Instead of simply referencing the minutes of an hour-long meeting, AI can be used to summarize them into "decisions" and "open issues." This increases search hit rates and helps users find the information they need in the shortest possible time.
Organizing into usable formats: Information is organized into formats suited to the intended use, such as a format that is easy to apply to internal inquiries or a column format that supplements technical manuals.

The importance of "context design" that influences the accuracy of AI responses

Even if tacit knowledge is digitized, performance can change dramatically depending on how it is presented to AI. This requires strategic design based on an understanding of the unique behavior of LLMs.

Context Window Trap and "Lost in the Middle"

The latest LLMs have dramatically increased the number of characters they can load at one time (context window). However, being able to store large amounts of information and being able to process it accurately are two different issues. It's important to be aware of the "lost in the middle" phenomenon, where important information is overlooked in a sea of information. If large amounts of unstructured data are fed into an AI without processing, the AI will be unable to maintain context, increasing the risk of incorrect answers and hallucination.

The essence of "context design" in utilizing AI is not to increase the amount of information given to the AI, but to "present with the utmost purity only the information that is most necessary for the current question."

Information selection using RAG (Search Expansion and Generation)

The technology that solves this problem is "RAG (Retrieval-Augmented Generation)." RAG is a system that instantly searches through a huge amount of data within a company to find only the most relevant parts in response to a user's question, and passes that information to AI as additional information. When designing RAG, the following points must be considered in order to enable AI to accurately access the information it needs:

Narrowing the scope: Allowing users to pinpoint information that is not publicly available, such as company-specific terminology or product specifications.
Eliminate noise: Filter out outdated manuals and chatter that is not relevant to the question, improving answer accuracy.
Providing evidence: By clearly indicating which document and statement the answer is based on, the reliability of the information is ensured.

In this way, optimizing the link between search and generation is the key to transforming tacit knowledge into actionable knowledge.

▼I want to know more about RAG (Search Expansion Generation)
⇒ Retrieval Augmented Generation (RAG) | Glossary

Building an AI workflow that automates "knowledge creation"

Digitizing tacit knowledge is not a one-time task. Knowledge is constantly being updated in daily work, so a system for continuously processing this information is essential.

Data pipeline development and automatic integration

The traditional method of manually updating FAQs cannot keep up with the ever-increasing amount of information. This is why it is important to build a "data pipeline" that automatically connects the source of information to the AI's reference destination.

For example, consider the following automated workflow:

Collection: AI periodically scans for trouble cases that have been resolved in specific Teams channels.
Preprocessing: The AI analyzes the interaction, determines its importance, and then automatically summarizes it into knowledge in Q&A format.
Registration: Approved knowledge is automatically registered in a vector database, so that the AI can use that knowledge to provide answers the next day.

By creating a system of "data integration x AI" like this, individual knowledge can be transformed into organizational assets that will not be lost. Preventing information from being overlooked and constantly delivering the latest knowledge to the workplace is the only way to sustain business efficiency.

Finally

When utilizing knowledge through AI, the most powerful weapon is not the latest model itself, but the company's own unique data that the model references. A company's unique strengths lie in the experience cultivated over many years by veteran employees and the down-to-earth interactions on the ground.

However, no matter how abundant the data, if it is not organized in a way that is easy for AI to refer to, or if the data is not updated and old data is still being used, it will be difficult for AI to achieve the accuracy expected of it.

First, we look at the treasure trove of "unstructured data" lying dormant throughout the company, organize it, and structure it. Then, by cultivating it into a system that can continuously supply the latest knowledge to AI as a data pipeline, AI will be able to fully utilize all knowledge within the company, including explicit and tacit knowledge.

Saison Technology Online Consultation

If you would like to hear more about our data utilization platform, we also offer online consultations.

Make an online consultation

The person who wrote the article

Affiliation: Data Integration Consulting Department, Data & AI Evangelist

Shinnosuke Yamamoto

After joining the company, he worked as a data engineer, designing and developing data infrastructure, primarily for major manufacturing clients. He then became involved in business planning for the standardization of data integration and the introduction of generative AI environments. From April 2023, he will be working as a pre-sales representative, proposing and planning services related to data infrastructure, while also giving lectures at seminars and acting as an evangelist in the "data x generative AI" field. His hobbies are traveling to remote islands and visiting open-air baths.
(Affiliations are as of the time of publication)