Democratizing data analysis with generative AI
Data analysis has become indispensable in the modern business environment. However, to derive new insights from data and create value from them, it is essential to have the skills to understand and analyze data.
In this column, we will introduce a method that utilizes generative AI to enable anyone, including those on the business side, to extract value from data.
Polarization in the way insights are discovered
The ultimate goal of data analysis is to improve business value. Improving business value can be thought of in many ways, such as increasing sales, reducing costs, improving customer and employee satisfaction, and strengthening risk management. So, what is needed to use data to improve business value? New discoveries derived from data, or insights. By discovering insights from data, you can take new actions that are different from what you have done before.
Nowadays, the people who uncover insights are divided into two groups. One is data analysts. This is a trend that has continued since the term "big data" became popular in the 2010s, and can also be said to be the traditional way of data analysis. Data analysts with advanced expertise uncover insights by collecting, processing, and analyzing data based on the requests of management and business departments, and then provide these uncovered insights to management and business departments.
Meanwhile, new actors have emerged, not just data analysts, but so-called users, such as management and business departments who use insights. As demand for data analysis increases, driven by terms like DX and data-driven management, the workload on data analysts increases, making it difficult for them to keep up with the speed users demand. While management and business departments want insights immediately, they often have to wait two weeks or even a month for the insights to arrive. This is where the movement toward "democratizing data" begins. As the word "democratization" suggests, this movement aims to enable anyone, regardless of whether they are in the IT or non-IT department, to access and analyze data.
▼ Want to know more about vector databases?
⇒ Vector database | Glossary
Hurdles to Data Democratization
To achieve data democratization, we first need a data platform that anyone can use easily. This includes a "data lake" that stores various types of data, a "data catalog" that organizes data information, and "BI (Business Intelligence)" tools that display data in graphs and other formats.
However, even if these data platforms are developed and provided to users, there are several hurdles to overcome before non-IT frontline users (such as sales representatives and human resources personnel) can fully utilize them. Not only do they need to learn how to use the tools, but they also need the skills to understand how to analyze and interpret the data. Various no-code tools have emerged recently, and although they are relatively easy to use, there are still psychological and technical hurdles to mastering various tools on your own. Overcoming the hurdles to using these tools and services is a major challenge in democratizing data at the organizational level.
Data analysis with generative AI
One key to removing this hurdle is generative AI. Generative AI can interpret natural language, i.e., human language, and generate content such as text and images. By applying this generative ability to the data analysis process, we believe we can achieve the democratization of data, allowing anyone to gain insights from data using their own words.
So how can we achieve data analysis using generative AI? First, it is important to break down data analysis into individual processes. As an example, we can break down the process as follows:
① Search: Find out where the data you want to see is located
② Extract: Extract only the data you need to see
3) Prediction: Predict future trends from past data
4. Visualization: Visually understand past and future trends
⑤Insight: Look to the future and gain insight into future measures
| ① Search: Generative AI converts text into vectors and performs vector search |
|
| ② Extraction: Generate the SQL required for extraction from the database. |
|
| ③ Prediction: Generate and execute code to make predictions based on the data. |
|
| ④Visualization: Generate graphs as image data based on the data |
|
| 5. Insight: Verbalize what trends and hypotheses can be considered based on the data |
|
First, in the search process, vector search can determine which data is most suitable for use in natural human language. Words have subtle differences in expression and nuance, and there are variations in spelling depending on the person, but vector search, also known as semantic search, is a system that searches for things with similar numerical meanings, so it can compensate for such variations caused by people. Generative AI includes an Embeddings model that converts words into vector representations.
The extraction process then generates the SQL required to retrieve data from the database. A common example of a data platform is storing data in a data warehouse. When storing data in a data warehouse, SQL statements must be written and queries must be made against the data warehouse to retrieve the data. However, SQL is not easy for non-IT personnel to write, and it can be even more difficult when there are detailed conditions or calculations involved. Generative AI can flexibly write complex SQL statements using verbal instructions.
In the prediction and visualization processes, the generative AI writes code and runs it on data. In the prediction process, it calculates estimates that predict what will happen in the future based on past data. In the visualization process, it determines what kind of graphical representation is appropriate based on the data, and can generate images in various graphical representations such as bar graphs, line graphs, and pie charts.
The insight process involves generating insights, which was the subject of the opening paragraph. It involves verbalizing what trends are present in the extracted data, what hypotheses can be considered, and what next actions should be taken. Traditionally, this process involved humans looking at the data, coming up with hypotheses based on business knowledge and experience, and deciding on actions. However, this type of insight and decision-making requires knowledge and experience, and is not something that just anyone can do easily. Data interpretation and verbalization by generative AI overcomes these skill barriers.
▼I want to know more about generative AI
⇒ Generative AI | Glossary
▼ Want to know more about vectors?
⇒ Vectorization / Embedding | Glossary | Glossary
The data pipeline that powers insights
Can you imagine a world where anyone can use generative AI to analyze data verbally and gain insights? By complementing the various skills involved in data analysis that have previously relied on human knowledge and experience with the generative capabilities of generative AI, it is expected that the hurdles to data utilization will be lowered and the speed of data utilization in the field will be accelerated.
Finally, we will introduce a key element in achieving data analysis using generative AI. This element is the "data pipeline" that connects all systems with generative AI. While it may seem like data analysis can be achieved with just generative AI, in reality, various challenges arise. Various generative AI models and systems (such as vector search mechanisms and databases where data is stored) appear in each process of data analysis. It is necessary to consider how to connect to generative AI models and systems, and how to control the execution order of each process.
In the data analysis using generative AI introduced earlier, a series of processes is achieved through the appropriate cooperation of the generative AI model and each system. The data pipeline acts as an orchestrator, connecting each mechanism and passing instructions and data appropriately, and controlling the appropriate task execution order through the exchange of data between processes.
HULFT Square, an iPaaS service provided by Saison Technology, can fulfill the role of this data pipeline. It can access various internal and external systems through a variety of connectors, connect to various generative AI models, integrate and process data as needed, and orchestrate each process in data analysis.
iPaaS-based data integration platform HULFT Square
HULFT Square is a Japanese iPaaS (cloud-based data integration platform) that supports "data preparation for data utilization" and "data integration that connects business systems." It enables smooth data integration between a wide variety of systems, including various cloud services and on-premise systems.
Finally
What did you think? This time, we introduced a data analysis method that utilizes generative AI to promote the democratization of data. By utilizing generative AI and data pipelines, anyone, regardless of whether they are in the IT or non-IT department, can extract insights from data and make data-based decisions.
Saison Technology has supported the development and construction of data platforms and data utilization using generative AI for many companies. If you are interested in promoting business decision-making using data in the workplace or using generative AI to quickly uncover issues and insights, please feel free to contact us.
