Building a data-driven platform to create an environment where anyone can test their own ideas and hypotheses
Awareness and efforts regarding data utilization have "evolved" in a short period of time!

SAISON INFORMATION SYSTEMS CO., LTD. (now Saison Technology) launched a company-wide data-driven project with the goal of achieving "democratization of data," meaning that all employees can voluntarily use data to improve their work. They built a "Data-Driven Platform" (hereinafter referred to as DDP), a platform equipped with all the functions needed to "collect, store, search, and utilize data." We spoke with Masaru Sasaki of the IT Support Department at the Corporate Development Center, who was the driving force behind the project, about what was done from launching the project to planning, building, utilization, and establishment, as well as points to keep in mind, and the results of using DDP.
*Affiliations are as of the time of interview.

We are taking on the challenge of building a DDP, aiming to "democratize data" so that anyone can test their "inspiration" and "hypothesis"!

First, please give us a brief explanation of what the DDP you have built is.

Simply put, it is a system that combines a data warehouse (DWH) with utilization tools, and is a platform equipped with all the functions to "collect, store, search, and utilize data." Internal users can refer to the DWH data that is made public throughout the company in an easy-to-search and extract format, create their own data marts according to their purpose, and utilize the data.

We have defined four sacred treasures as tools necessary not only to "collect" and "store" data, but also to achieve the goals of "searching" and "utilizing" it. A data catalog that lets you find the data you want, a DWH that combines and processes data, data integration tool that automates the process of extracting data, and a BI tool that analyzes and visualizes it. Users use these four tools in combination depending on their purpose and literacy.

case_study_12_Fig_01.png

The "four sacred treasures" that make up Saison Information Systems' (now Saison Technology's) DDP are the following four tools:

  • A data catalog that allows users to find the data they need and understand its meaning
  • A data warehouse that organizes and stores data so that it is easy to search and extract.
  • data integration tool that automates data extraction and processing
  • BI tools that analyze and visualize data from various perspectives

So, please explain step by step how you proceeded with the DDP construction project.

We began by clarifying the most important question: "Why is data-driven now necessary?" This question first comes down to the benefits for management and sales. We call ourselves a "data engineering company" and provide our customers with services that connect data to agile business decision-making. In advancing this business, it is important to deliver value to our customers through our own experience of actually utilizing data to improve corporate competitiveness, so in that sense this initiative is significant.

At the same time, there are also significant benefits for internal users. Previously, at our company, even if each employee wanted to utilize data to improve the quality and efficiency of their respective work, it was not easy to do so. This was because users had to go through a very cumbersome process: researching whether suitable data existed within the company, explaining their intended use to the data owner and receiving the data, confirming the business meaning of the data, and then developing analysis and automation. With this initiative, users will be able to learn the procedures and know-how for utilizing data, easily refer to data that has been organized in a usable state, and freely use analysis and automation tools to achieve their goals.

At the same time, it also instantly solves a long-standing problem for us in the IT department. Until now, there has been a huge demand within the company for enhanced visualization and analysis using data, but the IT department has been unable to keep up with the demands due to a lack of resources. Also, because the IT department is not an expert in each business, it has had difficulty understanding the analytical axis required for each request. Furthermore, it has been extremely difficult to manage the working databases that have ballooned to over 400, created for each request. DDP can solve all of these issues.

So, first of all, it's important to clearly define the purpose of the project.

That's right. I also think it's important to clarify what we want to achieve. In this regard, we defined it as "democratizing data," meaning a situation where anyone can verify the "inspiration" and "hypothesis" that arise in the course of daily work. To achieve this, we set five goals, including "all employees recognize that all data is contained within this platform" and "increasing the number of data-literate employees so that they can understand the impact and risks of data."

So, what kind of system should we create specifically to achieve these goals? To clarify this, we listed as many possible user experiences as possible, and defined and listed each system function that DDP should have for each. Using this, we prioritized the implementation of each function and began agile development.

case_study_12_Fig_02.png
Saison Information Systems (now Saison Technology) has five goals for data utilization

When launching a project, it is important to consider the criteria for selecting participating members.

That's right. So, we defined the types of stakeholders and those involved in DDP, and considered which of them we should involve. First, we thought it was absolutely necessary to have a data owner who understands the relationship between data and business and can assign meaning to it. Also essential was the participation of data users who already use data in some kind of business and have experienced the challenges and pain points. While a project team consisting only of engineers can create a data infrastructure, this tends to focus on "collecting" and "accumulating," making it difficult to achieve the true goal beyond. The "true requirements" for DDP lie in the minds of people who understand the meaning of data and the challenges of business, so we felt it was essential to have such members involved.

In addition, we decided to involve suitable employees in the project, including those expected to be the main users, and from a six-level persona categorized according to the level of data utilization (which we will discuss in more detail later).

I understand that the participation of such members is important, but in reality, it's quite difficult to get everyone's cooperation when they're busy with their daily work.

Yes, it's true that you wouldn't be able to find the time needed to coordinate between departments. So we asked our management to send out a strong message that "we will be moving forward with data-driven initiatives across the entire company, and that the designated members should participate in the project as a mission." This made the entire company aware of the need for data, and we were able to assign the people we wanted as key people who would grasp the requirements for DDP.

All project members then gathered together for a kickoff meeting. They were asked to participate in the project as a four-month task force, and the background issues and goals were shared with them.

I then explained two values that the members and business departments would gain from participating in the project. First, with support from IT staff in acquiring skills, members would improve their data literacy and be able to improve their work productivity themselves. Second, by aggregating and sharing the data held by the department, they would be able to use the data to eliminate the dependency on individual tasks. In short, I tried to increase motivation by telling them that their participation in the project would ultimately benefit them.

So the project began, and what was the first thing you did in the next planning and construction phase?

The first step was to collect data. To identify what kind of data existed within the company, we listed all existing systems and interviewed the person in charge of each. Through these interviews, we determined which department was responsible for the data in each system and whether it was worth sharing, and we divided the data into three categories: "usable," "unusable," and "likely usable."

Next, we identified the owners and experts of the target systems. Our investigation revealed that there were approximately 160 tables within the company, so we identified the owners of each table and the experts who were most familiar with the data held in that table. We then confirmed with the owners and experts whether the data was likely to be usable for analysis, automation, and visualization, and selected the data to be stored in the DWH, excluding data that did not appear to be usable.

The key points to consider were that data that is not currently needed but may be used for future data analysis should be stored, while highly confidential data whose disclosure would pose a risk or data that should not be made public due to contracts with customers should not be stored.

case_study_12_img_01.png
Saison Information Systems (now Saison Technology) IT Support Department
Data-Driven Platform Project Leader Masaru Sasaki

So it's not as simple as just collecting all the data within the company and storing it in a DWH.

That's right. You also need to be careful when deciding when and how to collect the target data. There are three main types of data: "master data," which requires information from past points in time and therefore requires all data to be refreshed daily using a replacement method; "transaction data," which generates differential data every day and requires the addition of data from the previous day; and "data with intermediate characteristics," which requires data to be completed in an intermediate database before being replaced or added. You need to clarify the timing of data storage and extraction requirements according to the characteristics of each data and its expected use.

Once the task of "collecting" the data is complete, the next step is to "store" it.

Yes. First, we clarified the scope of data disclosure. Naturally, some data is highly confidential and needs to be masked. Therefore, we had to carefully consider how much of the table itself, or each item in the table, should be made public.

As a result, in our case, we modeled four patterns of disclosure methods: "Disclose all data to all employees," "Disclose to employees but mask some items except for specific departments," "Disclose all data only to employees belonging to specific departments," and "Disclose all data only to specific employees."

Next is data pre-processing. So-called "dirty data," such as items that are expressed differently by different people (fluctuations) or data that is not usable on its own, cannot be stored in the DWH as is. We checked with the owner to see if there was any such data, and if there was, we dealt with it by using data integration tool to clean up the data or by changing the input operation in the first place.

This is roughly the work required to "collect" and "store" the data. Next, we will move on to the areas that directly affect users: "searching" and "utilizing" the data.

Four months after starting to use DDP, the company saw operational improvements and employee awareness and work styles moved forward toward a data-driven approach

We were able to "collect" and "store" data, but there are many companies that have only managed to do that much.

That's true. However, users cannot use the data simply by saying, "We've input the data! Here you go." This is because they are not given a way to find the data they need on their own. This is one of the reasons why data utilization has not yet become widespread in companies. When users have requests or inspirations such as "I want to do this" or "Maybe it's because...", it is necessary to provide a logical explanation, such as "In that case, you can use this data."

The tool for this purpose is a data catalog. Simply put, it is a "data dictionary" that manages meta-information associated with data, such as its location, description, number of items, history, and administrator. By making the data selection and the business meaning it represents into "shared knowledge," it becomes possible to easily find the data you need through a search.

The sophistication of this meta information is an important point that directly relates to user usability. It needs to be written in the terms of the people who actually use the data in their work, rather than being thought up by the system engineers.

Previously, only the IT department could search for and find the necessary data within the company. By deploying the data catalog across the company, employees can now take the initiative in using data.

That's true, but security becomes an issue. Until now, only the IT department had access to the data, so security considerations were limited. However, in the world of DDP, where all employees can access data at any time, security considerations naturally increase. There is a trade-off between convenience and security.

Therefore, our company adopted a method called role-based access control. This is a system in which all employees are assigned at least one role, and the data they can see or hide is controlled based on that role. In terms of visibility, you can make the table itself visible or invisible, or you can control visibility on a column-by-column basis. For example, if you have an employee master table, you can make the table itself visible to all employees, but restrict specific fields such as salary information and address information to only visible to employees in the HR department.

case_study_12_img_02.png

I see. So now employees can use the data in their work while still ensuring security.

Yes. To "utilize" the data, we decided to provide employees with tools on an application basis so that they could improve their work processes themselves. We deployed two tools: data integration tool that achieves automation and efficiency, and a BI tool that enables analysis and visualization.

For data integration tool, we built an execution environment within the company, making it available for anyone to use. Because it can be used without coding, users can easily implement the automation and efficiency improvements they want to achieve on their own. The access control we mentioned earlier comes into play when executing the tool. Even if the process created with data integration tool is the same, the results obtained will differ depending on the role of the employee executing it.

Meanwhile, BI tools are essential for getting closer to the "ideal state" of this project, as they allow employees to independently verify "inspirations" and "hypothesis" that arise in their daily work without the involvement of IT staff. The most advanced use of DDP involves using data integration tool from the DWH to aggregate and process data, entering it into a data mart, and then outputting the formatted data using a BI tool. If you can master this level of use, you will undoubtedly see great benefits.

It would be great if all employees could do this, but it's not that easy.

That's right. So, as I mentioned briefly at the beginning, in the utilization and adoption phase, we classified the expected users of DDP into six levels of personas based on their level of data literacy. Specifically, we classified them into six levels: "Absolute Beginner," "Beginner," "Data User (Beginner)," "Data User (Advanced)," "Data Scientist," and "Data Engineer." We considered the expected operations for each level, thinking, "These people will surely be motivated by these activities." We then defined a knowledge matrix that shows the level of understanding required for using various tools in order to carry out each activity.

This clarified to some extent the content of the training required for each level, but it was not possible to train all employees at once. After considering the shortest route to achieving our "desired state," we decided that the most efficient way to do this would be to first focus on training "data users (beginner level)" and "data users (advanced)." We then named the 13 members who were selected as expected users at the start of the project "data utilization skills development members" and made them the first target for training.

case_study_12_Fig_03.png
Six-level persona definitions and required skills for potential DDP users, categorized by proficiency level

Even if the target of education is decided, I think that many companies are still experimenting with "what to teach and how to teach it."

At our company, we first had each employee set a goal for how well they wanted to be able to use data. We then created a curriculum that started with easy tools and gradually increased the level of difficulty. The employee who was most familiar with each tool served as the instructor, and each seminar lasted about 1 to 1.5 hours. The content of the seminar was then filmed and made available as a review video.

What I paid particular attention to was to ask the superiors and HR of the members of the data utilization skills training program to "implement this training curriculum as an official job. If we work on it outside of working hours, please approve it as overtime," creating an environment where the members could study without hesitation. I also told them that they would be assigned a comprehensive exercise at the end, so that they would have a sense of mission.

To prevent anyone from dropping out, we made the progress of all members visible, fostering a sense of competition among them while also providing thorough follow-up support, such as encouraging encouragement.We also held a "terakoya" (temple school) on Zoom once a week, where experts could gather and ask questions freely.

What is the comprehensive exercise assignment?

We set the level of content so that it could only be solved by utilizing all the tools. Members who were unable to complete the tasks due to busy schedules or other reasons were asked to complete them at another time. After completing the three-month training course, even non-IT inexperienced members were able to reach a skill level where they could utilize the "four sacred treasures" and utilize data. The completion rate for the comprehensive exercise tasks was around 50%, and even those who had not completed them managed to complete around 80%, which once again reinforced the idea that if you are motivated, you can do it.

In the final workshop, after the instructor gave a model answer, several members shared their thought process when answering the questions, and everyone practiced mob programming. This allowed participants to understand that there are various approaches and take home what they learned.

We have put in place a system to support self-study and skill improvement so that even those who are not selected members can continue to study after the training period has ended. Specifically, we have developed content and communication channels that are useful for efficient learning, provided an execution environment for tools that can be used freely upon application, and opened a portal site that aggregates all information related to the DDP.

case_study_12_Fig_04.png
DDP portal site that covers all the information employees need to utilize data

What else are you working on to solidify and stimulate DDP use?

The first is open collaboration. We have made all communication channels, including Slack, available to all employees, and have established a system to accelerate collaboration, such as sharing best practices and knowledge.

Another initiative we are currently working on is "creating an organization where users praise and value each other." We will quantify the number of "likes" that users give to shared best practices, and award points to highly rated users and those who have contributed to improving data literacy. This point system will be linked to personnel evaluations and lead to employee recognition and incentives. We believe that by continuing these reforms, we will be able to move closer to becoming a "true data engineering company."

What specific results have these various initiatives produced so far?

We only started using DDP in April 2022, so it will be some time before we can see significant benefits, but we have already seen many positive examples. For example, when a field department staff member fills out information such as the billing address on an order information sheet, they can simply enter the customer code to retrieve information from the customer master and automatically display the company name, address, etc., which has prevented input errors and reduced input work.

While we have achieved such improvements in work quality and efficiency, in some ways an even more important result is the change in the awareness and working style of our employees. One example that illustrates this is that our IT engineers can now access the same data as our sales staff, which has enabled them to understand the significance of the products they develop and the needs of our customers.

Additionally, even in the business departments, while use is still mainly limited to a select few employees with high data literacy, as positive examples begin to emerge, understanding is gradually spreading among employees who do not have high levels of IT or analytical skills that "this is what it means to use data to improve business operations," and an organizational culture is being fostered that thinks, "maybe we can streamline operations that were previously done manually."

Seeing this situation, I feel that we are steadily approaching the "ideal state" that we aimed for by building DDP, where anyone can test their "inspirations" and "hypothesis." However, our company has only just reached the starting line, and the real work is yet to come. As a data engineering company, we sincerely hope to provide our customers with the best products and services, including the knowledge we have gained from this DDP initiative, and build the future together.

Related Content

Return to column list