- Reduces the labor required for pre-processing data into easy-to-understand data for AI generation, improving response accuracy by up to 90% -
To media representatives:
July 23, 2025
Saison Technology Co., Ltd.
Saison Technology Co., Ltd., Ltd. (Headquarters: Minato-ku, Tokyo; Representative Director, President and CEO: Makoto Hayama; hereinafter referred to as Saison Technology) will begin offering 10 types of "AI Preprocessing Template Series" data integration scripts for "HULFT Square" that pre-process internal data to be read as RAG (Search Augmented and Generated Analytics)*1, a generative AI, starting from July 23rd.
By using this template, it is possible to extract text from various business data stored in-house, such as spreadsheets, manuals, scanned PDFs, image data, audio data, and HTML, removing unnecessary noise such as tags, convert unstructured data into structured data, and add appropriate relationships to the data. By pre-processing internal data via HULFT Square via the generation AI into data that is easy for the generation AI to refer to, the accuracy of the generation AI's responses can be improved by up to 90%*2 and the labor required for data processing can be reduced by approximately 50-60%*3.
background
In recent years, the use of generative AI in companies has increased, and the RAG method, which combines information from internal data with large-scale language models (LLMs) to improve the search accuracy of generative AI, is becoming more widespread. However, the reality is that even when generative AI references internal data, the accuracy of answers often falls short of what was initially expected.
To make internal data "AI-ready" for generative AI, engineers must preprocess the data using various methods depending on the data type, such as processing it into a structure that is easy for generative AI to understand and assigning appropriate meaning to the data. Securing the know-how and man-hours required for this data processing is a challenge for quickly using internal data with generative AI and achieving high response accuracy.
Overview of the "AI Preprocessing Template Series"
The "AI Preprocessing Template Series", which processes data optimized for generated AI, is provided as the HULFT Square application, a data integration script that can be used on the data integration platform (iPaaS) "HULFT Square".
By using 10 types of "AI Preprocessing Template Series" according to the type of internal data, you can link the generation AI with the data via "HULFT Square" and leave pre-processing tasks such as text extraction, data conversion, and data semantics to the generation AI itself, making it possible to preprocess internal data into a state that is easy for the generation AI to understand in order to read the data as RAG.
The features, types, and processing content of the "AI Preprocessing Template Series" are outlined below.
- Various internal data is generated and preprocessed to make it easier for AI to understand, improving response accuracy by up to 90% *2
- Pre-processing of data, which requires know-how, is performed by the AI via HULFT Square, reducing the labor required for processing by approximately 50-60% *3
■Improvement of response accuracy
| Template Type | Template Name | Summary of effects and data processing | Now available |
|---|---|---|---|
| Conversion to QA format | AI preprocessing: Create QA tables from PDFs | Convert unstructured PDF manuals into structured data in QA format and output them in CSV format | 2025 July 23 |
| AI preprocessing: Creating QA tables from Excel | Convert Excel spreadsheets into QA format and output as structured data in CSV format | 2025 September schedule |
|
| AI preprocessing: Create QA table from JSON | Convert complex hierarchical data in JSON format from external systems such as e-commerce sites into structured data in QA format and output it in CSV format. | 2025 September schedule |
|
| AI preprocessing: Create QA tables from XML | Convert XML with complex hierarchical structures and tags into QA format and output it in CSV format as structured data | 2025 September schedule |
|
| Cleansing *4 | AI preprocessing HTML tag removal | Remove tags from HTML documents such as corporate websites and internal portal sites and output them in Markdown notation | 2025 August schedule |
| AI pre-processing: Removal of special characters and symbols | Remove special characters and symbols from HTML documents and output them in text format | 2025 August schedule |
■ Labor-saving data extraction
| Template Type | Template Name | Summary of effects and data processing | Now available |
|---|---|---|---|
| Extracting text from documents | AI preprocessing: Extract text from PDF | Extract text from scanned PDFs | 2025 August schedule |
| Extracting text from images | AI preprocessing: Extract text from images | Extract text from images such as photos of whiteboards and handwritten meeting minutes | 2025 August schedule |
| Speech-to-text extraction | AI preprocessing: Extract text from speech | Extract text from audio data such as meeting recordings | 2025 September schedule |
■ Labor-saving data storage
| Template Type | Template Name | Summary of effects and data processing | Now available |
|---|---|---|---|
| Embedding | AI pre-processing Embedding & Vector DB storage | Convert input data such as text or structured data into numeric vectors | 2025 September schedule |
- 1 RAG (Retrieval-Augmented Generation): Also known as retrieval augmented generation, this technology combines new external information with large-scale language models (LLMs) to improve the search accuracy of generative AI.
- 2. When a PDF price list for our products (HULFT 10: a 12-page document containing a mixture of tables and text) was used as the conversion target and pre-processed using three types of generation AI (Claude/Gemini/Qwen) to convert it into Q&A format, the accuracy rate for questions was 30-50% without processing, but with pre-processing, the accuracy of answers improved to 80-90% (according to a survey by Saison Technology).
- 3. Results of text extraction from PDFs and images show that text extraction from PDFs reduces work time by an average of approximately 60%, and text extraction from images reduces work time by an average of approximately 50% (according to Saison Technology).
- 4. Template type "Cleansing" processes data only in "HULFT Square" without going through the generation AI
About Saison Technology
With the mission of "Connect the world’s data and make it useful for everyone.," the company globally develops and operates data integration products and IT services that form the foundation for safety and security, as well as systems for a wide variety of industries, including finance and distribution. Leveraging its strengths, which have enabled it to quickly adapt to changes in the environment over many years, the company is currently focusing on expanding cutting-edge businesses such as its cloud-based data integration platform (iPaaS), HULFT Square, and strengthening its efforts to implement technologies that will pave the way for the future.
- Saison Technology:
https://www.saison-technology.com/ - HULFT Products page:
https://www.saison-technology.com/service/product/
[Trademark-related]
- "HULFT" is a trademark or registered trademark of Saison Technology.
- Other company names, product names, service names, etc. are trademarks or registered trademarks of the respective companies.