HULFT and HULFT Script: Securely Connecting Heterogeneous Databases
background
I want to reliably synchronize Oracle and PostgreSQL data over 1,000 km apart.
This happened in one particular project. The client was a manufacturing company with a technical headquarters in Tokyo and a production plant located far away. The client was aiming to improve the efficiency of its equipment management, and by collecting and analyzing maintenance records from each plant at the technical headquarters in Tokyo, and then providing feedback to each plant, they aimed to improve equipment utilization rates and reduce on-site workloads (eliminating overburden, waste, and inconsistency). After a planning period of about a year, it was decided to revise the equipment management databases on the Tokyo side and the production plant side, and synchronize them to obtain real-time information on equipment management status, and each system vendor was already working on revising their applications and databases.
And then something happens.
Who will be responsible for linking the Tokyo database with the factory database?
The moment the ball crosses the boundary line, it falls. This is a common occurrence. So, we received a consultation. The environment and requirements are roughly summarized below.
- Two-way synchronization between Oracle and PostgreSQL (synchronization interval is 1 minute)
- The physical distance between the databases is more than 1,000 km.
- The production factory and Tokyo Technology Headquarters are connected by an Internet VPN. The bandwidth is 100Mbps (best effort). Because of this, the connection sometimes drops out.
- There is less than a month until the service launch.
- Ensure synchronization is possible and the construction costs are within the remaining budget.
Number 5 was a point that was especially emphasized. (laughs)
Organizing DB synchronization methods
The techniques that come to mind are materialized views, DB links, and DB-specific replication functions.
However, if you recall, this time we are linking two different databases, Oracle and PostgreSQL. If the databases are the same model, synchronization can be achieved using the database's functions, but the moment they become different models, the methods available are limited.
| No | Synchronization Method | Study results |
|---|---|---|
| 1 | Utilizing DB functions (mateview, DB link, etc.) | NG (different model) |
| 2 | Use the Sync Tool | Needs consideration |
| 3 | Output to a file and transfer | Needs consideration |
What method did you use in this case?
Ultimately, "output to a file and transfer" was selected. Initially, they had considered using a synchronization tool, but the network put a strain on bandwidth, and there were cases where synchronization processing failed when the network was interrupted. To fundamentally resolve these issues, they needed to improve network quality. With less than a month until the launch of service, it was difficult to expand the infrastructure in time, and the need to control transfer quality with software was inevitably added to their requirements. This was where HULFT came in.
HULFT outputs data to a file and transfers it, meaning that the software has the functionality to retry in the event of a network outage and transfer data without straining the bandwidth.
Let's build it
Based on the requirements and verification results up to this point, the following integration outline was determined. Simply output data that has a flag set in the DB table to a file, and if the transfer results from HULFT are normal, unset the flag. If a transfer error occurs, the flag will remain as is. The process will be to retry at the next synchronization timing. The dividing point for each system's responsibility is "whether or not the file has arrived."
And once again, something happens.
Something's wrong with the holiday processing?
This is a minor requirement, but there is a connection for updating master data on holidays, but an issue has occurred where the master data is not updated.
The cause was quickly identified: the batch. Holidays were determined within the batch, but holiday processing was not being done in accordance with the factory's operating days.
It is possible to determine the date within a batch, but isn't this one of the areas that is prone to bugs?
Now, there are two options here: modify the batch or introduce a scheduler. When you think about future maintenance issues (such as increasing or decreasing the number of holidays), there is probably nothing better than a scheduler.
Time is tight. Use templates to speed up construction and maximize quality.
There have already been many twists and turns, and there is no time left. So I have a suggestion: "Why not use a template?"
There is a tool called HULFT Script that creates templates for holiday processing and error handling that are tailored to the factory's operating days.
The final system configuration was as follows:
HULFT Script performs the scheduler function and determines holidays, and calls the necessary batches. Data to be synchronized is generated and transferred. This series of events is controlled as a transaction, and if any problems occur, a rollback and error notification are performed. All of this was implemented by calling components from the template and configuring them.
summary
The file is simple and easy to understand
This happened during system testing. Something's wrong, it's different from what I expected. So I tried to isolate the issue, but was there a file there? Or not? It was primitive, but it allowed for easy debugging. The initial isolation point was clear. Especially in cases like this, where there is a large physical distance between the parties involved or where a multi-vendor project is being promoted, I feel that "simple and easy to understand is the key."
By making good use of templates, you can minimize construction time and costs.
We believe that HULFT Script was actually the key to this project. Holiday detection, transaction control, error handling. These are common basic requirements, but trying to implement them from scratch without a framework would be arduous. This series should be used as a template in cases where quality is guaranteed. It's a shortcut to maximizing quality while minimizing development time.


