XX is important for utilizing open data!
Explanations with examples and sample programs
HULFT problem-solving solutions | Utilizing open data
What is open data?
Open data is "publicly available data" that can be freely used by anyone via the internet, etc. It is made public not only by national and local governments, but also by research institutions and various private companies.
Definition of Open Data
The "Open Data Basic Guidelines" established by the Public-Private Data Utilization Promotion Strategic Council in 2017 defines open data as follows:
Open data is defined as public and private data held by the national government, local governments, and businesses that has been made public in a form that satisfies all of the following criteria so that any citizen can easily use it (process, edit, redistribute, etc.) via the Internet, etc.
- Any content that can be reused for commercial or non-commercial purposes.
- Machine-readable
- What can be used for free
In other words, open data is data that can be used by anyone free of charge, for both commercial and non-commercial purposes, and that is made public in a format that can be read by a wide range of common tools and programs, rather than a format that can only be handled by specific software. A similar definition is given in the "OPEN DATA HANDBOOK."
The significance and purpose of making the data public is also stated as follows:
The increased use of public data by a wide range of entities will enable the rapid and efficient provision of diverse services that utilize originality and ingenuity, and the provision and improvement of public services through public-private collaboration. This will enable appropriate responses to environmental changes such as diversifying needs and values and technological innovation, and will contribute to resolving the various challenges facing Japan, such as its severe fiscal situation and the rapid progress of an aging population with a declining birthrate.
It will also encourage venture companies and others to create a variety of new services and businesses, and improve the efficiency of corporate activities, which will lead to economic revitalization throughout Japan.
In this way, not only national and local governments but also private companies are being provided with useful data that can be used to create new businesses and improve the value of their services. This is truly a treasure trove, and we hope to use this data to help our companies grow.
Types of open data
What kind of open data is being made public? Open data is being made public not only by the national and local governments but also by private companies.
Country/Government
The Digital Agency and other government ministries and agencies publish a variety of statistical data on topics such as the economy, healthcare, education, transportation, disaster prevention, and tourism.
- e-Gov (Digital Agency)
As a "one-stop e-government portal," not only can you complete procedures with government agencies such as the Ministry of Health, Labor and Welfare online, but it also makes public data on the Constitution, laws, and government ordinances. - e-Stat (Statistics Bureau, Ministry of Internal Affairs and Communications, National Statistics Center)
As a portal site for government statistics, it publishes data from over 700 statistical surveys conducted by various government ministries and agencies. - EDINET (Financial Services Agency)
Securities reports, tender offer notification forms, and other documents are made public through the electronic disclosure system for disclosure documents based on the Financial Instruments and Exchange Act.
It is also published by many other organizations, including the Ministry of Land, Infrastructure, Transport and Tourism, the Ministry of Health, Labor and Welfare, and the Environment Agency.
local government
As of June 2023, data has been released by 1,449 municipalities, including all 47 prefectures. Data on disaster prevention, taxes, tourism, the environment, and more is published for each municipality.
- Hokkaido Open Data Portal
- Tokyo Metropolitan Government Open Data Catalog Site
- Kyoto Prefecture Open Data Catalog Site
For a list of all municipalities that have made their data public, please see the "List of Municipalities Initiatives for Open Data" on the Digital Agency website.
»List of local governments that have implemented open data initiatives
Private companies and organizations
Open data is not only made public by national and local governments, but also by private companies and organizations. A wide variety of data is made public, from data with a high public interest, such as weather data and map information, to data that indicates consumer needs, such as e-commerce site and word-of-mouth information. Here are some representative examples.
- OpenWeatherMap
OpenWeatherMap, operated by the UK company OpenWeather, makes weather data publicly available. Basic data such as the current weather is provided free of charge, but some data, such as 30-day weather forecasts, is available for a fee. - Rakuten Web Service
Data on various services provided by the Rakuten Group is provided via API. You can obtain product information from Rakuten Ichiba, hotel and inn rankings from Rakuten Travel, and user reviews from Rakuten Books.
What can you do with open data?
Let's think about what we can do with this open data.
New store opening plan planning
The aforementioned e-Stat publishes the population and number of households by city, ward, town, and village. This data could be used to consider potential locations for new stores. Furthermore, by graphing the concentration of existing stores in relation to the population of a potential store location in a scatter plot, or by plotting the presence of competing stores, it is possible to narrow down potential locations that are more likely to attract customers and be more profitable.
Sales causal analysis
By comparing your company's past sales data with meteorological information such as weather and temperature, you can use open data to analyze the causal relationship between sales and weather conditions. Furthermore, by analyzing the correlation between sales trends for each product in a store and the attributes of the area (hours of sunshine, traffic volume, age group, average income, etc.), you can consider key products and effective shelf allocations in other stores with the same regional characteristics.
Using electricity consumption to solve social issues
Until now, the use of electricity data was restricted by the Electricity Business Act. However, the Electricity Business Act was amended in 2023, and the operation of the electricity data aggregation system began, allowing electricity data to be provided to general companies *. With the recent spread of smart meters, an environment has been created where electricity usage can be monitored in near real time. This data can be used to understand the real-time presence status of delivery recipients, potentially reducing the number of missed deliveries.
- To use electricity data, you must register as a member with the Electric Power Data Management Association (a general incorporated association). To become a member, you must be certified with ISMS or PMS, and there are also other conditions such as an annual membership fee or usage fee depending on the data you use.
Combining data is important to utilize open data
As can be seen from these use cases, open data can provide more insights when analyzed in combination with multiple data types than when viewed alone. Companies should compare it with their own data. By comparing the data with the objective market environment and statistical information of open data in addition to their own situation, various strategies can be identified.
Combining these multiple data sets presents several challenges.
Publishers vary
Naturally, to combine multiple data sources, you need to obtain the data from each source. One issue is that each source simply has a different URL, but the protocols used to obtain the data (REST API, JDBC, FTP, etc.) may also differ. In addition, in the case of in-house data, authentication such as OAuth may be required depending on the system, so you need to be aware of the differences in protocols and authentication methods supported by each source.
Data formats vary
One of the challenges after acquiring the data is the difference in file formats, such as CSV, JSON, and XML. Since it is difficult to combine data with different data structures, a method for integrating the data structures is first required.
Data is held in different ways
Furthermore, even if the data represents the same thing, there may be different expressions, such as "Co., Ltd." and "(Co., Ltd.)," "1-8-1 Akasaka, Minato-ku," and "1-8-1 Akasaka." There may also be cases where there is no key item to match the data.
Therefore, in order to combine and use data, it is necessary to collect data from multiple sources using different methods, and then convert and process the data structure and storage method into a form that is easy to handle.
How to combine open data with your own data
So how can we combine data and use it? There are several ways to do this.
Manual collection and processing
This involves manually downloading the data from each publishing website and converting and processing it using Excel or similar software. This is the simplest and most reliable method, but since all steps are done manually, it takes a considerable amount of time and effort. If the data is updated, the process must be repeated in the same way, which creates issues with reproducibility.
How to collect and process data using a program
This is a method of creating processes using programming languages such as Python or Ruby. It overcomes the reproducibility issue that was a problem with manual work, but since programming knowledge is essential to create the program, it is difficult for people who are not engineers, and the time it takes to create the program is a drawback.
How to collect and process data using RPA
RPA is a tool that records mouse and keyboard operations on a PC screen and reproduces them using software. This method requires some effort to teach the tool the operations, but since once recorded, the same operations can be reproduced any number of times, even by non-engineers. On the other hand, because it reproduces screen operations, it may not work properly if the screen layout of the publishing source changes, and it tends to be difficult to flexibly control, such as when editing methods differ depending on the content of the data.
Representative RPA tools
»UiPath
»WinActor
How to collect and process data using data integration tool
The final method is to use dedicated data integration tools such as ETL or iPaaS. While this method is similar to RPA in that you must first create the process, it does not replicate the operations on a PC screen, so changes to the screen layout will not cause the system to stop working. It also allows for flexible control, such as branching the process depending on the data content. These tools tend to be more expensive than RPA tools, so they may not be cost-effective for applications such as analyzing open data only a few times a year. On the other hand, if you need to view daily analysis results in a BI tool at any time, or if you also want to automate business processes through system integration, then they are worth considering.
Representative RPA tools
»DataSpider Servista
»HULFT Square
Sample of collecting and processing open data
Here we will introduce a sample of collecting and processing open data using HULFT Square, one of data integration tool. All of the samples introduced here can be used free of charge if you have HULFT Square.
e-Gov (Digital Agency)
»List of laws and regulations
»Obtaining laws and regulations
e-Stat (Statistics Bureau, Ministry of Internal Affairs and Communications, National Statistics Center)
»Survey on Consumption Trends of Foreign Visitors to Japan
»Retail Price Survey (Structure)
»Travel and Tourism Consumption Trends Survey
»Survey of new farmers
»Physical fitness and athletic ability survey
EDINET (Financial Services Agency)
»Securities reports, etc. (list of reports)
»Securities reports, etc. (main body of the report)
gBizINFO (Ministry of Economy, Trade and Industry)
»Corporate financial information
Corporate Number System (National Tax Agency)
»Basic 3 Corporate Information
Land Comprehensive Information System (Ministry of Land, Infrastructure, Transport and Tourism)
»Real estate transaction price information
Tokyo Metropolitan Government Open Data Catalog Site (Tokyo Metropolitan Government)
»Tokyo Big Sight Event Information
»Crime information by neighborhood
OpenWeatherMap
»Weather forecast information
Rakuten Web Service
»Rakuten Market product search results
Recruit Web Service
»Hot Pepper Gourmet Gourmet Search Results
Yahoo! Shopping API
»Yahoo! Shopping Highly Rated Trend Rankings
Examples of using open data with data integration tool
We will introduce some examples of how open data is being utilized using data integration tool.
Flood forecasting using river flow and rainfall data (Nagano Prefecture Digital Transformation Promotion Division)
Nagano Prefecture is using open data to forecast river floods. Nagano Prefecture is home to many rivers, including the Shinano River (known as the Chikuma River in Nagano Prefecture), the highest flow rate in Japan, as well as the Kiso River, Tenryu River, and Himekawa River. The prefecture also has 77 cities, towns, and villages, the second highest number in the country. By collecting and analyzing data such as water levels and rainfall held by each city, town, and village, along with river and road information held by the prefecture and weather information obtained from private companies, open data is used to forecast floods more than 30 hours in advance and estimate the risk of flooding.
Optimizing delivery costs by utilizing map information such as route distance
(Nose Steel, Shiga University, Teikoku Databank)
Nose Steel is researching delivery optimization algorithms in collaboration with Shiga University and Teikoku Databank. They are developing an algorithm that determines the most efficient delivery route by taking into account information such as the weight of the steel and the truck load capacity in addition to the route distance and required time between delivery points obtained from map information. They are also developing a similar algorithm so that other companies can use it.
summary
We explained that open data is data that can be used by anyone free of charge, for both commercial and non-commercial purposes, and that is made publicly available in a format that is easy to use with tools and programs. We also explained that a wide variety of data is made public by many sources, including national and local governments and private companies.
He also explained that open data can be used in a wider range of ways when combined with in-house data rather than on its own, and that in order to do so, it is necessary to absorb the differences in protocols, formats, and data formats that differ from one publisher to another.
We also introduced how to combine open data with your own company's data, and provided examples using data integration tool.
Finally, at Saison Technology, we are happy to answer any questions you may have about utilizing open data, including data integration tool introduced in this article.
I hope this article will be helpful to those who are working to utilize open data.