NoSQL
「NoSQL」
This glossary explains various keywords that will help you understand the mindset necessary for data utilization and successful DX.
This time, we will explain "NoSQL," a database that has become a hot topic as it is based on a different concept from RDB.
What is NoSQL?
NoSQL (No Esq.) refers to a database engine that is designed with a different concept from traditional RDBs (relational databases) such as SQL.
When cloud services began to spread around the world, there was a growing need for data processing that was difficult to achieve with traditional RDBs, such as unprecedentedly large amounts of data, a wide variety of data, and extremely high-speed data processing.As a result, databases based on a different concept from RDBs began to be developed, and these came to be called NoSQL databases.
The historical background of the birth of NoSQL
The term NoSQL emerged around 2010 and became a hot topic, around the same time that terms like big data and Hadoop became popular.
The term cloud computing itself was coined in 2006, and while there had been no custom of using IT via a web browser until then, cloud services began to spread rapidly around 2010. This ushered in an era in which unprecedentedly large-scale services used by people all over the world appeared and were operated over the Internet. Around the same time, smartphones also began to become popular, and the use of IT spread at an unprecedented rate.
In these unprecedented circumstances, the utilization of "data" was also facing unprecedented challenges. Unprecedented data processing needs were emerging, including data volumes far exceeding those of the past, far more diverse data, and the need for unprecedented high-speed processing. Furthermore, it was becoming increasingly difficult to meet these new needs by simply improving the performance of the RDBs (commonly known as "databases") that had been the mainstream up until then.
Therefore, in order to meet the needs of this new era, various efforts are being made to rebuild databases using new technologies and thereby develop new era data infrastructures, and these new technologies have come to be called "NoSQL." NoSQL was named to refer to "new technologies that are not RDB (SQL)" as a metaphor for using SQL when using conventional RDB databases.
NoSQL: The shift to Not-only SQL
In the early days, many new technologies did not actually use SQL, so they were called "NoSQL," but as products began to appear that could use SQL (auxiliarily), it became clear that whether or not to use SQL was not an issue, and NoSQL came to be an abbreviation for "Not-only SQL." Furthermore, products began to appear that used SQL as before, but were venturing into new areas that NoSQL targeted, and these came to be called "NewSQL."
Volume: NoSQL because the amount of data is too large
A well-known NoSQL product from this perspective is "Hadoop," which was also booming at the time along with the term "big data."
Hadoop is an open source software that was originally developed based on an academic paper known as the "MapReduce paper," which introduced the technology that Google used to build its own cloud data centers.
⇒ MapReduce: Simplified Data Processing on Large Clusters
It was a truly otherworldly technology (at the time) that envisioned situations where the amount of data was so large that it could never be stored on a single machine. It created a distributed database that distributed and stored data across a large number of HDDs installed in a large number of PC servers, enabling search, aggregation, and data update processing across a group of these PC servers.
It was a technology that responded to the technological situation at the time, distributing the load of I/O processing across a large number of PCs, and because of this architecture, the use of SQL was not anticipated. Instead, it was written using "Map processing," which is parallel processing on each node, and "Reduce processing," which is processing that collects data from each node, making it a technology that was certainly "not SQL."
Writing processes in MapReduce is very different from traditional database usage and is often difficult to use, and processes that are easy to write in SQL (such as JOINs across tables) can be difficult to implement. Later, technologies (such as Hive) emerged that could convert SQL-like statements into MapReduce for processing, but they often did not provide the performance required and were not a sufficient solution.
For this reason, despite the huge boom at the time, it did not become a mainstream database technology afterwards. However, nowadays, due to its feature of being able to store and process extremely large amounts of data, it is sometimes used as the technical foundation of data lakes, and even though the name is no longer heard, it is still used everywhere.
Taking "Hadoop" as an example, NoSQL exists as a pursuit of technology that can process extremely large amounts of data.
Variety: NoSQL is needed because data formats are too diverse
When processing data with an RDB, you first need to define the type of data to be put into the database, and then store the data in a fixed format. This allows you to process data in a neat and tidy manner, but it also means that you won't even be able to store the data properly unless you make some advance preparations beforehand.
In situations where a wide variety of data arrives one after another, it was problematic to only be able to store data in a predetermined, prepared format.To address this problem, a schemaless database was developed that could store and process data regardless of its type.
For example, it is a database that can store data that has a data structure, such as JSON, but each JSON may have a different data structure, and can perform search, aggregation, and update operations on the stored data.
NoSQL may refer to this definition in a narrow sense, and may also be further subdivided into document-oriented. MongoDB, which can store and process JSON, is known as a representative product.
In an RDB, data types are considered before being written to the database (schema on write), but in a document-oriented database, the diverse data types are taken into account when reading data (schema on read).
When developing a system through trial and error to develop system functions and data, with an RDB, programming work must be stopped every time the data type to be stored changes, requiring time-consuming database-related work. However, with a document-oriented system, although the types of data stored are mixed, programmers can continue to experiment with programming without being bothered by database-related issues, which has some benefits for them.
Nowadays, even traditional RDBs are increasingly incorporating new functions to meet these needs. For example, the widely used PostgreSQL has a JSON type that allows queries to be written against JSON data.
Velocity: NoSQL is used because it must be processed very quickly
New technology has been developed because traditional RDBs cannot keep up with the processing speed.
For example, in cloud services used by a large number of users, a large number of login requests will be sent during peak usage periods, which will require a huge amount of database processing, such as matching IDs and passwords.
To meet these needs, databases were created that could process simple data types quickly in memory, without using hard disks or other devices that take time to access. A typical example is a type of database called a key-value store. Key-value stores mainly perform simple processing, such as returning a value when a search key is given, but in return they operate quickly and efficiently in memory.
A typical example is Redis, which is essentially an in-memory database that operates very quickly. Amazon Dynamo, a well-known cloud service, is also a key-value database.
Things that have a different concept (graph databases, vector databases, etc.)
There are databases that have a completely different concept from traditional databases (RDBs).
Graph database
A graph database is a database that specializes in handling data that consists of graph structures, or "vertices" and "edges," as defined in mathematical graph theory. An example of such data is a "railroad route map." Not only do the data structures stored in graph databases differ significantly from RDBs, but the search processing they perform is also significantly different, so they often use languages other than SQL (such as Gremlin, Cypher, or SPARQL).
Time Series Database
This is a database specialized for handling data that consists of a set of time (timestamp) and other values, such as "data continuously transmitted from sensors in IoT" and "bank transaction data." This database is designed for such purposes, taking into account that there is often a need to efficiently perform search processing using time, new data is constantly arriving, and data is rarely rewritten.
Object-Oriented Database
In object-oriented languages such as Java, the data handled within a program is modeled in an object-oriented manner. However, relational databases (RDBs) are generally used in IT systems, and because the approach to data modeling is different from that of RDBs, there is a problem when trying to store data, as the approach itself is different and a conversion process is required. Therefore, in order to eliminate such problems, this database aims to "be able to store data modeled in an object-oriented manner as is."
Vector Database
This database is gaining attention due to the boom in generative AI. It is also used as a means of implementing RAG. It is used as a means of searching for "data with similar meaning" using vector operations. See the article below for more details.
⇒ Vector database | Glossary
⇒ Retrieval Augmented Generation (RAG) | Glossary
Data is inherently diverse
Once you get used to the idea of RDB, you may start to think that RDB-like things are data, and that the data that NoSQL talks about is the exception. However, the data in the world is inherently diverse, and not all data can be naturally stored in an RDB.
It's not that it doesn't exist, it's just that you haven't noticed it
I used a "railroad map" as an example of graph database-like data, but similar data structures exist all over the world. For example, many types of data, such as "road traffic networks," "communication networks," "relationships between adjacent parts inside a device," and "human relationships (who is friends with who)," have structures that should ideally be handled in a graph DB.
However, traditionally, techniques have been used to force graph-structured data into RDBs. This type of data is simply thought to be difficult to handle, and it is a data that is based on a different way of thinking, and I think that people often do not realize that there are many cases where using a graph DB can solve the problem simply and naturally.
Furthermore, object-oriented languages such as Java are widely used in system development, and when these languages are fully utilized in system development, object-oriented modeling is used, and the data held by the program while it is running is also "properly object-oriented modeled data." However, because the way data is thought of is different in an RDB, time-consuming conversion processing is required to store the data, and even if you have worked hard to create a beautiful class design, converting it into an RDB and putting it into the database can ruin the performance.
This is a familiar "practical point" when developing systems using Java, but if there was a "database that could store data in an object-oriented manner," it might be possible to eliminate this kind of problem altogether.
Operational hassle is being eliminated in the cloud era
However, even if the usefulness of such non-RDB paradigms was recognized, in order to actually use them, it was necessary to go through the trouble of maintaining and operating a wide variety of databases. For example, it was easier and more efficient in terms of IT infrastructure to use only one type of database, PostgreSQL, and it was more realistic to have people devise ways to use RDB even for data that would be more suitable for other databases.
However, the situation is now changing. An environment is being established where databases other than RDBs can be used as cloud services, which is eliminating the "operational hassle" issue. For example, AWS (Amazon Web Services) offers a wide variety of databases, such as the ones listed below, which can be used without having to operate them yourself.
Relational database (RDB)
Of course, widely used RDBs are provided.
- Amazon Aurora
- Amazon RDS
- Amazon Redshift
Key-value type (in-memory database)
NoSQL is also available to meet the need for extremely fast processing.
- Amazon DynamoDB
- Amazon ElastiCache
- Amazon MemoryDB
Document-oriented
NoSQL is also available to meet the needs of those with "too diverse data."
- Amazon DocumentDB (MongoDB compatible)
Hadoop and column-oriented databases (suitable for processing large amounts of data)
NoSQL is also available to meet the needs of "too much data."
- Amazon Keyspaces (Apache Cassandra compatible)
- Amazon EMR(Hadoop)
Graph database
Graph databases that can handle data structures consisting of "vertices" and "edges" are also provided, and can be used without having to operate them yourself.
- Amazon Neptune
Time Series Database
A time series database that can be used for IoT and other purposes is also provided.
- Amazon Timestream
Vector Database
AWS also provides a vector database, which is gaining attention for its use in generative AI, particularly in RAG. It also provides an environment where vector search can be added to various existing databases and used together.
- Amazon OpenSearch Service
- Vector search functions have been added to various databases, etc.
- Available in RDB format:
The extension "pgvector" has been added to AWS PostgreSQL (Amazon Aurora PostgreSQL / Amazon RDS for PostgreSQL). - Available for object storage:
"Amazon S3 Vectors" allows you to use Amazon S3 as a vector store - Available in graph databases:
Amazon Neptune ML extension - Available in NoSQL (in-memory databases):
Amazon MemoryDB Vector Search Feature - Available in NoSQL (document-oriented):
Vector search functionality in Amazon DocumentDB (MongoDB compatible)
- Available in RDB format:
The above is the current list at the time of writing, and there is a possibility that more will be added in the future. Roughly speaking, all types of NoSQL other than the above "relational" are different types.
Remaining issues in achieving "diverse data utilization" (solved with iPaaS)
In this way, we have come to understand that a wide variety of databases exist, and that we are now in a position where they can be utilized without the need for operation.
In order to utilize these, it is of course necessary to "know that there are other databases than RDB" and "be willing to use them," but it is also necessary to have a foundation that can effectively link and combine a wide variety of databases and data.
When trying to link JSON data stored in a document-oriented database to an RDB, it is natural that data conversion processing will be required. When linking databases with different concepts, it may be necessary for a person to understand the meaning of the data in each and consider the conversion process.
This is where no-code methods come in handy: connecting various clouds, systems, and data, such as EAI, ETL, and iPaaS, which are technologies that connect data, such as DataSpider and HULFT Square.
Can be used with GUI only
Unlike regular programming, there is no need to write code. By placing and configuring icons on the GUI, you can achieve integrated processing with a wide variety of systems and data.
Being able to develop using a GUI is also an advantage
No-code development using only a GUI may seem like a simple compromise compared to full-scale programming. However, development using only a GUI allows "on-site business personnel to take the initiative in their own work." The same goes for what should be linked with what. The people who know the business best are the people on the front lines.
Full-scale processing can be implemented
There are many products that claim to be "developable using only a GUI," but problems like "it's easy to create, but you can only do simple things," "when I tried to execute a full-scale process, it couldn't process and crashed," or "it didn't have the high reliability or stable operation capacity to support business operations, which caused problems" often occur.
Even if a service appears to be easy to use and modern at first glance, if it cannot adequately process even slightly large amounts of data, it is no match for big data.
"DataSpider" and "HULFT Square" are easy to use, but they also allow you to create processes at the same level as full-scale programming. They have the same high processing power as full-scale programming, as they are internally converted to Java and executed, and they have a proven track record of supporting corporate IT for many years.
What is necessary for a "data infrastructure" to successfully utilize data?
To fully support actual business operations, high processing power is required to process large amounts of data. At the same time, flexible and rapid trial and error led by the field is also essential.
Generally, when high performance and advanced processing are required, the tool tends to be difficult to program and use, while when ease of use in the field is required, the tool tends to be easy to use but has low processing power and can only perform simple processing.
No need to operate in-house as it is iPaaS
DataSpider can be operated securely on a system under your own management. With HULFT Square, a cloud service (iPaaS), this "connecting" technology itself can be used as a cloud service without the need for in-house operation, eliminating the hassle of in-house implementation and system operation.
Are you interested in "iPaaS" and "connecting" technologies?
The key to successfully implementing "cloud utilization," "in-house data utilization," and "business automation" is to establish a "data integration platform" that can be developed in-house by your company.
Try out our products that allow you to freely connect various data and systems, from on-premise IT systems to cloud services, and make successful use of IT.
The ultimate "connecting" tool: data integration software "DataSpider" and data integration platform "HULFT Square"
"DataSpider," data integration tool developed and sold by our company, is a "connecting" tool with a long history of success as the foundation supporting various companies' business systems. "HULFT Square," a data integration platform, is a "connecting" cloud service developed using DataSpider technology.
We offer a free trial version and hold online seminars where you can try out the software for free, so we hope you will give it a try.
Glossary Column List
Alphanumeric characters and symbols
- The Cliff of 2025
- 5G
- AI
- API [Detailed version]
- API Infrastructure and API Management [Detailed Version]
- BCP
- BI
- BPR
- CCPA (California Consumer Privacy Act) [Detailed Version]
- Chain-of-Thought Prompting [Detailed Version]
- ChatGPT (Chat Generative Pre-trained Transformer) [Detailed version]
- CRM
- CX
- D2C
- DBaaS
- DevOps
- DWH [Detailed version]
- DX certified
- DX stocks
- DX Report
- EAI [Detailed version]
- EDI
- EDINET [Detailed version]
- ERP
- ETL [Detailed version]
- Excel Linkage [Detailed version]
- Few-shot prompting / Few-shot learning [detailed version]
- FIPS140 [Detailed version]
- FTP
- GDPR (EU General Data Protection Regulation) [Detailed version]
- Generated Knowledge Prompting (Detailed Version)
- GIGA School Initiative
- GUI
- IaaS [Detailed version]
- IoT
- iPaaS [Detailed version]
- MaaS
- MDM
- MFT (Managed File Transfer) [Detailed version]
- MJ+ (standard administrative characters) [Detailed version]
- NFT
- NoSQL [Detailed version]
- OCR
- PaaS [Detailed version]
- PCI DSS [Detailed version]
- PoC
- REST API (Representational State Transfer API) [Detailed version]
- RFID
- RPA
- SaaS (Software as a Service) [Detailed version]
- SaaS Integration [Detailed Version]
- SDGs
- Self-translate prompting / "Think in English, then answer in Japanese" [Detailed version]
- SFA
- SOC (System and Organization Controls) [Detailed version]
- Society 5.0
- STEM education
- The Flipped Interaction Pattern (Please ask if you have any questions) [Detailed version]
- UI
- UX
- VUCA
- Web3
- XaaS (SaaS, PaaS, IaaS, etc.) [Detailed version]
- XML
- ZStandard (lossless data compression algorithm) [detailed version]
A row
- Avatar
- Crypto assets
- Ethereum
- Elastic (elasticity/stretchability) [detailed version]
- Autoscale
- Open data (detailed version)
- On-premise [Detailed version]
Ka row
- Carbon Neutral
- Virtualization
- Government Cloud [Detailed Version]
- availability
- completeness
- Machine Learning [Detailed Version]
- mission-critical system, core system
- confidentiality
- Cashless payment
- Symmetric key cryptography / DES / AES (Advanced Encryption Standard) [Detailed version]
- Business automation
- Cloud
- Cloud Migration
- Cloud Native [Detailed version]
- Cloud First
- Cloud Collaboration [Detailed Version]
- Retrieval Augmented Generation (RAG) [Detailed version]
- In-Context Learning (ICL) [Detailed version]
- Container [Detailed version]
- Container Orchestration [Detailed Version]
Sa row
- Serverless (FaaS) [Detailed version]
- Siloization [Detailed version]
- Subscription
- Supply Chain Management
- Singularity
- Single Sign-On (SSO) [Detailed version]
- Scalable (scale up/scale down) [Detailed version]
- Scale out
- Scale in
- Smart City
- Smart Factory
- Small start (detailed version)
- Generative AI (Detailed version)
- Self-service BI (IT self-service) [Detailed version]
- Loose coupling [detailed version]
Ta row
- Large Language Model (LLM) [Detailed version]
- Deep Learning
- Data Migration
- Data Catalog
- Data Utilization
- Data Governance
- Data Management
- Data Scientist
- Data-driven
- Data analysis
- Database
- Data Mart
- Data Mining
- Data Modeling
- Data Lineage
- Data Lake [Detailed version]
- data integration / data integration platform [Detailed Version]
- Digitization
- Digitalization
- Digital Twin
- Digital Disruption
- Digital Transformation
- Deadlock [Detailed version]
- Telework
- Transfer learning (detailed version)
- Electronic Payment
- Electronic Signature [Detailed Version]
Na row
Ha row
- Hybrid Cloud
- Batch Processing
- Unstructured Data
- Big Data
- File Linkage [Detailed version]
- Fine Tuning [Detailed Version]
- Private Cloud
- Blockchain
- Prompt template [detailed version]
- Vectorization/Embedding [Detailed version]
- Vector database (detailed version)
Ma row
- Marketplace
- migration
- Microservices (Detailed Version)
- Managed Services [Detailed Version]
- Multi-tenant
- Middleware
- Metadata
- Metaverse
Ya row
Ra row
- Leapfrogging (detailed version)
- quantum computer
- Route Optimization Solution
- Legacy System/Legacy Integration [Detailed Version]
- Low-code development (detailed version)
- Role-Play Prompting [Detailed Version]
