NoSQL

Glossary

「NoSQL」

This glossary explains various keywords that will help you understand the mindset necessary for data utilization and successful DX.
This time, we will explain "NoSQL," a database that has become a hot topic as it is based on a different concept from RDB.

What is NoSQL?

NoSQL (No Esq.) refers to a database engine that is designed with a different concept from traditional RDBs (relational databases) such as SQL.
When cloud services began to spread around the world, there was a growing need for data processing that was difficult to achieve with traditional RDBs, such as unprecedentedly large amounts of data, a wide variety of data, and extremely high-speed data processing.As a result, databases based on a different concept from RDBs began to be developed, and these came to be called NoSQL databases.

The historical background of the birth of NoSQL

The term NoSQL emerged around 2010 and became a hot topic, around the same time that terms like big data and Hadoop became popular.

The term cloud computing itself was coined in 2006, and while there had been no custom of using IT via a web browser until then, cloud services began to spread rapidly around 2010. This ushered in an era in which unprecedentedly large-scale services used by people all over the world appeared and were operated over the Internet. Around the same time, smartphones also began to become popular, and the use of IT spread at an unprecedented rate.

In these unprecedented circumstances, the utilization of "data" was also facing unprecedented challenges. Unprecedented data processing needs were emerging, including data volumes far exceeding those of the past, far more diverse data, and the need for unprecedented high-speed processing. Furthermore, it was becoming increasingly difficult to meet these new needs by simply improving the performance of the RDBs (commonly known as "databases") that had been the mainstream up until then.

Therefore, in order to meet the needs of this new era, various efforts are being made to rebuild databases using new technologies and thereby develop new era data infrastructures, and these new technologies have come to be called "NoSQL." NoSQL was named to refer to "new technologies that are not RDB (SQL)" as a metaphor for using SQL when using conventional RDB databases.

NoSQL: The shift to Not-only SQL

In the early days, many new technologies did not actually use SQL, so they were called "NoSQL," but as products began to appear that could use SQL (auxiliarily), it became clear that whether or not to use SQL was not an issue, and NoSQL came to be an abbreviation for "Not-only SQL." Furthermore, products began to appear that used SQL as before, but were venturing into new areas that NoSQL targeted, and these came to be called "NewSQL."

Volume: NoSQL because the amount of data is too large

A well-known NoSQL product from this perspective is "Hadoop," which was also booming at the time along with the term "big data."

Hadoop is an open source software that was originally developed based on an academic paper known as the "MapReduce paper," which introduced the technology that Google used to build its own cloud data centers.

⇒ MapReduce: Simplified Data Processing on Large Clusters

It was a truly otherworldly technology (at the time) that envisioned situations where the amount of data was so large that it could never be stored on a single machine. It created a distributed database that distributed and stored data across a large number of HDDs installed in a large number of PC servers, enabling search, aggregation, and data update processing across a group of these PC servers.

It was a technology that responded to the technological situation at the time, distributing the load of I/O processing across a large number of PCs, and because of this architecture, the use of SQL was not anticipated. Instead, it was written using "Map processing," which is parallel processing on each node, and "Reduce processing," which is processing that collects data from each node, making it a technology that was certainly "not SQL."

Writing processes in MapReduce is very different from traditional database usage and is often difficult to use, and processes that are easy to write in SQL (such as JOINs across tables) can be difficult to implement. Later, technologies (such as Hive) emerged that could convert SQL-like statements into MapReduce for processing, but they often did not provide the performance required and were not a sufficient solution.

For this reason, despite the huge boom at the time, it did not become a mainstream database technology afterwards. However, nowadays, due to its feature of being able to store and process extremely large amounts of data, it is sometimes used as the technical foundation of data lakes, and even though the name is no longer heard, it is still used everywhere.

Taking "Hadoop" as an example, NoSQL exists as a pursuit of technology that can process extremely large amounts of data.

Variety: NoSQL is needed because data formats are too diverse

When processing data with an RDB, you first need to define the type of data to be put into the database, and then store the data in a fixed format. This allows you to process data in a neat and tidy manner, but it also means that you won't even be able to store the data properly unless you make some advance preparations beforehand.

In situations where a wide variety of data arrives one after another, it was problematic to only be able to store data in a predetermined, prepared format.To address this problem, a schemaless database was developed that could store and process data regardless of its type.

For example, it is a database that can store data that has a data structure, such as JSON, but each JSON may have a different data structure, and can perform search, aggregation, and update operations on the stored data.

NoSQL may refer to this definition in a narrow sense, and may also be further subdivided into document-oriented. MongoDB, which can store and process JSON, is known as a representative product.

In an RDB, data types are considered before being written to the database (schema on write), but in a document-oriented database, the diverse data types are taken into account when reading data (schema on read).

When developing a system through trial and error to develop system functions and data, with an RDB, programming work must be stopped every time the data type to be stored changes, requiring time-consuming database-related work. However, with a document-oriented system, although the types of data stored are mixed, programmers can continue to experiment with programming without being bothered by database-related issues, which has some benefits for them.

Nowadays, even traditional RDBs are increasingly incorporating new functions to meet these needs. For example, the widely used PostgreSQL has a JSON type that allows queries to be written against JSON data.

Velocity: NoSQL is used because it must be processed very quickly

New technology has been developed because traditional RDBs cannot keep up with the processing speed.

For example, in cloud services used by a large number of users, a large number of login requests will be sent during peak usage periods, which will require a huge amount of database processing, such as matching IDs and passwords.

To meet these needs, databases were created that could process simple data types quickly in memory, without using hard disks or other devices that take time to access. A typical example is a type of database called a key-value store. Key-value stores mainly perform simple processing, such as returning a value when a search key is given, but in return they operate quickly and efficiently in memory.

A typical example is Redis, which is essentially an in-memory database that operates very quickly. Amazon Dynamo, a well-known cloud service, is also a key-value database.

Things that have a different concept (graph databases, vector databases, etc.)

There are databases that have a completely different concept from traditional databases (RDBs).

Graph database

A graph database is a database that specializes in handling data that consists of graph structures, or "vertices" and "edges," as defined in mathematical graph theory. An example of such data is a "railroad route map." Not only do the data structures stored in graph databases differ significantly from RDBs, but the search processing they perform is also significantly different, so they often use languages other than SQL (such as Gremlin, Cypher, or SPARQL).

Time Series Database

This is a database specialized for handling data that consists of a set of time (timestamp) and other values, such as "data continuously transmitted from sensors in IoT" and "bank transaction data." This database is designed for such purposes, taking into account that there is often a need to efficiently perform search processing using time, new data is constantly arriving, and data is rarely rewritten.

Object-Oriented Database

In object-oriented languages such as Java, the data handled within a program is modeled in an object-oriented manner. However, relational databases (RDBs) are generally used in IT systems, and because the approach to data modeling is different from that of RDBs, there is a problem when trying to store data, as the approach itself is different and a conversion process is required. Therefore, in order to eliminate such problems, this database aims to "be able to store data modeled in an object-oriented manner as is."

Vector Database

This database is gaining attention due to the boom in generative AI. It is also used as a means of implementing RAG. It is used as a means of searching for "data with similar meaning" using vector operations. See the article below for more details.

⇒ Vector database | Glossary
⇒ Retrieval Augmented Generation (RAG) | Glossary

Data is inherently diverse

Once you get used to the idea of RDB, you may start to think that RDB-like things are data, and that the data that NoSQL talks about is the exception. However, the data in the world is inherently diverse, and not all data can be naturally stored in an RDB.

It's not that it doesn't exist, it's just that you haven't noticed it

I used a "railroad map" as an example of graph database-like data, but similar data structures exist all over the world. For example, many types of data, such as "road traffic networks," "communication networks," "relationships between adjacent parts inside a device," and "human relationships (who is friends with who)," have structures that should ideally be handled in a graph DB.

However, traditionally, techniques have been used to force graph-structured data into RDBs. This type of data is simply thought to be difficult to handle, and it is a data that is based on a different way of thinking, and I think that people often do not realize that there are many cases where using a graph DB can solve the problem simply and naturally.

Furthermore, object-oriented languages such as Java are widely used in system development, and when these languages are fully utilized in system development, object-oriented modeling is used, and the data held by the program while it is running is also "properly object-oriented modeled data." However, because the way data is thought of is different in an RDB, time-consuming conversion processing is required to store the data, and even if you have worked hard to create a beautiful class design, converting it into an RDB and putting it into the database can ruin the performance.

This is a familiar "practical point" when developing systems using Java, but if there was a "database that could store data in an object-oriented manner," it might be possible to eliminate this kind of problem altogether.

Operational hassle is being eliminated in the cloud era

However, even if the usefulness of such non-RDB paradigms was recognized, in order to actually use them, it was necessary to go through the trouble of maintaining and operating a wide variety of databases. For example, it was easier and more efficient in terms of IT infrastructure to use only one type of database, PostgreSQL, and it was more realistic to have people devise ways to use RDB even for data that would be more suitable for other databases.

However, the situation is now changing. An environment is being established where databases other than RDBs can be used as cloud services, which is eliminating the "operational hassle" issue. For example, AWS (Amazon Web Services) offers a wide variety of databases, such as the ones listed below, which can be used without having to operate them yourself.

Relational database (RDB)

Of course, widely used RDBs are provided.

Amazon Aurora
Amazon RDS
Amazon Redshift

Key-value type (in-memory database)

NoSQL is also available to meet the need for extremely fast processing.

Amazon DynamoDB
Amazon ElastiCache
Amazon MemoryDB

Document-oriented

NoSQL is also available to meet the needs of those with "too diverse data."

Amazon DocumentDB (MongoDB compatible)

Hadoop and column-oriented databases (suitable for processing large amounts of data)

NoSQL is also available to meet the needs of "too much data."

Amazon Keyspaces (Apache Cassandra compatible)
Amazon EMR（Hadoop）

Graph database

Graph databases that can handle data structures consisting of "vertices" and "edges" are also provided, and can be used without having to operate them yourself.

Amazon Neptune

Time Series Database

A time series database that can be used for IoT and other purposes is also provided.

Amazon Timestream

Vector Database

AWS also provides a vector database, which is gaining attention for its use in generative AI, particularly in RAG. It also provides an environment where vector search can be added to various existing databases and used together.

Amazon OpenSearch Service
Vector search functions have been added to various databases, etc.
- Available in RDB format:
  The extension "pgvector" has been added to AWS PostgreSQL (Amazon Aurora PostgreSQL / Amazon RDS for PostgreSQL).
- Available for object storage:
  "Amazon S3 Vectors" allows you to use Amazon S3 as a vector store
- Available in graph databases:
  Amazon Neptune ML extension
- Available in NoSQL (in-memory databases):
  Amazon MemoryDB Vector Search Feature
- Available in NoSQL (document-oriented):
  Vector search functionality in Amazon DocumentDB (MongoDB compatible)

The above is the current list at the time of writing, and there is a possibility that more will be added in the future. Roughly speaking, all types of NoSQL other than the above "relational" are different types.

Remaining issues in achieving "diverse data utilization" (solved with iPaaS)

In this way, we have come to understand that a wide variety of databases exist, and that we are now in a position where they can be utilized without the need for operation.

In order to utilize these, it is of course necessary to "know that there are other databases than RDB" and "be willing to use them," but it is also necessary to have a foundation that can effectively link and combine a wide variety of databases and data.

When trying to link JSON data stored in a document-oriented database to an RDB, it is natural that data conversion processing will be required. When linking databases with different concepts, it may be necessary for a person to understand the meaning of the data in each and consider the conversion process.

This is where no-code methods come in handy: connecting various clouds, systems, and data, such as EAI, ETL, and iPaaS, which are technologies that connect data, such as DataSpider and HULFT Square.

Can be used with GUI only

Unlike regular programming, there is no need to write code. By placing and configuring icons on the GUI, you can achieve integrated processing with a wide variety of systems and data.

Being able to develop using a GUI is also an advantage

No-code development using only a GUI may seem like a simple compromise compared to full-scale programming. However, development using only a GUI allows "on-site business personnel to take the initiative in their own work." The same goes for what should be linked with what. The people who know the business best are the people on the front lines.

Full-scale processing can be implemented

There are many products that claim to be "developable using only a GUI," but problems like "it's easy to create, but you can only do simple things," "when I tried to execute a full-scale process, it couldn't process and crashed," or "it didn't have the high reliability or stable operation capacity to support business operations, which caused problems" often occur.

Even if a service appears to be easy to use and modern at first glance, if it cannot adequately process even slightly large amounts of data, it is no match for big data.

"DataSpider" and "HULFT Square" are easy to use, but they also allow you to create processes at the same level as full-scale programming. They have the same high processing power as full-scale programming, as they are internally converted to Java and executed, and they have a proven track record of supporting corporate IT for many years.

What is necessary for a "data infrastructure" to successfully utilize data?

To fully support actual business operations, high processing power is required to process large amounts of data. At the same time, flexible and rapid trial and error led by the field is also essential.

Generally, when high performance and advanced processing are required, the tool tends to be difficult to program and use, while when ease of use in the field is required, the tool tends to be easy to use but has low processing power and can only perform simple processing.

No need to operate in-house as it is iPaaS

DataSpider can be operated securely on a system under your own management. With HULFT Square, a cloud service (iPaaS), this "connecting" technology itself can be used as a cloud service without the need for in-house operation, eliminating the hassle of in-house implementation and system operation.

Are you interested in "iPaaS" and "connecting" technologies?

The key to successfully implementing "cloud utilization," "in-house data utilization," and "business automation" is to establish a "data integration platform" that can be developed in-house by your company.

Try out our products that allow you to freely connect various data and systems, from on-premise IT systems to cloud services, and make successful use of IT.

The ultimate "connecting" tool: data integration software "DataSpider" and data integration platform "HULFT Square"

"DataSpider," data integration tool developed and sold by our company, is a "connecting" tool with a long history of success as the foundation supporting various companies' business systems. "HULFT Square," a data integration platform, is a "connecting" cloud service developed using DataSpider technology.

We offer a free trial version and hold online seminars where you can try out the software for free, so we hope you will give it a try.

Free product introduction seminar

「NoSQL」

What is NoSQL?

The historical background of the birth of NoSQL

NoSQL: The shift to Not-only SQL

Volume: NoSQL because the amount of data is too large

Variety: NoSQL is needed because data formats are too diverse

Velocity: NoSQL is used because it must be processed very quickly

Things that have a different concept (graph databases, vector databases, etc.)

Graph database

Time Series Database

Object-Oriented Database

Vector Database

Data is inherently diverse

It's not that it doesn't exist, it's just that you haven't noticed it

Operational hassle is being eliminated in the cloud era

Relational database (RDB)

Key-value type (in-memory database)

Document-oriented

Hadoop and column-oriented databases (suitable for processing large amounts of data)

Graph database

Time Series Database

Vector Database

Remaining issues in achieving "diverse data utilization" (solved with iPaaS)

Can be used with GUI only

Being able to develop using a GUI is also an advantage

Full-scale processing can be implemented

What is necessary for a "data infrastructure" to successfully utilize data?

No need to operate in-house as it is iPaaS

Are you interested in "iPaaS" and "connecting" technologies?

The ultimate "connecting" tool: data integration software "DataSpider" and data integration platform "HULFT Square"

Glossary Column List

Alphanumeric characters and symbols

A row

Ka row

Sa row

Ta row

Na row

Ha row

Ma row

Ya row

Ra row

Wa row

Recommended Content

Related Content

MFT（Managed File Transfer）

Transfer learning

Elastic (elasticity and flexibility)