Large Language Model (LLM)

Glossary

"Large Language Model (LLM)"

This glossary explains various keywords that will help you understand the mindset necessary for data utilization and successful DX.
This time, let's take a look at "large-scale language models," which are expected to have a major impact on the use of data and AI in business in the future.

What is a Large Language Model (LLM)?

A large-scale language model (LLM) is a natural language model trained on a large amount of text data. It usually refers to a trained neural network model trained on a large corpus using deep learning technology.
It has versatility in natural language processing, which traditionally required language models trained specifically for specific applications, and by adjusting the trained model to suit the application (through fine-tuning, transfer learning, etc.), it can handle a variety of applied natural language processing tasks, such as question answering, text classification, document summarization, and sentiment understanding. It is also known as the technology underlying ChatGPT, which is currently a hot topic.

What is a Language Model?

Large-scale language models are often referred to by the abbreviation LLM, which, as the name suggests, refers to "very large-scale language models." So what exactly is a "language model"? It can be difficult to understand, but we will explain roughly what it is using an example below to help you visualize it.

A very simple example of a "language model"

A "language model" is a model of the characteristics of a target natural language. For example, a language model can be created by preparing English sentences as training data, analyzing the occurrence rate of each letter of the alphabet, and creating an analysis table for each letter. It is known that in normal English sentences, "e" appears most frequently, while letters such as "e," "t," "a," "o," "n," and "i" also appear frequently, while letters such as "j," "k," "q," "x," and "z" appear less frequently.

It is true that it can be said to summarize the properties of English, but you might wonder what use it is. In fact, that is not the case. For example, the distribution of alphabets in other languages (such as French and German) is different, so by using this, it is possible to automatically determine whether text data is English or not.

Another ancient cipher is one that replaces letters of the alphabet with other letters (substitution ciphers), and these can also be deciphered. Even if a substitution cipher appears to humans at first glance to be nothing more than a string of mysterious words, if you know that the original document was written in English, you can decipher it by figuring out that the letter that appears most frequently in the cipher is probably "e."

Word frequency model

There are more complex models than alphabetical ones, for example, counting the frequency of words in a document.

This may also seem like it has no practical use at first glance. However, for example, by examining the bias in the appearance of words in a document, it is possible to determine who wrote it. For example, when researching the question of who Shakespeare really was, it is possible to verify whether the word appearance rate matches the writings of other people, or to compare the distribution of writings written by the person believed to be the real Shakespeare with the writings of Shakespeare, thereby verifying whether they are likely the same person.

With the advent of computers, more advanced processing has become possible, making it possible to create similar language models using "bi-grams," which aggregate the sequence of a word and the next word that appears (or the probability of two words appearing in succession), and "tri-grams," which examine the probability of three consecutive words appearing in the same sequence.

With such a model, for example, if a user is writing a sentence in a word processor and types a word that has a very low probability of appearing after a certain word, it can determine whether this is a typo and display a warning to the user to assist with input.

What kind of language model is the currently popular "large-scale language model"?

To reiterate, a "language model" is a model that models the characteristics of natural language in some form. It is created by preparing original text data and using it to perform some kind of analysis or learning. Models can be very simple, such as a list of alphabet occurrence rates or the probability of two words appearing consecutively, or they can be very complex models that use deep learning.

GPT（Generative Pre-trained Transformer）

The currently popular "ChatGPT" is also realized using a large-scale language model. It was developed using the large-scale language model "GPT (Generative Pre-trained Transformer)" developed by OpenAI.

An example often seen when explaining GPT is, "What is the next word after the phrase 'The highest mountain in Japan'?" GPT is trained on a large amount of text data to predict this task. In this example, the next word would likely be 'Mount Fuji'. While there is a chance that 'It's not Mount Takao' will come, it is highly unlikely, and GPT is a language model trained on such a task, able to determine that this is the case, and that 'I want to eat pudding' is unlikely to come.

But don't you feel like you just can't accept it?

This "Mt. Fuji explanation" is well-known, so many people have heard it, but many people may find it hard to understand. Is that all it's learned? Are you saying that ChatGPT can have a conversation like a human with just that level of learning?

First, recall the example we saw earlier where something unexpected was achieved simply by counting the occurrence rates of alphabet letters (something that even elementary school students can do if they take the time). Even something that may seem like "just that" at first glance can often be used skillfully to achieve surprising results.

On the other hand, that feeling is also appropriate. If you just want to predict the next word to appear, you can train it using the "bigrams" and "trigrams" explained earlier, and in fact, such language models have been around for a long time. But has conversational AI like ChatGPT existed for a long time? No, it hasn't. In other words, there is "more to it" than the current boom in large-scale language models.

"Large scale" and "deep learning"

It is believed that the reason large-scale language models are able to achieve such amazing things is because they are "large-scale" and use "deep learning" (neural networks). Their "large scale" in particular has attracted attention, so they are specifically called "large-scale language models (LLMs)" rather than "language models."

We know that the larger the scale, the greater the capabilities (we don't know why).

Large-scale language models can sometimes demonstrate unexpected capabilities. Just as ChatGPT saw a series of "new discoveries" (such as "it can do this") made by users after the service was launched, it can also do things that were not directly taught (emergence). For example, it can perform simple calculations without being taught calculations, or it can translate without being trained in translation.

We do not fully understand how large-scale language models work internally or how they demonstrate their capabilities. As learning progresses, we do not fully understand and cannot predict what they will be able to do next. However, we do know that as the amount of text data used for learning increases, new capabilities that were not previously apparent suddenly emerge.

Currently, there is a race to develop ever larger language models, as it is now known that the "large scale" of the model, including the amount of data, the amount of calculations, and the number of parameters in the trained model, is closely related to the emergence of new capabilities. Laws have also been discovered that predictably improve already acquired capabilities, such as the accuracy of predicting the next word, when the amount of data is increased (power law: scaling law).

This characteristic overturned conventional wisdom. In conventional machine learning, there is an optimal scale (number of parameters) for the complexity of the task to be solved, and if there is too much training data, a phenomenon known as overfitting occurs, which degrades performance, so it was not advisable to increase the amount of data indiscriminately.

However, simply preparing large amounts of text data, performing large amounts of calculations, and training larger language models has resulted in improved performance and even the emergence of new capabilities. This means that we are now in a situation where we can get a return on investment simply by pursuing the scale of language models. As a result, there is fierce competition worldwide to pursue the scale of language models.

The possibility that neural networks are acquiring capabilities autonomously

In 2012, when deep learning was just starting to become popular, a neural network was randomly trained on 10 million images, and despite no explicit human instruction about the concept of "cat," it acquired the ability to distinguish between cats, which caused quite a stir ("Google's cat"). This was a phenomenon that had never been seen in machine learning applications up until then.

If a neural network has the ability to autonomously learn the concept of cats simply by being given a large amount of image data, it is possible that it could also autonomously acquire this ability from large amounts of text data.If a neural network with a very large number of parameters (learning ability) is given a huge amount of text data, it is not surprising that it could autonomously learn the various concepts and logic contained in the large amount of text data by performing the task of learning the probability of the next word to appear.

In addition, there is still much we do not know about how humans acquire language ability, or why humans have such high language ability compared to other animals. Not only is there much we do not know about the capabilities of large-scale language models, but there is also much we do not know about how humans themselves acquire language. We do not know very well whether humans acquire language ability in the same way as humans and eventually reach the same ability as humans, or whether it is a different form of intelligence from humans.

Google's large-scale language models are developed through "different learning"

PaLM and BART, large-scale language models developed by Google, were developed by training them on a different task than GPT. GPT was trained on the task of predicting the next word (token), but as a result, it could only process documents in one direction, from the front to the back.

Therefore, by training the system to solve "sentence-missing problems" rather than "predicting the next word," it can learn from either the front or the back, without depending on the order of the data. For example, when given the sentence "Today, I ate fried rice and *," the system is trained to predict what * corresponds to (for example, "gyoza").

The problems used for learning are different from those used for GPT. However, the capabilities that can be achieved using the resulting large-scale language model are similar, and they behave in a similarly versatile and highly capable manner. This makes one think that the ability that is acquired autonomously from text data through learning is what is important.

How to deal with large-scale language models

Anyway, I hope that I have given you at least some understanding of why they are called "large-scale language models" and why they are such a hot topic. They were not created as intended or performing as expected, so we don't know how they will develop in the future, and we're not entirely sure what will happen in the future. How can we be involved with large-scale language models in the future?

Developing our own large-scale language models

This is a way of getting involved by preparing a large amount of training data, training it, and creating your own unique engine. Recently, progress has been made in the development of multimodal models that can handle not only text data, but also a combination of both images and text.

For each application, it is necessary to secure a large amount of high-quality training data (it is predicted that high-quality data will soon run out on Earth). To train a model using a large amount of data, hardware capable of performing enormous calculations is essential, and there is currently a worldwide scramble to acquire high-performance GPUs.

Adapting existing large-scale language models to your needs

This is the use of existing large-scale language models to create the language models you need. You can prepare the data necessary for learning tailored to your needs, and apply transfer learning and fine-tuning to the trained model (neural network) to use it according to your needs.

Access and harness the processing power of large language models

The inference capabilities of large-scale language models can be used in practical applications by externally invoking them and obtaining the processing results.You can incorporate the model into an application, call it externally as needed to obtain the results, or provide it as a service within your company for external use.

Incorporating and utilizing the conversational capabilities of large-scale language models

ChatGPT is a prime example, but large-scale language models have the potential to realize unprecedented conversational communication with humans. By embedding them in an application or integrating them externally, conversational UIs can be incorporated, providing users with an unprecedented user experience.

"iPaaS" that can be used in conjunction with external systems and data

Given the current fierce global competition, it will be rare to create a completely original large-scale language model (LLM) from scratch. In many situations, I think our future approach will be to somehow utilize existing large-scale language models.

If we were to perform additional training ourselves, we would need to prepare the data from somewhere, formatting it as needed. If we were to use a trained model for practical purposes, we would need to use an external application in conjunction with a large-scale language model.

In other words, the use of large-scale language models depends more on how they are combined with external data and systems than on the models themselves. It's also likely that much can be understood through trial and error while actually using them. An environment that allows for flexible and rapid collaboration with external parties is what's needed in the age of large-scale language models and generative AI.

In such a situation, one way to make the use of large-scale language models more efficient is to use a method that allows large-scale language models to be freely linked to various clouds, systems, and data without having to write and develop source code yourself. For example, this can be achieved by using "connecting" technologies such as "DataSpider" and "HULFT Square," also known as "EAI," "ETL," and "iPaaS."

Can be used with GUI only

Unlike regular programming, there is no need to write code. By placing and configuring icons on the GUI, you can achieve integrated processing with a wide variety of systems and data.

Being able to develop using a GUI is also an advantage

No-code development using only a GUI may seem like a simple compromise compared to full-scale programming, but if development can be done using only a GUI, it will become possible for on-site personnel to proactively utilize AI themselves.

The people who understand the business best are the people on the front lines. They can rapidly create the necessary things, such as how to utilize data and AI, which is an advantage over a situation where they have to explain things to engineers and ask them for help every time something needs to be done.

Full-scale processing can be implemented

There are many products that claim to allow development using only a GUI, but some people may have a negative impression of such products as being too simple.

It is true that things like "it's easy to make, but it can only do simple things," "when I tried to execute a full-scale process it couldn't process and crashed," or "it didn't have the high reliability or stable operating capacity to support business operations, which caused problems" tend to occur.

"DataSpider" and "HULFT Square" are easy to use, but also allow you to create processes at the same level as full-scale programming. They have the same high processing power as full-scale programming, as they are internally converted to Java and executed, and have a long history of supporting corporate IT. They combine the benefits of "GUI only" with the proven track record and full-scale capabilities for professional use.

What is necessary for a "data infrastructure" to successfully utilize data?

Of course, the ability to connect to a wide variety of data sources is necessary, and high processing power to process large amounts of data is also required to fully support actual business operations. At the same time, flexible and rapid trial and error led by the field is also essential.

Generally speaking, if you want high performance and advanced processing, the tool will tend to be difficult to program and use, while if you want ease of use in the field, the tool will tend to be easy to use but have low processing power and can only perform simple processing.

In addition, it is also desirable that the candidate has advanced access capabilities to a wide variety of data sources, especially legacy IT systems such as mainframes and non-modern data sources such as on-site Excel, as well as the ability to access the latest IT such as the cloud.

There are many methods that meet just one of these conditions, but to successfully utilize data, all of them must be met. However, there are not many methods for achieving data integration that are both usable in the field and have the high performance and reliability of a professional tool.

No need to operate in-house as it is iPaaS

DataSpider can be operated securely on a system under your own management. With HULFT Square, a cloud service (iPaaS), this "connecting" technology itself can be used as a cloud service without the need for in-house operation, eliminating the hassle of in-house implementation and system operation.

Related keywords (for further understanding)

EAI
- It is a concept of "connecting" systems by data integration, and is a means of freely connecting various data and systems. It is a concept that has been used since long before the cloud era as a way to effectively utilize IT.
ETL
- In the recent trend of actively working on data utilization, the majority of the work is not the data analysis itself, but rather the collection and preprocessing of data scattered around, from on-premise to cloud. This is a means to carry out such processing efficiently.
iPaaS
- A cloud service that "connects" various clouds with external systems and data simply by operating on a GUI is called iPaaS.
Generation AI
- Generative AI is a system that uses machine learning to learn the characteristics and structure of data from given training data, and can generate data with similar characteristics and structure based on instructions such as keywords.

If you are interested in our "Connecting" initiative,

If you are interested, please try out our products that solve IT system and business problems by using the concept of "connecting."

The ultimate tool for connecting data: DataSpider, data integration software

"DataSpider," data integration tool developed and sold by our company, is a "connecting" tool with a long history of success.

Unlike regular programming, development can be done using only the GUI (no code), without writing code. This means that it can be used by business personnel who have a good understanding of the business and can grasp the specific issues surrounding their company's silo structure.

There are many tools available that allow simple integration, but this tool is easy to use, even for non-programmers, as it only has a GUI, and it also has "high development productivity" and "full-scale performance that can serve as the foundation for business (professional use)." It can smoothly solve the problem of "connecting disparate systems and data," which is hindering the successful use of IT.

We offer a free trial version and hold online seminars where you can try out the software for free, so we hope you will give it a try.

Free product introduction seminar