In-Context Learning (ICL)

Glossary

"In-Context Learning (ICL)"

This glossary explains various keywords that will help you understand the mindset necessary for data utilization and successful DX.
This time, let's think about how learning is done in machine learning, which is currently attracting a lot of attention in the use of IT.

What is In-Context Learning (ICL)?

In-context learning (ICL) is the ability of generative AI based on large-scale language models (LLMs) such as ChatGPT to learn temporarily during use.
In the past, when utilizing machine learning, changing the behavior of a trained model required additional training such as fine tuning. In contrast, this technology allows you to change the model's behavior as if it had been retrained by providing additional training data or requesting a change in behavior via prompts when using the model.

To utilize machine learning, you need to prepare a "trained model" (how do you prepare one?)

"AI" is a hot topic in society these days, but in most cases, "AI" refers to "machine learning." Machine learning is a means of generating value from data, and its use is an important theme in efforts to utilize data.

Please also see here for more information on machine learning.
⇒Machine Learning | Glossary

I want to use machine learning, but preparing a trained model is difficult.

In order to utilize machine learning in business, it is necessary to prepare a "trained model" that can perform the tasks required for "your purpose (business)."

For example, if you want to automatically determine from image data whether the object in the image is an apple or an orange, you need a trained model that can do this.

Image data (input) ⇒ Trained model ⇒ "It's an apple" or "It's an orange" (output)

To create a trained model in-house, you need to prepare data and train it. Just preparing the data can be a difficult task, requiring highly skilled engineers and a large amount of calculations.

Prepare "input data + teacher data" as training data (prepare image data and manually label them as "this is an orange" or "this is an apple")
The data is used to train the model, resulting in a "trained model."

Fine tuning often doesn't work

Therefore, instead of preparing a pre-trained model from scratch, some companies try to reuse "existing pre-trained models." This method involves providing additional data to train the model and adjusting and modifying its capabilities to suit the company's needs (a process known as "fine-tuning"), thereby creating a model that is useful for the company's business.

There is already a trained model that can distinguish between "apples" and "tangerines" from images.
Additionally, prepare "input data + teacher data" as training data.
(Prepare the image as "This is a green apple.")
Using this data, we perform additional learning (fine tuning) to create a model with the added ability to distinguish between green apples.

Fine tuning is generally considered a great idea, but in this example, the behavioral changes may not go as intended, such as "I tried my best but I couldn't acquire the ability to distinguish between green apples" or "I learned about green apples, but now I can't distinguish between oranges."

In any case, the hurdles are high when it comes to securing specialized personnel, preparing data, and having the system "learn," and it is not an easy undertaking.

The even more difficult situation of "generative AI (large-scale language models)"

Generative AI has been a hot topic recently, as exemplified by the huge boom in ChatGPT, and attempts are being made to use it in business.

In practice, we are now hearing about how generative AI is being used for things like document summarization and helping generate ideas on general topics. However, rather than just assisting in these tasks, it seems that companies are not yet fully utilizing generative AI to directly support their own business activities.

To fully utilize generative AI in your business, it is necessary for the generative AI (large-scale language model) to have a thorough understanding of the specific circumstances related to your company. However, it is only trained using "general data" such as ChatGPT or data available on the Internet. However, this means that it can only respond to questions that can be answered with general knowledge.

For example, if you want the Generator AI to answer your questions about your company's internal travel expense settlement rules (it needs to know the internal company rules), or if you want it to come up with ideas to improve a new product in the West Japan division that isn't getting very good reviews (it needs to know your company's business situation), then you'll need to somehow prepare a "Generator AI that has knowledge about your company."

Building our own large-scale language models (LLMs) is too difficult

Generative AI is made possible by "large-scale language models." If your company decides to develop its own large-scale language model and uses data about your company during the learning process, you will be able to use something like ChatGPT, which provides answers that take your company's circumstances into consideration.

However, large-scale language models are created by preparing an enormous amount of data and performing an unimaginable amount of calculations. For most companies, it is unrealistic to undertake such a laborious and time-consuming effort on their own.

*However, this is common knowledge at the time of writing. The situation may change in the future, and we may enter an era where it is common for reputable companies to develop their own large-scale language models.

Fine-tuning doesn't work for building large-scale language models (LLMs) for your company

If developing from scratch is difficult, it will be necessary to make good use of "existing large-scale language models." For example, one approach would be to develop a large-scale language model that incorporates knowledge about your company by adding additional training information about your company to a large-scale language model that already exists.

Using this method (fine-tuning) to develop a generative AI that can provide accurate answers about your company is a more realistic and feasible approach than building one from scratch.

However, when actually trying to do this, fine-tuning large-scale language models often fails to add the desired knowledge, leaving this approach as something that is "not ineffective, but somehow doesn't work as expected."

*However, this is also true at the time of writing. The situation may change dramatically in the future, and it may become commonplace to use large-scale language models that are fine-tuned specifically for your company.

In any case, training a large-scale language model is not an easy task.

Whether you create it from scratch or fine-tune it, the hurdles are clearly higher than the ease of using a general-purpose large-scale language model such as ChatGPT (as a cloud service, etc.), and it is hard to say that it is something that any company can easily undertake.

Technology to change the "behavior" of existing large-scale language models

The "third option" that was discovered in such situations is In-Context Learning (ICL).

You can change the behavior by adding additional learning through prompts.

Normally, if you want to change the behavior of machine learning, you have to change the trained model itself. However, with large-scale language models, it is possible to change the behavior as if the model had been trained with additional data by devising prompts when using the model, without changing the trained model.

An example of "in-context learning" is "few-shot prompting," also known as prompt engineering.

For example, if you ask ChatGPT what category of product HULFT is, at the time of writing it will answer EAI or EDI (it is lying without hesitation, and is in a state of "hallucination").

What category of product is HULFT?

HULFT is a category of products that provide data integration and data transfer solutions. Specifically, it is positioned as middleware for efficient data collection, conversion, transfer, and synchronization between different systems. It is often used to realize different systems within an enterprise or data integration between companies, and is suitable for applications such as Data Integration (integration), EAI (Enterprise Application Integration), and EDI (Electronic Data Interchange).

This means that there is a lack of knowledge to use it in the way we envision our products and business. However, it is not realistic for our company to independently prepare a large-scale language model with knowledge of HULFT, and adding knowledge through fine-tuning often does not work. So what should we do?

In normal machine learning applications, it is difficult to change the behavior of a trained model without modifying it, but with a large-scale language model, you can provide additional knowledge via prompts without additional training. Here is an example following the input above.

Below is a list of the software product names and product categories. Please remember them and answer the questions.

HULFT : MFT
DataSpider : EAI
HULFT Square : iPaaS

Ok, I'll remember these software products and their categories.

HULFT : MFT (Managed File Transfer)
DataSpider : EAI (Enterprise Application Integration)
HULFT Square : iPaaS (Integration Platform as a Service)

What category of product is HULFT?

HULFT is a product in the MFT (Managed File Transfer) category.

When additional knowledge was provided via prompts, the user was able to correctly answer, "HULFT is an MFT."

Moreover, just by showing an example using the abbreviation "MFT," they correctly understood "HULFT: MFT (Managed File Transfer)," even though there are other words that are abbreviated as MFT. This shows that they are not simply memorizing the patterns they were told, but have a more advanced ability.

This behavior may seem natural in human conversation, but keep in mind that this is machine learning. Even though no changes were made to the large-scale language model (trained model) itself, when additional training data was provided within the prompt, the response was changed based on that data.

In other words, even though no additional training was performed, the behavior was changed as if additional training had been performed on a large-scale language model.

How in-context learning works

The large language model itself is not rewritten through prompts, yet it behaves as if it has been retrained. What is going on?

ChatGPT chats (conversations), so it remembers the flow of conversation. For example, if you say, "Nice to meet you, ChatGPT. My name is Watanabe," ChatGPT will respond, "Nice to meet you, Watanabe-san. Nice to meet you." It will then continue the conversation, remembering your name. If you say, "It was cloudless and hot today, wasn't it?", ChatGPT will remember that "Today's weather was sunny" and "it was hot," and generate a response based on the fact that "we're currently discussing the weather."

In other words, ChatGPT understands and remembers the "flow of conversation (context)" and generates responses in addition to the knowledge stored in the large-scale language model itself. This "ability to understand and retain the context of the conversation" allows it to change its behavior as if it had learned from additional data.

Regular machine learning behavior changes:
It is necessary to make changes to the "trained model" itself
In-Context Learning (ICL):
Even if the "trained model (large-scale language model)" itself remains the same, it has the ability to change its behavior as if it had undergone additional learning just by using the "state held at runtime" (the dialogue context).

Technically, the behavior of "Transformer," a deep learning model that realizes large-scale language models, is changed as if additional learning has been performed in the state it retains at runtime.

Because of this mechanism, what you teach using in-context learning cannot be retained forever ^*. This is because no changes are made to the large-scale language model itself. It is only retained for the duration of that conversation (context), and once the conversation ends, the information you teach will be forgotten. In this example, every time you use it, you will need to first provide the knowledge via a prompt that "HULFT is MFT."

(At the time of writing, this is the case.) In the future, new technologies may emerge that allow what is learned through in-context learning to be permanently incorporated into the large-scale language model itself.

Emergent capabilities in large-scale language models

Another interesting thing is that in-context learning is a surprising fact: it's not something that someone tried to engineer.

A large-scale language model is simply a trained model created by feeding it a huge amount of text data, and the GPT that powers ChatGPT has only been trained to solve the prediction problem of "What word will appear next in this text?"

However, with large-scale language models, as the amount of data used in training increases and the number of model parameters increases (i.e., as the model becomes larger), a strange phenomenon occurs in which the model somehow "emergently acquires advanced abilities that have not been directly taught."

Examples of phenomena include people acquiring the ability to translate despite not having received any training in translation, or people acquiring the ability to calculate (even though they make mistakes) despite not being taught how to do it.

Similarly, "In-Context Learning (ICL)" is one of the capabilities that emerged emergently as large-scale language models grew in size. It was discovered after the fact that this capability existed when the model was created and put to use. It is still not clear at this point why new capabilities emerge, or what capabilities will emerge in the future.

Related keywords (for further understanding)

Machine learning related keywords

Keywords related to Generative AI/ChatGPT

Keywords related to data integration and system integration

EAI
- It is a concept of "connecting" systems by data integration, and is a means of freely connecting various data and systems. It is a concept that has been used since long before the cloud era as a way to effectively utilize IT.
ETL
- In the recent trend of actively working on data utilization, the majority of the work is not the data analysis itself, but rather the collection and preprocessing of data scattered in various places, from on-premise to cloud.
MFT（Managed File Transfer）
- It is a linkage platform that realizes linked processing by files with a high level of "safety, security, and reliability" that can support corporate activities. It is not only possible to transfer files, but also to ensure that the transfer process is carried out, secure and secure transfer, and the ability to keep transfer logs properly to check and manage file transfer them, and the foundation that makes it happen.
iPaaS
- A cloud service that "connects" various clouds with external systems and data simply by operating on a GUI is called iPaaS.

Are you interested in "iPaaS" and "connecting" technologies?

Try out our products that allow you to freely connect various data and systems, from on-premise IT systems to cloud services, and make successful use of IT.

The ultimate "connecting" tool: data integration software "DataSpider" and data integration platform "HULFT Square"

"DataSpider," data integration tool developed and sold by our company, is a "connecting" tool with a long history of success. "HULFT Square," a data integration platform, is a "connecting" cloud service developed using DataSpider technology.

Another feature is that development can be done using only the GUI (no code) without writing code like in regular programming, so business staff who have a good understanding of their company's business can take the initiative to use it.

Try out DataSpider/ HULFT Square 's "connecting" technology:

There are many simple collaboration tools on the market, but this tool can be used with just a GUI, is easy enough for even non-programmers to use, and has "high development productivity" and "full-fledged performance that can serve as the foundation for business (professional use)."

It can smoothly solve the problem of "connecting disparate systems and data" that hinders successful IT utilization. We regularly hold free trial versions and hands-on sessions where you can try it out for free, so we hope you will give it a try.

Free product introduction seminar

Why not try a PoC to see if "HULFT Square" can transform your business?

Why not try verifying how "connecting" can be utilized in your business, the feasibility of solving problems using data integration, and the benefits that can be obtained?

I want to automate data integration with SaaS, but I want to confirm the feasibility of doing so.
We want to move forward with data utilization, but we have issues with system integration
I want to consider data integration platform to achieve DX.

PoC Program | HULFT Square

In-Context Learning (ICL)