Generated Knowledge Prompting

Glossary

"Generated Knowledge Prompting"

This glossary explains various keywords that will help you understand the mindset necessary for data utilization and successful DX.
This time, we will introduce "Generated Knowledge Prompting," a technique for utilizing conversational AI using large-scale language models such as ChatGPT.

What is Generated Knowledge Prompting?

Generated Knowledge Prompting is a technique that uses knowledge generated by large-scale language models such as ChatGPT to improve output. ChatGPT often produces more appropriate output when given appropriate auxiliary information, and this technique allows ChatGPT to generate such auxiliary information itself.
For example, if you ask a question without any knowledge and the answer is incorrect, this technique allows you to have ChatGPT output knowledge related to the question before asking the question, thereby improving the output.

Behavior that seems to be a lack of knowledge and is giving the wrong answer (but knows)

ChatGPT has become increasingly smarter recently, so it has significantly reduced the number of blatantly incorrect answers, but in the early days, hallucination problems where people answered clearly incorrect questions with confidence were common.

An image of early hallucination (created by the author in a style reminiscent of the time):

Please tell me about Kenrokuen Garden.

Kenrokuen is a Japanese garden located in Kanazawa City, Ishikawa Prefecture, Japan, and was created by Oda Nobunaga, the lord of the Kaga domain during the Edo period. It is now open to the public as part of Nagoya Castle Park, and is visited by many tourists as one of Kanazawa's most famous tourist spots.
It is a popular spot for cherry blossom viewing in the spring, barbecues in the summer, mandarin orange picking in the fall, and snow-covered mountain climbing and ice hockey in the winter, making it a popular leisure spot for all four seasons among Kanazawa residents.

Although the number of answers that are so obviously wrong has decreased significantly in recent years, this type of behavior still exists. It is still important to be careful when using the site to make sure that no false or illogical information has been mixed in with the answers.

The examples above are charming and even fun in a way, but there were also times when he made blatant mistakes when there was a clear correct answer. For example, like this:

Which prefecture is larger, Saitama or Kanagawa?

Saitama Prefecture.

Kanagawa Prefecture has a larger area, so it's a mistake, but this kind of thing happened often. Moreover, it's not like "it can't be helped if I make a mistake because I don't know," but rather, when I ask them individually, they actually know the answer.

Please tell me about the area of Saitama Prefecture.

Saitama prefecture has an area of approximately 3,797 square kilometers.

Please tell me the area of Kanagawa Prefecture.

Kanagawa Prefecture has an area of approximately 4,165 square kilometers.

In other words, it's not necessarily the case that they're making a mistake because they don't know the answer in the first place. In fact, when I asked them to give the answer above and then asked them again, they sometimes gave the correct answer.

Based on the above, which prefecture is larger, Saitama or Kanagawa?

Kanagawa Prefecture.

Knowledge-generating prompting (the concept)

Why does ChatGPT get it wrong when we're asking it what it "knows"? This behavior may seem unreasonable, but ChatGPT (a large-scale language model) is simply predicting and outputting "what word is likely to be output next," and it's not thinking logically, so this kind of strange behavior is inevitable.

Therefore, Generated Knowledge Prompting attempts to "have ChatGPT answer based on what it already knows" by separating the phase of "having ChatGPT generate knowledge" (or drawing out the knowledge it thinks is necessary for the answer) from the phase of "having ChatGPT output the answer you want it to based on that knowledge." In the example above,

Input: Which is larger, Saitama Prefecture or Kanagawa Prefecture?
It gives incorrect output

This means that you are making mistakes because you are unable to utilize the knowledge you should have, but you can take measures to prevent this by taking steps such as the following.

Generate knowledge
- When answering questions, we take the step of generating in advance "various things that we want the AI to consider and refer to"
- for example:
  Please tell me the area of Saitama prefecture → Answer correctly
  Please tell me the area of Kanagawa Prefecture → Answer correctly
Get the answers you want
- The generated knowledge can be added to the prompt, or the conversation that generated the knowledge can be continued (in the context of the conversation) to output the desired answer.
- for example:
  Based on the above, which prefecture is larger, Saitama or Kanagawa? → Answer correctly

Although questions that indirectly use knowledge are likely to lead to mistakes, if you directly ask for the necessary knowledge like this, you will often get the correct answer. Also, by having people answer in stages like this, it becomes easier to understand what the basis is and how they are responding, so if you don't get the answer you expected, it can be easier to understand what you are doing wrong.

Knowledge-generating prompting (paper version)

Before we consider how it can be used on a daily basis, let us first introduce the steps outlined in the original paper (please feel free to skip this section if you find it too difficult).

⇒ Generated Knowledge Prompting for Commonsense Reasoning（Jiacheng Liu）

knowledge generation
- Using "Few-Shot Learning prompting," a large-scale language model generates a large number of "targeted types of knowledge" related to "questions."
Generate an answer
- The original question is repeatedly asked with prompts that combine or do not combine the ``large amount of generated knowledge,'' and the answer that is obtained most often is considered to be the correct answer.

In the original paper, this technique is proposed as a way to increase the rate of correct answers when asking questions with correct answers related to "numerical knowledge," "common sense in society," or "scientific common sense," rather than simply asking the question.

In the "knowledge generation" phase, the original question is used to generate multiple "targeted types of knowledge." Rather than manually eliciting knowledge, the system uses "Few-Shot Learning prompting," where the user enters the question they want to ask in the "{question}" field below to generate knowledge. The original paper gives the following example of eliciting "numerical knowledge" (translation added by me):

Generate some numerical facts about objects. Examples:
(Generate numerical facts about the subject)

Input: penguins have <mask> wings.
(Penguins have ? wings.)
Knowledge: Birds have two wings. Penguin is a kind of bird.
(Birds have two wings, and penguins are a type of bird.)

Input: a parallelogram has <mask> sides.
(A parallelogram has a side called a ?)
Knowledge: A rectangular is a parallelogram. A square is a parallelogram.
(A rectangle is a parallelogram. A square is also a parallelogram.)

Input: there are <mask> feet in a yard.
(How many feet are in a yard?)
Knowledge: A yard is three feet.
(One yard is three feet)

Input: water can exist in <mask> states.
(Water can exist in different states?)
Knowledge: There states for matter are solid, liquid, and gas.
(Matter exists in three states: solid, liquid, and gas.)

Input: a typical human being has <mask> limbs.
(Humans typically have ? limbs.)
Knowledge: Human has two arms and two legs.
Humans have two arms and two legs.

Input: {question}
(The above is just an example of "answer like this", the source phrase for knowledge generation is entered here.)
Knowledge:

This can be thought of as few-shot prompting, guiding the AI in the direction you want it to expand on the given question to generate knowledge. Generative AIs such as ChatGPT operate probabilistically, so if you give them questions repeatedly, they will give different answers (generate knowledge) each time. In other words, simply by having them answer the questions multiple times, you can generate a large amount of "knowledge."

Now the next question is how to use the "knowledge" generated in this way.

Although it's a pain to do this manually, we create many prompts, combining and not combining the many pieces of knowledge generated from the "original question," and have the subject answer these multiple times. The answer that is returned the most times is considered the answer we obtained. For example, if you ask which prefecture is larger, Saitama or Kanagawa, and "Kanagawa" is answered 42 times out of 50, then Kanagawa is considered the answer.

Other examples of knowledge generation using "general knowledge of society" found in the paper include the following:

Generate some knowledge about the input. Examples:
(Generate knowledge about the input)

Input: Greece is larger than mexico.
(Greece is bigger than Mexico)
Knowledge: Greece is approximately 131,957 sq km, while Mexico is approximately 1,964,375 sq km, making Mexico 1,389% larger than Greece.
(Greece has an area of approximately 131,957 square kilometers, while Mexico has an area of approximately 1,964,375 square kilometers, making Mexico 1,389% larger than Greece.)

Input: Glasses always fog up.
(My glasses always fog up)
Knowledge: Condensation occurs on eyeglass lenses when water vapor from your sweat, breath, and ambient humidity lands on a cold surface, cools, and then changes into tiny drops of liquid, forming a film that you see as fog. Your lenses will be relatively cool compared to your breath, especially when the outside air is cold.
(Condensation forms on eyeglass lenses when water vapor from sweat, breath, or ambient humidity comes into contact with a cold surface, cools, and turns into tiny liquid droplets. These droplets form a film that appears as cloudiness. When the air is cold, the lenses are relatively cold compared to your breath.)

Input: A fish is capable of thinking.
(Fish can think)
Knowledge: Fish are more intelligent than they appear. In many areas, such as memory, their cognitive powers match or exceed those of 'higher' vertebrates including non-human primates. Fish's long-term memories help them keep track of complex social relationships.
(Fish are more intelligent than they appear. In many areas, such as memory, their cognitive abilities match or exceed those of "higher" vertebrates, especially non-human primates. Fish's long-term memory helps them keep track of complex social relationships.)

Input: A common effect of smoking lots of cigarettes in one's lifetime is a higher than normal chance of getting lung cancer.
(The general effect of smoking many cigarettes in a lifetime is a higher than normal chance of developing lung cancer.)
Knowledge: Those who consistently averaged less than one cigarette per day over their lifetime had nine times the risk of dying from lung cancer than never smokers. Among people who smoked between one and 10 cigarettes per day, the risk of dying from lung cancer was nearly 12 times higher than that of never smokers.
(People who smoked an average of less than one cigarette per day during their lifetime had a nine-fold higher risk of dying from lung cancer than never smokers. Among people who smoked one to 10 cigarettes per day, the risk of dying from lung cancer was about 12 times higher than never smokers.)

Input: A rock is the same size as a pebble.
(The rock is the same size as a pebble.)
Knowledge: A pebble is a clast of rock with a particle size of 4 to 64 millimetres based on the Udden-Wentworth scale of sedimentology. Pebbles are generally considered larger than granules (2 to 4 millimetres diameter) and smaller than cobbles (64 to 256 millimetres diameter).
(Pebbles are rock fragments with a grain size between 4 and 64 millimeters, based on the Wooden-Wentworth Sedimentology Scale. Pebbles are generally considered larger than grains (2 to 4 millimeters diameter) and smaller than cobbles (64 to 256 millimeters diameter).)

Input: {question}
Knowledge:

Knowledge-generating prompting (how we actually use it)

How can this way of thinking be useful when using ChatGPT and other tools on a daily basis?

The mere existence of a technique that first "generates relevant knowledge" and then "asks questions" should be useful. Or, even when you feel like you can't get a good answer, if you have the ability to remember that it's not because they "don't know" but because they "know the answer but haven't been able to draw on it," you should be able to increase the number of measures you can take when things don't go well.

Techniques for creating prompts include:
- A technique to first generate knowledge and then ask what you want to know
When you don't get the results you expected, here are some things to consider:
- In addition to cases where you "cannot answer because you don't know the question in the first place/it is beyond your ability," there are also cases where you "know the question but are unable to use your knowledge effectively to answer the question."

The original paper uses a somewhat extensive procedure, but as a prompting technique, I think it can be seen as generating the necessary knowledge in advance by asking about relevant facts, figures, and things to consider before asking the actual question you want to ask.

This will likely help us come up with more appropriate questions, as we will be able to ask related questions to deepen our understanding and then create questions that will get us the answers we really want to know.

Additionally, you could develop a "knowledge generation template" that uses few-shot prompting, as in the original paper, tailored to your field and application. Each time you ask a question, knowledge is generated in a targeted direction, and based on that, an excellent answer is generated. This "prompt template" could be an asset for your company in utilizing generative AI.

In addition to generating facts and relevant information, such as the area of Kanagawa Prefecture, it would also be possible to have the robot consider in advance what it should consider when answering. For example, if you are asking the robot to think of a catchphrase, you could have it generate a catchphrase in advance by asking questions such as "What qualities make a good catchphrase?", and then have it present a catchphrase idea based on that.

Similarities with RAG (Extended Search Generation)

This technique is also similar to the recently popular "RAG." While RAG retrieves knowledge from outside and combines it with the original question, this technique retrieves the necessary knowledge "from within the large-scale language model" and combines it with the original question. However, the idea of providing an answer by adding knowledge to the prompt is similar.

It has the same advantages as RAG, and it is relatively clear what knowledge was given to the subject to answer, and what knowledge was not given to the subject to answer. With this prompting technique, if we separate the large-scale language model used for knowledge generation from the large-scale language model that ultimately asks the questions, we should end up with a situation very similar to RAG.

For example, let's say you're asking someone to think of a catchphrase. You can have them first identify their prior knowledge and the things a good catchphrase should fulfill, and then make a note of it. Then, you can end the conversation (i.e., erase what ChatGPT remembers) and start a new conversation. Then, you can select only the things you want to use from the notes,

Please refer to the following information and the characteristics that a good catchphrase should meet to generate a catchphrase.

If you get your child to think like this, you can use the selected items to produce output. You should also be able to experiment and see how the output changes by changing what is referenced and what is not.

Furthermore, if the generated knowledge proves to be generally useful, it will become an asset for your company and can be utilized in future use of ChatGPT.

Keywords related to Generative AI/ChatGPT (for further understanding)

Generated Knowledge Prompting