When the aurora appears, earthquakes occur, and when diaper sales lead to beer sales - it's surprising to see "unexpected correlations" in the world

This is Watanabe from the marketing department.
This is a column that casually writes about various topics related to data, IT, etc.

A topic about how "unexpected associations" can be found in data

This time, I would like to talk about how data analysis can sometimes reveal ``unexpected correlations.''

Here are the news items we'll be discussing directly:
⇒Do large solar flares trigger earthquakes? - Kyoto University proposes electrostatic coupling model between the ionosphere and the Earth's crust | TECH+

This is a topic that raises the question, "What is going on?", about the possibility that "huge explosions on the surface of the sun" (solar flares) that produce auroras may be related to "earthquakes on Earth."

Many people may not understand this, but I will explain what it means below (to the best of my knowledge).

When the aurora appears, earthquakes occur, and when diaper sales lead to beer sales - it's surprising to see "unexpected correlations" in the world

Why does the sun shine?

First, let me tell you something that may change your image of what the sun is. When you think of the sun, you may have a gentle, idyllic image of it, like "it's bright outside because of the sun" or "it's nice and warm," but in reality, incredible things are happening inside the sun that are beyond imagination.

The Sun is different from planets like Earth; it is a "giant mass of hydrogen gas." There is no ground (rock) like on Earth. When we hear the word "gas" in our everyday sense, we may imagine it as something light and easily carried away by just the wind, but the Sun is made up of "an enormous amount of gas," and the "massive weight of the gas" exerts tremendous pressure at the center of the Sun.

The sun is made of hydrogen gas and shines, but it is not the gas that is burning (it reacts with oxygen and burns chemically) that occurs in our daily lives. If we were to assume that the sun shines as brightly as it does now through combustion, it would burn out in a few thousand years. This would be shorter than the history of mankind, which doesn't make sense, and there was a time when it was a mystery how the sun shines.

At the center of the sun, the weight of the sun's own massive gases exerts unimaginable pressure on the hydrogen atoms themselves, crushing them and causing a nuclear fusion reaction that converts four hydrogen atoms into one helium atom, resulting in the glow. The sun shines and illuminates the Earth because of this extreme world, where the hydrogen atoms themselves scream and are transformed into other substances.

What are solar flares (and what causes them)?

The topic this time is about the connection between something called a "solar flare" and "earthquakes." I'm sure you know what an earthquake is without any explanation, but what is a "solar flare"?

A solar flare is an explosion that occurs on the surface of the sun.

As explained, the interior of the sun is an "awesome" world. Nuclear fusion is occurring in conditions that would crush even the atoms themselves, and because nuclear fusion is constant, the temperatures are incredibly high. Furthermore, "enormous electric currents" flow inside the sun, and these strong electric currents also generate "strong magnetic fields" inside the sun. With the intense heat, electric currents, and magnetic fields, tremendous energy swirls inside the sun.

There are dark dots on the surface of the sun called "sunspots." Because they are fairly easy to see from Earth, the existence of sunspots has been known since the Middle Ages. People have been troubled by the question of why such ugly things exist on the sun, which is supposed to be a perfect object in the heavens, and have continued to observe the sunspots as they move and increase and decrease in number on the sun's surface, hoping to read something from them.

We now know that sunspots form in places where strong magnetic field lines extend beyond the surface of the sun. Sunspots are observed to move from east to west over time, but this is because the sun also rotates. The gas on the sun rotates, and it has been observed that the magnetic field lines move along with it.

However, although the Sun rotates, it is not made of rock (solid material) like the Earth, so it does not rotate in unison with the entire star (for example, in 24 hours).

The Sun's rotation speed differs near the equator and near the poles, and the rotation speed of the gas also differs depending on the depth, creating a complex situation. Furthermore, because the center is extremely hot, thermal convection occurs from the center of the Sun to the surface, causing the gas inside the Sun to behave in a complex manner. As a result, something extraordinary is happening: "high-energy magnetic field lines are being stretched and twisted in complex ways inside the Sun due to the behavior of the gas."

Magnetic field lines cannot continue to twist indefinitely, so when the "twisting of magnetic field lines" exceeds its limit, a destructive "reconnection of the magnetic field lines" occurs, which is like unwinding the twist. This releases a large amount of energy from the magnetic field lines, causing a "huge explosion on the surface of the Sun." This is called a "solar flare."

Solar flares hitting the Earth

The inside of the sun is an extremely intense environment, and solar flares occur when the extremely strong magnetic field lines can no longer withstand the twisting, so they release a lot of energy. Powerful electromagnetic waves such as X-rays are emitted, and "plasma" (matter that has become so hot that it has surpassed gas) and "charged particles" (charged fragments like atoms torn off in the intense conditions) from the sun are also ejected into space.

And these "high-energy dangerous things" will be blown towards the Earth.

Solar flares range in size from small to very large, and are classified into "A," "B," "C," "M," and "X" in increments of 10 based on the intensity of the X-rays emitted. The reason for the classification is that large flares can cause damage to Earth, and Class X is a dangerous solar flare that "may cause damage."

Specifically, solar flares can destroy artificial satellites. They can disrupt communications, cause geomagnetic storms that make it difficult to determine direction, and even affect the power grid, shutting down power plants and causing large-scale blackouts. For a modern society that relies on electricity and IT, solar flares are dangerous space disasters that can paralyze and destroy the social infrastructure that supports them.

Solar flares are harmful to living organisms, but the Earth's magnetic field prevents them from hitting the Earth's surface directly. If Earth were a planet without magnetic fields, there might not be any living organisms. However, not everything is prevented.

Even so, high-energy particles from the sun are easily blown into the skies above the North and South Poles, where magnetic field lines emerge from the Earth. These high-energy particles can react with the Earth's atmosphere and emit light. This is the "aurora" that shines in the night sky. The aurora, which is now a popular tourist attraction, can be traced back to its origins as "magnetic field lines twisted inside the sun."

To put it in Chico's words, "The aurora shines because the magnetic field lines are twisted by the sun!", but omitting the middle part of the explanation makes it hard to believe.

On rare occasions, auroras can be seen from Hokkaido or Tokyo (low-latitude auroras), but this occurs when an "ultra-powerful solar flare" occurs, sending high-energy particles into areas other than the Antarctic and Arctic. In other words, low-latitude auroras are a phenomenon that occurs when "something extraordinary happens on the sun."

Large-scale solar flares can cause damage, so the idea that "the beautiful aurora can be seen in Japan" actually refers to a space disaster-like situation that could cause major damage to social infrastructure. Solar flares are also harmful to living things. In the past, people in medieval Europe and ancient China often considered auroras to be "ominous," thinking they were "bad omens" such as "war," "disaster," "divine wrath," "epidemics," and "signs of major political events," but this perception may have been quite accurate from a scientific perspective.

"Is there a connection between large-scale flares and earthquakes?" Mystery

The theme of this article is whether the occurrence of these "large-scale solar flares" somehow triggers earthquakes on Earth.

Intuitively, it's hard to imagine a connection, so it makes you wonder if it's true, but the timing of the incidents has led people to believe there may be some kind of connection. For example, recently, even just in Japan,

  • Early morning of January 1, 2024: A large X-class solar flare occurs
  • January 1, 2024, evening: Noto Peninsula earthquake with maximum seismic intensity of 7 occurs

The timing of these events was too close to be coincidence, and

  • December 8, 2025, afternoon: X-class solar flare occurs
  • Late night on December 8, 2025: An earthquake occurs off the east coast of Aomori Prefecture, with a maximum seismic intensity of 6+.

Here too, a major earthquake occurred "some time after the large-scale flare occurred."

Earthquakes are physical phenomena that occur on the Earth's rocky surface, where something extremely heavy moves on a large scale. In contrast, solar flares are high-energy, but the material they release is light.

It is difficult to imagine how a solar flare, which is high in energy but not very massive, could cause such a massive phenomenon (earthquake). Moreover, it seems that there is a time lag before the effects are felt.

The paper's theory on how large solar flares trigger themselves

The paper considers and proposes a mechanism for why the two phenomena are related.

Earthquakes occur at faults and plate boundaries, where rocks rub against each other and break apart. These places are called "fracture zones," and the paper suggests that the connection with solar flares can be explained by considering fracture zones as giant natural capacitors (think of them as "something that stores electricity").

(The following is a bit complicated, so if you don't understand, feel free to skip it.)

  • A massive explosion on the sun's surface releases electrically charged particles
  • When they reach Earth, most of the particles are physically blocked by the Earth's magnetic field, but some react with the atmosphere and cause auroras and geomagnetic storms on Earth.
  • At that time, the particles carry strong electrical energy, causing a phenomenon called "electricity accumulating in the upper atmosphere."

Then, think of the Earth electrically as follows:

  • It is believed that electricity accumulates "high in the atmosphere" to produce auroras.
  • It is believed that electricity also accumulates "between the earth's surface and the upper atmosphere"
  • Electricity also accumulates in the "underground fracture zone" (a natural capacitor)

If you think about it like that, the distance from the ground to the sky is like "three capacitors connected in series." In an electronic circuit like this, even if there is no movement of matter or charge beyond the capacitors, if "electricity accumulates in the top capacitor due to a solar flare," that voltage will "affect the bottom capacitor underground."

  • Electricity accumulated in the upper atmosphere passes through the capacitor between the Earth's surface and the atmosphere, causing electrical effects in places underground where electricity tends to accumulate.

Based on this model, if a powerful solar flare of even X class were to occur, it could be calculated that the electric charge that could accumulate in the Earth's upper layers could exert a force strong enough to cause crustal destruction in the fracture zone, and so the paper suggested that this could be the mechanism by which earthquakes are caused.

However, this is merely a case of "proposing such a model," so it is necessary to verify whether this idea is correct in light of other phenomena and observational data.

"The aurora shines because the magnetic field lines are twisted by the sun!" But the connection is already so complicated that it leaves you wondering, and then there's the story of "because there are three capacitors." It's as complicated and bizarre as if a barrel maker makes money when the wind blows, but "it might actually be that way." Reality is an amazing thing, and many things are connected to each other in unexpected ways.

A famous story about the surprising connection between diapers and beer

There are more cases in the world where unexpected things are related to each other than you might think.

The term "data mining," which refers to the process of discovering new facts by analyzing data, became widely known in the 1990s. This was an early example of the now-popular "data utilization efforts" that involve the large-scale use of data using computers.

At the time, one example that became widely known as an "amazing example" of a discovery made through data mining was the analysis of sales data collected in an American supermarket in 1992, which revealed the surprising fact that "many people buy diapers and beer at the same time."

This too leaves one wondering, "Why?" and "I don't understand why?" When one looks into why this is the case, one finds that it is often the case that a husband who is sent on an errand by his wife to "buy diapers" will "also buy a beer for himself" (however, this was in America at the time, and the same thing may not be happening in Japan).

Once you understand that these items often sell at the same time, you can aim for impulse purchases by casually placing beer near diapers (although you don't know how it works), but if you also understand the mechanism behind "husbands buying them when they run errands," you can aim to increase sales even further by placing other items that people are likely to buy by accident, not just beer.

If I were to suggest placing beer near the diaper section, I'm sure I'd be asked, "What are you talking about?" But that's the trend I found. Of course, there are many other surprising things like this to consider, not just this one case. The world is so complex!

However, it is difficult to "find unexpected correlations"

There are stories that make you realize something that makes you think, "Who could have thought that A and B were related?", and they really do give you the feeling of "discovering a new fact from data." Examples like "diapers and beer" and "solar flares and earthquakes" are perfect examples.

These kinds of initiatives seem great, and you might think, "Let's try using data in this way in our own company!" However, when you actually try it, you often find yourself faced with unexpected "data realities."

There is no data

It's a common occurrence when it comes to data utilization: management takes the lead in launching a company-wide initiative to utilize data, and plans to do various things, but when they try to get started, they realize that "there is no data in the company to begin with," and the plan quickly comes to a halt.

A while ago, the term "data scientist" was all the rage, but those who studied and mastered data analysis techniques and thought they were ready to become data scientists often found themselves faced with a difficult reality: there was no data to analyze.

There is no data available

This is also close to a situation where there is no data. If you think about it, it's not that there is no data at all, but it's scattered around the company in Excel spreadsheets on-site, and you don't know the full picture of what's where, and the formats and data are all different.

In such cases, before analyzing the data, it can be necessary to understand where the data is located, and before using the data, to confirm "what kind of data is this and what its history is," and then to perform "pre-processing to prepare the data in a format that can be used for analysis," which can be very time-consuming.

People often introduce BI tools in the hope of improving their data analysis environment, but the process of preprocessing and inputting the necessary data into the BI tools is so difficult that they often give up and don't finish.

"Obvious things" that must be excluded in large quantities

Once you have prepared (or rediscovered) the data and gone through the painstaking preprocessing of the data, it is not necessarily the case that new discoveries like "beer and diapers" will be made one after another. In reality, when you conduct such an analysis, "discoveries" like this will be output one after another.

  • People who buy rice balls often also buy green tea.
  • People who live near a train station are more likely to commute by train
  • Many people use umbrellas on rainy days.

Anyone can understand this without having to go to the trouble of analyzing data, but analysis tools output a lot of information like this.

In other words, the true greatness of "Diapers and Beer" may not be what it discovered, but rather that it excluded the commonplace things that appear in large numbers.

This is also something you'll only understand if you try it, so it's also a lesson that there are things in this world that only those who have tried them can understand.

The problem of data being too different when fields are too different

"Large Flares and Earthquakes" allows us to consider another difficulty: it crosses the academic boundaries of space astronomy and geophysics.

Different fields involve different people. Not only is there little communication between fields that are far apart, but the terminology and ways of thinking about data can be completely different. It can be difficult to know which concepts correspond to which, whether data that looks similar can really be considered the same data, and how to convert data if there are differences.

This can happen even within a company when the departments and related fields are significantly different. For example, it's common for the sales department, manufacturing department, and accounting department to have different definitions of what "this month's sales" means, resulting in different specific figures. If you're not careful, you could analyze data without the right information, leading to strange results.

Neither "data relevance" nor "the task of finding relevance" can often be predicted in advance.

This time, I talked about how "real-world data has truly unexpected correlations." And I also talked about how, when we try to learn from such things and "find unexpected correlations," "even there, unexpected things happen and we have trouble." The world is really tough.

There are various approaches to data utilization. For example, if you are looking to "summarize this month's business situation numerically, graph it with a BI tool, and visualize it as a management dashboard," it is clear what data is needed to utilize the data, and the tasks that need to be performed are relatively clear, so there are few unexpected issues.

Even so, there may be times when management says, "We need this data too, so please graph it and make it visible as soon as possible," and you have to rush to respond using a no-code product, but it doesn't feel like you have to do anything completely unexpected. However, it's also true that such data utilization rarely leads to surprising new findings.

Discovering surprising facts about data that your competitors would never know would naturally be a great advantage for your company. However, combining data that you hadn't anticipated in advance and finding unexpected relationships can be a difficult endeavor, and the effort of "discovering" itself can be difficult and prone to unexpected occurrences. What is needed to successfully utilize data when venturing into such a difficult field?

We need an environment that can flexibly "connect" a wide variety of data as needed, that can not only quickly and efficiently process data collaboratively without coding, but also create full-scale data pre-processing as needed, and that has high processing power that can process large amounts of data quickly.We also need an environment that can "connect" data to various things as needed, from machine learning and generative AI to various analytical engines.

Our "DataSpider" and "HULFT Square" have been supported for many years as data integration tools that are useful in the "reality of data" in exactly such situations.

We are sometimes asked, "Do you really need a dedicated tool just to connect data?" or "Isn't it enough if you can just move data from right to left?" (There are plenty of "simple tools" like that out there), or "Do you really need the ability to customize processing or high processing performance?"

However, in actual efforts, our products are equipped with the functions and capabilities to not only solve simple, predictable problems, but also to fully grapple with the gritty reality of data when such problems arise. If you want to succeed in data analysis that would leave other companies exhausted (such as discovering the relationship between diapers and beer), we hope you will consider using our products.

The person who wrote the article

Affiliation: Marketing Department, Digital Marketing Division

Ryo Watanabe

・2017: Transferred from Appresso Co., Ltd.
After majoring in information engineering (artificial intelligence lab) at university, I struggled in the development department of a startup.
・Small and medium-sized enterprise management consultant (as of 2024)
・Image: I took over the "Fukusuke" name that was previously used by our company.
(Affiliations are as of the time of publication)

Related Content

Return to column list