Claude Shannon, often called the father of the Internet and AI, introduced a concept called information content. This refers to the amount of information a piece of data can convey.
For example, consider the letter "A", which in ASCII is represented as 01000001. By itself, this letter conveys limited information. Dictionary.com lists two potential meanings:
Used when referring to someone or something for the first time in a text or conversation.
"A man came out of the room."
Used to indicate membership of a class of people or things.
"He is a lawyer."
Now, let's add more data in the form of two additional letters: "C" and "T". This is akin to what companies do when they start merging datasets. Adding "C" and "T" leaves us with six possible combinations:
CAT = 01000011 01000001 01010100
CTA = 01000011 01010100 01000001
ACT = 01000001 01000011 01010100
ATC = 01000001 01010100 01000011
TAC = 01010100 01000001 01000011
TCA = 01010100 01000011 01000001
Among these, we have four nonsense combinations and two meaningful word, "CAT" and “ACT.” The information content of the letter "A" is increased because we added context (even though it came with noise). Similarly, "C" and "T", meaningless in isolation, now have information content. When an IT or Data Department says they need MORE data to make the system work, that means they are still looking for the missing high information content data “C” and “T.”
In addition, people often overlook Shannon's corollary: increasing data also increases uncertainty. The four nonsense words represent noise. The price we paid for unlocking the hidden information in the three letters was the introduction of this noise. But once the meaningful information is identified, the extraneous data can be discarded. We only need to save CAT = 01000011 01000001 01010100.
Big data (low information content) becomes small data (high information content). That noise is also an exponential addition which ultimately will limit how much data can be ingested and processed before the system can extract the desired information content.
This is why adding more data to an AI system does not always improve accuracy. In an optimized system, where the AI ingests only high-information-content data, each new piece of data should increase accuracy. The focus should be on adding data with high information content. If you have "A", you want to add "C" and "T", not "Z" and "F". And even when you add those high information content letters, you want to prune the noise you created. Adding low-information-content data makes the system more expensive and less efficient.
Before you start pruning your data to save on IT costs, we need to discuss complicated versus complex data.
Complicated data doesn’t change. In our example, the word "CAT" always refers to a feline.
Complex data, however, evolves. At Time = 1, "CAT" may refer to a pet, but at Time = 2, "ZAF" might take on that meaning, and "CAT" no longer applies.
AI and LLMs (large language models) excel at extracting hidden information content from complicated data, where relationships are stable. However, they struggle with complex data, where relationships change. Humans constantly change—we are complex data—in accordance with and reaction to a constantly changing world—additional complex data. Using only past data about human behavior will not capture the impacts of change—it will not capture complex data. If you expand your time horizon and only look for longer term trends (See the Copernicus Principle), you can treat complex data like complicated data. But this will only work for long term trends with correspondingly long term data about past behavior.
So my advice: focus on the problem set and your data, not the “AI.” Did the addition of AI improve the desired result? If so, what data did it use and what data can you dispense with? Does adding more of the useful data improve the desired result? If you have high information content data, you’ll be fine even with the most basic AI.
If you want to operate with micro-data and achieve results, you need data about the future. Instead of the using the past to predict the future, you need to harness the complexity of human behavior to capture predicted beliefs about the future. But that is Belief Data which most of you ignore.
-J