Information theory
Information is the mathematical resolution of uncertainty, where "bits" measure the surprise of an outcome.
Information is the mathematical resolution of uncertainty, where "bits" measure the surprise of an outcome.
In information theory, "information" isn't about meaning or context; it’s about the reduction of uncertainty. Before you flip a coin, you are in a state of uncertainty. Once the coin lands, that uncertainty is resolved, and you have gained information. Specifically, a fair coin flip represents exactly one "bit" of information—the answer to a single yes/no question.
The more unlikely an event is, the more information it provides when it actually happens. This is called "surprisal." If you are told it’s raining in the Sahara Desert, you’ve gained a lot of information because the event was highly improbable. If you’re told it’s raining in a rainforest, you’ve gained very little. Information theory quantifies this "surprise" using logarithms, providing a bridge between raw probability and digital data.
Claude Shannon turned communication into a rigorous science, proving it was "even more profound" than the transistor.
Claude Shannon turned communication into a rigorous science, proving it was "even more profound" than the transistor.
While the discipline has roots in 1920s telegraphy work by Harry Nyquist and Ralph Hartley, it was established almost single-handedly by Claude Shannon in 1948. His landmark paper, A Mathematical Theory of Communication, transformed the "art" of sending messages into a hard branch of physics and mathematics. Historians have argued that while the transistor enabled the hardware of the digital age, Shannon provided the "soul" or the logical framework for everything that followed.
Before Shannon, engineers thought the only way to send more information was to increase power or reduce noise. Shannon proved that every communication channel has a maximum "capacity." As long as you stay below that limit, you can use mathematical codes to transmit data with zero errors, no matter how much static or interference is on the line.
Entropy measures the "lack of information," serving as the ultimate limit for data compression.
Entropy measures the "lack of information," serving as the ultimate limit for data compression.
The core building block of this theory is "Information Entropy." Borrowing the term from thermodynamics, entropy describes the average amount of uncertainty in a source of data. If a message is highly predictable—like a book written using only the letter "E"—its entropy is near zero because you already know what’s coming. If the message is a sequence of random numbers, its entropy is at its maximum.
This concept dictates how small we can "zip" a file. Compression works by identifying and removing redundancy (low entropy parts) and keeping only the essential, unpredictable information. If you try to compress a file below its entropy limit, you inevitably lose data. This makes entropy the "speed limit" for how efficiently we can store and move knowledge.
Coding theory allows us to talk to deep-space probes and browse the web through noisy airwaves.
Coding theory allows us to talk to deep-space probes and browse the web through noisy airwaves.
The practical branch of information theory is coding. There are two main types: source coding (compression) and channel coding (error correction). Source coding removes the "extra" bits to save space, while channel coding adds "smart" extra bits back in. These extra bits act like a safety net, allowing a computer to realize—and fix—data that gets corrupted by solar flares or a bad Wi-Fi signal.
This is why we can see high-definition photos from the Voyager missions billions of miles away. The signals reaching Earth are incredibly weak and buried in cosmic noise, but because of Shannon’s theorems, we can use error-correcting codes to reconstruct the original message perfectly. It is the same technology that allows your mobile phone to work in a crowded room full of competing signals.
Information theory has evolved into a "universal language" for physics, biology, and the study of black holes.
Information theory has evolved into a "universal language" for physics, biology, and the study of black holes.
Though it began in electrical engineering, information theory is now a cornerstone of modern science. In biology, it is used to understand how the genetic code functions and evolves. In linguistics, it measures the complexity of human languages. In the realm of artificial intelligence, it provides the metrics used to train neural networks and improve machine learning models.
Perhaps most surprisingly, the theory has migrated into fundamental physics. Physicists now treat information as a physical property, similar to mass or energy. The study of "Black Hole Information" and quantum computing relies heavily on Shannon’s original math, suggesting that the universe itself might be best understood as a giant processor of information.
The entropy of a Bernoulli trial as a function of success probability, often called the binary entropy function, Hb(p). The entropy is maximized at 1 bit per trial when the two possible outcomes are equally probable, as in an unbiased coin toss.
A picture showing scratches on the readable surface of a CD-R. Music and data CDs are coded using error correcting codes and thus can still be read even if they have minor scratches using error detection and correction.
Image from Wikipedia
Image from Wikipedia