Semantic Information Pursuit
In 1948, Shannon published a famous paper, which laid the foundations of information theory and led to a revolution in communication technologies. Critical to Shannon’s ideas was the notion that a signal can be represented in terms of “bits,” and that the information content of the signal can be measured by the minimum expected number of bits. However, while such a notion of information is well suited for tasks such as signal compression and reconstruction, it is not directly applicable to audio-visual scene interpretation tasks, because bits do not depend on the “semantic content” of the signal, such as words in a document, or objects in an image. In this talk, I will present a new measure of semantic information content called “semantic entropy”, which is defined as the minimum expected number of semantic queries about the data whose answers are sufficient for solving a given task (e.g., classification). I will also present an information-theoretic framework called ``information pursuit'' for deciding which queries to ask and in which order, which requires a probabilistic generative model relating data and questions to the task. Experiments on handwritten digit classification show, for example, that the translated MNIST dataset is harder to classify than the MNIST dataset.