Pagine

sabato 5 ottobre 2024

Strolling through informatics #2 – Representing data

by Enrico Nardelli

(versione italiana qua)

In informatics we often speak – improperly – of information processing. From a formal standpoint, however, the term information denotes data whose acquisition by a receiver determines a reduction in their uncertainty regarding some phenomenon. A typical example might be the percentage of votes obtained by a party in an election, which is still data, but constitutes information only if the recipient is not already aware of it. Data constitutes a single bit of information, that is, it has a unit informational value, if it reduces binary uncertainty in the receiver (that is, the uncertainty about whether something is true or false). In this regard, I should mention that in informatics the term "bit" derives from the contraction of the English expression binary digit and is used to indicate the representation of a binary value, that is, for example, true or false, using the two digits 0 and 1.

Since in informatics the perspective is not the reduction of uncertainty for those who receive the data, but rather the automaton that processes them mechanically, it is appropriate to always use the term "data" instead of "information" in an informatics context and – if we want maximum rigor – to speak of "representations," since data must necessarily be encoded in some way so that they can be processed by the automaton.

The datum in fact exists, even if abstract, independently of any material representation that might concretize it, which is chosen by us. This is clearly understood with numbers, concepts that do not have physical reality in themselves. The number "five" (that is, that number which in decimal base encoding we write as "5"), for example, is the concept that corresponds to how many objects there are in every set made of five objects, that is, the abstraction that represents what is common to all sets of five objects. The number "five" therefore exists in the world of abstractions, and it is I who can choose to represent it concretely through bits (101), with the alphabet (five), or with other encodings. An alternative term to representation is in fact "encoding," and we can consider them synonymous.

Let us also observe that, while the meaning of numbers can be formalized without too much difficulty, for most words this objective is extremely elusive because it involves interpretation by the receiver. This is true within the same language, but even more so between different languages, an aspect that constitutes one of the major difficulties of translation. Consider, for example, that "casa" corresponds in English to both house and home. The string "casa" is a representation, that is, data expressed in a material form, which for human beings is a symbol of (that is, a sign that refers to) a "meaning" that is uniquely determined only within a certain linguistic community. For example, the string "camera" has the meaning of "room" for Italian but "photographic camera" for English. When we speak of informatics in relation to computers (we will use this term or the equivalent English "computer" interchangeably) we use the term "representations" and not "symbols" because, even though from the human point of view those representations are indeed symbols of something meaningful, for the automaton they have no meaning, nor do their elaborations possess meaning – for the automaton.

Representations can be classified into two broad categories: analog and digital. The first is the one that, until a few decades ago, was the most used in human history. A typical example is the position of hands on a dial or the length of a shadow to indicate the time of day. The second – based on which for the same example time is represented through digits – now characterizes contemporary society, which for this very reason is called "digital society." In analog representation there is a proportion, an analogy. The more the hand has moved from its initial position or the longer the shadow is, the greater the time that has passed. In digital representation there are a series of arbitrarily chosen signs, digits, to which we assign a value. Let us observe, however, that the representation of quantities through digits has been used by humanity for millennia: the Babylonians used it for calculating astronomical orbits and the Egyptians used it for calculating land areas. Every population represented quantities with their own set of signs. The "digital" is therefore not a modern phenomenon.

Even the mechanical and automatic processing of representations can be realized in analog or digital mode. The first mechanical arithmetic calculators performed additions and subtractions through movements of rods or wheels, that is, with analog manipulation of analog data. Modern computers, instead, process digital representations digitally, that is, those consisting of digits. In this case the sum of two values, for example, is not the total length of two rods (each of which represents a value) placed in a row, but is the result of the addition between two numbers each of which is the digital representation of the number. The ten digits of our representation system (called precisely the "decimal system") are replaced, within computers, by a binary system (that is, one that uses only the two values "zero" and "one") for simple technological convenience. For a modern computer based on electricity, having only two values to represent constitutes a tremendous simplification, which leads to obtaining smaller and faster computing devices. Computers therefore adopt binary encoding for the representation of values.

Let us complete this section by observing that alphabetic characters can also be represented through binary encoding, progressively associating a binary representation to the various letters. This is what was done with the famous ASCII code (the universal standard for the binary representation of alphabet characters) which encodes the letter 'A' with the binary representation '01000001', corresponding to the decimal number '65', the letter 'B' with '01000010', corresponding to '66' and so on. Proceeding in a similar way, methods can be defined for constructing binary representations of images, sounds and videos. An in-depth study on data representation using the binary system can be found in the educational guide of the "Programma il Futuro" project available at this link https://programmailfuturo.it/come/cittadinanza-digitale/come-funzionano-i-computer/dati-e-sistema-binario

[[The posts in this series are based on the Author's book (in Italian) La rivoluzione informatica: conoscenza, consapevolezza e potere nella società digitale, (= The Informatics Revolution: Knowledge, Awareness and Power in the Digital Society) to which readers are referred for further reading]].

--
The original version (in italian) has been published by "Osservatorio sullo Stato digitale" (= Observatory on Digital State) of IRPA - Istituto di Ricerche sulla Pubblica Amministrazione (= Research Institute on Public Administration) on 2 October 2024.

Nessun commento:

Posta un commento

Sono pubblicati solo i commenti che rispettano le norme di legge, le regole della buona educazione e sono attinenti agli argomenti trattati: siamo aperti alla discussione, non alla polemica.