Link&Think: Strolling through informatics #12 – Synchronization among multiple agents

(versione italiana qua)

As described in the introductory post on distributed computing, a second significant problem is that of synchronization, where the various executors don't need to coordinate their work, which each one performs autonomously, but must avoid interfering with each other when using resources that everyone needs.

We'll discuss this through the example of an office where two employees, Rossi and Bianchi, must each check their own files without having to coordinate their management. For simplicity, let's assume their work consists only of checking, followed by either rejecting or approving the file under examination. File approval happens by sealing the decision with a special stamp made of two parts located in two different places. If the two employees don't implement some form of synchronization, they might find themselves in a situation where one has taken one part of the stamp, the other has taken the other part, and neither can approve their file. We'll exclude particular strategies that people might implement in cases like these, for example going to find the other person responsible and agreeing that one gives up their part to the other so they can stamp their file and then receive back the complete stamp to approve their own.

This is the simplest possible example of synchronization. It can be solved by having both agents execute the same algorithm, called Peterson's algorithm, which I won't report entirely here due to its formal complexity, referring to my book La rivoluzione informatica for its detailed explanation and providing an intuitive description below.

The algorithm assumes there are three "global variables" which, in the context of an office like the one we're considering, we can think of as three bulletin boards that all employees can read. Each bulletin board is identified by a name: Rossi_wants_to_stamp, Bianchi_wants_to_stamp, Now_it's_the_turn_of. At the beginning of the algorithm, FALSE is written on each of the first two bulletin boards, while what's written on the third is irrelevant.

The algorithm continuously repeats the sequence of actions described below.

The employee who needs to approve their file declares they want to stamp, that is, writes TRUE on the bulletin board that begins with their name, and writes the other employee's name on the bulletin board that expresses whose turn it is.

After this, they check if it's simultaneously true that the other employee wants to stamp and that it's their turn, and if so (let's call this eventuality a "red light") they wait for this situation to become false.

When this happens (i.e., there's no longer a red light) then (1) they go get both parts of the stamp, (2) approve the file, (3) release both parts of the stamp. These three actions are technically defined as the "critical section," because they're the ones that must be executed by only one automaton at a time, to avoid causing the deadlock situation described at the beginning.

After executing the critical section, the employee writes FALSE on the bulletin board that begins with their name and continues checking their files.

Having finished describing the algorithm, let's show that indeed the critical section can be executed by only one automaton at a time and that only one automaton at a time can be in the waiting state.

Let's assume Bianchi is executing their critical section and that Rossi is waiting. Then Rossi will be able to enter their critical section only when Bianchi exits theirs and writes FALSE on the Bianchi_wants_to_stamp bulletin board (thus making false the condition that was keeping Rossi waiting, or removing the red light). When this happens, Bianchi is checking files and Rossi enters the critical section: after a while Rossi will also exit their critical section and move on to checking files.

Let's now imagine that, while Bianchi continues checking files, Rossi declares they want to stamp. If Bianchi continues always checking files then Rossi doesn't have to wait (because FALSE is written on Bianchi_wants_to_stamp) and enters their critical section. If, during this phase when Rossi is in their critical section, Bianchi also declares they want to stamp, then they'll find the red light (because TRUE is written on Rossi_wants_to_stamp and moreover it's written on the turn bulletin board that it's Rossi's turn) and will remain waiting. We're therefore in the previously described situation, with roles reversed between Rossi and Bianchi.

Let's see instead what happens if, right after Rossi has declared they want to stamp, Bianchi also writes TRUE on the Bianchi_wants_to_stamp bulletin board. Then it can't happen that both go into the waiting state because, even if they try to write simultaneously on the bulletin board that expresses whose turn it is, they can only write one after the other and in the end either Bianchi or Rossi will be written on the bulletin board. In the first case, it's Rossi who waits and Bianchi enters the critical section; in the second case, the opposite happens. If instead the attempt to write on the turn bulletin board, after both have declared their intention to stamp, is not simultaneous but happens first (for example) by Rossi, then Rossi will enter the waiting state, but only momentarily because when Bianchi then comes to write on the turn bulletin board, they'll make false the condition that was blocking Rossi in the waiting state, allowing the latter to enter their critical section while they (Bianchi) wait.

In the next post we'll examine the third important problem of distributed computing, that of consensus.

[[The posts in this series are based on the Author's book (in Italian) La rivoluzione informatica: conoscenza, consapevolezza e potere nella società digitale, (= The Informatics Revolution: Knowledge, Awareness and Power in the Digital Society) to which readers are referred for further reading]].

--
The original version (in italian) has been published by "Osservatorio sullo Stato digitale" (= Observatory on Digital State) of IRPA - Istituto di Ricerche sulla Pubblica Amministrazione (= Research Institute on Public Administration) on 11 December 2024.

Link&Think

Pagine

sabato 14 dicembre 2024

Strolling through informatics #12 – Synchronization among multiple agents

Nessun commento:

Posta un commento