Cloud Notes of Technical Issues in Distributed System

Cloud Notes of Technical Issues in Distributed System

  1. Time Synchronization
  2. Coordination and agreement
  3. Transactions and concurrency control

Time synchronization

Timing is important, for accurately.

Computers each have their own physical clocks

Due to the structural differences between servers, different time drifts are generated after a period of time, so that the physical clocks of different servers differ to some extent. As a direct result, event A may occur in a later order than event B, but the timestamp sent over is indeed less than B. If the synchronisation of state is involved B’s data will overwrite A’s data, which we don’t want to see.

  • Electronic devices that count oscillations occuring in a crystal at a frequency.

  • Operating System reads the hardware clock value.

  • Not perfect

    • Clock skek: the instantaneous difference between the readings of any two clocks
    • Clock drift: different crystal-based clock count time at different rates
      • Temperature matter
      • Drift rate: The change in the offset between the clock and a nominal perfect reference clock per unit of time

External syncronization

Synchronize a group of clocks with an authoritative external source of time
For example, UTC: Coordinated Universal Time
Network Time Protocol(NTP)

Process Time: t+T(round)/2

Internal syncronization

Synchronize between a group of computer. A coordinator computer is chosen to be the master. Other computers are slaves. Master periodically polls the slaves, and the slaves send back their clock values.

  • Berkeley Algorithm
  • Cristian’s Method

Distributed Mutual Exclusion

  1. safety - at most one process can execute at a time
  2. liveness - requests to enter and exit the critical section eventually succeed, freedom from deadlock and starvation
  3. Ordering - entry to thee critical section is granted in that order.

Evaluated by:

  1. Consumed bandwidth
    • required two messages to enter the critical section(request message & grant message)
    • required one messages to exit the critical section(a release message)
  2. Client delay
    • Round-trip delay
  3. Throughput(synchronization delay)
    • THe time for a release messages to the derver and a grant message to the next process.

Coordination and agreement

Transations and concurrency control

Motivation of Synchronization

  1. Recoverable to handle process crash
  2. Multiple clients access the same object concurrently
  3. Atomic operation

Atomicity Transactions “原子不可分割”

  1. All or nothing
    • either completes successfully
    • either has no effect at all
  2. Isolation
    • Each transaction must be performed without interference from other transactions
    • No observation

Concurrency Control

  1. Lost update
    • Use old value to calculate a new value
  2. inconsistent retrievals
    • Transaction observes values that are involved in an ongoing updating transaction

Rules of Serial Equivalence

All pairs of conflicting operations of the two transactions be executed in the same order

FIFO?

Locking

  • Exclusive lock - Pessimistic Lock
    Only one can access the object at the same time
    Assuming that concurrency conflicts will occur, block any operations that may violate data integrity.

Java synchronized is an implementation of pessimistic locking, where every time a thread wants to modify data it first obtains a lock, ensuring that only one thread can manipulate the data at any one time, while the others are blocked.

  • Optimistic Lock
    Timestamp/version
    When the update is committed, check the timestamp of the data in the current database and compare it with the timestamp you got before the update, if it is the same then it is OK, otherwise it is a version conflict.

  • Two Phase lock

  • Deadlock

    • Detection:
      • Find cycles in the wait-for graph
      • Select a transaction for abortion to break the cycle
    • Timeout
  • Read/Write Locks

    • read lock before performs read operation
    • write lock before performs write operation
    • write lock is more exclusive

Optimistic concurrency control

Checks “conflict operations” before commit
If yes, aborts it and the client may restart

Timestamp ordering

Record the most recent time of reading and writing of each object
Compare timestamp => determine it can be done immediately or must be delayed or rejected.

Clusters

Benefits of computer clusters include

  1. Scalable performance
  2. High availability
  3. Fault tolerance
  4. Modular growth
  5. Use of commodity components

Attributes of Computer Clusters

  • Scalability
  • Packaging
    • Compact packaging: closely packaged in racks
    • Slack packaging: Located in different locations
  • Control
    • Centralized
    • Decentralized
  • Homogeneity
    • Homogeneous cluster: Node from the same platfrom
    • Heterogeneous cluster: Node from the different platfrom

Architecture

  • OS should be designed multiuser, multitasking and multithreaded
  • interconnected by fast commodity networks
  • Cluster middleware glues together all node platforms at the user space

Design principles of Clusters

  • Single-System image (SSI)
    The same client will see the same view of the service no matter which machine in the cluster it connects to.
  • Reliability
    • operate without a breakdown
  • Availability
    • percentage of time available to the user
  • Servoceability
    • maintenance/repair/upgrades etc.

Operate-Repair cycle

  • Mean time to failure
    • average time of fails
  • Mean time to repair
    • average time to fix(restore)

Type of Failures

  1. Unplanned failures vs. planned shutdowns
  2. Transient failures vs. permanent failures
    • reboot can fix
  3. Partial failures vs. total failures
    • part of the system, the cluster still usable

Fault-Tolerant

  • Host standby
    only primary nodes are actively doing the useful work
    Standby nodes are powered on and running some monitoring programs
  • Active-takeover
    All servers are primary and doing useful work.
    User may experience some delays or may lost some data
  • Failover
    When a component fails, it allows the remaining system to take over the services

Failure Cost Analysis

  • MTTF, MTTR
  • Avilability(%)
  • The downtime per year(hours)
  • The yearly failure cost

Cloud Notes of Technical Issues in Distributed System

https://blog.kwunlam.com/Cloud-Notes-of-Technical-Issues-in-Distributed-System/

Author

Elliot

Posted on

2021-03-20

Updated on

2023-05-07

Licensed under