Cloud Notes of Technical Issues in Distributed System
Cloud Notes of Technical Issues in Distributed System
- Time Synchronization
- Coordination and agreement
- Transactions and concurrency control
Time synchronization
Timing is important, for accurately.
Computers each have their own physical clocks
Due to the structural differences between servers, different time drifts are generated after a period of time, so that the physical clocks of different servers differ to some extent. As a direct result, event A may occur in a later order than event B, but the timestamp sent over is indeed less than B. If the synchronisation of state is involved B’s data will overwrite A’s data, which we don’t want to see.
Electronic devices that count oscillations occuring in a crystal at a frequency.
Operating System reads the hardware clock value.
Not perfect
- Clock skek: the instantaneous difference between the readings of any two clocks
- Clock drift: different crystal-based clock count time at different rates
- Temperature matter
- Drift rate: The change in the offset between the clock and a nominal perfect reference clock per unit of time
External syncronization
Synchronize a group of clocks with an authoritative external source of time
For example, UTC: Coordinated Universal Time
Network Time Protocol(NTP)
Process Time: t+T(round)/2
Internal syncronization
Synchronize between a group of computer. A coordinator computer is chosen to be the master. Other computers are slaves. Master periodically polls the slaves, and the slaves send back their clock values.
- Berkeley Algorithm
- Cristian’s Method
Distributed Mutual Exclusion
- safety - at most one process can execute at a time
- liveness - requests to enter and exit the critical section eventually succeed, freedom from deadlock and starvation
- Ordering - entry to thee critical section is granted in that order.
Evaluated by:
- Consumed bandwidth
- required two messages to enter the critical section(request message & grant message)
- required one messages to exit the critical section(a release message)
- Client delay
- Round-trip delay
- Throughput(synchronization delay)
- THe time for a release messages to the derver and a grant message to the next process.
Coordination and agreement
Transations and concurrency control
Motivation of Synchronization
- Recoverable to handle process crash
- Multiple clients access the same object concurrently
- Atomic operation
Atomicity Transactions “原子不可分割”
- All or nothing
- either completes successfully
- either has no effect at all
- Isolation
- Each transaction must be performed without interference from other transactions
- No observation
Concurrency Control
- Lost update
- Use old value to calculate a new value
- inconsistent retrievals
- Transaction observes values that are involved in an ongoing updating transaction
Rules of Serial Equivalence
All pairs of conflicting operations of the two transactions be executed in the same order
FIFO?
Locking
- Exclusive lock - Pessimistic Lock
Only one can access the object at the same time
Assuming that concurrency conflicts will occur, block any operations that may violate data integrity.
Java synchronized is an implementation of pessimistic locking, where every time a thread wants to modify data it first obtains a lock, ensuring that only one thread can manipulate the data at any one time, while the others are blocked.
Optimistic Lock
Timestamp/version
When the update is committed, check the timestamp of the data in the current database and compare it with the timestamp you got before the update, if it is the same then it is OK, otherwise it is a version conflict.Two Phase lock
Deadlock
- Detection:
- Find cycles in the wait-for graph
- Select a transaction for abortion to break the cycle
- Timeout
- Detection:
Read/Write Locks
- read lock before performs read operation
- write lock before performs write operation
- write lock is more exclusive
Optimistic concurrency control
Checks “conflict operations” before commit
If yes, aborts it and the client may restart
Timestamp ordering
Record the most recent time of reading and writing of each object
Compare timestamp => determine it can be done immediately or must be delayed or rejected.
Clusters
Benefits of computer clusters include
- Scalable performance
- High availability
- Fault tolerance
- Modular growth
- Use of commodity components
Attributes of Computer Clusters
- Scalability
- Packaging
- Compact packaging: closely packaged in racks
- Slack packaging: Located in different locations
- Control
- Centralized
- Decentralized
- Homogeneity
- Homogeneous cluster: Node from the same platfrom
- Heterogeneous cluster: Node from the different platfrom
Architecture
- OS should be designed multiuser, multitasking and multithreaded
- interconnected by fast commodity networks
- Cluster middleware glues together all node platforms at the user space
Design principles of Clusters
- Single-System image (SSI)
The same client will see the same view of the service no matter which machine in the cluster it connects to. - Reliability
- operate without a breakdown
- Availability
- percentage of time available to the user
- Servoceability
- maintenance/repair/upgrades etc.
Operate-Repair cycle
- Mean time to failure
- average time of fails
- Mean time to repair
- average time to fix(restore)
Type of Failures
- Unplanned failures vs. planned shutdowns
- Transient failures vs. permanent failures
- reboot can fix
- Partial failures vs. total failures
- part of the system, the cluster still usable
Fault-Tolerant
- Host standby
only primary nodes are actively doing the useful work
Standby nodes are powered on and running some monitoring programs - Active-takeover
All servers are primary and doing useful work.
User may experience some delays or may lost some data - Failover
When a component fails, it allows the remaining system to take over the services
Failure Cost Analysis
- MTTF, MTTR
- Avilability(%)
- The downtime per year(hours)
- The yearly failure cost