Distributed and Cloud Computing Notes 1
Distributed and Cloud Computing Notes 1
Reasons for Distributed Systems
- Functional Separation
- Different Capabilities and purposes
- Inherent Distribution
- Information
- People
- Power imbalance and load variation
- Reliability
- Economies
Consequences of Distributed Systems
- Concurrency - Each computer is autonomous
- Carry our tasks independently
- Tasks coordinate their actions by exchanging messages
- System capacity can be increased by adding more resources
- No global clock
- Independent Failures
Motivation of Distributed Systems
- To share resource and information
Trends in Distributed Systems
- The emergence of pervasive networking technology
- The emergence of mobile and ubiquitous computing
- The increasing demand for multimedia services
- The view of distributed systems as a utility
Maintenance of intranet
- No rick if no connection to internet
- Firewalls are used to limit services from/to an intranet
- Limit FTP/Remote Desktop etc.
Mobile computing: Performing computing tasks while the user is on the move, away from his/her usual environment
Eight forms of transparency
- Access transparency
- Location transparency
- Concurrency transparency
- Replication transparency
- Failure transparency
- Mobility transparency
- Performance transparency
- Scaling transparency
List of Challenge
- Heterogeneity
- Security
- Confidentiality
- Protection against disclosure to unauthorized individual information
- Integrity
- Protection against alteration or corruption
- Availability
- Protection against interference targeting access to the resources
- DDoS
- Authenticity or Non-repudiation
- Proof of sending / receiving an information
- digital signature
- Confidentiality
Failure
Availability =MTTF/(MTTF+MTTR)
- Mean time to failure(MTTF)
- The average time of normal operation before the system fails
- Mean time to repair (MTTR)
- The average time it takes to repair the system and restore it to working condition
Single point failure
Single hardware/Software component failures cause the whole system crash. The key approach to enhancing availability is to make as many as possible partial failures by removing single points of failure
Checkpointing
- The process of periodically saving the stage of an executing program to stable storage, from which the system can recover after a failure.
- Each program stae saved is called a Checkpoint .
- Checkpointing can be realized by operating system at kernel level/Third party library/by the application itself.
Jobs
- Serial Jobs: Run on a single node
- Parallel jobs: use multiple nodes
- Interactive jobs: require fast turnaround time, and their input/output is directed to a termainal
- Batch jobs: need more resources and don’t need immediate responses. Scheduled jobs.
job Management System
- A user server: Let user submit jobs.
- A job scheduler: performs job scheduling
- A resource manager: allocates and monitors resources. Enforces scheduling policies, and collects accounting information.
Security Mechanisms
- Encryption(AES, RSA)
- Authentication(Password, Public key)
- Authorization(access control)
- Concurrency
- Fair scheduling
- Preserve dependencies
- Avoid deadlocks
- Object locking, data consistency, semaphores
- Fault tolerance (No failure despite faults)
- Fault detection
- Checksums
- Heartbeat
- Fault masking
- Retransmission of corrupted messages
- Redundancy
- Fault toleration
- Exception handling
- Timeouts
- Fault recovery
- Rollback mechanisms
- Fault detection
- Scalability
- Openness
- Distribution transparency <= Do not let other touch
Distributed and Cloud Computing Notes 1
https://blog.kwunlam.com/Distributed-and-Cloud-Computing-Notes-1/