Distributed and Cloud Computing Notes 1

Distributed and Cloud Computing Notes 1

Reasons for Distributed Systems

  • Functional Separation
    • Different Capabilities and purposes
  • Inherent Distribution
    • Information
    • People
  • Power imbalance and load variation
  • Reliability
  • Economies

Consequences of Distributed Systems

  • Concurrency - Each computer is autonomous
    • Carry our tasks independently
    • Tasks coordinate their actions by exchanging messages
    • System capacity can be increased by adding more resources
  • No global clock
  • Independent Failures

Motivation of Distributed Systems

  • To share resource and information
  • The emergence of pervasive networking technology
  • The emergence of mobile and ubiquitous computing
  • The increasing demand for multimedia services
  • The view of distributed systems as a utility

Maintenance of intranet

  • No rick if no connection to internet
  • Firewalls are used to limit services from/to an intranet
    • Limit FTP/Remote Desktop etc.

Mobile computing: Performing computing tasks while the user is on the move, away from his/her usual environment

Eight forms of transparency

  • Access transparency
  • Location transparency
  • Concurrency transparency
  • Replication transparency
  • Failure transparency
  • Mobility transparency
  • Performance transparency
  • Scaling transparency

List of Challenge

  • Heterogeneity
  • Security
    • Confidentiality
      • Protection against disclosure to unauthorized individual information
    • Integrity
      • Protection against alteration or corruption
    • Availability
      • Protection against interference targeting access to the resources
      • DDoS
    • Authenticity or Non-repudiation
      • Proof of sending / receiving an information
      • digital signature

Failure

Availability =MTTF/(MTTF+MTTR)

  • Mean time to failure(MTTF)
    • The average time of normal operation before the system fails
  • Mean time to repair (MTTR)
    • The average time it takes to repair the system and restore it to working condition

Single point failure

Single hardware/Software component failures cause the whole system crash. The key approach to enhancing availability is to make as many as possible partial failures by removing single points of failure

Checkpointing

  • The process of periodically saving the stage of an executing program to stable storage, from which the system can recover after a failure.
    • Each program stae saved is called a Checkpoint .
    • Checkpointing can be realized by operating system at kernel level/Third party library/by the application itself.

Jobs

  • Serial Jobs: Run on a single node
  • Parallel jobs: use multiple nodes
  • Interactive jobs: require fast turnaround time, and their input/output is directed to a termainal
  • Batch jobs: need more resources and don’t need immediate responses. Scheduled jobs.

job Management System

  • A user server: Let user submit jobs.
  • A job scheduler: performs job scheduling
  • A resource manager: allocates and monitors resources. Enforces scheduling policies, and collects accounting information.

Security Mechanisms

  • Encryption(AES, RSA)
  • Authentication(Password, Public key)
  • Authorization(access control)

  • Concurrency
    • Fair scheduling
    • Preserve dependencies
    • Avoid deadlocks
    • Object locking, data consistency, semaphores
  • Fault tolerance (No failure despite faults)
    • Fault detection
      • Checksums
      • Heartbeat
    • Fault masking
      • Retransmission of corrupted messages
      • Redundancy
    • Fault toleration
      • Exception handling
      • Timeouts
    • Fault recovery
      • Rollback mechanisms
  • Scalability
  • Openness
  • Distribution transparency <= Do not let other touch
Author

Elliot

Posted on

2021-01-29

Updated on

2023-05-07

Licensed under