Distributed systems for fun and profit读书笔记

这两天读了一个比较有意思的系列文章,Distributed systems for fun and profit,简单的做下读书笔记。


  • 梳理一些关键概念
  • 总结梳理分布式里的一些细节思想


  • 信息以光速传递
  • 不相干的个体不相关连的失败

换句话说,其核心就是处理distance,并且不止一个。希望读过该文章之后,能够对distance, time,consistency models三者之间的相互影响,有个比较感性的认识。

###Distributed systems at a high level

Distributed programming is the art of solving the same problem that you can solve on a single computer using multiple computers.


  • 存储 (Storage)
  • 计算 (Computation)

Cluster size

performance gap. 机器的增长带来的性能增长是非线性的


is the ability of a system, network, or process, to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth.


  • Size scalability
  • Geographic scalability
  • Administrative scalability:

A scalable system is one that continues to meet the needs of its users as scale increases. There are two particularly relevant aspects - performance and availability - which can be measured in various ways.


Performance (and latency)

Performance is characterized by the amount of useful work accomplished by a computer system compared to the time and resources used.


  • 低延迟 Short response time/low latency for a given piece of work
  • 高吞度 High throughput (rate of processing work)
  • 低资源利用率 Low utilization of computing resource(s)

Latency The state of being latent; delay, a period between the initiation of something and the occurrence.

Availability (and fault tolerance)

the proportion of time a system is in a functioning condition. If a user cannot access the system, it is said to be unavailable.

Availability = uptime / (uptime + downtime)

Availability % How much downtime is allowed per year?
90% (“one nine”) More than a month
99% (“two nines”) Less than 4 days
99.9% (“three nines”) Less than 9 hours
99.99% (“four nines”) Less than an hour
99.999% (“five nines”) ~ 5 minutes
99.9999% (“six nines”) ~ 31 seconds