1. 软件架构 Software Architecture
- 软件架构是系统的一个或多个结构,包含了软件元素、这些组件的外部可见属性以及他们之间的关系。
Software Architecture is the structure or structures of the system, which comprise software elements, the externally visible properties of these components, and the relationship among them.(In pratice书中的定义) - 单独的盒式模型不是架构,而是一个起点
Box-and-line drawings alone are not architecture, but a starting point. - 架构包含了组件的行为 Architecture includes behaviour of components.
1.1 架构扮演的角色 Role of Architecture
- 架构是代表如何实现需求的决策的首批人工制品之一。作为早期设计决策的体现,架构代表了那些最难更改的设计决策,因此值得最仔细的考虑
An architecture is one of the first artefacts that represents decision on how requirements are to be achieved. As the manifestation of early design decisions, the architecture represents those design decisions that are hardest to change and hence deserve the most careful consideration. - “架构”是实现成功的产品线工程的关键制品,它是对一系列类似系统的规范开发方法,与独立开发每个系统相比,其工作量、费用和风险更少。
An architecture is the key artefact in achieving successful product line engineering, the disciplined development of a family of similar system with less effort, expense, and risk than developing each system independently. - 当有人开始在系统上工作时,架构通常是首先要检查的设计工件
An architecture is usually the first design artefact to be examined when someone starts working on a system. - 软件架构为维护和修改决策提供了参考框架
Software architecture provides a framework of reference for maintenance and modification decisions.
1.2 为什么软件架构是重要的 Why is software architecture important?
- 软件架构提供了沟通的工具
Software architecture provides a vehicle for communication- 软件架构是一个可以确定和谈判利益冲突的参考框架
It is a frame of reference in which competing interests may be identified and negotiated- 和用户讨论需求 Negotiating requirements with users
- 保证客户获取到过程和成本的信息 Keeping customer informed of progress, cost etc.
- 实现管理决策和分配 Implementing management decisions and allocations.
- 软件架构是一个可以确定和谈判利益冲突的参考框架
- 软件架构表现了最早期的决策集合
Software architecture manifests the earliest set of design decisions- 约束着实现和开发者 It constraints the implementation and developers
- 实现必须要符合架构 Implementation must conform to architecture
- 资源分配的决策约束着单独模块的实现 Resource allocation decisions constrain implementations of individual components
- 约束着实现和开发者 It constraints the implementation and developers
- 表现了早期的设计决策 Manifestation of early design decisions
- 软件架构决定了开发和维护工作的组织结构
Software architecture dictates organisational structure for development & maintenance efforts, e.g.- 划分为团队 Division into teams
- 预算,计划单位 Units for budgeting, planning
- 工作分解结构的基础 Basis of Work Breakdown Structure
- 文档的组织 Organisation for documentation
- CM库的组织 Organisation for CM libraries
- 集成的基础 Basis of integration
- 测试计划、测试的基础 Basis of test plans, testing
- 运维的基础 Basis of maintenance
- 软件架构决定了开发和维护工作的组织结构
- 架构促进/阻碍质量属性的实现,比如灵活性、安全性、易用性
Architecture facilitates/hinders achievement of quality attributes, e.g., modifiability, security, usability etc. - 架构会影响质量,但由于涉及许多其他因素,可能无法保证质量
Architecture influences qualities, but may not guarantee them as there are a number of other factors involved. - 架构引发有关潜在变更的讨论(系统的80%的工作是部署后的工作)
An architecture invokes discussion about potential change ( 80% of effort for a system is post-deployment effort) - 架构将更改分为三种类型 Architecture categorise changes into three types:
- 本地: 信号组件修改 Local: signal component modification
- 非本地: 几个组件修改 Non-local: several component modification.
- 架构: 修改系统的基本结构,通信和协调机制 Architectural: modification of the system’s basic structure, communication, and coordination mechanism
- 架构是一种可迁移和可重用的抽象: 一对多映射(一种架构,许多系统)
Architecture is a transferable and reusable abstraction one-to-many mapping (one architecture, many systems) - 架构是产品通用性的基础。整个产品线共享一个架构
Architecture is the basis for product commonality. A whole product line shares a single architecture - 可以通过架构集成独立开发的组件来开发系统(基于Component的软件工程-CBSE)
Systems can be developed by integrating independently developed components via architecture ((Component-Based Software Engineering - CBSE)
1.3 软件架构过程 Software Architecture Process
- 通过StackHolder获取到ASRs(架构攸关的需求)
- 通过分析得到Prioritized Quality Attribute Scenarios(高优先级质量属性解决方案)和Requirements,Constraints(需求和约束)
- 将上述部分,结合模式和策略,综合可以得到架构的设计
- 根据架构的设计得到由模式决定的候选视图的示意图,之后完成文档化
- 选择、组合视图,将文档进行进一步的评估,这一部分需要StackHolders的参与、也需要Prioritized Quality Attribute Scenarios和文档等作为参考。
1.3.1 移动手机系统架构 Mobile Phone System Architecture
1.3.2 洗衣机架构 Washing Machine Architecture
1.4 讨论 Discussion
- 科学和工程有什么不同?What is Difference between Science and Engineering?
- 科学的研究是研究这个世界既有的部分
- 工程是研究的是人类创造新的世界(是不是因为人才产生的)
- 软件和硬件有什么不同?What is Difference between ‘Software’ and ‘Hardware’?
- 软件是不可见的: 软件是虚拟的,而硬件是实体的。
- 软件制作出来就是为了被修改和改变的(软件的演化是他的本质属性)
- 架构和设计有什么不同?What is Difference between Architecture and Design?
- 所有的架构都是软件设计,但是不是所有的软件设计都是架构
- 架构是设计过程的一个过程。
- 其他观点
- 架构是更高层的设计,是为了修改的
- 架构是设计决策的组合
- 架构和结构有什么不同?What is Difference between Architecture and Structure?
- 架构定义了组件(Component)的接口,Component之间如何交流以及如何相互依赖,Component的职责。
- 架构提供了设计的更高层抽象视角,隐藏设计的复杂性和实现,更强调非功能性需求。
- 【标准】架构是包括结构信息的,因为结构是一种静态的、逻辑的、是关于系统如何构成。但是架构除了包含结构,还会增加组件的相互之间的关系接口,还会定义一些动态的行为(一个组件可能和谁进行交互)
- 为什么要在架构中使用抽象?Why Abstraction in Architecture?
- 更高层的视角,更关注本身的结构而不是本身的实现。
- 降低架构设计时的系统复杂度,可以屏蔽和隐藏一些细节。
2. 需求 Requirements
需求中往往存在有开发人员和用户的矛盾,我们需要将这一个部分进行转化
2.1 功能性需求 Functional Requirements
- 功能性需求定义了系统必须做什么并且强调了系统如何提供价值给涉众
Functional requirements state what the system must do and address how the system provides value to the stakeholders. - 功能性需求意味着系统的行为
Functional requirements means the behaviour of the system. - 功能是系统完成其预期工作的能力,例如,使学生能够在线注册
Functionality is the ability of the system to do the work for which it was intended, e.g., enable students to enrol online. - 通过使用任意数量的可能结构来实现功能
Functionality may be achieved through the use of any
number of possible structures. - 功能在很大程度上与结构无关,因为它可以作为单个整体系统存在而没有任何内部结构
Functionality is largely independent of structure, because it could exist as a single monolithic system without any internal structure.
2.2 质量需求 Quality Requirements
- 质量需求是系统应在其功能需求之上提供的整个系统的理想特性(又称质量属性)
Quality requirements are desirable characteristics of the overall system (aka. quality attributes) that system should provide on the top of its functional requirements. - 质量需求是功能要求或整个产品的资格
Quality requirements are qualifications of the functional requirements or of the overall product. - 如果质量属性很重要,则软件架构将功能分配(映射)限制在各种结构上。
Software architecture constrains the allocation (mapping) of the functionality onto various structures if quality attributes are important.
2.2.1 非功能性需求 Non-functional Requirements
非功能需求或架构需求是用于质量属性的替代术语
Non-functional requirements or architectural requirements are alternative terms used for quality attributes.无法在实现功能后再去满足非功能性需求(不能事后追加质量)
It is not possible to get the functionality right and then try to accommodate non-functional requirements (NO retro-fitting quality).
在任何设计决策中都必须考虑非功能性需求,而不是在完成功能后再去改造系统或设备,以满足非功能性要求
在任何设计决策中都必须考虑非功能性要求
Non-functional requirements must be taken into account during any design decision.非功能性需求分为两大类: There are two broad categories ot non-functional requirements:
- 在执行过程中可观察(外部): 系统满足其行为要求的程度如何? 例如性能,安全性,可用性,可用性等。
Observable (External) during execution: How well a system satisties its behavioural requirements? e.g., performance, security, availability, usability etc. - 执行期间不可观察(内部): 系统的维护,集成或测试有多容易? 例如,可修改性,可移植性,可重用性,可测试性等。
Not observable (Internal) during execution: How easily a system can be maintained, integrated, or tested? e.g., modifiability, portability, reusability, testability etc.
- 在执行过程中可观察(外部): 系统满足其行为要求的程度如何? 例如性能,安全性,可用性,可用性等。
约束是限定的边界,之后的架构是在这个边界内找到最优的解。
2.2.2 质量属性 Quality Attributes
- 开发完成后,质量不能添加到软件密集型系统中
Quality isn’t something that can be added to a software intensive system after development finishes. - 在软件开发的所有阶段都需要解决质量问题
Quality concerns need to be addressed during ALL phases of the software development. - 业务目标确定系统必须具备的质量
Business goals determine qualities that a system must posses. - 质量属性高于系统功能,而系统功能是系统能力,服务和行为的基本表述
Quality attributes are over and above of system’s functionality, which is the basic statement of the system’s capabilities, services, and behaviours. - 功能通常在开发计划中占据首要位置
Functionality usually takes the front seat in the development plan. - 但是,系统通常会被重新设计,因为它们缺乏所需的质量水平,即难以维护,移植或扩展
However, systems are usually redesigned because they lack desired level of quality, i.e. difficult to maintain, port, or scale. - 软件架构限制了各种质量属性的实现,例如性能,安全性,可用性等
Software architecture constrains the achievement of various quality attributes, e.g., performance, security, usability etc. - 这就是为什么软件架构被认为是解决质量问题的最合适的层次
That is why software architecture is considered the most appropriate level of addressing the quality Issues. - 没有质量属性完全依赖于设计,也不依赖于实施或部署
No quality attribute is entirely dependent on design, nor is it dependent on implementation or deployment.
2.2.3 确定质量属性 Specifying Quality Attributes
- 要在架构级别对其进行评估,必须对质量属性进行精确定义
Precise definition of a quality attribute is necessary to evaluate it at the architecture level. - 质量属性场景用于定义所需的质量属性
Quality attribute scenarios are used to define the desired quality attribute. - 场景是具有一定结构的简单描述。
Scenarios are simple descriptions with certain structure. Two main classes of scenarlos are:- 通用场景是与系统无关的场景,用于指导质量属性要求的规范
General scenarios are system independent scenarios to guide the specification of quality attribute requirements. - 具体场景是系统特定的场景,用于指导特定系统的质量属性要求的规范。它们是通用场景的实例
Concrete scenarios are system specific scenarios to guide the specification of quality attribute requirements for a particular system. They are instances ot general scenarios.
- 通用场景是与系统无关的场景,用于指导质量属性要求的规范
这个场景(scenarios)就是4+1视图中的1(Use Case)
2.2.4 通用场景 General Scenarios
- 通用场景提供了一个框架,用于生成大量通用的,独立于系统的,质量属性特定的场景
General scenarios provide a framework for generating a large number of generic, system-independent, quality attribute specific scenarios. - 每种场景都可能但不一定与我们所关注的系统相关
Each scenario is potentially but not necessarily relevant to the system We are concerned with. - 为了使通用场景对特定系统有用,我们必须使它们特定于系统
To make the general scenario useful for a particular system, We must make them system specific. - 将通用场景系统具体化意味着将其转换为特定系统的具体术语
Making a general scenario system specific means translating it into concrete terms for the particular system.
2.2.5 质量属性场景建模 Modeling Quality Attribute Scenarios 重要
刺激(Stimulus): 到达系统时需要考虑的条件
Stimulus: A condition that needs to be considered when it arrives at a system.刺激源(Source of Stimulus): 产生刺激的实体(人,系统或任何执行器)
Source of Stimulus: An entity (human, system, or any actuator) that generates the stimulus.可能是输入、消息等等,对当前的状态有一个变化。
响应(Response): 刺激措施到来之后开展的活动
Response: The activity undertaken after the arrival of the stimulus.响应度量(Response Measure): 对刺激的响应应以某种方式进行测量,以便可以测试需求
Response Measure: The response to the stimulus should be measurable in some fashion so that the requirement can be testable.多长时间系统有反馈
环境(Environment): 发生刺激时系统的状况,例如过载,运行等
Environment: A system’s condition when a stimulus occurs, e.g. overloaded, running etc.工件(Artifact): 需求适用的整个系统或系统的一部分
Artifact: The whole system or the portion of the system to which the requirement applies.可能是一个软件制品
只有定义好这6个元素,就能锁定架构的一个场景,之后可以用来进行架构的设计
刺激和响应发生在一个环境中: 系统正常运行、系统过载、系统受到攻击、系统网络等出现了故障。
2.2.6 策略 Tactics (原子级别的最小的决定)
- 风格或模式应用策略来提供承诺的收益
Style or pattern applies tactics to provide the promised benefit. - 策略是影响质量属性响应控制的设计决策,例如冗余
A tactic is a design decision, .e.g. redundancy, that influences the control of a quality attribute response. - 策略的集合称为体系结构策略
A collection of tactics is called an architectural strategy. - 系统设计包括一组设计决策,其中一些决策可帮助控制质量属性响应;其他确保系统功能的实现
A system design consists of a collection of design decisions: some of these decisions help control the quality attribute response; others ensure achievement of system functionality. - 像模式一样,策略也可以由其他策略组成,例如,冗余可以由数据冗余,计算冗余组成。设计人员根据需求选择一个或另一个
Like patterns, tactics may also be composed of other tactics, e.g., redundancy may be composed of redundancy of data, redundancy of computation-Designer chooses one or other depending upon requirements. 策略可以用作策略层次结构 Tactics can be used as hierarchy of tactics.
这句话的意思是,策略可以按照层次结构来使用。也就是说,可以将不同的策略分层,按照优先级或者重要性来组织。这样可以更好地管理和使用策略。— New Bing
2.2.7 质量设计决策 Quality Design Decisions
- 架构是设计决策的集合。Architecture is a collection of design decisions.
- 七类设计决策(可能重叠) Seven categories of design decisions (may overlap):
- 职责分配 Allocation of responsibilities: 将大的职责进行分配
- 协调模型 Coordination model: 各部分之间的沟通、交互
- 数据模型 Data model: 数据格式、存储方式(缓存等)
- 资源管理 Management of resources: CPU、网络、内存、时间(部分时间敏感的场景)等资源
- 架构元素之间的映射 Mapping among architecture elements: 架构元素如何映射到软件的实现上
- 绑定时间决策 Binding time decisions:
- 系统的变化在什么时间点前需要固定下来,也就是这个时间前,系统还是可以变化的,但是这个时间之后就不可以变化了
- 比如选择安装环境是需要在一个时间点前完成的,技术是否添加、编译时间、初始化时间,运行时绑定,但运行时是弹性最大的
- 实际上我们希望绑定时间越往后越好,但是也就要付出相应的代价。
- 技术选择 Choice of technology: 前面的部分都确定后,我们可以选择技术栈相对比较局限,解空间已经被压缩了
2.2.8 质量属性
2.3 约束 Constraints
- 约束是具有零自由度的设计决策
A constraint is a design decision with ZERO degrees of freedom. - 约束是已经做出的预先指定的设计决策
Constraints are pre-specified design decisions that have been already made. - 通过接受设计决策并将其与其他受影响的设计决策进行协调来满足约束条件
Constraints are satisfied by accepting the design decision and reconciling it with other affected design decisions.
3. 质量属性和策略 Quality Attributes & Tactics
3.1 可用性 Availability
可用性是应用程序的关键需求 Key requirement for most IT applications
度量方式: 以所需的可用时间比例来衡量
Measured by the proportion of the required time it is useable, e.g.- 营业时间内100%可用
100% available during business hours - 每周计划的停机时间不超过2个小时-24x7x52(100%可用性)
No more than 2 hours scheduled downtime per week - 24x7x52 (100% availability)
- 营业时间内100%可用
相关性: 与应用程序的可靠性有关 Related to an application’s reliability
- 不可靠的应用程序的可用性较差
Unreliable applications suffer poor availability - 可用性、可靠性不同:
- 可用性是指可以使用,但是不保证正确
- 可靠性是指可以稳定正确的使用
- 不可靠的应用程序的可用性较差
可用性损失的时间由以下因素决定: Period of loss of availability determined by:
- 发现故障的时间 Time to detect failure
- 纠正故障的时间 Time to correct failure
- 重启应用的时间 Time to restart application
例子(时间序): 发生故障-检测到故障-纠正故障-重启应用,这三个代表的是not available的时间(N/A)
提高可用性的方案:
- 尽可能降低N/A的时间
- 机器尽可能缩短failure到detect时间
- 机器尽可能缩短correct到restart的时间
- 尽可能提高Available的时间
高可用性策略 Strategies for high availability
- 消除单点故障 Eliminate single points of failure
- 复制和故障转移 Replication and failover
- 自动检测并重启 Automatic detection and restart
可恢复性 Recoverability (e.g., a database)
- 在应用程序或系统出现故障后,可以重新建立性能级别并恢复受影响的数据的能力
The capability to re-establish performance levels and recover affected data after an application or system failure.
- 在应用程序或系统出现故障后,可以重新建立性能级别并恢复受影响的数据的能力
可将可用性计算为在指定的时间间隔内它将在所需范围内提供指定服务的概率
Availability can be calculated as the probability that it will provide the specified services within required bounds over a specified time interval.- MTBF(平均无故障时间, mean time between failures)
- MTTR(平均维修时间, mean time to repair)
计算可用性时,可能不考虑计划内的停机时间
Scheduled downtimes may not be considered when calculating availability.
3.1.1 Outage, Failure, Fault, Error
- 可用性是指通过减少故障来最大程度地减少服务中断时间
Availability is about minimizing the service outage time by mitigating faults. - 引起 Failure 的原因称为 Fault
A failure’s cause is called a fault. - 当系统无法交付该系统期望的服务时,将发生Failure
A failure occurs when a system cannot deliver a service that is expected of that system. - Failure是系统状态的可观察特征
A failure is an observable characteristics of a system’s state. - 系统任何部分中的 Fault 都有可能导致 Failure。系统可以从 Failure 中修复或恢复
A fault in any part of a system has a potential to cause a failure; a system can be repaired or recovered from a failure. - 发生 Fault 与 Failure 之间的中间状态称为 Error
Intermediate states between the occurrence of a fault and a ftailure are called errors. - 名词辨别
- Outage: 系统不可用的情况,scheduled downTime就是一种Outage。
- Failure: 系统不可用失效
- Fault: 是系统导致Failure的原因,Fault不会立即导致Failure
- Error: 在Fault发生与Failure的中间状态
3.1.2 服务水平协议 Service-Level Agreement
Amazon EC2’s SLA:
AWS将通过商业上合理的努力来使 Amazon EC2 在服务年度内的年度正常运行率至少达到99.95%。如果 Amazon EC2 不符合年度正常运行时间百分比承诺,您将有资格获得服务信用
AWS will use commercially reasonable efforts to make Amazon EC2 available with an Annual Uptime Percentage of at least 99.95% during the Service Year. In the event Amazon EC2 does not meet the Annual Uptime Percentage commitment, you will be eligible to receive a Service Credit.99.95%的是99.9%的一半
3.1.3 对于Failure的计划 Planning for Failure
- 危害分析 Hazard analysis: 对Failure进行分类
- 灾难性的/危险 Catastrophic/Hazardous
- 主要的/次要的 Major/Minor
- 没有影响 No effect
- 故障树分析: Fault tree analysis:
- 分级处理 Failure
- 故障模式,影响和严重性分析
Failure Mode, Effects, and Criticality Analysis(FMECA)- FMECA依靠过去类似系统的故障历史
FMECA relies on the history of failure of similar systems in the past. - $5 10^{-5} = 1 10^{-3} * 5\%$
- FMECA依靠过去类似系统的故障历史
3.1.4 可用性通用场景 Availability General Scenario
- source: 可能是内部的,也可能是外部的,可能是人、设施、硬件等等,无论是哪一类都会引起可用性的问题,都能发出一个刺激
- Stiumulus: Failure,不正确的时间、不正确响应(超过边界)
- 工件: 在进程中、交流通道中等等
- 环境: 各种不同的系统环境,正常的错误的等等
- 反应: 错误发生后一些的可能反应,recover是correct的时间
- 反应度量: 时间上、可用性的描述(多长时间,可以用多少)
3.1.5 可用性示例场景 Availability Sample Scenario
- Source没有收到Heartbeat认为出现了一个Failure
- 将Stimulus发送给正在处理的进程
- 进程会通知Operation(人和服务器,来检查是否可以运行)
- 最后会发送一个回复
- 整体发生在一个正常运转的环境中
3.1.6 可用性策略 Availability Tactics
- 这个树说明了对于可用性可以采用哪些手段来解决: 很重要
- 每一个树的分支代表了我们考虑的时间点: 尽可能的延长可用时间
- 不同的检测服务可用的手段
- 主动发送心跳: Heart Beat
- 资源的损耗有一次通讯
- 可以同时承担更多的业务(定期更新状态)
- 自动化检测,更为定时的服务
- 单向更为安全
- 被动接受检测: Ping/Echo 或 Minotor
- 资源损耗有两次通讯
- 更加灵活自助,根据自己的情况进行检测
- 双向确认
- TimeStamp
- 收到一系列的消息应该在时间上有先后顺序
- 进行常识的信息的检查,如果和常识不符合那么可能是出现了问题
- 自检: 检查一下自己是否有问题
- Preparation and Repair
- Active Redundancy 冗余部分是都在工作的,如果没有发现问题时,我们只接受Primary的输入,而Secondary的输入会被抛弃,有明显的downTime
- Passive Redundancy: Primary同步到Secondary上,而如果Primary挂掉了,则启用Secondary,并快速操作(从上一个状态),不一定有明显的DownTime,一般选择使用Passive的方式
- Spare: 组合在一起使用
- Rollback: 回滚解决不一致的问题
- Retry
- Ignore Faulty Behavior
- Degradation: 服务降级,比如Windows的安全模式,让目前已经发生的问题不再影响系统的修复
- Reconfiguration
- Reintroduction:
- shadow
- State Retry Resynchronized
- Escalating Restart
- Non-Stop Forwarding
- Denial of Service: Dos攻击: 大量无效的请求将资源耗尽以阻止提供正常的服务
- 上述的操作可能不仅仅涉及到一个质量属性
3.1.6.1 Fault 探测 Fault Detection
- Ping/Echo
- 一个组件发出ping命令,并期望在预定时间内在另一个组件上产生回波
One component issues a ping and expects, an echo from another component within a pre-detined time. - Ping/Echo可以在负责一项任务的一组组件中使用
Ping/Echo can be used within a group of components responsible for one task.
- 一个组件发出ping命令,并期望在预定时间内在另一个组件上产生回波
- 心跳 Heartbeat(dead man time)
- 一个组件定期发出心跳消息(也可以携带数据),而另一个组件侦听该消息
One component emits a heartbeat message (can also carry data) periodically and another component listens for it. - 如果心跳失败,则假定发起组件已失败,并通知故障纠正组件
If the heartbeat fails, the originating component is assumed to have failed and a fault correction component is notified.
- 一个组件定期发出心跳消息(也可以携带数据),而另一个组件侦听该消息
- 异常 Exception
- 识别故障的一种方法是遇到异常
One method for recognising faults is to encounter an exception. - 异常处理程序通常在引入异常的同一过程中执行
The exception handler typically executes in the same process that introduces the exception.
- 识别故障的一种方法是遇到异常
- ping和心跳策略在不同的进程中运行,异常策略在单个进程中运行
The ping and heartbeat tactics operate among distinct processes, and the exception tactic operates within a single process.
3.1.6.2 Fault 恢复 Fault Recovery
- 表决 Voting
- 在冗余处理器上运行的进程每个都接受等效输入并计算一个简单值,该值将发送给投票者
Processes running on redundant processors each take equivalent input and compute a simple value that is sent to a voter. - 如果投票器检测到单个进程的异常行为,它将使其失败。
If the voter detects deviant behaviour from a single process, it fails it.
- 在冗余处理器上运行的进程每个都接受等效输入并计算一个简单值,该值将发送给投票者
- 主动冗余 Active redundancy
- 所有冗余组件均以并行方式响应事件-所有组件均处于相同状态
All redundant components respond to events in parallel - there are all in the same state. - 仅使用了一个组件的响应,其余组件则被丢弃
The response from only one component is used, and the rest are discarded. - 发生故障时,通常不存在停机时间,因为备份是最新的,唯一的切换时间是恢复时间
When a failure occurs, the downtime is usually non-existent as backup is current and the only switching time is the recovery time.
- 所有冗余组件均以并行方式响应事件-所有组件均处于相同状态
- 被动冗余 Passive redundancy
- 一个组件(主要)响应事件,并通知其他组件(次要)它们必须进行的状态更新
One component (primary) responds to events and informs the other components (secondary) of state updates they must make. - 发生故障时,系统必须首先确保备份状态足够新,然后才能恢复服务
When a failure occurs, the system must first ensure that the backup state is sufficiently recent before resuming services.
- 一个组件(主要)响应事件,并通知其他组件(次要)它们必须进行的状态更新
- 备件 Spare
- 备用备用计算平台配置为替换许多不同的故障组件
A standby spare computing platform is configured to replace many different failed components.
- 备用备用计算平台配置为替换许多不同的故障组件
- 影子操作 Shadow operation
- 先前发生故障的组件可能会在“影子模式”下运行一小段时间,以确保它可以模仿工作组件的行为,然后再将其恢复正常工作
A previously failed component may be run in “shadow mode” for a short time to make sure that it mimics the behaviour of the working components before restoring it to service.
- 先前发生故障的组件可能会在“影子模式”下运行一小段时间,以确保它可以模仿工作组件的行为,然后再将其恢复正常工作
- 状态重新同步 State re-synchronisation
- 被动和主动冗余策略要求要恢复的组件在恢复服务之前对其状态进行升级
The passive and active redundancy tactics require the component being restored to have its state upgraded before its return to service.
- 被动和主动冗余策略要求要恢复的组件在恢复服务之前对其状态进行升级
- 检查点/回滚 Checkpoint/Rollback
- 检查点记录的是定期或响应特定事件而创建的一致状态
A checkpoint is recording of a consistent state created either periodically or in response to specific events.
- 检查点记录的是定期或响应特定事件而创建的一致状态
- 从服务中删除 Removal from service
- 该策略将系统的某个组件从运行中移除,以进行一些活动以防止预期的故障
This tactic removes a component of the system from operation to undergo some activities to prevent anticipated failure.
- 该策略将系统的某个组件从运行中移除,以进行一些活动以防止预期的故障
- 事务 Transaction
- 事务是几个连续步骤的捆绑,这样就可以一次撤消整个捆绑
A transaction is the bundling of several sequential steps such that the entire bundle can be undone at once.
- 事务是几个连续步骤的捆绑,这样就可以一次撤消整个捆绑
- 过程监控器 Process monitor
- 一旦检测到进程中的故障,监视进程就可以检测到不良进程并为其创建新实例,并按照备用策略将其初始化为适当的状态
Once a fault in a process has been detected, a monitoring process can detect the non-performing process and create a new instance of it, initialised to some appropriate state as in the spare tactic.
- 一旦检测到进程中的故障,监视进程就可以检测到不良进程并为其创建新实例,并按照备用策略将其初始化为适当的状态
3.1.7 可用性设计和分析的检查列表 Checklist for Availability Design & Analysis
3.2 互操作性 Interoperability
- 互操作性是指两个或多个系统可以在特定上下文中通过接口有用地交换有意义的信息的程度
Interoperability is about the degree to which two or more systems can usefully exchange meaningful information via interfaces in a particular context.- 交换数据的能力(语法互操作性)
Ability to exchange data (syntactic interoperability) - 能够正确解释数据(语义互操作性)
Ability to correctly interpret the data (semantic interoperability)
- 交换数据的能力(语法互操作性)
- 互操作性需要确定与谁,什么以及在什么情况下(上下文)
Interoperability needs to identify with whom, with what, and under what circumstances (the context). - 互动 Interface
- 夏琳说金告诉她特雷弗听说希瑟想参加你的聚会
Charlene said that Kim told her that Trevor heard that Heather wants to come to your party.
- 夏琳说金告诉她特雷弗听说希瑟想参加你的聚会
- 互操作性的两个重要方面 Two important aspects of interoperability:
- 发现: 服务的使用者必须发现服务的位置,身份和接口
Discovery: the consumer of a service must discover the location, identity, and the interface of the service. - 处理回应: Handling of the response:
- 向请求者报告并做出响应 reports back to the requester with response.
- 将其响应发送到另一个系统 sends its response on to another system.
- 向任何感兴趣的各方广播其响应 broadcasts its response to any interested parties.
- 发现: 服务的使用者必须发现服务的位置,身份和接口
3.2.1 互操作性的通用场景 Interoperability General Scenario
3.2.2 互操作性示例场景 Interoperability Sample Scenario
3.2.3 互操作性的策略 Interoperability Tactics
- 定位 Locate
- 发现服务: 通过搜索已知目录服务来找到服务
Discovery service: locate a service through searching a known directory service.- 多级间接 multiple levels of indirection
- 发现服务: 通过搜索已知目录服务来找到服务
- 管理接口 Manage interfaces
- 编排: 使用控制机制来协调,管理和排序特定服务的调用
Orchestrate: uses a control mechanism to coordinate and manage and sequence the invocation of particular services. - 定制接口: 向接口添加或删除功能
Tailor interface: adds or removes capabilities to an interface.
- 编排: 使用控制机制来协调,管理和排序特定服务的调用
- Orchestrate: 请求,一个请求会涉及到多个Service,我们需要按照一定顺序进行处理请求
3.2.4 互操作性的检查列表 Checklist for Interoperability Design & Analysis
3.3 可修改性 Modifiability
- 可修改性涉及更改以及进行更改的时间或金钱成本,包括这种可更改性影响其他功能或质量属性的程度
Modifiability deal with change and the cost in time or money of making a change, including the extent to which this modifiability affects other functions or quality attributes. - 为变更做准备是有代价的,而进行变更则要付出代价
There is a cost of prepraring for change as well as a cost of making a change. - 计划可修改性的四个问题 Four questions to plan for modifiability
- 有什么可以改变的?What can change?
- 变化的可能性是多少?What is the likelihood of the change?
- 何时进行更改,谁进行更改?When is the change made and who makes it?
- 变更的成本是多少?What is the cost of the change?
- 如果更改少于预期,则可能不需要昂贵的修改机制
If fewer changes than expected come in, then an expensive modification mechanism may not be warranted. - 计算公式: N $\ast$ 没有机制的情况下进行更改的成本 <= 安装机制的成本 + (N $\ast$ 使用机制进行更改的成本)
- 降低的成本可以用于提高可修改性
3.3.1 可修改性的通用场景 Modifiability General Scenario
3.3.2 可修改性的样本场景 Modifiability Sample Scenario
3.3.3 可修改性的策略 Modifiability Tactics
- 减小模块大小 Reduce Size of a Module:
- 拆分模块: 如果要修改的模块包含大量功能,则修改成本可能会很高(尽可能的控制包的大小)
Split module: If the module being modified includes a great deal of capabilities, the modification costs will likely be high.
- 拆分模块: 如果要修改的模块包含大量功能,则修改成本可能会很高(尽可能的控制包的大小)
- 增加一致性 Increase Cohesion:
- 增加语义一致性: 如果模块中的职责 A 和 B 目的不同,则应通过创建新模块或将职责移至现有模块将它们放置在不同的模块中
Increase semantic coherence: If the responsibilities A and B in a module do not serve the same purpose, they should be placed in different modules by creating a new module or moving a responsibility to an existing module.
- 增加语义一致性: 如果模块中的职责 A 和 B 目的不同,则应通过创建新模块或将职责移至现有模块将它们放置在不同的模块中
- 减少耦合 Reduce Coupling:
- 封装为模块引入了显式接口,并减少了对一个模块的更改传播到其他模块的可能性
Encapsulation introduces an explicit interface to a module, and reduces the probability that a change to one module propagates to other modules. - 使用中介打破依赖: 所有的组件都要通过中间的组件进行通信,使用反模式等方法解决
Use an intermediary breaks a dependency. - 当两个模块受到相同更改的影响时,请进行重构: 不同于代码重构
Refactor when two modules are affected by the same change.
- 封装为模块引入了显式接口,并减少了对一个模块的更改传播到其他模块的可能性
- 延迟绑定: 在生命周期的不同阶段绑定某些参数的值,而不是最初定义它们的阶段。
Defer binding: Binds the value of some parameters at a different phase in the life cycle than the one in which they are initially defined.
3.3.4 Checklist for Modifiability Design and Analsis
3.4 性能 Performance
- 性能是关于时间和软件系统满足时序要求的能力 (单位时间内能做多少事情)
Performance is about time and the software system’s ability to meet timing requirements. - 所有系统都有性能要求,即使未明确表示也是如此
All systems have pertormance requirements, even it they are not explicitl y expressed. - 响应时间的两个基本因素 Two basic contributors to the response time
- 处理时间(系统正在响应时)
processing time (when the system is working to response) - 阻塞时间(系统无法响应时)
blocked time (when the system is unable to response)
- 处理时间(系统正在响应时)
3.4.1 性能的通用场景 Performance General Scenario
3.4.2 性能的样本场景 Performance Sample Scenario
3.4.3 性能的策略 Tactics for Performance
- 在需求方面 On the demand side
- 管理采样率(降低采样频率)
Manage sampling rate (reducing sampling frequency) - 限制事件响应: 当离散事件到达系统的速度太快而无法处理时,必须将事件排队,直到可以处理它们为止
Limit event response: When discrete events arrive at the system too rapidly to be processed, the events must be queued until they can be processed. - 如果不是所有事件都同样重要,则对事件进行优先级排序
Prioritize events if not all events are equally important. - 通过使用中介来增加处理事件流的资源,从而减少开销
Reduce overhead by using intermediaries to increase the resources in processing an event stream
- 管理采样率(降低采样频率)
- 在资源方面 On the resource side
- 增加资源(更快的处理器,更多的内存,更快的网络…)
Increase resources(faster processor, additional memory, faster network…) - 如果可以并行处理请求,请引入并发
Introduce concurrency if requests can be processed in parallel. - 维护多个计算副本: 使用负载均衡器将新工作分配给可用的副本服务器之一
Maintain multiple copies of computations: Use load balancer to assign new work to one of the available duplicate servers. - 维护数据的多个副本: Maintain multiple copies of data:
- 缓存 caching
- 数据复制 data replication
- 增加资源(更快的处理器,更多的内存,更快的网络…)
3.4.4 Checklist for Performance Design and Analysis
3.5 安全性 Security
- 安全性是衡量系统保护数据和信息免遭未授权访问的能力,同时仍提供对授权人员和系统的访问权限。
Security measures system’s ability to protect data and information from unauthorized access while still providing access to people and systems that are authorized. - 安全性的三个特征 Three characteristics of security: (CIA)
- 机密性:保护数据和服务免受未经授权的访问。
Confidentiality: Data and services are protected from unauthorized access. - 完整性: 数据和服务不会受到未经授权的操纵
Integrity: Data and services are not subject to unauthorized manipulation. - 可用性: 系统将可供合法使用
Availability: The system will be available for legitimate use.
- 机密性:保护数据和服务免受未经授权的访问。
3.5.1 安全性的通用场景 Security General Scenario
3.5.2 安全性的样本场景 Security Sample Scenario
3.5.3 安全性的策略 Security Tactics
- 通过将系统内的网络流量或服务请求模式与一组签名或已知模式进行比较来检测入侵
Detect intrusion by comparing network traffic or service request patterns within a system to a set of signatures or known patterns. - 检测服务拒绝 Detect service denial
- 使用校验和或哈希值验证消息的完整性
Verify message integrity using checksums or hash values. - 识别参与者-系统的任何外部输入的来源
Identify actors - source of any external input to the system. - 验证参与者或他们所声称的角色
Authenticate actors who or what they purport to be. - 授权有权访问和修改数据或服务的行为者
Authorize actors who have the rights to access and modify either data or services. - 限制对计算资源的访问
Limit access to computing resouces. - 通过最小化系统的攻击面来限制暴露
Limit exposure by minimizing the attack surface of a system. - 加密数据。Encrypt data.
- 正在进行攻击时,撤消对敏感资源的访问
Revoke access to sensitive resources when an attack is underway. - Authenticate: 认证,Authorize: 授权。
3.5.4 安全性的检查列表 Checklist for Security Design and Analysis
3.6 可测试性 Testability
- 可测试性是指可以使软件通过(通常基于执行)测试来证明其错误的难易程度。
Testability refers to ease with which software can be made to demonstrate its faults through (stypically execution-based) testing. - 为了使系统能够正确测试,必须有可能控制每个组件的输入,然后观察其输出。
For a system to be properly testable, it must be possible to control each component’s inputs and then to observe its outputs.
3.6.1 可测试性的通用场景 Testability General Scenario
3.6.2 可测试性的样本场景 Testability Sample Scenario
3.6.3 可测试性的策略 Testability Tactcs
- 控制和观察系统状态: 维护某种状态信息,允许测试人员为该状态信息分配一个值,和/或使测试人员可以按需访问该信息。
Control and observe system state: Maintain some sort of state information, allow testers to assign a value to that state information, and/or make that information accessible to testers on demand.- 专用接口使您可以控制或捕获组件的值
Specialized interfaces allow you to control or capture values for a component. - 记录/回放导致故障的状态,然后复现故障
Record/playback the state that caused a fault and re-create the fault. - 沙盒将系统的实例与现实世界隔离开来,可以进行实验以消除其后果
Sandboxing isolates an instance of the system from the real world to enable experimentation to undo its consequences.
- 专用接口使您可以控制或捕获组件的值
- 限制复杂度: 复杂的软件更难测试,因为它的操作状态空间很大,并且在大状态空间中重新创建精确状态更加困难。
Limit complexity: Complex software is harder to test, because its operating state space is very large and more difficult to re-create an exact state in a large state space.- 限制结构的复杂性,避免、减少或解决组件之间的依赖关系;隔离和封装对外部环境的依赖关系。
Limit structural complexity avoiding, reducing or resolving dependencies between components; isolating and encapsulating dependencies on external environment.- 限制一个类派生自的类的数量
Limit the number of classes from which a class is derived. - 限制继承树的深度和类的孩子数量
Limit the depth of the inheritance tree and the number of children of a class. - 限制多态和动态调用 Limit polymorphism and dynamic calls.
- 限制一个类派生自的类的数量
- 限制不确定性-限制行为复杂性 Limit nondeterminism - limiting behavioral complexity.
- 非确定性系统更难测试。Nondeterminism systems are harder to test.
- 限制结构的复杂性,避免、减少或解决组件之间的依赖关系;隔离和封装对外部环境的依赖关系。
3.6.4 可测试性的检查列表 Checklist for Testability Design and Analysis
3.7 易用性 Usability
- 易用性与用户完成所需任务的难易程度以及系统提供的用户支持的类型有关。
Usability is concerned with h how easy it is for the user to accomplish a desired task and the kind of user support the system provides. - 易用性包括以下几个方面: Usability comprises the following aspects:
- 学习系统功能 Learning system features
- 高效使用系统 Using a system efficiently
- 最小化错误的影响 Minimizing the impact of errors
- 使系统适应用户需求 Adapting the system to user’s needs
- 增强信心和满意度 Increasing confidence and satistaction
3.7.1 易用性的通用场景 Usability General Scenario
3.7.2 易用性的样本场景 Usbility Sample Scenario
3.7.3 易用性的策略 Usability Tactics
- 支持用户主动性: 支持用户纠正错误或提高效率。
Support user initiative: support the user in either correcting errors or being more efficient.- 取消 Cancel
- 撤消: 系统必须维持足够数量的系统状态,以便可以恢复更早的状态
Undo: System must maintain a sufficient amount of system state so that an earlier state may be restored. - 用户启动长时间运行的操作时暂停/恢复
Pause/resume when a user has initiated a long-running operation - 将较低级别的对象聚合到一个组中,以便可以将操作应用于该组
Aggregate the lower-level objects into a single group, so that the operation may be applied to the group.
- 支持系统主动性: 确定系统用来预测其自身行为或用户意图的模型。
Support system initiative: Identify the models the system uses to predict either its own behavior or the user’s intention.- 维护任务模型: 确定上下文,以便系统可以了解用户正在尝试的内容并提供帮助
Maintain task model: Determine context so the system can have some idea of what the user is attempting and provide assistance. - 维护用户模型: 代表用户的关于系统的知识,根据用户行为训练出用户的模型
Maintain user model: Represent the user’s knowledge of system. - 维护系统模型: 确定预期的系统行为,以便可以向用户提供适当的反馈
Maintain system model: Determine expected system behavior so that appropriate feedback can be given to the user.
- 维护任务模型: 确定上下文,以便系统可以了解用户正在尝试的内容并提供帮助