The evolution of systems that handle all types of media within the enterprise is inevitable. Along with the benefits of unified communications systems, however, there will be increasing concern about reliability. This is not the simple issue of whether or not IP-based systems can be made as reliable as their predecessors. Systems can generally be made as reliable as you need them to be – for a price. However, convergence puts more eggs in one basket, and IP systems create a tendency towards centralization. To counteract these concerns, there has already been more attention paid to the subject of resiliency, which has reached the status of bafflegab.
Marketing documents and articles are full of references to reliability, availability, resiliency, robustness, survivability and fault tolerance. Many of them do not adequately define or distinguish such terms. In some cases, they do not provide correct definitions. With the growing importance of these issues to IT and telecom managers, a little more rigour should be expected.
What is Resiliency?
Let’s briefly review the basics, without the mathematics. Reliability is a measure of the percentage uptime, considering the downtime due only to faults. Availability is a measure of the percentage uptime, considering the downtime due to faults and other causes such as planned maintenance. For two different systems, it is possible for one system to be more reliable but less available than the other. This would arise if, for example, the more reliable system required many changes or upgrades that caused significant downtime. Reliability and availability are the two most important measures in this context, and that will not change as a result of convergence. So what is all the fuss about those other terms? Behind the marketing noise lies a genuine issue, namely how systems and networks achieve high reliability and availability.
I will use resiliency as an example because it is frequently coupled with reliability in articles and vendors’ documents, almost as if the authors did not know which word to choose. Resiliency, in the context of this article, is the characteristic of being able to adapt under stress or faults in order to avoid failure. One type aims at maintaining full service and performance, and another type aims at failing gracefully by reducing services or performance. For example, a network could block certain nodes or reduce the bandwidth available to all nodes. Resiliency, therefore, is an attribute that contributes to achieving the required reliability but is not an independent measure of it. It has gained more attention recently because IP-networks have an inherent resiliency that their forerunners did not, thus counteracting the claims that circuit-switches are much more reliable than their IP-counterparts. As a result, the battleground for product supremacy has shifted from reliability to resiliency.
The topic of reliability is becoming more cluttered with terminology, just when managers need more understanding. Clarifying the terminology will be the easy part, but it is an important prerequisite for dealing with the real issues. Managers should increase their attention to reliability as it becomes more important with the evolution of unified communications within the enterprise. They will need to ask old and new questions: Can I afford to risk putting all of my eggs in one basket? What constitutes various types of failure? What are the trade-offs between costs, failures, reduced services and reduced performance? These concerns are much more worthy of a manager’s attention than struggling with terminology.
Ron Scott is principal of Scott & Associates. He can be reached at [email protected]