Iftekhar Hussain
1 Limited Scope
Previous reviews named a number of positive points, so I will not repeat those.
Surprisingly, they miss the main issue I have with this book: It is limited in scope, delivering less than title or cover text suggest. It covers the specific issue of non-stop forwarding and routing protocol support for it.
Other important building blocks of highly available networks such as MPLS Fast-Reroute are only sketched in the last chapter.
A reasonable question to ask is: Given the protocol complexity, the fact that software failure is the main outage reason, and the "blackhole" period while control plane restarts, which is more reliable: a network structure with graceful restart, or one that deploys techniques like FRR or BFD to immediately reroute around a failed node?
The book unfortunately does not provide much material to answer this.
Another area that I would have liked to see is more implementation guides. Adjusting protocol parameters like timers forms an important part to improve operations. A book claiming to be a comprehensive guide to deploy high availability should discuss those, but this one does not.
2 Excellent reference for Cisco high availability
Reviewer: Rik Guyler, Senior Network Engineer
Cisco Certifications Held: CCNA, CCDA, CCNP, CCDP
Iftekhar Hussain's Fault-Tolerant IP and MPLS Networks is worthy of consideration by anybody that works in a Service Provider environment. This reference, while not a configuration guide by any means, provides insight to the some of the design considerations of a Service Provider network.
The things I like about this book:
I found this book reasonably easy to read. The information was given in a clear and concise manner without a lot of fluff. With only 316 pages (including index), there was very little written that was not important to the topic. After having read dozens of technical books, this was a refreshing change of pace from the 1000+ page tomes that contain no more real content than this book does.
The manner in which the book was written and published was very tidy and neat. It's been rather typical in my experience to find technical books riddled with errors but this book was not. In fact, I cannot recall seeing one error or maybe more accurately, I cannot recall catching one. Either way, this book was cleanly published.
I will not go into too much detail about the content as the name of the book speaks for itself. I will say that the content is right on track for what it claims and that provides a realistic look at various fault-tolerant mechanisms and practices designed for the Service Provider network. Not having a foundation in MPLS, I definitely learned where MPLS fits into the big picture with regard to high availability networks. Once again, this book is not a low level technical reference that will guide you to configuring your Cisco routers. It offers strictly a Network Architect's view of the technologies mentioned.
The things I do not like about this book:
I cannot honestly say that there is much I do not like about this book. I chose this book thinking that there would be more low level technical information such as configuration guidelines for the various technologies so I was somewhat disappointed that there was nothing like this. However, after reading this book, it clearly is a design and concepts book so I would have liked it to have said "Design" only rather than "Design and deploy" on the cover. A picky complaint for sure but valid nonetheless.
It is also worthwhile to mention (not a complaint) that while this book includes routing protocols such as BGP, OSPF and ISIS, it does not get into the functionality of these protocols except to discuss how they fit into a high availability design. The same can be said about MPLS. If you want detailed information on BGP, OSPF, ISIS or MPLS then you will need to look to other reference materials.
I highly recommend this book for everybody that wishes to learn more about some of the high availability features offered to Service Providers by Cisco.
3 Stirring the Protocol Soup
This book describes a great many of the common scenarios involved in achieving resiliency in IP/MPLS core networks. With a heavy migration to converged, multiservice IP networks underway, network availability and reliability are extremely critical. Today's IP infrastructure continues to serve the needs of legacy applications along with new service offerings such as VoIP and other premium IP services that are rated based on performance (such as latency, jitter, and carrier-class availability). To deliver value-added IP services that are backed by Service Level Agreements (SLAs), the service provider's network must attain a very high level of service availability.
Triple play applications have carrier class requirements that include rigorous uptime requirements more commonly associated with diverse transport networks that may be SONET/SDH supporting ATM or frame relay. Expected of both legacy and emerging applications, these features include nonstop forwarding (NSF), graceful restart (GR), in-service software upgrade (ISSU), bidirectional forward detection (BFD), and traffic engineering (TE) with fast reroute (FRR). Failover times for redirected traffic is expected to be on the order of 50 ms for applications such as broadcast video, and the latency and jitter requirements of voice have only recently been associated with the capabilities of a routed infrastructure such as IP/MPLS. This book illustrates how to integrate these demanding applications into service provider core networks.
MPLS is a technology that requires a clean separation of the forwarding and control plane; although earlier routing platforms did not have this separation, modern routers do (interestingly, so does the book!). The control plane can thought of as the routing engine; it maintains peer relationships, runs routing protocols, builds the routing table, and creates the forwarding table, which it then exports to the forwarding plane to send the packets to their output interfaces. After an initial chapter that explains how to design for IP/MPLS uptime, Part I deals with the forwarding planes for both IP and MPLS. The main issue in terms of resiliency here is NSF, which is very important in carrier environments. Software is a common cause of network element failure and it is a huge boon to carriers when a routing platform can forward packets even while the control plane is in trouble. Failure is not an option in the service provider community, because any false step can severely damage a service provider's hard-earned reputation for delivering quality services.
Part II covers the IP/MPLS control plane, where IP routing protocols and MPLS signaling protocols are run. The interior routing protocols covered are ISIS and OSPF; these are the most common IGPs for service providers. For these, and in fact for all of the control plane protocols, the issue is how to minimize the effect of a restart on the router's operation. Things get tricky here because Cisco routers may have very different characteristics in terms of being capable of NSF or GR, which is the ability of a router to quickly reestablish adjacencies after restarting. For instance, if one router is incapable of either NSF or GR, then you need to consider what type of high availability you will have based on the resiliency features of adjacent routers. Similar issues exist for BGP, and there are additional considerations based on speaker behavior and the settings of the BGP attributes.
On the MPLS control plane, the signaling protocols are BGP, LDP, and RSVP. Of course, both MPLS and the signaling protocol will be running on the control plane, so the issues get a little more complex as Cisco routers may vary as to whether both control plane protocols will support GR. These considerations have to be combined with the question of whether adjoining routers support NSF. There are similar issues for both LDP and RSVP-TE, and the book details ways to mitigate the effects of restarts on all of these protocols, as well as how to handle variances in the support of graceful restart or NSF on the routers in the topology.
Some key MPLS services are detailed in Part III, along with suggestions on how to ensure their high availability. Layer 2 and Layer 3 VPNs are discussed, both in point-to-point and point-to-multipoint modes. MPLS provides TE mechanisms to find backup paths for these critical services. For some real time services, such as voice or video running over these VPNs, fast reroute on the order 50 ms failover may be required. Core routing vendors have other mechanisms in order to speed up link detection and failover; one example is BFD. This section details these mechanisms as well as other ways to affect restoration of layers 1-3 based on MPLS signaling.
This book covers most the large number of scenarios that you need to consider in the service provider environment for high availability. Part of the reason why there are so many situations to consider is that it is fairly new for higher-layer protocols such as IP/MPLS to perform the instant failover that has traditionally been thought of as the purview of telco-quality Layer 1 services such as SONET/SDH. For working its way through the nuances of this protocol soup and the varied support there may be for failure characteristics from one router to the next, I would give this book five stars.
4 Comprehensively Covers a Major Point of Failure
This book should be of great interest to network engineeers, system engineers, and technical managers who want to understand the fault-tolerent aspects of packet-switched networks and how they can effectively design and deploy high-availability IP/MPLS networks.
The cost of low availability is well known. Amazon announced that on one day just before Christmas they had averaged approximaely 30 orders per second. A minute of down time on their system would then have resulted in the loss (hopefully just a delay) of about 200 orders.
This book is concerned not with the basic processing elements of a system, but with the edge and core routers that connect between the external communications system and the processing elements. It describes the theory, the accepted standards, and the practical aspects of designing a system that will continue to function despite failure of hardware or software components in the system.