Architecting Highly Available 
        CompactPCI Systems
      
2003/04/04
High Availability is an overused term in today's marketplace. Vendors 
        have used this term to define architectures as simple as redundant power 
        supplies and as complicated as fully redundant systems. This leads to 
        the question "What is High Availability?". It might be easier to think 
        of High Availability as an increase in the availability of a system, or 
        a decrease in downtime. Many of today's telecommunications systems require 
        5NINES availability or 99.999% uptime. The amount of downtime allowed 
        in these systems is 5.26 minutes per year ( 525,600 minutes/year x 99.999%). 
        The 5 minutes of downtime includes scheduled maintenance as well as any 
        downtime that might result from the failure of any part of the system. 
        Designing High Availability systems that are capable of obtaining 5NINES 
        availability will generally require that every function in the system 
        be redundant, that is there is no single point of failure. The road to 
        High Availability systems generally includes redundant power supplies, 
        fan trays, and mirrored hard drives. The addition of these redundant components 
        will decrease the probability that a component failure will cause a system 
        failure. The addition of the redundant components has increased the availability 
        of the system; it is now more highly available. As you might expect adding 
        redundancy to power supplies, fans and hard drives is relatively straight 
        forward. Providing for redundant compute elements in a system is a more 
        complicated challange.
        
        Application of CompactPCI to High Availability Applications
Developers have been applying PICMG 2.0 CompactPCI Specification compliant 
        systems to a variety of High Availability applications over the years. 
        As the market requirements for High Availability have increased, CompactPCI 
        systems have had to evolve to meet the new challenges. The original CompactPCI 
        systems were simple bus based architectures. Figure 1 shows typical first 
        generation CompactPCI architecture.
      
        
PICMG 2.0 CompactPCI compliant systems are composed of one or more CompactPCI bus segments. Each segment can contain up to eight CompactPCI board slots. Each bus segment contains one System Slot and up to 7 Peripheral Slots. The PCI bus is used as the primary communication path between the slots in each bus segment. In this architecture the PCI Bus and the System Slot are single points of failure. A misbehaving Peripheral Slot can bring down the entire PCI Bus segment preventing communication between any of the slots. This single point of failure was a significant obstacle to the adoption of CompactPCI in High Availability applications. Early architects of CompactPCI High Availability systems had to overcome the limitation of the single point of failure PCI Bus. The typical solution was to add a second CompactPCI bus segment and duplicate the functionality in both bus segments. Figure 2 shows an example of a dual CompactPCI bus based architecture.

        
        In Figure 2 dual bus segments and dual System Slots are used to provide 
        redundancy for the single points of failures that exist in standard Compact 
        PCI architectures. In the Dual Segment architecture, each of the System 
        Slots can control either of the two PCI Bus Segments. By providing redundant 
        System Slots, a failure of either System Slot can now be compensated for. 
        This architecture also covers the potential fault of a PCI bus. If a fault 
        occurs in PCI Bus 1, then PCI Bus 2 is available to handle the task. The 
        engineering challenges with this kind of architecture are complicated. 
        The System Slots provide clocks, arbitration and interrupt servicing for 
        a bus segment. The failover of a System Slot requires that the clock drivers, 
        request/grant arbitration and interrupt controllers also transfer over 
        to the active System Slot. Knowing when a bus has failed and then being 
        able to bring up the redundant System Slot without impacting the total 
        system availability is difficult. In 1999 PICMG formed a subcommittee 
        to standardize an implementation of Redundant System Slots. The PICMG 
        2.13 Redundant System Slot specification was abandoned three years later. 
        PICMG 2.13 is the only subcommittee that was disbanded without completing 
        a specification. This is largely due to the complexities of the problem 
        and the propriety solutions that exist. It is clear that redundant system 
        slots in CompactPCI can be used to increase system availability but at 
        a cost and at a level of complexity that are prohibitive. Vendors that 
        provide this type of architecture are selling proprietary solutions - 
        not open architectures.
        
        Adding IP Data Transport to CompactPCI
In September 2001, PICMG approved the PICMG 2.16 Packet Switched Backplane specification. This specification defines 10/100/1000Mbit Ethernet interconnects between peripheral slots and fabric slots in a compact PCI segment. The fabric slots are redundant. PICMG 2.16 compliant systems have been deployed in a variety of applications. The ubiquitous nature of the Ethernet interconnects and the need for IP data transports has led to high levels of adoption among system providers. Figure 3 shows a typical PICMG 2.0 and 2.16 architecture.

In PICMG 2.16 compliant systems the IP data transport can be used as 
        the primary communications channel within the system. This communications 
        path has redundant links to redundant Fabric Slots
        The PICMG 2.16 specification allows an architect to avoid using the CompactPCI 
        bus altogether, and provides a way of increasing system availability without 
        increasing the cost of the system. PICMG 2.16 compliant systems are inherently 
        redundant - there is no single point of failure. The Ethernet fabric is 
        a convenient way to handle packet based data transport that we see in 
        next generation applications.
        
        The next step in the evolution of highly available CompactPCI systems 
        is the removal of the System Slot. As applications take advantage of the 
        IP interconnects in today's systems, the PCI bus is becoming an unused 
        expense. PICMG is working on a specification called CompactTCA. The CompactTCA 
        specification is expected to combine the system management capabilities 
        defined in AdvancedTCA (PICMG 3.0) the form factor defined in PICMG 2.0 
        and the data transport defined in PICMG 2.16. This architecture will not 
        contain a PCI bus. This kind of system will be able to support 24 Peripheral 
        slots and two Fabric Slots. The elimination of the PCI bus will reduce 
        the cost of the boards used in CompactPCI systems, reduce the complexities 
        of providing redundant system slots and increase the total slot count. 
        Figure 4 shows an example of a possible CompactTCA system.

Summary
PICMG 2.16 Packet Switched Backplane is a viable way to improve the availability 
        of systems built today. The elimination of single points of failure found 
        in first generation CompactPCI systems and the addition of redundant data 
        transports provide the building blocks necessary to achieve 5NINES availability. 
        Systems designers should beware of vendors providing products based on 
        proprietary Redundant System Slot architectures. These closed architecture 
        systems will not benefit from the CompactPCI ecosystem that exists today. 
        It is clear that CompactPCI systems using PICMG 2.16 Packet Switched backplanes 
        will provide the combination of point to point data transports and redundancy 
        necessary to achieve 5NINES availability as well as providing a migration 
        path to future technologies.
        
      
凌華科技供稿 CTI論壇編輯