| C H A P T E R 1 |
|
Introduction to InterDomain Networks |
This chapter contains an overview of IDN and information about Domain IP addresses, dynamic reconfiguration (DR), memory error handling, network-wide arbstops, and system commands and daemons.
|
Note - For information and procedures on how to configure IDNs, refer to the Sun Enterprise 10000 Domain Configuration Guide in the Solaris 8 6/00 on Sun Hardware Answerbook Collection. |
The InterDomain Network (IDN) feature supports high-speed networking between dynamic system domains (or simply, domains) within a single Sun Enterprise 10000 platform. The IDN driver is a DLPI exporting driver that allows domains to communicate with each other using standard networking interfaces, such as Transmission Control Protocol/Internet Protocol (TCP/IP). However, an IDN requires no cabling or special hardware.
IDNs take advantage of the Sun Enterprise 10000 hardware features that enable any set of resident domains to communicate among themselves over the system centerplane using shared memory. A shared memory region (SMR) is used as a conduit for network packets. The SMR is maintained in one domain in the IDN and is used by all other domains in that IDN.
There may be multiple, independent IDNs within a single Sun Enterprise 10000 platform. Each network can comprise multiple logical network interfaces or channels, with each channel representing a separate IP subnet. Configure the number of networks, and the domains that make up a particular network, based on the performance considerations of your applications. For example, consider which domains require high-speed connectivity and also have sufficient processing power to effectively take advantage of the InterDomain Networks feature.
IDNs can be used for many purposes. For example, IDNs can be used for the following reasons:
To link domains to an IDN or to create an IDN, use the domain_link(1M) command. The order in which you specify the domain names is not significant. For instructions on how to use the domain_link(1M) command, see To Use the domain_link(1M) Command With Inactive Domains.
Whenever an argument to domain_link(1M) specifies a domain that is already part of an IDN, all other domains in that IDN are also linked by the domain_link(1M) command.
Note that when you link domains together in an IDN, each domain can communicate directly with the other domains in the network by using the shared memory region (SMR). There is no priority given to the domains based on the order in which they were added to an IDN.
Only one domain in an IDN is denoted as the master domain. The master domain maintains the SMR, which is used as a conduit for network traffic. For example, if domain_a is the master domain, domain_b and domain_c communicate with each other using the SMR maintained on domain_a.
When you create a new IDN out of two domains that do not belong to an existing IDN, the master domain is automatically chosen by the system. After this decision is made, the master domain cannot be changed unless you unlink the master domain or unless the master domain hangs and the network is automatically reconfigured to use an alternate master. An exception to this rule occurs when two existing IDNs are merged by using a single domain_link(1M) command. In this case, the system determines which domain from among the two current master domains will become the master domain for the new IDN.
The system chooses the master domain by determining which domain has the greatest processing power and the widest memory bandwidth, which is a function of how many system boards with memory are contained within a domain. The domain with the greatest overall capacity is used as the master domain because it has the responsibility of servicing IDN buffer requests on behalf of other domains.
To unlink a domain from an IDN, use the domain_unlink(1M) command, which accepts one or more domains as a parameter. When you unlink a domain, the system broadcasts a message to the remaining domains in the IDN to inform them that they should no longer attempt to communicate with the outgoing domain. Other domains in the network continue to communicate with each other without interruption, both during and after the unlink operation.
Although there is no particular order in which you must deconfigure an IDN link and its associated network interface(s), Sun suggests that you deconfigure the network interface by using the ifconfig(1M) command before you unlink the domain to prevent users from unnecessarily using the disconnected link.
By default, the system will not perform an unlink operation on an active domain if any domain within the same IDN is in an unknown (AWOL) state, such as halted or hung. The state of the domain is detected and reported when you perform the unlink operation.
You can use one of two force options, -f or -F, to bypass the check for domains in an unknown state and to force the unlink operation to proceed. With the soft force option, -f, the domain_unlink(1M) command attempts to unlink all of the specified domains in the standard manner; however, if a time-out condition occurs due to the presence of an AWOL domain within the IDN, the domain_unlink(1M) command uses the -F option to remove the link, forcing the domain to be unlinked.
With the hard force option, -F, the domain_unlink(1M) command disconnects the specified domain from all of the other domains in the IDN and does so without synchronizing the disconnections. Use this option only when the specified domain is completely nonresponsive (that is, not responding to log in requests) or when it must be isolated from the IDN as part of AWOL recovery.
You can dismantle an entire IDN in a single operation, which isolates each domain that is a member of the IDN. Execute the domain_unlink(1M) command with at least n-1 names of the domains in the IDN, where n is the total number of domains within the IDN.
The IDN subsystem, in conjunction with support from the SSP, can automatically link and unlink domains. Automatic linking occurs at boot time if the domain has been configured as part of the IDN. Automatic unlinking occurs when one or more IDN members detect and report that another IDN member is not responding to IDN requests. If the master domain is nonresponsive, a new master domain wil be elected from the available domains after the master is unlinked. Although the domain is automatically unlinked, the domain_status(1M) command still reports the domain as being linked.
DR operations work on individual domains within an IDN. The IDN traffic to and/or from the target domain is paused for only a brief period of time while DR operations are executed on the domain.
When you attach a board to a domain that is part of an IDN, the following sequence of actions occur:
1. You perform the Init Attach operation.
2. You perform the Complete Attach operation, at which point DR unlinks the domain in which the board resides from the IDN. DR saves the IDN configuration information internally so that DR can relink the domain after the Complete Attach operation.
3. DR then performs the Complete Attach operation.
4. After the Complete Attach operation completes successfully, DR relinks the domain to the IDN.
During a Detach operation, the following sequence of actions occur:
1. You perform the Drain operation.
2. After the Drain operation has completed and you have selected the Complete Detach operation, DR unlinks the domain from the IDN. DR saves the IDN configuration information internally so that it can automatically relink the domain after the Drain operation.
3. DR then performs the Complete Detach.
4. After the Complete Detach operation, DR relinks the domain to the IDN.
|
Note - The DR Complete Attach and Complete Detach operations must finish in a timely manner to prevent TCP/IP connections across the IDN from timing out. Typically, the timeout value is two minutes. |
An arbitration stop, or arbstop, of the domain causes the domain to freeze and all hardware-level transactions to cease. When an arbstop occurs within a domain that is part of an IDN, subsequent arbstops occur in all of the other domains in that same IDN.
|
Note - Domains that are not members of an arbstopped IDN are not affected by the arbstop. |
Normally, this is not a problem because arbstops rarely occur. However, if another domain in that IDN is in an unknown state and possibly attempting to communicate with the domain being unlinked, the unlink command can cause arbstops to occur, especially when it is used with the force option.
If a domain or cluster arbstop occurs, the current BBSRAM and arbstop information is dumped to the following files:
To understand the conditions under which arbstops can occur, consider the hardware architecture that allows system boards to communicate with each other. In the following illustration, three domains exist, none of which are members of an IDN.
The interconnect (that is, the backplane of the system) contains shared memory domain registers, and each board contains shared memory mask registers. These registers work to support interboard communication within a domain, and they facilitate interdomain communication in the IDN environment. The shared memory domain registers on the interconnect allow a message to be forwarded to a particular destination board only if the registers are programmed to allow the originating board to send a message to the specified destination board. Correspondingly, the shared memory mask registers on a destination board allow an incoming message to be accepted only if the registers are programmed to allow that destination board to receive a message from the originating board.
In FIGURE 1-3, the shared memory domain registers on the interconnect for the three domains have been grouped to allow forwarding of messages between the domains. In addition, the boards in each domain now have shared memory mask registers programmed to accept messages from the other domains.
In addtion to certain hardware failures, an arbstop can occur if any board attempts to send a message to another board and if either the shared memory domain registers, or the shared memory mask registers, do not allow communication between the two boards. During the linking or unlinking process, the domain_link(1M) and domain_unlink(1M) commands reprogram these registers to enable or disable cross-domain transactions.
If one domain is in an unknown state (for example, partially hung) and if you attempt to unlink another domain in the IDN, the domain_unlink(1M) command will fail unless you use one of the force options, -f or -F, because the domain_unlink(1M) command needs to communicate to the hung domain to ensure that all interdomain transactions have ceased before it reprograms the registers that are related to the shared memory. The domain_unlink(1M) command does not reprogram the shared memory domain registers if the domain is not responding. If you force the unlink to proceed, the shared memory domain registers are reprogrammed; however, the shared memory mask registers on the hung domain are not reprogrammed, so the IDN software on that domain is not aware that the disconnect has taken place.
After all of the IDN-specific registers have been reprogrammed, if the hung domain attempts to communicate with another domain, the hung domain, and any other domains in that IDN can arbstop. Thus, use the force option only if you are certain that the hung domain will not attempt any further communication with the other domains in the same network. To reduce the potential for such an arbstop, either reboot the hung domain or unlink it from the IDN before you unlink any other domain.
|
Caution - Do not use the force option on a known active domain unless it is absolutely necessary. |
The InterDomain Networks feature affects the behavior of several SSP commands. This section contains an explanation of the behavior of the commands that are affected by IDNs.
The following table contains a list of the SSP commands affected by IDNs.
|
You must unlink any domain that is in an unknown state (AWOL) before you use the bringup(1M) command to reboot any other domain in the IDN. Note that if multiple domains within the same IDN are hung, you must unlink all of the hung domains simultaneously. In addition, you cannot unlink any nonresponsive domain when other nonresponsive domains are present in the same IDN. Finally, the bringup(1M), domain_link(1M), and domain_unlink(1M) commands cannot run concurrently. |
|
|
You cannot remove a domain that is currently a member of an IDN. The domain must first be unlinked; then, you can remove it. |
|
|
DR commands and IDN commands cannot run concurrently. See Dynamic Reconfiguration and IDNs for more information about the dr(1M) command. |
|
|
When edd(1M) produces a dump file as a result of an arbstop or recordstop in a domain that is part of an IDN, the dump file is based on the entire set of boards comprising all of the domains that are members of the IDN. The edd(1M) daemon is also required to enable automatic linking of domains and AWOL recovery. |
|
|
By default, you cannot issue a hostint(1M) operation to a domain that is a member of an IDN. Although you can override this restriction by using the force option, you should unlink the domain first. |
|
|
When hpost -Wc is used on a domain that is part of an IDN, it clears recordstops on all of the boards within all of the domains in that IDN. |
|
|
You cannot power off a system board within a domain that is a member of an IDN unless you use the force option with the power(1M) command (refer to the power(1M) man page for more information on the use of the force option). The domain must first be unlinked from the IDN. Then, you can use the power(1M) command to power off the board. |
|
|
By default, you cannot issue a sigbcmd obp or panic operation to a domain that is currently a member of an IDN. Although you can override this restriction by using the force option, you should unlink the domain first. |
Copyright © 2002, Sun Microsystems, Inc. All rights reserved.