Tuesday, June 21, 2016

Decoding Junos Commands

Decoding Show Chassis Fabric Destinations


The output of 'show chassis fabric destinations' command and some syslogs may contain what looks like the below. It can be annoying to look through these outputs and figure out which fpc is misbehaving so I thought I'd write a quick post to help me remember. 

For this post, please refer the below graphic which contains a snip of what you'd see in "show log chassisd" (Apologies for the small graphic):








In the graphic I've labeled the pertinent parts. You can see that for a 20-slot mx2020 chassis there are 20x4-digit groups (or columns). Each grouping represents an slot-- the leftmost grouping is slot0 == fpc0. Each digit within a group represents a PFE and its connectivity/error status to the backplane, in the case of MPC6E cards, there are 4xPFEs. 

As I mentioned each digit represents what the local PFE thinks of its connectivity to remote PFEs. In this case, we see that there may be an issue with FPC10 as code "6" indicates error. The complete list of codes is below (redux from Junos KB):

Fabric destinations state:
   0: non-existent
   2: enabled
   3: disabled
   6: dest-err and disabled


You may see code "3" appear in command output, particularly when the FPC is rebooting either due to manual intervention or MX2020 fabric healing event. A code of "0" appears when the slot is not populated or otherwise the system doesn't recognize the fpc. 

Lastly code "6" can appear when fabric grant timeouts are observed and a remote PFE cannot talk to the local PFE or when the fabric manager has disabled the PFE<>S{CF]B link due the the former issue.


Decoding syslog messages


Often times you will have the below message logged in 'show log chassisd' when there are issues with the FPC. In this case, both 'show chassis fabric destinations' and 'show log chassisd' both point to the same fpc10 as having an issue.

You see this repeated in the syslog where X is the slot number for every populated slot.

Jun  2 11:34:44  re0-routerA fpcX XMCHIP(3): FO: Request timeout error - Number of timeouts 4, RC select 13, Stream 168
The more interesting part of this message is the Stream = N where N is some number 0 - 254. The way you determine the FPC having the problem is using the following formula:

N - offset / #pfes = fpc_slot
In our case:

N = 168
#pfes = 4 As MPC6E "Scuba" cards have 4xPFEs per card
offset = 128
AIUI, the origin of the 'offset' here is because the pfe-pfe flows are broken down into high-pri and low-pri streams. The offset for the high-pri stream is 128 (if anyone has any additional information on that please comment!)

No comments:

Post a Comment