Tuesday, June 21, 2016

Decoding Junos Commands

Decoding Show Chassis Fabric Destinations


The output of 'show chassis fabric destinations' command and some syslogs may contain what looks like the below. It can be annoying to look through these outputs and figure out which fpc is misbehaving so I thought I'd write a quick post to help me remember. 

For this post, please refer the below graphic which contains a snip of what you'd see in "show log chassisd" (Apologies for the small graphic):








In the graphic I've labeled the pertinent parts. You can see that for a 20-slot mx2020 chassis there are 20x4-digit groups (or columns). Each grouping represents an slot-- the leftmost grouping is slot0 == fpc0. Each digit within a group represents a PFE and its connectivity/error status to the backplane, in the case of MPC6E cards, there are 4xPFEs. 

As I mentioned each digit represents what the local PFE thinks of its connectivity to remote PFEs. In this case, we see that there may be an issue with FPC10 as code "6" indicates error. The complete list of codes is below (redux from Junos KB):

Fabric destinations state:
   0: non-existent
   2: enabled
   3: disabled
   6: dest-err and disabled


You may see code "3" appear in command output, particularly when the FPC is rebooting either due to manual intervention or MX2020 fabric healing event. A code of "0" appears when the slot is not populated or otherwise the system doesn't recognize the fpc. 

Lastly code "6" can appear when fabric grant timeouts are observed and a remote PFE cannot talk to the local PFE or when the fabric manager has disabled the PFE<>S{CF]B link due the the former issue.


Decoding syslog messages


Often times you will have the below message logged in 'show log chassisd' when there are issues with the FPC. In this case, both 'show chassis fabric destinations' and 'show log chassisd' both point to the same fpc10 as having an issue.

You see this repeated in the syslog where X is the slot number for every populated slot.

Jun  2 11:34:44  re0-routerA fpcX XMCHIP(3): FO: Request timeout error - Number of timeouts 4, RC select 13, Stream 168
The more interesting part of this message is the Stream = N where N is some number 0 - 254. The way you determine the FPC having the problem is using the following formula:

N - offset / #pfes = fpc_slot
In our case:

N = 168
#pfes = 4 As MPC6E "Scuba" cards have 4xPFEs per card
offset = 128
AIUI, the origin of the 'offset' here is because the pfe-pfe flows are broken down into high-pri and low-pri streams. The offset for the high-pri stream is 128 (if anyone has any additional information on that please comment!)

Broken EIGRP in VPLS Context

Introduction

Commonly, Enterprise customers turn to carriers for providing WAN connectivity between remote offices. Today, most carriers today offer an emulated Layer 2 services on top of their MPLS cloud--a common carrier implementation being Virtual Private LAN Service ( VPLS ). While the details of the carrier implementation are outside the scope of this post, the interested reader may refer to RFCs listed at the end of this post for additional information.

As it relates to emulated Layer 2 WAN services, often Enterprise customers treat the WAN as an extension to the LAN. While this behavior is encouraged by the fact that you have one Layer 2 broadcast domain, it is folly. Particularly, many customers run IGPs ( EIGRP, OSPF, RIP, etc.) over this emulated Layer 2 service (I've even seen customers run STP!!). In the majority of cases, there are very few problems and the IGPs behave accordingly. However, in some instances, due to the service implementation of the provider or the software/hardware configuration of the customer, things go awry and IGPs behave in unexpected ways.

Background

In this post, I'll talk about an EIGRP issue I've seen in my environment, how it relates to L2VPN service implementation, and the work around I used to restore service.

For the purpose of this post, remote sites have two EIGRP speaking routers--RO-XA and RO-XB. Likewise datacenters have DC-XA and DC-XB. All the A-routers connect to VPLS provider A, likewise all B-routers connect to VPLS provider B. All routers are in single EIGRP domain.

Problem Description and Symptoms

While turning up one of our Regional Offices it was noticed that the EIGRP adjacency with multiple Regional Offices were flapping. 
RO-1B#show ip eigrp neighbors gi0/1
EIGRP-IPv4 Neighbors for AS(10)
H   Address                 Interface              Hold Uptime   SRTT   RTO  Q  Seq
                                                   (sec)         (ms)       Cnt Num
4   10.0.248.69             Gi0/1                    11 00:00:22    1  5000  1  0
6   10.0.248.72             Gi0/1                    12 00:00:53    1  5000  1  0
5   10.0.248.67             Gi0/1                    10 00:00:59    1  5000  1  0
3   10.0.248.76             Gi0/1                    13 14:05:32   30   180  0  212
2   10.0.248.65             Gi0/1                    11 14:05:32   20   144  0  646
RO-1B#

Looking in the logs, confirms how often the adjacency flaps. Since we will focus on DC-1B ( 10.0.248.69) below we see the flaps occurring regularly:

RO-1B#show clock
14:03:50.406 EDT Thu Oct 23 2014
RO-1B#show log | in Oct.*23 14.*.248.69
Oct 23 14:00:51.022: %DUAL-5-NBRCHANGE: EIGRP-IPv4 10: Neighbor 10.0.248.69 (GigabitEthernet0/1) is down: retry limit exceeded
Oct 23 14:00:51.826: %DUAL-5-NBRCHANGE: EIGRP-IPv4 10: Neighbor 10.0.248.69 (GigabitEthernet0/1) is up: new adjacency
Oct 23 14:02:11.338: %DUAL-5-NBRCHANGE: EIGRP-IPv4 10: Neighbor 10.0.248.69 (GigabitEthernet0/1) is down: retry limit exceeded
Oct 23 14:02:13.538: %DUAL-5-NBRCHANGE: EIGRP-IPv4 10: Neighbor 10.0.248.69 (GigabitEthernet0/1) is up: new adjacency
Oct 23 14:03:33.050: %DUAL-5-NBRCHANGE: EIGRP-IPv4 10: Neighbor 10.0.248.69 (GigabitEthernet0/1) is down: retry limit exceeded
Oct 23 14:03:34.866: %DUAL-5-NBRCHANGE: EIGRP-IPv4 10: Neighbor 10.0.248.69 (GigabitEthernet0/1) is up: new adjacency
We noted that although EIGRP adjacencies were flapping, there were no issues with unicast traffic or other broadcast type traffic. Indeed, between RO-1B and DC-1B ARP packets flowed freely. ICMP test ( all sizes) came back with no loss and latency within spec.

Buried in all this, we found another curiosity which is a focus of this post. We found that in the presence of flapping EIGRP adjacency, some adjacencies were half-formed:

  • RO-1B would have stable adjacency with DC-2B
  • RO-1B would have flapping adjacency with other Regional Offices
  • Other Regional Offices wouldn't even see or start a neighbor adjacency with RO-1B, 

High-Level Troubleshooting Efforts:

So we had flapping adjacencies, one way adjacencies, and suboptimal routing, and apparent disregard for the EIGRP split-horizon/poison reverse rule. Our main concern was trying to understand the adjacency flaps--the other things were taken as side effects but are interesting enough that they warrant discussion.

First, lets clear up the preliminaries. We did the following, not necessarily in order:
  • Removed any policy-map configuration on th R01-B interface
  • Replaced R01-B with a different model/platform
  • Replaced physical wiring to the carriers CPE
  • Performed extended pings with sizes from 100 Bytes to 1250 Bytes (could be fragmentation issue)
  • Verified we can ping 224.0.0.10, the EIGRP multicast address
We had the carrier check the following for the devices in their network on the path RO-1B to their PE ( so called Access network in carrier parlance):
  • Fiber light levels from their CPE to PE were at as-built levels and tended not to fluctuate
  • Perform RFC2544 test for frame loss
  • Verify CPE hand off port had no errors/drops/oversubscription
  • Verify traffic was not being dropped due to BUM filtering. BUM filters, or broadcast/unknown unicast/multicast filters are a standard way to limit the amount of flooding occurs over an emulated Layer 2 service
  • Verify service configuration on their equipment
A special note about BUM filtering. Too much broadcast/unknown unicast/multicast traffic between customer sites can easily cause oversubscription/resources issues in the carrier network. This condition is too easily created by a customer using a bridge-like service due to customer misconfiguration. This is because the carrier PE must replicate ( if they use ingress replication) each BUM packet and send to the other PEs in the L2VPN--which comes at a cost. The interested reader may refer to RFC5501 ( Section 4 ) AND RFC4761 ( Section 4 ) for a detailed treatment of this problem in a carrier environment.


Finding a work around:

As mentioned previously, there were various canaries in the coal mine and were key clues to finding a work around:


  • We could ping the EIGRP multicast address (224.0.0.10) and did receive replies from some of the EIGRP speakers. For each non-replying router, we could still reach the router via unicast ping
  • While EIGRP sessions were flapping, unicast traffic worked just fine

The work around was to use EIGRP unicast mode ( example here)-- instead of EIGRP speakers using multicast to discover neighbors, we explicitly configure the neighbors. All routers then send EIGRP packets directly to configured neighbors.

As outlined in Cisco FAQs, this feature isn't without is caveats. Namely, once you configure unicast neighbors on an interface, all multicast packet processing stops. It's all or nothing.


Conclusion:

While I've seen this issue in an EIGRP context, there's no good reason why other multicast-based IGPs or protocols wouldn't be impacted. This includes protocols that only use multicast for a subset of messages/functionality. Their behaviors under a carrier-multicast outage will differ. In our case, we had suboptical routing ( using Remote sites with lower bandwidth profile as transit for site-to-site flows), equipment failures/issues due to reconvergence, and even sites being isolated from the network.

Essentially, the problem was with the carrier's ability to deal with multicast/broadcast traffic. As we've learned from RFC7117 there are different approaches to the BUM problem in a VPLS carrier environment. Some implementations use ingress replication exclusively ( Junos PDF) ans some use formal multicast distribution tree within the MPLS cloud ( Cisco http).

While I'll never really know the specifics of the problem within the carrier's network, I'm told it was a router software bug.

Cheers!


References:

Normative References

All Virtual Private LAN Services RFCs ( listing ); in particular RFC4761, RFC5501 and RFC7117

Informative References

RFC3031: Multiprotocol Label Switching Architecture ( http )
RFC4364: BGP/MPLS IP Virtual Private Networks (VPNS) ( http )
RFC3107: Carrying Label Information in BGP-4 ( http )
[EIGRP-Protocol] "Enhanced Interior Gateway Routing Protocol" Technology Whitepaper ( http )