DeepDive

Tuesday, June 21, 2016

Decoding Junos Commands

Decoding Show Chassis Fabric Destinations

The output of 'show chassis fabric destinations' command and some syslogs may contain what looks like the below. It can be annoying to look through these outputs and figure out which fpc is misbehaving so I thought I'd write a quick post to help me remember.

For this post, please refer the below graphic which contains a snip of what you'd see in "show log chassisd" (Apologies for the small graphic):

In the graphic I've labeled the pertinent parts. You can see that for a 20-slot mx2020 chassis there are 20x4-digit groups (or columns). Each grouping represents an slot-- the leftmost grouping is slot0 == fpc0. Each digit within a group represents a PFE and its connectivity/error status to the backplane, in the case of MPC6E cards, there are 4xPFEs.

As I mentioned each digit represents what the local PFE thinks of its connectivity to remote PFEs. In this case, we see that there may be an issue with FPC10 as code "6" indicates error. The complete list of codes is below (redux from Junos KB):

Fabric destinations state:
0: non-existent
2: enabled
3: disabled
6: dest-err and disabled

You may see code "3" appear in command output, particularly when the FPC is rebooting either due to manual intervention or MX2020 fabric healing event. A code of "0" appears when the slot is not populated or otherwise the system doesn't recognize the fpc.

Lastly code "6" can appear when fabric grant timeouts are observed and a remote PFE cannot talk to the local PFE or when the fabric manager has disabled the PFE<>S{CF]B link due the the former issue.

Decoding syslog messages

Often times you will have the below message logged in 'show log chassisd' when there are issues with the FPC. In this case, both 'show chassis fabric destinations' and 'show log chassisd' both point to the same fpc10 as having an issue.

You see this repeated in the syslog where X is the slot number for every populated slot.

Jun 2 11:34:44 re0-routerA fpcX XMCHIP(3): FO: Request timeout error - Number of timeouts 4, RC select 13, Stream 168

The more interesting part of this message is the Stream = N where N is some number 0 - 254. The way you determine the FPC having the problem is using the following formula:

N - offset / #pfes = fpc_slot

In our case:

N = 168
#pfes = 4 As MPC6E "Scuba" cards have 4xPFEs per card
offset = 128

AIUI, the origin of the 'offset' here is because the pfe-pfe flows are broken down into high-pri and low-pri streams. The offset for the high-pri stream is 128 (if anyone has any additional information on that please comment!)

Broken EIGRP in VPLS Context

Introduction

Commonly, Enterprise customers turn to carriers for providing WAN connectivity between remote offices. Today, most carriers today offer an emulated Layer 2 services on top of their MPLS cloud--a common carrier implementation being Virtual Private LAN Service ( VPLS ). While the details of the carrier implementation are outside the scope of this post, the interested reader may refer to RFCs listed at the end of this post for additional information.

As it relates to emulated Layer 2 WAN services, often Enterprise customers treat the WAN as an extension to the LAN. While this behavior is encouraged by the fact that you have one Layer 2 broadcast domain, it is folly. Particularly, many customers run IGPs ( EIGRP, OSPF, RIP, etc.) over this emulated Layer 2 service (I've even seen customers run STP!!). In the majority of cases, there are very few problems and the IGPs behave accordingly. However, in some instances, due to the service implementation of the provider or the software/hardware configuration of the customer, things go awry and IGPs behave in unexpected ways.

Background

In this post, I'll talk about an EIGRP issue I've seen in my environment, how it relates to L2VPN service implementation, and the work around I used to restore service.

For the purpose of this post, remote sites have two EIGRP speaking routers--RO-XA and RO-XB. Likewise datacenters have DC-XA and DC-XB. All the A-routers connect to VPLS provider A, likewise all B-routers connect to VPLS provider B. All routers are in single EIGRP domain.

Problem Description and Symptoms

While turning up one of our Regional Offices it was noticed that the EIGRP adjacency with multiple Regional Offices were flapping.

RO-1B#show ip eigrp neighbors gi0/1

EIGRP-IPv4 Neighbors for AS(10)

H Address Interface Hold Uptime SRTT RTO Q Seq

(sec) (ms) Cnt Num

4 10.0.248.69 Gi0/1 11 00:00:22 1 5000 1 0

6 10.0.248.72 Gi0/1 12 00:00:53 1 5000 1 0

5 10.0.248.67 Gi0/1 10 00:00:59 1 5000 1 0

3 10.0.248.76 Gi0/1 13 14:05:32 30 180 0 212

2 10.0.248.65 Gi0/1 11 14:05:32 20 144 0 646

RO-1B#

Looking in the logs, confirms how often the adjacency flaps. Since we will focus on DC-1B ( 10.0.248.69) below we see the flaps occurring regularly:

RO-1B#show clock

14:03:50.406 EDT Thu Oct 23 2014

RO-1B#show log | in Oct.*23 14.*.248.69

Oct 23 14:00:51.022: %DUAL-5-NBRCHANGE: EIGRP-IPv4 10: Neighbor 10.0.248.69 (GigabitEthernet0/1) is down: retry limit exceeded

Oct 23 14:00:51.826: %DUAL-5-NBRCHANGE: EIGRP-IPv4 10: Neighbor 10.0.248.69 (GigabitEthernet0/1) is up: new adjacency

Oct 23 14:02:11.338: %DUAL-5-NBRCHANGE: EIGRP-IPv4 10: Neighbor 10.0.248.69 (GigabitEthernet0/1) is down: retry limit exceeded

Oct 23 14:02:13.538: %DUAL-5-NBRCHANGE: EIGRP-IPv4 10: Neighbor 10.0.248.69 (GigabitEthernet0/1) is up: new adjacency

Oct 23 14:03:33.050: %DUAL-5-NBRCHANGE: EIGRP-IPv4 10: Neighbor 10.0.248.69 (GigabitEthernet0/1) is down: retry limit exceeded

Oct 23 14:03:34.866: %DUAL-5-NBRCHANGE: EIGRP-IPv4 10: Neighbor 10.0.248.69 (GigabitEthernet0/1) is up: new adjacency

We noted that although EIGRP adjacencies were flapping, there were no issues with unicast traffic or other broadcast type traffic. Indeed, between RO-1B and DC-1B ARP packets flowed freely. ICMP test ( all sizes) came back with no loss and latency within spec.

Buried in all this, we found another curiosity which is a focus of this post. We found that in the presence of flapping EIGRP adjacency, some adjacencies were half-formed:

RO-1B would have stable adjacency with DC-2B
RO-1B would have flapping adjacency with other Regional Offices
Other Regional Offices wouldn't even see or start a neighbor adjacency with RO-1B,

High-Level Troubleshooting Efforts:

So we had flapping adjacencies, one way adjacencies, and suboptimal routing, and apparent disregard for the EIGRP split-horizon/poison reverse rule. Our main concern was trying to understand the adjacency flaps--the other things were taken as side effects but are interesting enough that they warrant discussion.

First, lets clear up the preliminaries. We did the following, not necessarily in order:

Removed any policy-map configuration on th R01-B interface
Replaced R01-B with a different model/platform
Replaced physical wiring to the carriers CPE
Performed extended pings with sizes from 100 Bytes to 1250 Bytes (could be fragmentation issue)
Verified we can ping 224.0.0.10, the EIGRP multicast address

We had the carrier check the following for the devices in their network on the path RO-1B to their PE ( so called Access network in carrier parlance):

Fiber light levels from their CPE to PE were at as-built levels and tended not to fluctuate
Perform RFC2544 test for frame loss
Verify CPE hand off port had no errors/drops/oversubscription
Verify traffic was not being dropped due to BUM filtering. BUM filters, or broadcast/unknown unicast/multicast filters are a standard way to limit the amount of flooding occurs over an emulated Layer 2 service
Verify service configuration on their equipment

A special note about BUM filtering. Too much broadcast/unknown unicast/multicast traffic between customer sites can easily cause oversubscription/resources issues in the carrier network. This condition is too easily created by a customer using a bridge-like service due to customer misconfiguration. This is because the carrier PE must replicate ( if they use ingress replication) each BUM packet and send to the other PEs in the L2VPN--which comes at a cost. The interested reader may refer to RFC5501 ( Section 4 ) AND RFC4761 ( Section 4 ) for a detailed treatment of this problem in a carrier environment.

Finding a work around:

As mentioned previously, there were various canaries in the coal mine and were key clues to finding a work around:

We could ping the EIGRP multicast address (224.0.0.10) and did receive replies from some of the EIGRP speakers. For each non-replying router, we could still reach the router via unicast ping
While EIGRP sessions were flapping, unicast traffic worked just fine

The work around was to use EIGRP unicast mode ( example here)-- instead of EIGRP speakers using multicast to discover neighbors, we explicitly configure the neighbors. All routers then send EIGRP packets directly to configured neighbors.

As outlined in Cisco FAQs, this feature isn't without is caveats. Namely, once you configure unicast neighbors on an interface, all multicast packet processing stops. It's all or nothing.

Conclusion:

While I've seen this issue in an EIGRP context, there's no good reason why other multicast-based IGPs or protocols wouldn't be impacted. This includes protocols that only use multicast for a subset of messages/functionality. Their behaviors under a carrier-multicast outage will differ. In our case, we had suboptical routing ( using Remote sites with lower bandwidth profile as transit for site-to-site flows), equipment failures/issues due to reconvergence, and even sites being isolated from the network.

Essentially, the problem was with the carrier's ability to deal with multicast/broadcast traffic. As we've learned from RFC7117 there are different approaches to the BUM problem in a VPLS carrier environment. Some implementations use ingress replication exclusively ( Junos PDF) ans some use formal multicast distribution tree within the MPLS cloud ( Cisco http).

While I'll never really know the specifics of the problem within the carrier's network, I'm told it was a router software bug.

Cheers!

References:

Normative References

All Virtual Private LAN Services RFCs ( listing ); in particular RFC4761, RFC5501 and RFC7117

Informative References

RFC3031: Multiprotocol Label Switching Architecture ( http )

RFC4364: BGP/MPLS IP Virtual Private Networks (VPNS) ( http )

RFC3107: Carrying Label Information in BGP-4 ( http )

[EIGRP-Protocol] "Enhanced Interior Gateway Routing Protocol" Technology Whitepaper ( http )

Monday, November 24, 2014

SDN, OpenFlow, and Google

SDN for Everyone!

Let's face it, everyone is talking about SDN-- Software Defined Networking. Vendors all up and down the infrastructure stack are evangelizing SDN. As a future-focused kind of engineer I recently had a chance to watch what Goolge is doing ( has done ) with OpenFlow in their environment ( video ). From a conceptual perspective, SDN isn't new-- we've used SDN in various incarnations since the dawn of networking. For example, leveraging SNMP to send a trap to an NMS and having that NMS server send an SNMP set to bring down a link or backup a configuration. Another example would be Remotely Triggered Black Hole routing ( RFC5635 ) whereby a controlling component signals all other nodes to discard a source or a destination IP address ( in this case the triggering component could be an IPS/IDS or an application that detects unwanted behavior). Still another would be RSVP ( RFC2205 )which is a solid first-attempt at making the network more application aware.

Leading from in Front

What was so impressive about the Google talk was not the technology but the leadership and focus on Engineering. In the end, the business needs to run applications that people use to bring in money-- its not particularly interested in how the network provides such a service. Introducing OpenFlow, as Urs explained, was not a small-risk proposition. Certainly any business would find it better to stick with a system that works now than to go with a system that may work later ( or otherwise has characteristics that may be useful in the future). There's where the technical leadership comes in. All too often the best technical solution is cast aside for the status quo or the "tried and true." Google's OpenFlow rollout is a prime example of the technical organization understanding the risks, deciding the path, executing, and, hopefully reaping the rewards.

Cannot Improve What you Can't Measure

Before Google rolled out a single change, they took the time and invested in making sure they could properly test the change. What that meant for Google was building a simulation environment--it paid big dividends in that they could prototype their controller in an environment that closely matched the operational parameters and characteristics of their network. Without this, the development-deployment feedback loop is broken as there's no way to precisely measure the impact of your change. In this case, the change is in shifting how path selection is done. Instead of a report driven, trending, analysis loop--path selection is near real-time, based on the real-time constraints of the network and needs of the application. Google envisioned the tool needed to test the product, not just the tool itself.

Friday, August 1, 2014

Hello World

Hello World!

Welcome to DeepDive a blog focused on diving deep into all manner of topics related to Network and System Engineering. I have an background in Computer Science and work professionally as a Network Engineer. If I'm not deploying some new equipment, researching various features, or haunting Cisco's learning network forums, you can find me researching various aspects of Network Engineering.

Why blah, Joe? I do it for the soapbox

I think its useful to share and develop ideas in a collaborative way. I find that I learn things much more thoroughly when I'm forced to explain it to someone else or answer someone's question. In this blog, nothing is remedial and I will try to make no assumptions and keep each post as complete as I can. I'm looking forward to feedback--correct me if I'm wrong, tell me my idea is flawed..I learn, you learn, we all learn.

I'm still relatively new to the field and have a lot to learn about the discipline of Network Engineering. For me, Network Engineering is not just about knowing how to design, implement, and troubleshoot network related hardware/software. Its not just about plugging in commands are sticking to tried and true designs/methodologies. To me, Network Engineering is a philosophy-- a lifestyle. Its a way of thinking and doing. Too many Network Engineers are only superficially interested in what they do. Many do not care what happens under the hood or how things actually work. Many focus on certification, achieving only a superficial knowledge of the curriculum. Still many, due to personal/professional obligations, do not have the time to invest in going deep and looking critically at applications, protocols, architectures, and designs. I hope to address those short comings in myself and others with this blog.

Enough of the soapbox. What's the blag about anyways!?

So you got a blog, who cares? What's this about, Joe?

Simply put: Messing with stuff. It is about verification and validation of the things I'm told by vendors, things I tell myself about my work, and things others tell me about Networking. I will post/review code and packet dumps, datasheets, interesting articles and just about anything else under the sun related to network applications, protocols, design, and architecture. If it talks on a network, I will try to cover it in detail.

Okay, this is too long, I'm leaving now...

While this blog is aimed at technical folks, I don't assume every knows everything all the time--I won't assume I'll remember those things going forward. As a result, I'll try to make my post as complete, verbose, and engaging as possible. My post will be long, drawn-out, and verbose. If your looking for a quick answer--flee from this place.

Hoping you enjoy!