The Setup
Cisco's DevNet XRd sandbox provides a pre-built SR-MPLS topology running containerized IOS-XR (XRd) routers. The sandbox gives you a working starting point — I focused on understanding the existing configuration in depth, running verification scenarios, and automating state collection against it using NETCONF.
The topology is 7 XRd routers plus a PCE and a virtual route reflector, running IS-IS as the IGP with SR-MPLS extensions. What makes it interesting is that it's not just basic SR — it includes Flex-Algo (delay-based routing), a live PCE doing dynamic path computation via PCEP, and on-demand SR-TE policies triggered by BGP color communities.
Four parallel paths exist between xrd-1 (ingress PE) and xrd-2 (egress PE) through the core. The topology uses anycast SIDs to provide ECMP across the parallel paths — label 17001 resolves to either xrd-3 or xrd-5, and label 17101 resolves to either xrd-4 or xrd-6.
Flex-Algo: Constraint-Based Topology Separation
The most interesting feature in this topology is Flex-Algo. Standard IS-IS
metrics are administrative costs — they don't reflect actual fiber latency.
Flex-Algo 128 is defined with metric-type delay, which means it
runs a separate SPF using link delay as its metric rather than IS-IS cost.
In a production environment with Performance Measurement running, those delay
values would be dynamically measured and flooded as IS-IS delay TLVs —
the SPF would then route around degraded fiber automatically.
Two algorithms are defined. FA-128 is configured for delay-metric routing, constraining paths to links tagged as "blue" (intended low-latency), excluding "red" links. FA-129 does the opposite — uses standard IGP cost, prefers red links, and avoids blue links, providing intentional path diversity for traffic that should take a different physical route.
router isis 1
affinity-map red bit-position 128
affinity-map blue bit-position 129
!
flex-algo 128
metric-type delay ! Use measured delay, not IS-IS cost
advertise-definition
affinity exclude-any red ! Never use high-latency links
affinity include-all blue ! Only use verified low-latency links
!
flex-algo 129
metric-type igp
advertise-definition
affinity exclude-any blue ! Avoid low-lat links (path diversity)
affinity include-any red
Each router advertises three prefix-SIDs on its Loopback0 — one per algorithm.
The label value is always SRGB_base + index. With SRGB base 16000,
xrd-1's three loopback SIDs are: 16101 (base algo 0), 16201 (FA-128 delay),
16301 (FA-129 affinity). The same three-tier structure applies to the anycast
SIDs on Loopback1 — the shared loopback between xrd-3/xrd-5 and xrd-4/xrd-6.
This is the key design insight: one Loopback1 per anycast pair, three SIDs
on it, each anchoring a different forwarding topology:
interface Loopback1 ipv4 address 101.103.105.255 255.255.255.255 ! shared anycast IP ! prefix-sid index 1001 n-flag-clear ! base algo → label 17001 prefix-sid algorithm 128 index 1002 n-flag-clear ! FA-128 delay → label 17002 prefix-sid algorithm 129 index 1003 n-flag-clear ! FA-129 affinity → label 17003
The n-flag-clear is what makes these anycast — it clears the
Node flag in the IS-IS prefix-SID advertisement, telling other routers this
SID is shared by multiple nodes rather than being unique to one. Without it,
two routers advertising the same SID index would be a conflict. With it, the
SPF resolves the label to whichever node is topologically closest, providing
automatic ECMP and resilience.
In a production low-latency network (HFT, mobile xhaul), Performance Measurement would be running on every link — TWAMP-Light probes measuring actual one-way delay, with IOS-XR flooding updated delay sub-TLVs into IS-IS whenever a link's measured latency changes. FA-128 would then automatically route around degraded fiber without any operator intervention.
In this sandbox, PM is not
configured (PM process is not running), so no delay TLVs are
being flooded — confirmed by show isis database verbose | include delay
returning empty. FA-128 is correctly defined and its per-algorithm SIDs
(17002/17102) are in the label table, but every link effectively has
delay = 0, making the FA-128 SPF produce the same result as the base SPF.
The configuration and SID structure are production-accurate; only the
measurement input is missing.
PCE/PCEP: Dynamic Path Computation
xrd-7 is the PCE. It receives the full IS-IS topology via BGP-LS (the
distribute link-state command in IS-IS exports the topology
database into BGP, which xrd-7 receives). When xrd-1 needs a path computed,
it sends a PCReq to xrd-7 via PCEP, and xrd-7 responds with a label stack.
The on-demand color mechanism is what triggers this. When xrd-2 advertises
a VPN route into BGP VPNv4 with color community 100, xrd-1 sees the color,
matches it against its on-demand color 100 config, and
automatically requests a path from the PCE. No static SR-TE policy config
required on the head-end.
RP/0/RP0/CPU0:xrd-1# show segment-routing traffic-eng pcc ipv4 peer PCC's peer database: Peer address: 100.100.100.107 Precedence: 255, (best PCE) State: up Capabilities: Stateful, Update, Segment-Routing, Instantiation
Once the PCEP session is up and BGP converges, the PCE-computed policy appears with the full label stack:
Color: 100, End-point: 100.100.100.102
Status: Admin: up Operational: up
Candidate-paths:
Preference: 100 (BGP ODN) (active)
Dynamic (pce 100.100.100.107) (valid)
Metric Type: IGP, Path Accumulated Metric: 3
SID[0]: 17001 [Prefix-SID, 101.103.105.255] ← anycast xrd-3 or xrd-5
SID[1]: 17101 [Prefix-SID, 101.104.106.255] ← anycast xrd-4 or xrd-6
SID[2]: 16102 [Prefix-SID, 100.100.100.102] ← xrd-2 (destination PE)
Binding SID: 24006
The PCE chose anycast SIDs for the middle hops — label 17001 resolves to whichever of xrd-3/xrd-5 is closer, providing automatic ECMP. The core doesn't need to know the specific path; it just executes the top label instruction at each hop.
Notice the metric type: IGP (1), Accumulated Metric 3. The
on-demand color 100 policy on xrd-1 explicitly requests
metric type igp from the PCE, so the PCE runs a standard
IGP-cost SPF and returns base-algorithm anycast SIDs — 17001 and 17101.
This is algorithm 0, the base topology.
If the policy instead requested metric type latency, the PCE
would run the FA-128 delay-metric SPF and return the FA-128 anycast SIDs —
17002 and 17102 — steering traffic onto the delay-constrained topology.
The PCE picks which SID tier to use based entirely on what metric the
head-end requests. This is the elegance of the three-tier anycast design:
one label stack change at the head-end shifts the entire traffic flow to a
different forwarding topology without touching any P router config.
The DevNet sandbox uses Docker 24.0.5 which has a known bug where containers can't attach to a macvlan network simultaneously with multiple bridge networks. I worked around this by starting each container with only the mgmt network, then attaching data plane networks one at a time post-start. If you're trying to replicate this and hitting "Container cannot be connected to network endpoints" errors — that's why.
TI-LFA: Sub-Second Failover
TI-LFA pre-computes backup paths for every protected prefix and installs them in the FIB before any failure occurs. When a link goes down, the FIB swap is instantaneous — no reconvergence wait, no RSVP signaling, no path computation under pressure.
Before the failure test, label 17001 shows two equal-cost ECMP paths — one via xrd-3, one via xrd-5 — and each is pre-programmed as the other's backup:
17001 Pop Gi0/0/0/0 100.101.103.103 ← xrd-3 (IDX:0, BKUP-IDX:1)
Pop Gi0/0/0/1 100.101.105.105 ← xrd-5 (IDX:1, BKUP-IDX:0)
Updated: 19:35:17
Shutting Gi0/0/0/0 (the link to xrd-3) and immediately checking:
17001 Pop Gi0/0/0/1 100.101.105.105 ← xrd-5 only
Updated: 19:52:47.575
! commit was at 19:52:47 — sub-second switchover
! TI-LFA backup was already programmed — zero reconvergence wait
At the same time, the PCE detects the topology change via BGP-LS and starts reoptimizing the SR-TE policy — but it does this while traffic continues flowing on the existing LSP:
Status: (active) (reoptimizing) LSPs: LSP[0]: LSP-ID 2 (active) ← still carrying traffic LSP[1]: LSP-ID 3 (reoptimized) ← PCE computed new path, installing ! Binding SID: 24006 ← unchanged throughout entire event
After restoring the interface, IS-IS reconverged and ECMP was restored in about 5 seconds. The Binding SID (24006) never changed through any of this — any system steering traffic by pushing that label was completely unaffected.
Key Takeaways
- The anycast SID design is elegant. Using 17001 for the {xrd-3, xrd-5} pair means the PCE doesn't need to pin to a specific node — it gets ECMP and resilience for free from the anycast topology.
- Flex-Algo and SR-TE solve different problems. Flex-Algo gives you a constraint-based forwarding plane that every router participates in automatically. SR-TE gives you explicit end-to-end path control with PCE optimization. In production Low Latency networks you'd use both — FA for the default latency-optimized paths, SR-TE for specific traffic classes needing precise path control.
- TI-LFA coverage is 100% in this topology. The anycast design means there's always an alternate path available. Classic LFA would only protect ~50-80% of destinations depending on topology.
- The PCE reoptimization is hitless. LSP-ID 2 kept forwarding while LSP-ID 3 was being installed. The transition between them was invisible to any traffic flowing through the tunnel.
- IOS-XR SRGB is explicit.
global-block 16000 18000must be configured. Arista cEOS auto-assigns base 900000. This is the first thing to check when comparing label values between platforms.
What's Next
The next post will cover the NETCONF automation side — specifically, how I found the correct YANG paths for SR-TE policies, IS-IS SR labels, and BGP neighbor state on IOS-XR 25.3.1. The short version: the model names in the capabilities advertisement don't always map cleanly to what you'd expect, and a few of them return null until you find the right container path.
I'm also planning to deliberately break some config — shut the PCE, change Flex-Algo affinities mid-convergence, modify the SRGB — to see how the network behaves at the edges. That's where the interesting learning happens.