Skip to content

Commit 6f524f2

Browse files
authored
[Sonic-DASH] Dash Tunnel and FNIC changes (#1911)
* Dash tunnel and FNIC changes * Updated to rev 2.4 for: Dash Tunnel behavior PA validation updates Switch attributes
1 parent b0573a6 commit 6f524f2

File tree

1 file changed

+184
-20
lines changed

1 file changed

+184
-20
lines changed

doc/dash/dash-sonic-hld.md

+184-20
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# SONiC-DASH HLD
22
## High Level Design Document
3-
### Rev 2.2
3+
### Rev 2.4
44

55
# Table of Contents
66

@@ -51,7 +51,9 @@
5151
| 2.0 | 04/08/2024 | Prince Sunny | Schema updates for PL, PL-NSG, metering |
5252
| 2.1 | 08/22/2024 | Mukesh M Velayudhan | Add local Region ID field in appliance |
5353
| 2.2 | 08/28/2024 | Lawrence Lee | Route table `routing_type` restrictions, delete op behavior |
54-
| 2.3 | 11/7/2024 | Kumaresh Perumal | Update DASH_PA_VALIDATION_TABLE |
54+
| 2.3 | 11/07/2024 | Kumaresh Perumal | Update DASH_PA_VALIDATION_TABLE |
55+
| 2.4 | 02/05/2025 | Prince Sunny | Update DASH_TUNNEL, FNIC, minor clarifications |
56+
5557

5658
# About this Manual
5759
This document provides more detailed design of DASH APIs, DASH orchestration agent, Config and APP DB Schemas and other SONiC buildimage changes required to bring up SONiC image on an appliance card. General DASH HLD can be found at [dash_hld](https://github.com/sonic-net/DASH/tree/main/documentation/general/dash-high-level-design.md).
@@ -69,6 +71,7 @@ This document provides more detailed design of DASH APIs, DASH orchestration age
6971
| vPORT | VM's NIC. Eni, Vnic, VPort are used interchangeably |
7072
| ST | Service Tunnel |
7173
| PL | Private Link |
74+
| FNIC | Floating NIC |
7275

7376
# 1 Requirements Overview
7477

@@ -89,6 +92,7 @@ At a high level the following should be supported:
8992
- Telemetry and Monitoring
9093
- Private Link
9194
- Private Link NSG
95+
- Express Route GW Bypass
9296

9397
Phase 2
9498
- Service Tunnel
@@ -127,6 +131,11 @@ Following are the minimal scaling requirements
127131
| Total active connections | 32M (Bidirectional) |
128132
| Metering Buckets per ENI | 4000 |
129133
| CPS | 3M |
134+
| Max PA validation entries | 4k |
135+
| Max TUNNEL entries | 4k |
136+
| Max TUNNEL members per group | 128 |
137+
| Max trusted VNIs per ENI | 16 |
138+
| Max trusted VNIs | 1k Per Card |
130139

131140
\* Number of VNET is a software limit as VNET by itself does not take hardware resources. This shall be limited to number of VNI hardware can support
132141

@@ -185,6 +194,7 @@ DASH Sonic implementation is targeted for appliance scenarios and must handles m
185194
13. During a bulk operation, if any part/subset of API fails, implementation shall return *error* for the entire API. Sonic implementation shall validate the entire API as pre-checks before applying and return accordingly.
186195
14. Implementation must have flexible memory allocation for ENI and not reserve max scale during initial create (e.g 100k routes). This is to allow oversubscription.
187196
15. Implementation must not have silent failures for APIs. E.g accepting an API from controller, returning success and failing in the backend. This is orthogonal to the idempotency of APIs described above for ADD and Delete operations. Intent is to ensure SDN controller and Sonic implementation is in-sync
197+
16. An ENI can be modeled as FNIC or regular VM at create time only.
188198

189199
## 1.7 ACL requirements
190200

@@ -307,13 +317,13 @@ Reference Yang model for DASH Vnet is [here](https://github.com/sonic-net/sonic-
307317

308318
## 3.1 Config DB
309319

310-
### 3.1.1 DEVICE Metadata Table
320+
### 3.1.1 DEVICE Metadata Table for SmartSwitch DPU.
311321

312322
```
313323
"DEVICE_METADATA": {
314324
"localhost": {
315-
"subtype": "Appliance",
316-
"type": "SonicHost",
325+
"type": "SmartSwitchDPU",
326+
"subtype": "SmartSwitch",
317327
"switch_type": "dpu",
318328
"sub_role": "None"
319329
}
@@ -368,6 +378,8 @@ DASH_ENI_TABLE:{{eni}}
368378
"v4_meter_policy_id": {{string}} (OPTIONAL)
369379
"v6_meter_policy_id": {{string}} (OPTIONAL)
370380
"disable_fast_path_icmp_flow_redirection": {{bool}} (OPTIONAL)
381+
"mode": {{floating_nic_mode/vm_mode}} (OPTIONAL)
382+
"trusted_vni": {{vni list}} (OPTIONAL)
371383
```
372384
```
373385
key = DASH_ENI_TABLE:eni ; ENI MAC as key
@@ -379,9 +391,11 @@ admin_state = Enabled after all configurations are applied.
379391
vnet = Vnet that ENI belongs to
380392
pl_sip_encoding = Privatelink encoding for IPv6 SIP transformation; Format `field_value/full_mask` where both `field_value` and `full_mask` must be given as IPv6 addresses. See "3.6.3.2 PL IPv6 Address Transformation" for details.
381393
pl_underlay_sip = Underlay SIP (ST GW VIP) to be used for all private link transformation for this ENI
382-
v4_meter_policy_id = IPv4 meter policy ID
383-
v6_meter_policy_id = IPv6 meter policy ID
384-
disable_fast_path_icmp_flow_redirection = Disable handling fast path ICMP flow redirection packets
394+
v4_meter_policy_id = IPv4 meter policy ID
395+
v6_meter_policy_id = IPv6 meter policy ID
396+
disable_fast_path_icmp_flow_redirection = Disable handling fast path ICMP flow redirection packets
397+
mode = floating nic mode or vm mode. Default is 'vm_mode'
398+
trusted_vni = list of trusted VNIs for this ENI, single value or "-" for range both inclusive. MSEE VNIs can added here temporarily.
385399
```
386400

387401
### 3.2.4 TAG
@@ -474,7 +488,7 @@ encap_type = encap type depends on the action_type - {vxlan, nvgre
474488
vni = vni value to be used as the key for encapsulation. Applicable if encap_type is specified.
475489
```
476490

477-
### 3.2.7 ROUTING APPLIANCE
491+
### 3.2.7 ROUTING APPLIANCE (DEPRECATED, Use DASH_TUNNEL)
478492

479493
```
480494
DASH_ROUTING_APPLIANCE_TABLE:{{appliance_id}}:
@@ -499,6 +513,8 @@ DASH_APPLIANCE_TABLE:{{appliance_id}}
499513
"sip": {{ip_address}}
500514
"vm_vni": {{vni}}
501515
"local_region_id": {{region_id}}
516+
"outbound_direction_lookup": {{dst_mac/src_mac}} (OPTIONAL)
517+
"trusted_vnis": {{vni list}} (OPTIONAL)
502518
```
503519

504520
```
@@ -507,6 +523,8 @@ key = DASH_APPLIANCE_TABLE:id ; attributes specific for the
507523
sip = source ip address, to be used in encap
508524
vm_vni = VM VNI that is used for setting direction. Also used for inbound encap to VM
509525
local_region_id = Region where this appliance is located
526+
outbound_direction_lookup= dst_mac or src_mac; Default is src_mac. This attribute overrides to dst_mac
527+
trusted_vnis = list of global trusted VNIs, single value or "-" for range both inclusive.
510528
```
511529

512530
### 3.2.9 ROUTE LPM TABLE - OUTBOUND
@@ -542,6 +560,7 @@ DASH_ROUTE_TABLE:{{group_id}}:{{prefix}}
542560
"metering_policy_en": {{bool}} (OPTIONAL) (OBSOLETED)
543561
"metering_class_or": {{uint32}} (OPTIONAL)
544562
"metering_class_and": {{uint32}} (OPTIONAL)
563+
"tunnel": {{string}} (OPTIONAL)
545564
```
546565

547566
```
@@ -550,7 +569,7 @@ key = DASH_ROUTE_TABLE:group_id:prefix ; Route route table
550569
action_type = routing_type ; reference to routing type (DEPRECATED)
551570
routing_type = routing_type ; replacement for the deprecated `action_type` field. Must be one of {vnet, vnet_direct, direct, servicetunnel, drop}.
552571
vnet = vnet name ; destination vnet name if routing_type is {vnet, vnet_direct}, a vnet other than eni's vnet means vnet peering
553-
appliance = appliance id ; appliance id if routing_type is {appliance}
572+
appliance = appliance id ; appliance id if routing_type is {appliance} (DEPRECATED, Use tunnel attribute)
554573
overlay_ip = ip_address ; overly_ip to lookup if routing_type is {vnet_direct}, use dst ip from packet if not specified
555574
overlay_sip_prefix = ip_prefix ; overlay ipv6 src ip if routing_type is {servicetunnel}, transform last 32 bits from packet (src ip)
556575
overlay_dip_prefix = ip_prefix ; overlay ipv6 dst ip if routing_type is {servicetunnel}, transform last 32 bits from packet (dst ip)
@@ -559,12 +578,13 @@ underlay_dip = ip_address ; underlay ipv4 dst ip to o
559578
metering_policy_en = bool ; Metering policy lookup enable (optional), default = false (OBSOLETED). If aggregated or/and bits is 0, metering policy is applied
560579
metering_class_or = uint32 ; Metering class-id 'or' bits
561580
metering_class_and = uint32 ; Metering class-id 'and' bits
581+
tunnel = string ; Nexthop tunnel for ECMP or single nexthop, routing_type is {direct}
562582
```
563583

564584
### 3.2.10 ROUTE RULE TABLE - INBOUND
565585

566586
```
567-
DASH_ROUTE_RULE_TABLE:{{eni}}:{{vni}}:{{prefix}}
587+
DASH_ROUTE_RULE_TABLE:{{eni}}:{{vni}}:{{prefix/tag}}
568588
"action_type": {{routing_type}}
569589
"priority": {{priority}}
570590
"protocol": {{protocol_value}} (OPTIONAL)
@@ -576,7 +596,7 @@ DASH_ROUTE_RULE_TABLE:{{eni}}:{{vni}}:{{prefix}}
576596
```
577597

578598
```
579-
key = DASH_ROUTE_RULE_TABLE:eni:vni:prefix ; ENI Inbound route table with VNI and optional SRC PA prefix
599+
key = DASH_ROUTE_RULE_TABLE:eni:vni:prefix ; ENI Inbound route table with VNI and optional SRC PA prefix or prefix tag defined by DASH_PREFIX_TAG_TABLE
580600
; field = value
581601
action_type = routing_type ; reference to routing type, action can be decap or drop
582602
priority = INT32 value ; priority of the rule, lower the value, higher the priority
@@ -672,14 +692,10 @@ DASH_PA_VALIDATION_TABLE:{{vni}}
672692
```
673693
key = DASH_PA_VALIDATION_TABLE:vni; ENI and VNI as key;
674694
; field = value
675-
addresses = list of addresses used for validating underlay source ip of incoming packets.
695+
prefixes = list of prefixes used for validating underlay source ip of incoming packets.
676696
```
677697

678-
DASH_PA_VALIDATION_TABLE is used only for PL outbound direction. PA address can be either IPV4 or IPV6.
679-
680-
Total PAs per MSEE would be 64 and if there are 64 MSEEs per region(based on 400G DPU), there would be 4K PA_VALIDATION entries.
681-
682-
For more scale numbers, please refer to the [doc](https://github.com/sonic-net/DASH/blob/main/documentation/express-route-service/express-route-gateway-bypass.md)
698+
DASH_PA_VALIDATION_TABLE is used only for additional PA validation. PA prefix can be either IPV4 or IPV6. Used for fastpath or other explicit PA validation cases
683699

684700
### 3.2.14 DASH tunnel table
685701

@@ -695,11 +711,18 @@ DASH_TUNNEL_TABLE:{{tunnel_name}}
695711
key = DASH_TUNNEL_TABLE:tunnel_name; tunnel name used for referencing in mapping table
696712
; field = value
697713
endpoints = list of addresses for ecmp tunnel
698-
encap_type = vxlan or nvgre
699-
vni = vni value for encap
714+
encap_type = vxlan or nvgre, create only attribute
715+
vni = vni value for encap, create only attribute
700716
metering_class_or = uint32
701717
```
702718

719+
DASH_TUNNEL_TABLE shall have one or more endpoints. Encap type, VNI are create only attributes. A change on encap would require deleting and creating new tunnel objects.
720+
One endpoint is treated as single nexthop and comma separated multiple endpoints shall be treated as ECMP nexthop. For return packet from the tunnel, expectation is to have the same encap type.
721+
722+
For single endpoint, implmentation shall simply create a sai_dash_tunnel object with ```SAI_DASH_TUNNEL_ATTR_DIP=endpoint IP``` and ```SAI_DASH_TUNNEL_ATTR_MAX_MEMBER_SIZE=1```
723+
724+
For ECMP, implementation shall create ```sai_dash_tunnel_member``` and ```sai_dash_tunnel_next_hop``` with appropriate ```SAI_DASH_TUNNEL_ATTR_MAX_MEMBER_SIZE```. Since MAX_MEMBER_SIZE is set during creation, it is expected that adding new member will be a new DASH_TUNNEL object creation. However, implementation shall support removing members.
725+
703726
### 3.2.15 DASH orchagent (Overlay)
704727

705728
| APP_DB Table | Key | Field | SAI Attributes/*objects* | Comment |
@@ -988,6 +1011,8 @@ SONiC for DASH shall have a lite swss initialization without the heavy-lift of e
9881011
| | SAI_SWITCH_ATTR_TYPE |
9891012
| | SAI_SWITCH_ATTR_VXLAN_DEFAULT_PORT |
9901013
| | SAI_SWITCH_ATTR_VXLAN_DEFAULT_ROUTER_MAC |
1014+
| | SAI_SWITCH_TUNNEL_ATTR_VXLAN_UDP_SPORT |
1015+
| | SAI_SWITCH_TUNNEL_ATTR_VXLAN_UDP_SPORT_MASK |
9911016

9921017
### 3.3.5 Underlay Routing
9931018
DASH Appliance shall establish BGP session with the connected Peer and advertise the prefixes (VIP PA). In turn, the Peer (e.g, Network device or SmartSwitches) shall advertise default route to appliance. With two Peers connected, the appliance shall have route with gateway towards both Peers and does ECMP routing. Orchagent install the route and resolves the neighbor (GW) mac and programs the underlay route/nexthop and neighbor.
@@ -1608,3 +1633,142 @@ The same principle applies to `overlay_dip_prefix` and the final overlay destina
16081633
final_overlay_dip = (orig_packet_dip & ~overlay_dip_prefix.mask)
16091634
| overlay_dip_prefix.addr
16101635
```
1636+
1637+
### 3.6.4 ER GW Bypass - Private Link
1638+
1639+
```
1640+
[
1641+
{
1642+
DASH_APPLIANCE_TABLE:dpu_guid_22: {
1643+
"sip":"10.250.20.19",
1644+
"vm_vni": "20",
1645+
"local_region_id": "2",
1646+
"outbound_direction_lookup": "dst_mac",
1647+
"trusted_vni": "100"
1648+
},
1649+
"OP": "SET"
1650+
},
1651+
{
1652+
"DASH_ROUTING_TYPE_TABLE:privatelink": [
1653+
{
1654+
"name": "action1",
1655+
"action_type": "4to6",
1656+
},
1657+
{
1658+
"name": "action2",
1659+
"action_type": "staticencap",
1660+
"encap_type": "gre",
1661+
"vni":"100"
1662+
} ],
1663+
"OP": "SET",
1664+
},
1665+
{
1666+
"DASH_ENI_TABLE:F4939FEFC47E": {
1667+
"eni_id": "497f23d7-f0ac-4c99-a98f-59b470e8c7bd",
1668+
"mac_address": "F4-93-9F-EF-C4-7E",
1669+
"underlay_ip": "25.1.1.1",
1670+
"admin_state": "enabled",
1671+
"vnet": "Vnet1",
1672+
"pl_sip_encoding": "::cb3a:16e5:ff71:0:0/::ffff:ffff:ffff:0:0"
1673+
"mode": "floating_nic_mode",
1674+
"trusted_vni": "1000"
1675+
},
1676+
"OP": "SET"
1677+
},
1678+
{
1679+
"DASH_ENI_ROUTE_TABLE:F4939FEFC47E": {
1680+
"group_id":"group_id_4"
1681+
},
1682+
"OP": "SET"
1683+
},
1684+
{
1685+
"DASH_ROUTE_GROUP_TABLE:group_id_4": {
1686+
"guid":"group_id_4-test",
1687+
"version":"1"
1688+
},
1689+
"OP": "SET"
1690+
},
1691+
{
1692+
"DASH_ROUTE_TABLE:group_id_4:10.0.2.4/32": {
1693+
"routing_type":"vnet",
1694+
"vnet":"Vnet1",
1695+
"metering_class_or":"0x60"
1696+
"metering_class_and":"0x77"
1697+
},
1698+
"OP": "SET"
1699+
},
1700+
{
1701+
"DASH_VNET_MAPPING_TABLE:Vnet1:10.0.2.4": {
1702+
"routing_type":"privatelink",
1703+
"mac_address":"F9-22-83-99-22-A2",
1704+
"underlay_ip":"50.1.2.3",
1705+
"overlay_sip_prefix":"fd41:108:20:abc:abc::0/ffff:ffff:ffff:ffff:ffff:ffff::",
1706+
"overlay_dip_prefix":"2603:10e1:100:2::3401:203/ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff",
1707+
"metering_class_or":"0x06",
1708+
},
1709+
"OP": "SET"
1710+
},
1711+
{
1712+
"DASH_ROUTE_TABLE:group_id_4:10.0.0.4/32": {
1713+
"routing_type":"direct",
1714+
"tunnel":""exgw_tunnel_1"
1715+
},
1716+
"OP": "SET"
1717+
},
1718+
{
1719+
"DASH_TUNNEL_TABLE:"exgw_tunnel_1": {
1720+
"endpoints":"100.8.1.2,10.79.14.7",
1721+
"encap_type":"vxlan",
1722+
"vni":1000
1723+
}
1724+
"OP": "SET"
1725+
},
1726+
{
1727+
"DASH_ROUTE_RULE_TABLE:F4939FEFC47E:1000:10.79.14.7/32": {
1728+
"action_type":"decap",
1729+
"priority":"1",
1730+
"region":"5"
1731+
},
1732+
"OP": "SET"
1733+
},
1734+
{
1735+
"DASH_ROUTE_RULE_TABLE:F4939FEFC47E:1000:us_region_tag": {
1736+
"action_type":"decap",
1737+
"priority":"2"
1738+
},
1739+
"OP": "SET"
1740+
},
1741+
{
1742+
"DASH_PREFIX_TAG_TABLE:us_region_tag": {
1743+
"ip_version":"ipv4",
1744+
"prefix_list":"10.20.1.59/32,10.0.1.0/24"
1745+
},
1746+
"OP": "SET"
1747+
}
1748+
]
1749+
```
1750+
1751+
For the example configuration above, the following is a brief explanation of lookup behavior in the floating nic inbound/outbound direction:
1752+
1753+
*Intentionally omitting the details of flow creation, flow match etc. The below steps are for reference and not capturing all details.
1754+
1755+
1. Packet destined to DST_CA:10.0.2.4 from (SRC_CA:10.0.0.4, SRC_PA:10.79.14.7, VNI:1000):
1756+
1. Floating nic mode enabled for ENI
1757+
2. Lookup inbound route rule and hits for entry 10.79.14.7
1758+
3. The action in this case is 'decap'
1759+
4. After decap, the outbound pipeline is taken (VNI 1000 is marked as trusted VNI)
1760+
5. LPM lookup hits for entry 10.0.2.4/32
1761+
6. The action in this case is "vnet"
1762+
7. Next lookup is in the mapping table and mapping table action here is "privatelink"
1763+
8. First Action for "privatelink" is 4to6 transposition
1764+
9. As per **3.6.3.2**, the final overlay SIP is `fd41:108:20:cb3a:16e5:ff71:a00:204`:
1765+
10. Similarly, the final overlay DIP is `2603:10e1:100:2::3401:203`:
1766+
11. Second Action is Static NVGRE encap with GRE key '100'.
1767+
12. Underlay DIP shall be 50.1.2.3 (from mapping), Since 'pl_underlay_sip' is not provided in ENI, Underlay SIP shall be 10.250.20.19 (from APPLIANCE)
1768+
1769+
2. Return Packet destined to DST_CA:10.0.0.4 from SRC_CA:10.0.2.4:
1770+
1. This packet shall be transformed IPv6 packet from PL endpoint
1771+
2. Outer SRC_PA:50.1.2.3, Outer DST_PA:10.250.20.19
1772+
3. Reverse transpositions applied (v6->v4)
1773+
4. Transformed packet ECMP tunneled to one of ER GW endpoint IP as configured in DASH_TUNNEL_TABLE
1774+
5. Underlay SRC_PA:10.250.20.19, Underlay DST_PA:100.8.1.2, Outer VNI:1000

0 commit comments

Comments
 (0)