Neutron构建网络时涉及的知识点比较广, 虚拟化网络实施上又具有非常大的灵活性, 这往往会让接触的同学摸不着头脑. 本文特意对"无浮动IP的虚机出公网流量路径"这一场景进行分享, 同时对涉及到的组件和知识点进行简要介绍, 希望能给对虚拟化网络感兴趣的同学一些帮助.
公司内虚拟化网络实施有很多种方式, 为什么单挑这个场景进行分享呢, 这主要是因为这个场景的链路相对比较长, 涉及到的知识点比较全, 没有硬件厂商的绑定具有通用性.
本次分享分3部分供读者挑选: 1. 拓扑和流量路径.
2. 网络知识点回顾.
3. 各节点抓包记录.
第一部分: 拓扑和流量路径
拓扑:
本场景流量路径:
流量路径是图中U字顺序, 具体实现和社区版有些区别.
第二部分: 网络知识点回顾
对VLAN/VXLAN, DVR, OSPF, ARP, OVS, namespace, bridge这些关键字比较熟悉的同学, 阅读起来可能会比较轻松, 如果不熟悉也不必担心, 我们会在这部分回顾一下这些基础的知识点, 然后在第三部分分步抓包加深理解.
OSI七层模型
在本次我们重点关注L2数据链路层和L3网络层这2层.
VLAN报文格式
基于802.1Q的VLAN帧格式
- VLAN ID取值范围为1~4094
- LAN 一个LAN表示一个广播域, LAN中的所有成员都会收到LAN中1个成员发出的广播包
VLAN 表示 Virutal LAN。一个带有 VLAN 功能的交换机能够同时处于多个 LAN 中. - Access类型的端口只能属于1个VLAN
- Trunk类型的端口可以属于多个VLAN,可以接收和发送多个VLAN的报文
VXLAN报文格式
VXLAN是将以太网报文封装在UDP传输层上的一种隧道转发模式(ovs默认使用4798)
- VXLAN 在 VTEP间建立隧道,通过 Layer 3 网络(外部网络)传输封装后的 Layer 2 数据
最外层的 IP/UDP 协议报文用来在底层网络上传输. - 中间是VXLAN 头部,vtep 接受到报文之后,去除前面的 IP/UDP 协议部分,根据这部分来处理vxlan 的逻辑,主要是根据VNI 发送到最终的虚拟机.
- 最里面是原始的报文,也就是虚拟机看到的报文内容.
- 封装会增加50Bytes的overhead.
ARP(Address Resolution Protocol )
ARP协议是用来将IP地址解析为MAC地址的协议.
- 静态ARP
- 免费arp
- IP地址冲突检测
- 用于通告一个新的MAC地址
- 动态arp
- 动态ARP通过广播ARP请求和单播ARP应答这两个过程完成地址解析
- 网桥/网卡自动学习, 维护生命周期
- proxy arp
- 常用命令:
ip neigh/ arp -n ovs-appctl fdb/show br-int brctl showmacs <bridge_name>
策略路由(PBR)
可以依据用户自定义的策略进行报文转发.
Tips:
- 默认表0, 32766(默认main), 32767 3个优先级已被占用
- 数值越小优先级别越高
网络namespace
用来实现隔离的一套机制,不同 namespace 中的资源之间彼此不可见
namespace中拥有独立的网络栈(网卡、路由转发表、iptables)
一个设备(Linux Device)只能位于1个namespace中.
不同namespace中的设备可以利用veth pair进行桥接.
常见namespace:
fip-xxx
qrouter-xxx
snat-xxx
qdhcp-xxx
命令:
ip netns # 查看
ip netns exec ns1 ip addr # 执行命令
Neutron基本概念
网络:
隔离的 L2 域,可以是虚拟、逻辑或交换。
子网:
隔离的 L3 域,IP 地址段。其中每个机器有一个 IP,同一个子网的主机彼此 L3 可见。
端口:
网络上虚拟、逻辑或交换端口。 所有这些实体都是虚拟的,拥有自动生成的唯一标示id,支持CRUD功能,并在数据库中跟踪记录状态.
Linux Bridge
OVS(Openvswitch)
Openvswitch是一个虚拟交换软件.
一个虚拟交换机主要2个作用:
- 传递虚拟机之间的流量
- 实现虚拟机与外界网络的通讯
OVS网桥:
- br-int:bridge-integration,综合网桥,常用于表示实现主要内部网络功能的网桥.
- br-ex:bridge-external,外部网桥,通常表示负责跟外部网络通信的网桥.
- br-tun: bridge-tunnel, 隧道网桥.
OVS流表:
匹配:
- 数值越大优先级越高
- 根据端口号匹配
- 根据来源MAC/目的MAC匹配
- 根据协议来下匹配
动作:
- NORMAL(普通二层交互)
- resubmit
- output到某个端口
- drop
- learn
- 修改mac, 打/剥离vlan/tunnel
Iptables
表(tables)提供特定的功能, 内置的表: nat, filter, mangle.
链(chains)是数据包传播的路径.
- 自定义chain没有自定义策略
- 动作确实是执行完以后,就不再继续匹配其他同链的规则动作了
策略由不同的规则(rule)串联而成, 规则的本质是对进入的IP报文进行说明.
- 匹配(符合什么条件):
- interface/目的地址/源地址/协议/状态/...
- 动作(做什么处理):
- accept
- drop
- reject
- return: 继续父链的调用处的下一条
- DNAT: --to-destination
- SNAT
DVR 分布式虚拟路由
目的: 降低网络节点的负载
核心解决方式:
- 通过流表规则解决多个路由器mac冲突的问题
- 让本地的请求找到本地的路由器(ARP)
- 要避免路由器的接口 mac 地址直接暴露到外部网络上, 通过流表拦截此MAC
- 从 neutron server 上申请唯一 MAC 地址, 通过流表替换进出流量的MAC地址
说明:
- 分布到多个计算节点上的 qrouter 的interface 的 MAC 地址都相同
- OVS flows 需要更新来支持 DVR
- 虚拟机启动时port转为active时, 会rpc通知neutron agent更新规则
- 南北SNAT流量依然需要经过网络节点
- 有floating IP的南北流量直接从compute节点出
- 公司内的DVR和社区版有些区别
- Neutron L2 Agent承担使用iptables维护链和规则的任务
- L3 agent iptables subnet之间的路由服务
neutron DVR部署设置:
- 网络节点和计算节点都部署L2和L3 agent
- compute dvr 为 dvr, network节点为 dvr_snat
- l2_population = True
- router_distributed = True
OSPF
Open Shortest Path First 开放式最短路径优先, OSPF通过路由器之间通告网络接口的状态来建立链路状态数据库,生成最短路径树,每个OSPF路由器使用这些最短路径构造路由表.
作用:
- 通告虚拟机的浮动IP
- 通告VPC公网互联的SNAT IP
neutron-l3-agent启动fip-xxx的namespace和接入交换机运行ospf协议.
常用命令
tcpdump # 抓包
ovs-appctl fdb/show br-int # 查看ovs mac表
ip rule list # 查看策略列表
ip route list table table_name # 查看某个策略路由
ovs-ofctl dump-ports-desc br-int # 查看网桥端口
ovs-ofctl dump-flows br-int # 查看流表
brctl showmacs <bridge> # 查看linux网桥学习到的mac
ovs-appctl fdb/show br-int # 查看OVS网桥学习到的mac
第三部分: 各节点抓包记录
通过第二部分介绍的一大堆的基础概念, 我们再通过按照流量路径顺序抓包来加深印象, 同时可以参考第一部分的图示进行讲解.
抓包的顺序为:
- 虚拟机内部抓包
[root@Server-be9f76b6 ~]# tcpdump -i eth0 icmp -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
17:23:29.858571 fa:16:3e:b2:ef:af > fa:16:3e:06:d8:7b, ethertype IPv4 (0x0800), length 142: 192.168.1.7 > 8.8.8.8: ICMP echo request, id 60691, seq 53781, length 108
17:23:29.892335 fa:16:3e:d1:d8:49 > fa:16:3e:b2:ef:af, ethertype IPv4 (0x0800), length 110: 8.8.8.8 > 192.168.1.7: ICMP echo reply, id 60691, seq 53781, length 76
[root@Server-be9f76b6 ~]# ip r l
169.254.169.254 via 192.168.1.1 dev eth0 proto static
192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.7
default via 192.168.1.1 dev eth0 proto static
[root@Server-be9f76b6 ~]# ip r g 8.8.8.8
8.8.8.8 via 192.168.1.1 dev eth0 src 192.168.1.7
cache mtu 1450 hoplimit 64
[root@Server-be9f76b6 ~]# traceroute -n 8.8.8.8
1 192.168.1.1 0.178 ms 0.209 ms 0.135 ms
2 192.168.1.6 0.355 ms 0.352 ms 0.340 ms
3 169.254.96.33 0.558 ms 0.546 ms 0.531 ms
4 10.206.221.193 1.459 ms 2.400 ms 1.967 ms
5 10.206.223.56 1.458 ms 10.206.223.60 1.407 ms 10.206.223.62 1.880 ms
6 10.206.223.62 2.370 ms 10.206.223.60 2.279 ms 1.749 ms
(略)
[root@Server-be9f76b6 ~]# ip neigh
192.168.1.6 dev eth0 lladdr fa:16:3e:d1:d8:49 STALE
192.168.1.2 dev eth0 lladdr fa:16:3e:93:a3:f2 STALE
192.168.1.1 dev eth0 lladdr fa:16:3e:06:d8:7b REACHABLE
说明:
查路由8.8.8.8默认走网关
发: 目的mac对应192.168.1.1
回包: 源mac对应192.168.1.6
数据格式: 无vlan, 无vxlan 1. 有序列表项0
- 计算节点抓tap口流量
[root@w07 ~]# tcpdump -i tap2ab77d0f-99 -nnee
listening on tap2ab77d0f-99, link-type EN10MB (Ethernet), capture size 262144 bytes
17:16:11.866127 fa:16:3e:b2:ef:af > fa:16:3e:06:d8:7b, ethertype IPv4 (0x0800), length 142: 192.168.1.7 > 8.8.8.8: ICMP echo request, id 60691, seq 53368, length 108
17:16:11.899884 fa:16:3e:d1:d8:49 > fa:16:3e:b2:ef:af, ethertype IPv4 (0x0800), length 110: 8.8.8.8 > 192.168.1.7: ICMP echo reply, id 60691, seq 53368, length 76
[root@w07 ~]# brctl show
bridge name bridge id STP enabled interfaces
qbr134a0c76-2f 8000.3286bc9479d4 no qvb134a0c76-2f
tap134a0c76-2f
qbr2ab77d0f-99 8000.966e9ba8b1fc no qvb2ab77d0f-99
tap2ab77d0f-99
qbr5a70c226-ca 8000.a602f52928ba no qvb5a70c226-ca
tap5a70c226-ca
qbr8e2a2b2f-ea 8000.ea797eb5b738 no qvb8e2a2b2f-ea
tap8e2a2b2f-ea
[root@w07 ~]# brctl showmacs qbr2ab77d0f-99
port no mac addr is local? ageing timer
1 46:4f:72:0f:95:bf no 23.98
1 96:6e:9b:a8:b1:fc yes 0.00
1 96:6e:9b:a8:b1:fc yes 0.00
2 fa:16:3e:b2:ef:af no 0.06
1 fa:16:3e:d1:d8:49 no 0.06
2 fe:16:3e:b2:ef:af yes 0.00
2 fe:16:3e:b2:ef:af yes 0.00
说明:
长度依然是108
数据格式: 无vlan, 无vxlan
网桥
qvb2ab77d0f-99 (1)
tap2ab77d0f-99 (2)
网桥上学习到的mac
-
qvo-xxx/qvb-xxx的包 (类似抓tap设备, 略)
-
br-int
[root@w07 ~]# ovs-vsctl show
...
Bridge br-int
fail_mode: secure
...
Port "qvo2ab77d0f-99"
tag: 10
Interface "qvo2ab77d0f-99"
Port "qr-178fc8e2-fd"
tag: 10
Interface "qr-178fc8e2-fd"
type: internal
...
[root@w07 ~]# ovs-appctl fdb/show br-int
port VLAN MAC Age
45 10 fa:16:3e:95:02:b6 4
40 10 fa:16:3e:06:d8:7b 0
1 10 fa:16:3e:d1:d8:49 0
23 1 fa:16:3e:4d:50:f1 0
47 1 fa:16:3e:78:e7:8a 0
41 10 fa:16:3e:b2:ef:af 0
24 1 fa:16:3e:0c:69:32 0
说明:
br-int dump-ports-desc
1(patch-tun)
40(qr-178fc8e2-fd)
流表最终匹配到NORMAL
tag 10表示是该接口VLAN ID=10, 并不是说报文的VLAN ID = 10
报文格式依然是普通包, 无VLAN
- 在qrouter-xxx的namespace抓包
[root@w07 ~]# ip netns exec qrouter-c4d4b760-41b9-45e1-a607-d054da99c479 tcpdump -i qr-178fc8e2-fd -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on qr-178fc8e2-fd, link-type EN10MB (Ethernet), capture size 262144 bytes
17:28:46.130180 fa:16:3e:b2:ef:af > fa:16:3e:06:d8:7b, ethertype IPv4 (0x0800), length 142: 192.168.1.7 > 8.8.8.8: ICMP echo request, id 60691, seq 54121, length 108
17:28:46.130209 fa:16:3e:06:d8:7b > fa:16:3e:d1:d8:49, ethertype IPv4 (0x0800), length 142: 192.168.1.7 > 8.8.8.8: ICMP echo request, id 60691, seq 54121, length 108
[root@w07 ~]# ip netns exec qrouter-c4d4b760-41b9-45e1-a607-d054da99c479 ip rule list
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
3232235777: from 192.168.1.1/24 lookup 3232235777
[root@w07 ~]# ip netns exec qrouter-c4d4b760-41b9-45e1-a607-d054da99c479 ip r s t 3232235777
default via 192.168.1.6 dev qr-178fc8e2-fd
[root@w07 ~]# ip netns exec qrouter-c4d4b760-41b9-45e1-a607-d054da99c479 ip neigh
192.168.1.7 dev qr-178fc8e2-fd lladdr fa:16:3e:b2:ef:af PERMANENT
192.168.1.13 dev qr-178fc8e2-fd lladdr fa:16:3e:b6:fb:42 PERMANENT
192.168.1.6 dev qr-178fc8e2-fd lladdr fa:16:3e:d1:d8:49 PERMANENT
192.168.1.10 dev qr-178fc8e2-fd lladdr fa:16:3e:95:02:b6 PERMANENT
192.168.1.3 dev qr-178fc8e2-fd lladdr fa:16:3e:a9:38:b7 PERMANENT
192.168.1.2 dev qr-178fc8e2-fd lladdr fa:16:3e:93:a3:f2 PERMANENT
说明:
长度依然是108
数据格式: 无vlan, 无vxlan
静态mac(l2 population)
路径:
进入qr-xxx
策略路由, 下一跳到1.6, 查mac
从qr-xxx出去, 到 1(patch-tun)
- 计算节点br-tun网桥和流表
[root@w07 ~]# ovs-vsctl show
76001df8-48a5-4185-8de4-a035fc4b2d72
Bridge br-tun
fail_mode: secure
Port "vxlan-0ace6b9c"
Interface "vxlan-0ace6b9c"
type: vxlan
options: {df_default="true", in_key=flow, local_ip="10.206.107.238", out_key=flow, remote_ip="10.206.107.156"}
...
[root@w07 ~]# netstat -nlup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
udp 0 0 0.0.0.0:111 0.0.0.0:* 12153/rpcbind
udp 0 0 127.0.0.1:323 0.0.0.0:* 1047/chronyd
udp 0 0 0.0.0.0:874 0.0.0.0:* 12153/rpcbind
udp 0 0 0.0.0.0:4789 0.0.0.0:* -
[root@w07 ~]# ovs-appctl fdb/show br-tun
port VLAN MAC Age
其他常用命令:
ovs-appctl dpif/dump-flows br-tun
ovs-appctl dpif/show
说明:
br-tun dump-ports-desc
1(patch-int)
5(vxlan-0ace6b9c)
流表: 1(mod_dl_src)>2>20(output 5)
- 计算节点eth2抓包
[root@w07 ~]# tcpdump -i eth2 udp -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 262144 bytes
17:35:05.742902 b4:96:91:5a:31:78 > 5c:c9:99:60:e0:3c, ethertype IPv4 (0x0800), length 192: 10.206.107.238.58650 > 10.206.107.156.4789: VXLAN, flags [I] (0x08), vni 103
fa:16:3f:13:77:c5 > fa:16:3e:d1:d8:49, ethertype IPv4 (0x0800), length 142: 192.168.1.7 > 8.8.8.8: ICMP echo request, id 60691, seq 54500, length 108
17:35:05.776442 5c:c9:99:60:e0:3c > b4:96:91:5a:31:78, ethertype IPv4 (0x0800), length 160: 10.206.107.156.56414 > 10.206.107.238.4789: VXLAN, flags [I] (0x08), vni 103
fa:16:3e:d1:d8:49 > fa:16:3e:b2:ef:af, ethertype IPv4 (0x0800), length 110: 8.8.8.8 > 192.168.1.7: ICMP echo reply, id 60691, seq 54500, length 76
[root@w07 ~]# ip neigh | grep "5c:c9:99:60:e0:3c"
10.206.107.193 dev eth2 lladdr 5c:c9:99:60:e0:3c REACHABLE
说明:
报文格式为VXLAN格式
eth:ethertype:ip:udp:vxlan:eth:ethertype:ip:icmp:data
外部源本地eth2, 外部mac 网络节点eth2的UDP 4789
增加了50的overhead
内部包含原始包
- 网络节点eth2抓包
[root@w02 ~]# tcpdump -i eth2 udp and host 10.206.107.238 -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth2, link-type EN10MB (Ethernet), capture size 262144 bytes
17:37:16.969152 5c:c9:99:db:69:c9 > b4:96:91:5a:32:54, ethertype IPv4 (0x0800), length 192: 10.206.107.238.58650 > 10.206.107.156.4789: VXLAN, flags [I] (0x08), vni 103
fa:16:3f:13:77:c5 > fa:16:3e:d1:d8:49, ethertype IPv4 (0x0800), length 142: 192.168.1.7 > 8.8.8.8: ICMP echo request, id 60691, seq 54631, length 108
17:37:17.002678 b4:96:91:5a:32:54 > 5c:c9:99:db:69:c9, ethertype IPv4 (0x0800), length 160: 10.206.107.156.56414 > 10.206.107.238.4789: VXLAN, flags [I] (0x08), vni 103
fa:16:3e:d1:d8:49 > fa:16:3e:b2:ef:af, ethertype IPv4 (0x0800), length 110: 8.8.8.8 > 192.168.1.7: ICMP echo reply, id 60691, seq 54631, length 76
17:37:17.970994 5c:c9:99:db:69:c9 > b4:96:91:5a:32:54, ethertype IPv4 (0x0800), length 192: 10.206.107.238.58650 > 10.206.107.156.4789: VXLAN, flags [I] (0x08), vni 103
[root@w02 ~]# ip neigh | grep -E "5c:c9:99:db:69:c9"
10.206.107.129 dev eth2 lladdr 5c:c9:99:db:69:c9 REACHABLE
[root@w02 ~]# netstat -nlup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
udp 0 0 0.0.0.0:45492 0.0.0.0:* 1630/haproxy
udp 0 0 0.0.0.0:111 0.0.0.0:* 787644/rpcbind
udp 0 0 127.0.0.1:323 0.0.0.0:* 794/chronyd
udp 0 0 0.0.0.0:871 0.0.0.0:* 787644/rpcbind
udp 0 0 0.0.0.0:4789 0.0.0.0:* -
udp 0 0 0.0.0.0:8472 0.0.0.0:* -
说明:
报文格式为VXLAN格式
eth:ethertype:ip:udp:vxlan:eth:ethertype:ip:icmp:data
内部报文未变化(源和目的MAC和IP)
本地内核态UDP 4789开启
- 网络节点br-tun网桥和流表
[root@w02 ~]# ovs-vsctl show
c67632e5-75ed-4f73-b4ab-cf32f95a8770
...
Bridge br-tun
fail_mode: secure
Port br-tun
Interface br-tun
type: internal
Port "vxlan-0ace6bee"
Interface "vxlan-0ace6bee"
type: vxlan
options: {df_default="true", in_key=flow, local_ip="10.206.107.156", out_key=flow, remote_ip="10.206.107.238"}
...
说明:
br-tun port
4(vxlan-0ace6bee)
1(patch-int)
匹配br-tun的流表:
0->4->9(dl_src)->patch-int
- 网络节点br-int网桥和流表
[root@w02 ~]# ovs-appctl fdb/show br-int
port VLAN MAC Age
1 20 fa:16:3e:95:02:b6 28
1 20 fa:16:3e:b2:ef:af 7
74 20 fa:16:3e:d1:d8:49 0
说明:
br-int port
1(patch-tun):
74(sg-499291dc-d8):
匹配br-int的流表:
0(dl_src)>1(dl_lan,dl_dst)>sg-xxx
- snat-xxx的namespace抓包
[root@w02 ~]# ip netns exec snat-c4d4b760-41b9-45e1-a607-d054da99c479 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: rfp-c4d4b760-4@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
link/ether 36:66:d5:bf:5e:e0 brd ff:ff:ff:ff:ff:ff
inet 169.254.96.32/31 scope global rfp-c4d4b760-4
valid_lft forever preferred_lft forever
inet 112.65.210.200/32 brd 112.65.210.200 scope global rfp-c4d4b760-4
valid_lft forever preferred_lft forever
125: sg-499291dc-d8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
link/ether fa:16:3e:d1:d8:49 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.6/24 brd 192.168.1.255 scope global sg-499291dc-d8
valid_lft forever preferred_lft forever
[root@w02 ~]# ip netns exec snat-c4d4b760-41b9-45e1-a607-d054da99c479 ip r l
default via 169.254.96.33 dev rfp-c4d4b760-4
169.254.96.32/31 dev rfp-c4d4b760-4 proto kernel scope link src 169.254.96.32
192.168.1.0/24 dev sg-499291dc-d8 proto kernel scope link src 192.168.1.6
[root@w02 ~]# ip netns exec snat-c4d4b760-41b9-45e1-a607-d054da99c479 ip neigh
192.168.1.7 dev sg-499291dc-d8 lladdr fa:16:3e:b2:ef:af REACHABLE
169.254.96.33 dev rfp-c4d4b760-4 lladdr 82:99:8f:5a:6b:ec DELAY
192.168.1.10 dev sg-499291dc-d8 lladdr fa:16:3e:95:02:b6 STALE
[root@w02 ~]# ip netns exec snat-47c9415f-f30a-4a7c-820d-b7322a064f20 iptables -t nat -S
...
-A neutron-l3-agent-POSTROUTING ! -i rfp-47c9415f-f ! -o rfp-47c9415f-f -m conntrack ! --ctstate DNAT -j ACCEPT
-A neutron-l3-agent-snat -o rfp-47c9415f-f -j SNAT --to-source 112.65.210.208
-A neutron-l3-agent-snat -m mark ! --mark 0x2/0xffff -m conntrack --ctstate DNAT -j SNAT --to-source 112.65.210.208
...
说明:
默认路由是169.254.96.33(在fip-xxx的fpr-xxx上)
rfp-xxx和fpr-xxx是一对patch, 用来连接2个不同的namespace
防火墙规则:
SNAT, 目的地址是rfp-xxx, 则把包源地址改为112.65.210.200
就是说进入到rfp-xxx接口的时候, 源IP已经改完了
[root@w02 ~]# ip netns exec snat-c4d4b760-41b9-45e1-a607-d054da99c479 tcpdump -i sg-499291dc-d8 -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on sg-499291dc-d8, link-type EN10MB (Ethernet), capture size 262144 bytes
17:45:23.761053 fa:16:3e:06:d8:7b > fa:16:3e:d1:d8:49, ethertype IPv4 (0x0800), length 142: 192.168.1.7 > 8.8.8.8: ICMP echo request, id 60691, seq 55117, length 108
17:45:23.809919 fa:16:3e:d1:d8:49 > fa:16:3e:b2:ef:af, ethertype IPv4 (0x0800), length 110: 8.8.8.8 > 192.168.1.7: ICMP echo reply, id 60691, seq 55117, length 76
sg-xxx流量:
普通包, length 108, 无VLAN, 无VXLAN
源MAC: fa:16:3e:06:d8:7b(qr-xxx公用MAC, 但不是有qr-xxx发出, 是由br-tun规则还原出来的MAC)目的MAC是sg-xxx interface
[root@w02 ~]# ip netns exec snat-c4d4b760-41b9-45e1-a607-d054da99c479 tcpdump -i rfp-c4d4b760-4 -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on rfp-c4d4b760-4, link-type EN10MB (Ethernet), capture size 262144 bytes
17:46:13.847083 36:66:d5:bf:5e:e0 > 82:99:8f:5a:6b:ec, ethertype IPv4 (0x0800), length 142: 112.65.210.200 > 8.8.8.8: ICMP echo request, id 60691, seq 55167, length 108
17:46:13.880503 82:99:8f:5a:6b:ec > 36:66:d5:bf:5e:e0, ethertype IPv4 (0x0800), length 110: 8.8.8.8 > 112.65.210.200: ICMP echo reply, id 60691, seq 55167, length 76
rfp-xxx流量:
普通包, length 108, 无VLAN, 无VXLAN
发包: 源MAC: rfp-xxx, 目的MAC为默认网关mac(注意源地址已经经过SNAT)
- 网络节点fip-xxx的namespace
[root@w02 ~]# ip netns exec fip-4617ac50-7b34-4b05-811d-b2afc741d446 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
...
4: fpr-c4d4b760-4@fpr-3bbebb1a-5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000
link/ether 82:99:8f:5a:6b:ec brd ff:ff:ff:ff:ff:ff
inet 169.254.96.33/31 scope global fpr-c4d4b760-4
valid_lft forever preferred_lft forever
124: fip-vif.103: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
link/ether fa:17:3e:f7:cd:bc brd ff:ff:ff:ff:ff:ff
inet 10.206.221.195/26 brd 10.206.221.255 scope global fip-vif.103
valid_lft forever preferred_lft forever
[root@w02 ~]# ip netns exec fip-4617ac50-7b34-4b05-811d-b2afc741d446 netstat -nltp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:2601 0.0.0.0:* LISTEN 2063367/zebra
tcp 0 0 127.0.0.1:2604
说明:
fip-xxx端口:
fpr-xxx(patch的一端)
fip-vif.xxx(与交换机建OSPF的端口)
默认路由在物理交换机上
zebra/ospfd 2个进程监听
[root@w02 ~]# ip netns exec fip-4617ac50-7b34-4b05-811d-b2afc741d446 tcpdump -i fpr-c4d4b760-4 -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on fpr-c4d4b760-4, link-type EN10MB (Ethernet), capture size 262144 bytes
17:48:54.127102 36:66:d5:bf:5e:e0 > 82:99:8f:5a:6b:ec, ethertype IPv4 (0x0800), length 142: 112.65.210.200 > 8.8.8.8: ICMP echo request, id 60691, seq 55327, length 108
17:48:54.160524 82:99:8f:5a:6b:ec > 36:66:d5:bf:5e:e0, ethertype IPv4 (0x0800), length 110: 8.8.8.8 > 112.65.210.200: ICMP echo reply, id 60691, seq 55327, length 76
fip-xxx内的fpr-xxx抓包:
目的mac在fpr-xxx
[root@w02 ~]# ip netns exec fip-4617ac50-7b34-4b05-811d-b2afc741d446 tcpdump -i fip-vif.103 -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on fip-vif.103, link-type EN10MB (Ethernet), capture size 262144 bytes
17:49:40.208078 fa:17:3e:f7:cd:bc > 5c:c9:99:60:d2:71, ethertype IPv4 (0x0800), length 142: 112.65.210.200 > 8.8.8.8: ICMP echo request, id 60691, seq 55373, length 108
17:49:40.241501 5c:c9:99:60:d2:71 > fa:17:3e:f7:cd:bc, ethertype IPv4 (0x0800), length 110: 8.8.8.8 > 112.65.210.200: ICMP echo reply, id 60691, seq 55373, length 76
fip-xxx内的fip-vif.xxx抓包
目的mac在默认网关(物理网关)
[root@w02 ~]# ip netns exec fip-4617ac50-7b34-4b05-811d-b2afc741d446 ip r l
default via 10.206.221.193 dev fip-vif.103
10.206.221.192/26 dev fip-vif.103 proto kernel scope link src 10.206.221.195
112.65.210.200 via 169.254.96.32 dev fpr-c4d4b760-4
112.65.210.204 via 169.254.113.28 dev fpr-3bbebb1a-5
169.254.96.32/31 dev fpr-c4d4b760-4 proto kernel scope link src 169.254.96.33
169.254.113.28/31 dev fpr-3bbebb1a-5 proto kernel scope link src 169.254.113.29
[root@w02 ~]# ip netns exec fip-4617ac50-7b34-4b05-811d-b2afc741d446 telnet 0 ospfd
Trying 0.0.0.0...
Connected to 0.
Escape character is '^]'.
Hello, this is Quagga (version 1.0.0.0).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
User Access Verification
Password:
localhost> show ip ospf route
============ OSPF network routing table ============
N 10.206.221.192/26 [10] area: 10.206.221.193
directly attached to fip-vif.103
N 112.65.210.200/32 [10] area: 10.206.221.193
directly attached to lo
N 112.65.210.204/32 [10] area: 10.206.221.193
directly attached to lo
============ OSPF router routing table =============
R 10.206.221.193 [10] area: 10.206.221.193, ABR, ASBR
via 10.206.221.193, fip-vif.103
============ OSPF external routing table ===========
localhost> show ip ospf nei
localhost> show ip ospf neighbor
Neighbor ID Pri State Dead Time Address Interface RXmtL RqstL DBsmL
10.206.221.193 10 Full/DR 34.730s 10.206.221.193 fip-vif.103:10.206.221.195 0 0 0
10.206.221.196 1 Full/DROther 31.489s 10.206.221.196 fip-vif.103:10.206.221.195 0 0 0
说明:
ospf协议维护路由:
ospfd与物理交换机建立邻居
配置:
设置相同的area, stub, mtu, vlan
由L3 agent维护fip-xx ns
- 网络节点br-ex网桥
[root@w02 ~]# ovs-vsctl show
c67632e5-75ed-4f73-b4ab-cf32f95a8770
...
Bridge br-ex
Port "fip-vif.103"
tag: 103
Interface "fip-vif.103"
type: internal
Port br-ex
Interface br-ex
type: internal
Port "eth3"
Interface "eth3"
ovs_version: "2.5.5"
[root@w02 ~]# ovs-appctl fdb/show br-ex
port VLAN MAC Age
1 0 5c:c9:99:60:d2:6e 171
1 103 fa:17:3e:a7:81:25 8
5 103 fa:17:3e:f7:cd:bc 0
1 103 5c:c9:99:60:d2:71 0
...
说明:
端口:
1(eth3)
5(fip-vif.103)
流表: NORMAL
eth3允许VLAN 103
- 网络节点eth3端口
[root@w02 ~]# tcpdump -i eth3 icmp -nnee
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth3, link-type EN10MB (Ethernet), capture size 262144 bytes
17:54:36.698733 fa:17:3e:f7:cd:bc > 5c:c9:99:60:d2:71, ethertype 802.1Q (0x8100), length 146: vlan 103, p 0, ethertype IPv4, 112.65.210.200 > 8.8.8.8: ICMP echo request, id 60691, seq 55669, length 108
17:54:36.732114 5c:c9:99:60:d2:71 > fa:17:3e:f7:cd:bc, ethertype 802.1Q (0x8100), length 114: vlan 103, p 0, ethertype IPv4, 8.8.8.8 > 112.65.210.200: ICMP echo reply, id 60691, seq 55669, length 76
说明:
流量包: vlan 103(出br-ex时)
流量通过eth3口送给物理交换机
流量包成功从虚拟机->宿主机-> 网络节点-> 物理交换机-> 互联网.
结语
本文流量路径和社区版本有一些区别, 如果对网络感兴趣, 建议还是根据实际情况, 结合用到的各个知识点自己绘制一次流量路径, 再碰到其他的使用场景也就能举一反三了.