速查:
client:172.16.12.233
server:172.16.12.219
网卡: 都是单卡eth0
测试过程:client上发1000字节的SQL到server,client上发超过1500字节的SQL到server
测试SQL超过1480:
很长的SQL超过了1480字节抓包
tcpdump -i eth0 -s 0 -w s2_s.cap port 3006交互的包上都有DF标志,简单的说就是报文不允许切分,如果一定要切(路由觉得包太大只能丢弃)分直接返回失败,所以需要让client切好了再发。
The DF flag is typically set on IP packets carrying TCP segments. This is because a TCP connection can dynamically change its segment size to match the path MTU, and better overall performance is achieved when the TCP segments are each carried in one IP packet. So TCP packets have the DF flag set, which should cause an ICMP Fragmentation Needed packet to be returned if an intermediate router has to discard a packet because it’s too large. The sending TCP will then reduce its estimate of the connection’s Path MTU (Maximum Transmission Unit) and re-send in smaller segments. If DF wasn’t set, the sending TCP would never know that it was sending segments that are too large. This process is called PMTU-D (“Path MTU Discovery”). If the ICMP Fragmentation Needed packets aren’t getting through, then you’re dealing with a broken network. Ideally the first step would be to identify the misconfigured device and have it corrected; however, if that doesn’t work out then you add a configuration knob to your application that tells it to set the
TCP_MAXSEGsocket option withsetsockopt(). (A typical example of a misconfigured device is a router or firewall that’s been configured by an inexperienced network administrator to drop all ICMP, not realising that Fragmentation Needed packets are required by TCP PMTU-D).
为什么MTU=1500但是wireshark看到的发包收包都有超过1500的呢?

原来在wireshark是在网卡层面以上抓的包,网卡根据tso和gro的配置自动拆/拼包,这两个概念后面介绍

查看参数配置:
ethtool -k eth0 | grep -E 'generic-segmentation-offload|tcp-segmentation-offload'
tcp-segmentation-offload: off
generic-segmentation-offload: on
# ethtool -K eth0 tso on
# ethtool -K eth0 tso off为了降低 CPU 的负载,提高网络的出口带宽,TSO 提供一些较大的缓冲区来缓存 TCP 发送的包,然后由网卡负责把缓存的大包拆分成多个小于 MTU 的包。tcpdump 或者 wireshare 抓取的是网卡上层的包,所以我们可能会观察到大小超过 MTU 的包
查看参数配置
ethtool -k eth0 | grep -E 'generic-receive-offload|large-receive-offload'
generic-receive-offload: off
large-receive-offload: off [fixed]
# ethtool -K eth0 gso off
# ethtool -K eth0 gso onLRO 的核心在于:在接收路径上,将多个数据包聚合成一个大的数据包,然后传递给网络协议栈处理,但 LRO 的实现中存在一些瑕疵:
而解决这些问题的办法就是新提出的 GRO(Generic Receive Offload)
首先,GRO 的合并条件更加的严格和灵活,并且在设计时,就考虑支持所有的传输协议,因此,后续的驱动,都应该使用 GRO 的接口,而不是 LRO,内核可能在所有先有驱动迁移到 GRO 接口之后将 LRO 从内核中移除。
而 Linux 网络子系统的维护者 David S. Miller 就明确指出,现在的网卡驱动,有 2 个功能需要使用,一是使用 NAPI 接口以使得中断缓和 (interrupt mitigation) ,以及简单的互斥,二是使用 GRO 的 NAPI 接口去传递数据包给网路协议栈。
在 NAPI 实例中,有一个 GRO 的包的列表 gro_list,用堆积收到的包,GRO 层用它来将聚集的包分发到网络协议层,而每个支持 GRO 功能的网络协议层,则需要实现 gro_receive 和 gro_complete 方法。
https://www.ibm.com/developerworks/cn/linux/l-cn-network-pt/index.html
http://wsfdl.com/%E8%B8%A9%E5%9D%91%E6%9D%82%E8%AE%B0/2016/07/12/tcp_package_large_then_MTU.html
https://liqiang.io/post/tcp-segmentation-offload-introduction-and-operation-2f0b8949