LastPass “难民” 的新选择:KeePass

随着 LastPass 对免费用户设定的最新限制,包括我在内的一大批白嫖用户面临着移动端或 PC 端无法访问的困难。于是最近终于下定决心寻找 LastPass 的替代品,最好要是开源软件,并且能自己管理、同步数据。这样的选项并不多,最终我选择了 KeePass

之前 LastPass 为免费用户提供的功能主要有:

  • 密码跨设备同步
  • 浏览器端密码自动填充,自动提示添加
  • 手机端密码自动填充、自动提示添加

经过一番查找,同时满足开源、自己管理数据两个条件的成熟密码管理软件共有两个:KeePass 和 BitWarden

对 BitWarden 的考虑

BitWarden Logo

首先值得肯定的是,BitWarden 作为商业化运营的软件,官方提供的软件集成度和丰富性相对 KeePass 都要高很多;因为后者除了官方桌面客户端以外,其它功能(包括同步、移动客户端、浏览器插件)都是靠第三方插件和软件实现的。

BitWarden 的开源程度也比较高,提供了客户端的源代码和自己托管服务端的选项。但我也发现了一些问题:

  • 自己托管服务端需要启动 10+ 个 Docker 容器,其中包含独立的 Microsoft SQL Server,因而最低配置要求也较高,内存需要 2GB 以上。
    • 他们应该是直接提供了官方服务端,实际上个人托管的话,一个 SQLite 数据库也就足够了。
  • 自己的服务端环境依然需要依赖 BitWarden 提供的中心服务器。
  • 部分付费功能,即使自行托管服务端,依然需要购买付费套餐,通过验证后才能解锁

因此,最终还是没有选择 BitWarden 的方案。对于不喜欢折腾的用户,可能直接购买一个 BitWarden 的付费套餐会比较划算,毕竟 $10/年的价格相对其它竞品很有优势。

选择 KeePass 方案

最终决定迁移到 KeePass,经过一番对比,采取了以下软件组合:

  • 桌面客户端:KeePass 官方版
    • 功能完善,支持各种格式导入/导出、自动填充、多语言等
    • 通过插件扩展机制,可以支持更多同步方式、第三方集成等功能
    • 使用 .Net 编写,通过 Mono 可以支持 macOS 和 Linux 下运行
  • 浏览器端:Kee 插件
    • 开源,功能完善,支持 Chrome(及 Chromium 内核浏览器,例如 Microsoft Edge、Opera、国产浏览器等)和 Firefox
    • 由于之前出现过 Chrome 商店的插件被卖掉之后加入恶意功能的先例,建议 Chrome 用户直接下载特定版本的插件并解压使用,以避免自动更新
  • Android 移动端:KeePass2Android (KP2A)
    • 开源,功能完善
    • 针对不需要软件内置同步功能的用户,有单独的 Offline 版本(不申请网络权限)
  • 同步:方案较多,比较常用的是网盘同步法
    • 借助 Dropbox、Google Drive、Microsoft OneDrive、百度网盘等客户端,将数据库文件夹映射到本地目录,然后使用桌面客户端直接将数据库保存同步,无需插件。
    • Android 客户端本身也支持部分网盘直接同步,上传、下载数据库文件。

由 LastPass 迁移到 KeePass 也很简单。以 Chrome 浏览器插件为例,插件登陆后,在菜单中 “Advanced > Export > LastPass CSV File” 可以导出 CSV 格式的密码文件。然后在 KeePass 客户端中新建一个 Database,然后在 “File > Import…” 下拉找到 LastPass CSV 即可导入之前的网站名称、账号、密码、URL 信息。

DNS as Configuration / Code with DNSControl

Managing DNS for a domain name traditionally involves visiting the control panel of your DNS authoritative server providers to create, modify or delete the related records there. But I recently discovered a new project, DNS Control by Stack Overflow, which allows one to manage DNS records by modifying JavaScript configuration files, similar to the ways Kubernetes and Ansible work in.

A simple illustration of how DNSControl works.

Why did I switch?

In my experience, the main advantages of DNSControl, or rather the workflow it promotes, are the following:

  • Support for different authoritative DNS providers: It is no longer needed to visit the control panels of different providers. The configuration is provider-agnostic, and can be applied to different or even multiple DNS providers, which allows administrators to easily migrate between providers or mix and use servers from different providers simultaneously.
  • Specify the state instead of actions: This is analogous of managing infrastructure using Ansible vs manually. Only the final state is specified in the configuration file, and the software takes care of adding or modifying records and deleting unnecessary ones.
  • Use script to simplify records description: A basic version of JavaScript can be used in describing the DNS records, which can reduce repetition and ease the complexity of modifications. For example, variables (or constants) and functions can be used to generate similar DNS records in batch.

I will briefly introduce my new workflow for migrating and managing DNS below, in order to show you how it can be done.

Migrating existent zones

The first step of switching to the new workflow, is to export and migrate the existent DNS zones from the current providers into the configuration file.

If you are like me who have dozens of records in the old DNS control panel, and you simply don’t want to copy-paste everything by hand, DNSControl has a “get-zones” sub-command that can be used in this situation. You can read the official documentation about migration, and the steps I used are:

  1. In order to read from the current provider, credentials must be generated and provided in the creds.json file. The methods vary by provider, which can be found in their respective pages. For example, CloudFlare only requires an API token with sufficient permissions to access and modify zone records.
  2. With creds.json filled out and saved to the current directory, the following command can be executed to export current records of a specific zone:
    dnscontrol get-zones --format=js --out=dnsconfig.js <creds-name> <PROVIDER-IDENTIFIER> your-domain.tld

    1. The software is written in Go, so they provide static binaries in GitHub release page.
    2. creds-name is the key used in creds.json, and PROVIDER-IDENTIFIER can be found in the “Identifier” column in the provider table.
  3. Now dnsconfig.js should contain all your existent records, and you can optimize the script using JavaScript variables and functions. Note that they use a simple JavaScript interpreter, so please only use the simplest features of the language. (You will know what not to use in the testing steps below.)

Updating DNS records

In order to create or update DNS records for a domain, one should first edit dnsconfig.js by modifying the arguments or variables (if created in the previous part) that belongs to the domain in question. Then, in order to make sure that the JavaScript syntax is correct and all the changes are indeed desired, use the preview sub-command to compare the changes to the existent records online. Finally, when everything checks out, use dnscontrol push to apply the changes.

To further automate the workflow, I personally use a Git repository to version-control my dnsconfig.js configuration, and Jenkins to perform the steps above. My creds.json is kept private in Jenkins’ “Credentials” area, and mounted into the pipeline environment during execution. In this way, I can commit and push my DNS configuration to the Git server, and Jenkins will automatically check and apply the changes.

Supported providers

As of the time of writing this article, the following DNS providers are supported by DNSControl:

  • ActiveDirectory_PS
  • AXFRDDNS
  • Azure DNS
  • BIND
  • Cloudflare
  • ClouDNS
  • deSEC
  • DigitalOcean
  • DNSimple
  • Gandi_v5
  • Google Cloud DNS
  • Hurricane Electric DNS
  • Hetzner DNS Console
  • HEXONET
  • INWX
  • Linode
  • Microsoft DNS Server (Windows Server)
  • Name.com
  • Namecheap Provider
  • Netcup
  • NS1
  • Oracle Cloud
  • Ovh
  • PowerDNS
  • Route 53
  • SoftLayer DNS
  • Vultr

In addition, the following registrars are supported, which allow users to modify the domains’ NS records to point to the providers above:

  • CSC Global
  • DNSimple
  • DNS-over-HTTPS
  • Gandi_v5
  • HEXONET
  • Internet.bs
  • INWX
  • Name.com
  • Namecheap Provider
  • OpenSRS
  • Ovh
  • Route 53

And even if your current provider is not covered, you can easily add your own integration and possibly contribute to the upstream.

Setting up your own IPv6 Tunnel

I have some servers that don’t come with native IPv6 connectivity, which means that in order to use the next generation protocol, they need to be tunneled by other IPv6-capable nodes over IPv4.

In the past I have exclusively gone for the Tunnel Broker service provided by Hurricane Electric. I loved their service not only because it is free and easy to set up, but also for the reasonably good quality of their tunnel, since HE is a well-known transit provider. But recently, one of my servers which I use as an Internet exit has been suffering when it tries to make connections to IPv6-enabled websites. The symptom is simple – I can ping6 some addresses but not others, and the frequency is getting higher. So, I decided to set up a private tunnel endpoint using one of my own IPv6-enabled servers.

Prerequisites

I want to mimic the Tunnel Broker service as much as possible, because it is known to work. The current service provides tunnel users with the following stuff:

  • “Server IPv4 Address”: The remote IPv4 tunnel endpoint, like 66.220.*.*.
  • “Client IPv6 Address”: An IPv6 address representing the host connecting to the tunnel, like 2001:470:c:*::2.
  • “Server IPv6 Address”: An IPv6 address representing the tunnel server, also used as the IPv6 gateway of the client host, like 2001:470:c:*::1.
  • “Routed IPv6 Prefixes”: /64 or /48 subnets given to the tunnel operator to provide IPv6 connectivity to other internal networks through the tunnel.

I made one of my IPv6-connected servers the designated tunnel server. In order to be used as such, it has the following to provide:

  • A public IPv4 address: I will use this as the “Server IPv4 Address”, which means my client host will connect to this endpoint over IPv4.
  • Three routable IPv6 addresses: The specs actually say I have 10, and I believe they would route the whole /64 to me if I set it up right. But for this particular use case, 3 is enough: *::1 is the address of the tunnel server, *::2 and *::3 as Server and Client IPv6 Address respectively.

Since I don’t have any other subnets to make routable, I don’t need to provide another routable IPv6 prefix.

Connecting tunnel client and server

We first need to make the client and server hosts communicable using their IPv6 addresses. The protocol used by Tunnel Broker, and thus my new tunnel, is Simple Internet Transition (SIT). It is supported by Linux kernel natively and quite easy to set up. In fact, the Tunnel Broker service provides users with sample client configurations depending on their preferred network management tools. Here is an example using iproute2:

modprobe ipv6
ip tunnel add sit-ipv6 mode sit remote [SERVER-IPV4] local [CLIENT-IPV4] ttl 255
ip link set sit-ipv6 up
ip addr add [CLIENT-IPV6]/127 dev sit-ipv6
ip route add ::/0 dev sit-ipv6
ip -f inet6 addr

For my configuration, the client IPv6 address is *::3, and the netmask is set to /127 to include both ends’ addresses. If one wants to persist the configuration, they can use the method provided by their operating systems. Here is the example client configuration using Netplan (used at least by Ubuntu 18.04):

network:
  version: 2
  tunnels:
    sit-ipv6:
      mode: sit
      remote: [SERVER-IPV4]
      local: [CLIENT-IPV4]
      addresses:
        - "[CLIENT-IPV6]/127"
      gateway6: "[SERVER-IPV6]"

The thing about SIT tunnels is that they are symmetrical, so in order to set up the server end, one need to make the following changes:

  • Switch the server and client IPv4 addresses, so that the one after “local” is the IPv4 address of the configured machine.
  • Replace CLIENT-IPV6 with SERVER-IPV6 as the interface’s IPv6 endpoint.
  • Remove the route / gateway definition, since the server already has an external IPv6 gateway.

By now, both the tunnel server and client hosts should be able to reach each other with their brand new IPv6 addresses. This can be verified by running ping6 [SERVER-IPV6] on the client side, and vice versa.

Forwarding Tunneled Traffic

In order for the tunneled host to actually reach the global Internet, the tunnel server has to route IPv6 traffic from and to the host.

Forwarding Outgoing Traffic

Since [SERVER-IPV6] is configured to be the IPv6 gateway on the client host, all its traffic with a remote IPv6 destination address will be sent over the tunnel to the server side. By default, a server will not take the role of routing that traffic – it will only receive traffic destined to itself. To make it also forward traffic to the next hop, we need to enable packet forwarding in the kernel parameters. This can be done by running the following as root:

echo 1 > /proc/sys/net/ipv6/conf/[SERVER-TUNNEL-INTERFACE]/forwarding

This can be persisted across reboots by appending net.ipv6.conf.[SERVER-TUNNEL-INTERFACE].forwarding=1 to /etc/sysctl.conf. Note that if you have firewalls like ip6tables, you may need to configure its forwarding rules, or change the default forwarding policy to ACCEPT.

Accepting Incoming Traffic

When there is traffic coming in for the tunnel server, but has the destination address of the client host, the tunnel server’s gateway will attempt to use “Neighbor Solicitation Message” to verify its reachability. But the client host’s IPv6 address is absent on all interfaces of the server host, so it will not reply said message, causing the incoming traffic to be dropped.

In order for the tunnel server to respond to the solicitation message with a “Neighbor Advertisement Message”, we need to configure a NDP proxy for the server’s external interface. The first step is to enable NDP proxy in the Linux kernel:

echo 1 > /proc/sys/net/ipv6/conf/[SERVER-EXTERNAL-INTERFACE]/proxy_ndp

This parameter can be persisted in the same way as shown in the last section. Then we have to explicitly enable NDP proxy for the client IPv6 address. Using iproute2 this can be done as:

ip -6 neigh add proxy [CLIENT-IPV6] dev [SERVER-EXTERNAL-INTERFACE]

This line means that when the external router wants to reach the client IPv6 address on the interface, the server will respond with its own address. Then, when the traffic destined for the client host arrives, the server will forward it to the tunnel interface, since we configured a /127 subnet above to include IPv6 addresses of both ends. This can be shown by observing the routing table from running ip -6 route on the server.

The command also needs to be persisted, so that client hosts will not lose connectivity after the server reboots. The way of persistence varies by the network management tool used by the server. For ifupdown the command can be written in /etc/network/interfaces; If the server is using Netplan, the location where this command goes should probably be /etc/networkd-dispatcher/routable.d, since Netplan doesn’t come with native hook support.

Summary

I would like to revisit the route an outgoing packet will go through. Let’s say a process on the client host wants to access 2001:4860:4860::8888:

  • According to the routing table on the client end, the traffic should be forwarded to the gateway, SERVER-IPV6.
  • Then it will notice that the SERVER-IPV6 address belongs to the /127 subnet on sit-ipv6 interface.
  • When the packet is forwarded to the sit-ipv6 tunnel interface, it will be encapsulated with an IPv4 header, and sent to the SERVER-IPV4 address. This could be across the IPv4 Internet, or a private connection if there is one.
  • The encapsulated packet will be received by the sit-ipv6 interface on the server’s end, and unpacked to its original IPv6 form.
  • Since the IPv6 destination is an external one, and we have enabled forwarding on the server, it will be routed to the external gateway according to the routing table.

When the remote server replies, the packet goes the exact opposite way back to the client host.

西数各系列硬盘使用 SMR 和 PMR 的型号列表

前一段时间,西数红盘系列使用 SMR(叠瓦磁记录)技术的新闻闹得很大,因为红盘 Red 系列是主打 NAS 存储,虽然不是“高端”(是与蓝盘、绿盘一样的低转速),但也比蓝盘价格贵出几成,因此被锤的很惨。

图片来源:Synology

在此之前西数是不公布每款产品的内部技术的。今天看了下西数官网,已经标出了每个系列中具体哪些型号使用 SMR、哪些使用 PMR(垂直磁记录;西数称作 CMR,常规磁记录)技术。之前传出的消息是 2~6 TB 的红盘使用 SMR,但我看了下只是一部分型号,而我在某东买的红盘在 CMR 的型号中。所以先别急着退货,可以对照下表检查一下自己的硬盘型号是否在中招的列表中,然后再做决定。

下面的型号信息综合自多个来源,仅供参考,请以官网信息为准。

SMR 技术的型号如下:

  • WD Red™ 3.5” (3.5 英寸红盘): WD20EFAX (2TB), WD30EFAX (3TB), WD40EFAX (4TB), WD60EFAX (6TB)
  • WD Blue™ 3.5” (3.5 英寸蓝盘): WD20EZAZ (2TB), WD60EZAZ (6TB)
  • WD Blue™ 2.5” (2.5 英寸蓝盘): WD10SPZX (1TB), WD20SPZX (2TB)
  • WD Black™ 2.5” (2.5 英寸黑盘): WD10SPSX (1TB)

PMR (CMR) 技术的型号如下:

  • WD Red™ (2.5 英寸 / 3.5 英寸红盘): WD10JFCX (1TB), WD10EFRX (1TB), WD20EFRX (2TB), WD30EFRX (3TB), WD40EFRX (4TB), WD60EFRX (6TB), WD80EFAX (8TB), WD100EFAX (10TB), WD101EFAX (10TB), WD120EFAX (12TB), WD140EFAX (14TB)
  • WD Red Pro (3.5 英寸红盘 Pro): WD2002FFSX (2TB), WD4002FFWX (4TB), WD4003FFBX (4TB), WD6002FFWX (6TB), WD6003FFBX (6TB), WD8003FFBX (8TB), WD102KFBX (10TB), WD121KFBX (12TB), WD141KFGX (14TB)
  • WD Black™ 3.5” (3.5 英寸黑盘): WD5003AZEX (500GB), WD1003FZEX (1TB), WD2003FZEX (2TB), WD4005FZBX (4TB), WD6003FZBX (6TB)
  • WD Black™ 2.5” (2.5 英寸黑盘): WD2500LPLX (250GB), WD3200LPLX (320GB), WD5000LPLX (500GB)
  • WD Blue™ 3.5” (3.5 英寸蓝盘): WD5000AZLX (500GB), WD5000AZRZ (500GB), WD10EZRZ (1TB), WD10EZEX (1TB), WD20EZRZ (2TB), WD30EZRZ (3TB), WD40EZRZ (4TB), WD60EZRZ (6TB)
  • WD Blue™ 2.5” (2.5 英寸蓝盘): WD3200LPCX (320GB), WD5000LPVX (500GB), WD5000LPCX (500GB), WD5000LQVX (EA) (500GB)
  • WD Purple (3.5 英寸紫盘): WD10EJRX (1TB), WD20EJRX (2TB), WD30EJRX (3TB), WD40EJRX (4TB), WD60EJRX (6TB), WD80EJRX (8TB), WD81PURZ (8TB), WD82PURZ (8TB), WD100EJRX (10TB), WD101EJRX (10TB), WD121EJRX (12TB), WD140EJRX (14TB)
  • WD Gold (3.5 英寸金盘): WD1005VBYZ (1TB), WD2005VBYZ (2TB), WD4003VRYZ (4TB), WD6003VRYZ (6TB), WD8004VRYZ (8TB), WD102VRYZ (10TB), WD121VRYZ (12TB), WD141VRYZ (14TB)

信息来源:

  1. On WD Red NAS Drives – Western Digital Blog
  2. Western Digital Online

附上西数官方博客的原文(机器翻译):

2020 年 4 月 22 日

过去的一周,可以说是多事之秋。作为一个团队,我们认真倾听并了解您对我们的WD Red NAS硬盘的反馈,特别是我们如何沟通使用哪些记录技术,这一点非常重要。我们清楚地听到了您的担忧。以下是我们通过渠道提供的客户内部硬盘的那个列表。

点击这里查看使用SMR技术的客户内部硬盘的SKU。

我们致力于提供能帮助您做出明智的购买决定的信息,并尽可能多的使用。感謝您讓我們知道如何做得更好。我们将更新我们的营销资料,并提供更多关于SMR技术的信息,包括基准和理想的用例。

再次,我们知道您将您的数据委托给我们的产品,我们不会掉以轻心。如果您已经购买了硬盘,如果您遇到性能或其他技术问题,请致电我们的客户关怀。我们将为您提供选择。我们将为您提供帮助。

更多精彩内容,敬请期待。


2020 年 4 月 20 日

最近,大家都在讨论我们的一些WD Red硬盘(HDD)中使用的记录技术。我们对任何误解表示遗憾,希望花几分钟时间讨论一下硬盘,并提供一些补充信息。

WD Red 硬盘是使用 NAS 系统的家庭和小型企业的理想选择。它们非常适合使用1到8个硬盘托架进行文件的共享和备份,一年的工作负载率为180TB,是很好的选择。我们已经对这种类型的使用进行了严格的测试,并得到了主要NAS供应商的验证。

我们通常会指定设计好的用例和性能参数,不一定要讲到引擎盖下的东西。其中一项创新技术是Shingled Magnetic Recording(SMR)技术。

SMR是经过测试和验证的技术,它使我们能够跟上个人和企业使用的数据量不断增加的趋势。我们正在不断创新,以推动它的发展。SMR 技术有不同的实现方式–硬盘管理的 SMR (DMSMR)、设备本身的 SMR (如我们的低容量 (2TB – 6TB) WD Red HDD,以及主机管理的 SMR (用于高容量数据中心应用)。每种实现都为不同的用例服务,从个人计算到世界上一些最大的数据中心都有不同的用例。

DMSMR的设计是为了管理驱动器内的智能数据放置,而不是依赖主机,从而为终端用户实现无缝集成。典型的小型企业/家庭NAS工作负载的数据强度是断断续续的,这就为DMSMR硬盘留出了足够的空闲时间,可以根据需要执行后台数据管理任务,为用户持续提供最佳的性能体验。

多年来,WD Red 硬盘已为全球各地的家庭和小型企业NAS系统提供了可靠的动力,并得到了主要NAS制造商的一致认可。在建立了这样的声誉之后,我们明白,有时我们的硬盘可能会被用于系统工作负载远远超过其预期用途。此外,有些人最近分享说,在某些数据密集型的连续读/写使用案例中,WD Red HDD 驱动的 NAS 系统的性能并不符合您的预期。

如果您遇到的性能与您的预期不符,请考虑我们为密集型工作负载设计的产品。这些产品可能包括 WD Red Pro 或 WD Gold 硬盘,或者 Ultrastar 硬盘。我们的客户服务团队随时准备提供帮助,也可以确定哪种产品可能最适合您。

我们知道您将您的数据委托给我们的产品,我们不会掉以轻心。如果您已经购买了 WD Red 硬盘,如果您遇到性能或其他技术问题,请致电我们的客户服务中心。我们将为您提供各种选择。我们将为您提供帮助。

Kubernetes + Flannel: UDP packets dropped for wrong checksum – Workaround

Update on July 22

Stable versions have been released in all supported branches (v1.16.13, v1.17.9 and v1.18.6) that include the fix needed. According to the change log:

Fixes a problem with 63-second or 1-second connection delays with some VXLAN-based network plugins which was first widely noticed in 1.16 (though some users saw it earlier than that, possibly only with specific network plugins). If you were previously using ethtool to disable checksum offload on your primary network interface, you should now be able to stop doing that. [ref]

Update on June 17

After 3 months, the problem has been located and solved from Kubernetes’ end. Long story short, they decided that neither Linux kernel nor Flannel required any change, instead a mark added by kube-proxy caused the kernel to double-NAT the packet, and in turn sent with a wrong checksum. A pull request has been merged to master, and soon to be backported to release branches.

To see the whole story, check out the following links:


Recently I noticed some DNS queries in my Kubernetes cluster time out, causing apps to crash. I looked into the issue, reported to the kernel network team and applied the workaround.

Symptom

My Kubernetes cluster is built with Flannel overlay network with vxlan backend. The idea is that each node (machine) gets a private IP subnet to further allocate to pods. When a cross-node packet is to be sent, it was sent to the vxlan virtual interface, encapsulated in a UDP (regardless of the original protocol) packet and routed to the other node, where it is received by another vxlan interface and extracted.*

Kubernetes clusters provide Service resource. One of the many types of Service allows you to use a single virtual IP to represent multiple pods, some times across nodes. This is implemented with kube-proxy component, which utilizes the IPVS feature in Linux Kernel.

Now, when I make a DNS query on the host (it should be the same from inside containers, but with more hops) using dig against CoreDNS’ service IP, it always times out. It works fine if I query one of the backend pod’s IP instead.

Diagnosis

I used tcpdump to capture the packet, and noticed that the encapsulated UDP packet had a bad UDP checksum.

06:22:23.699846 IP (tos 0x0, ttl 64, id 7598, offset 0, flags [none], proto UDP (17), length 133)
    192.3.59.220.25362 > 147.135.114.20.8472: [bad udp cksum 0xd2ae -> 0x245b!] OTV, flags [I] (0x08), overlay 0, instance 1
IP (tos 0x0, ttl 63, id 33703, offset 0, flags [none], proto UDP (17), length 83)
    172.19.192.0.13169 > 172.19.195.166.53: [udp sum ok] 41922+ [1au] A? www.google.com. ar: . OPT UDPsize=4096 (55)

Further test on the receiving end shows that the packet is transferred, but dropped on the target node. That makes it certain that the checksum is what caused the DNS query to time out with no response.

A little more Googling shows that this could be caused by “Checksum offloading“. That means if the kernel wants to send a packet out on a physical ethernet card, it can leave the checksum calculation to the card hardware. In this case, if you capture the packet from kernel, it will show a wrong checksum, since it has yet to be calculated; but, the same packet captured on the receiving end will have a different and correct checksum, calculated by the sender’s network card hardware.

Workaround

I tried to use ethtool to disable TX (outgoing) checksum offloading on flannel.1 (vxlan virtual interface), and the query works again. So my guess is the kernel driver miscalculated / forgot to calculate the checksum with offloading turned on; when it’s off, it used kernel code to calculate the correct checksum before sending it to the actual outgoing network card.

To temporarily turn off checksum offloading:

sudo ethtool -K flannel.1 tx-checksum-ip-generic off

I used a systemd service to automatically do this after the interface appears. Note that the interface was created by flannel after kubelet is run, so you can’t simply execute it at boot time, e.g. in /etc/rc.local.

You can use the following code to create the service /etc/systemd/system/xiaodu-flannel-tx-off.service, then enable and start it. (The service file can be downloaded using this link.)

sudo tee /etc/systemd/system/xiaodu-flannel-tx-off.service > /dev/null << EOF
[Unit]
Description=Turn off checksum offload on flannel.1
After=sys-devices-virtual-net-flannel.1.device

[Install]
WantedBy=sys-devices-virtual-net-flannel.1.device

[Service]
Type=oneshot
ExecStart=/sbin/ethtool -K flannel.1 tx-checksum-ip-generic off
EOF
sudo systemctl enable xiaodu-flannel-tx-off
sudo systemctl start xiaodu-flannel-tx-off

For systemd >= 245, they added TransmitChecksumOffload parameter to *.link unit. You can read the docs and try it out yourself, or just use the service above.

* If you want to learn more about how Flannel and Kubernetes networking works under the hood, I strongly suggest that you read this blog post, which gives an step-by-step demonstration of how a packet is sent from one pod to another.