Ubuntu 升级内核后 initrd.img 损坏造成 kernel panic 无法启动

这几天搞了搞我的服务器,昨天给 Kubernetes 集群中的 Ubuntu 20.04 虚拟机做了一次系统升级,重启之后就进不去系统了,喜提 20+ 小时的全站 down time。今天终于解决了这个问题,在此记录一下。

故障表现

  • Ubuntu 20.04 LTS 系统更新内核后,重启无法进入系统。
  • 主机端观察虚拟机 CPU、内存占用低,且 ACPI 重启无响应,只能 Reset 硬重启。
  • 使用 VNC / Terminal 观察启动日志,可以看到以下日志:
    • /init: conf/conf.d/zz-resume-auto: line 1: syntax error: unexpected "("
    • Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000200
  • 使用 LiveCD 进入系统后,执行 lsinitramfs /boot/initrd.img 可以观察到类似下图的 conf/conf.d/zz-resume-auto 错误文件。
Output of `lsinitramfs` with `conf/conf.d/zz-resume-auto` line

故障原因

直接原因是 Ubuntu 系统更新时更新了内核版本,生成 initrd.img 时引入了错误的文件。根本原因还不好确定,在 Launchpad 上面有类似的 bug 挂了几年都没修复。

修理方法

进入恢复环境

因为系统无法启动,首先要进入恢复环境。其实最简单的方法是进到上一个版本的内核,正常启动系统之后来修复。但是因为我的虚拟机用的是 cloud-init image,试了网上的方法都没法让 Grub 菜单显示出来。下面介绍如何用 Live CD 挂载的方式进入恢复环境。

  1. 下载 Ubuntu Live Server ISO 并引导启动。在我的虚拟机环境比较简单,如果是其他 VPS 服务商或物理机,请自行搜索启动方式,或使用其它 Live Recovery 环境。
  2. 进入安装界面后不要继续,按 Ctrl+Alt+F2 切换至 TTY 命令行。

在恢复环境中重新生成 initrd.img

进入到恢复环境后,按照下列步骤挂载原系统,并生成一个新的 initrd.img 文件。

  1. 挂载原系统根目录,然后切换至原系统环境下:
    sudo mount /dev/sda1 /mnt # /dev/sda1 替换成原系统根目录,可以用 lsblk 找一下
    sudo mount --bind /dev /mnt/dev
    sudo mount --bind /proc /mnt/proc
    sudo mount --bind /sys /mnt/sys
    sudo chroot /mnt
  2. 执行命令重新生成正确的 initrd.img
    update-initramfs -cu -k 5.4.0-136-generic
    (最后一个参数要替换成最新内核的完整版本,可以 ls /lib/modules 找到正确的版本号)
  3. 检查新生成的 initrd.img 文件,确保没有上述的 zz-resume-auto 错误文件:
    lsinitramfs /boot/initrd.img | grep conf
    (应该输出上面图片中除了 zz-resume-auto 的其他正常文件)
  4. 按 Ctrl-D 退出原系统环境,然后执行 sudo umount /mnt/dev /mnt/proc /mnt/sys /mnt 卸载所有目录。
  5. 重启系统,从硬盘启动试试看是否修复。

参考文献:https://forum.level1techs.com/t/solved-kernel-panic/120146

小技巧:Windows 系统下 Alt+F7 快捷键失效?

最近发现在我的 Windows 系统下,JetBrain 全家桶(IDEA、PyCharm、PhpStorm 等等一系列 IDE)的 Find Usages 快捷键 Alt+F7 不响应了。对于使用 IDEA 默认键盘配置的用户来说,“查找引用”的使用频率还是挺高的。

本来是想使用一些检测快捷键冲突的软件查找一下,后来简单 Google 了一下,发现很多人都遇到了这个问题

罪魁祸首是 N 卡显卡驱动自带的 NVIDIA GeForce Experience 软件,它有一个“游戏内覆盖”的功能,主要触发快捷键是 Alt+Z,但是会把其它一系列快捷键 Alt+F1, Alt+F2, Alt+F3, Alt+F5, Alt+F6, Alt+F7, Alt+F8, Alt+F9, Alt+F10, Alt+F11 都注册掉。(思考题:为什么没有 Alt+F4?)

解决方法就是在开始菜单中启动 GeForce Experience 软件,点击右上角齿轮进入设置,直接禁用“游戏内覆盖”功能。如果确实用到游戏内覆盖界面,也可以按 Alt+Z 打开覆盖界面,在设置中的“键盘快捷键”中修改或删除冲突的快捷键。

ngx_pagespeed module for Ubuntu’s stock nginx

I recently decided to build ngx_pagespeed for Ubuntu’s stock nginx, since nginx supports dynamic modules (as Apache did long ago) since 1.9.11. Being “dynamic” means that third-party modules don’t have to be built into the main nginx binary, instead can be separately built and dynamically load at runtime, as long as the module and main nginx are compatible, meaning their configuration parameters are similar enough.

(中文版请参见另一篇文章:为 Ubuntu 官方源的 nginx 单独编译 ngx_pagespeed 模块

I have not seen anyone else build any third-party modules separately. nginx.org official repository provides several modules as packages, but I believe they are built along with the binary they shipped. A probable reason is that even though modules built separately can be loaded, they are not guaranteed to work smoothly. I saw that nginx 1.11.5 is trying to improve compatibility of dynamic modules, but whether that works remains to be tested.

I used the latest Debian (Ubuntu) packaging tools and formats to build the package, and Launchpad PPA to host the apt repository.

You can use the apt commands shown on the page to enable the PPA and install the packages. The following is an example:

sudo add-apt-repository ppa:du9l/ngx-pagespeed
sudo apt-get update

To use this module, first install the right flavored package, then add the line:
load_module modules/ngx_pagespeed.so;
… to your /etc/nginx/nginx.conf file before reloading nginx.

Please remember that this is an experimental feature, and you should NOT use it in production. More information can be found in the PPA description below.

PPA Description

This repository contains “ngx_pagespeed” dynamic module for Ubuntu’s stock nginx packages, including all flavors available in the official repository.

* WARNING: Building dynamic modules alone is EXPERIMENTAL. It is NOT guaranteed to work by the nginx authors. Even though the module can be loaded and has been tested on my own server, I still don’t recommend using it in PRODUCTION environments. USE AT YOUR OWN RISK. *

The package names are “ngx-pagespeed-nginx-FLAVOR” where FLAVOR is core / light / full / extras, which should match your nginx flavor. Check with:
$ dpkg -l | grep nginx

The versions follow ngx_pagespeed’s latest stable versions and Ubuntu’s REL-updates (e.g. xenial-updates) nginx versions. For example, the first version available in this PPA is built with ngx_pagespeed 1.11.33.4 and xenial-updates’ nginx 1.10.0-0ubuntu0.16.04.4 versions.

To use this module, first install the right flavored package, and then add the line:
load_module modules/ngx_pagespeed.so;
… to your /etc/nginx/nginx.conf file before reloading nginx.

* NOTE: This package is linked against the pre-built “psol” binaries provided by Google, so only i386 and amd64 systems are supported for now. In the future I will update the package to build psol itself. *

 

为 Ubuntu 官方源的 nginx 单独编译 ngx_pagespeed 模块

我最近尝试为 Ubuntu 源里的 nginx 编译 ngx_pagespeed 模块。nginx 从 1.9.11 开始支持动态模块,即第三方模块不再需要编译进 nginx 主程序,而是可以动态加载,条件是编译 nginx 和模块的参数基本相同。

English version: ngx_pagespeed module for Ubuntu’s stock nginx

目前网上好像还没有提供动态模块的先例,只有 nginx.org 官方源中有提供几个模块,但应该也是跟主程序一起编译的。原因可能是这样编译的模块虽然可以加载,但不一定可以稳定运行。据我观察 nginx 在 1.11.5 版本之后有增强模块兼容性的举动,具体是否好用还不好说。

在编译模块软件包的过程中,我使用的都是 Debian (Ubuntu) 的最新编译、打包工具和格式,同时使用 Launchpad PPA 来提供 apt 源。

可以使用上面 PPA 源中的命令来启用源、安装软件包。常用的命令应该是:

sudo add-apt-repository ppa:du9l/ngx-pagespeed
sudo apt-get update

使用模块前,先安装对应于 nginx 版本,然后在 /etc/nginx/nginx.conf 配置文件中添加下面一行:
load_module modules/ngx_pagespeed.so;
最后重载(reload)nginx 配置即可。

请注意,单独编译模块是实验性的特性,请不要在生产环境中使用该模块。更多信息请参见的 PPA 的介绍。

(以下是之前文章的备份)

打包 Debian/Ubuntu 软件的感想

最近正在尝试为 Ubuntu 打包一个 .deb 软件,然后看了一遍 Ubuntu 和 Debian 的打包文档,感觉简直是复杂到爆了。同样是全功能的软件包管理器,Arch Linux 的 pacman 只需要一个 PKGBUILD 加一些可选的额外文件(例如安装脚本)就能搞定,deb 系统需要一大堆脚本来辅助“简化”打包工作,学习曲线实在太陡了。经过这一番折腾,也让我明白了平时一个简单的 apt-get 背后有多少人默默的努力……

大概记录一下为一个软件打包、发布到 Launchpad PPA 的全过程。根据具体软件可能有些步骤不同,主要参考文档是 Debian 维护人员手册(有中文)。

  1. 创建目录并下载源代码。通常建议创建一个专门的目录(例如 ~/package),下载源代码 *.tar.gz 文件,并解压缩成单独的源代码目录(~/package/hello-2.10)。
  2. 初始化 debian 目录。在刚才的源代码目录中,使用 dh_make 工具创建 debian 目录的基础结构(~/package/hello-2.10/debian),例如 dh_make -f ../hello-2.10.tar.gz
  3. 可选:将源代码针对 Debian/Ubuntu 进行必要的修改,使用 quilt 工具将修改补丁管理在 debian/patches 目录中。
  4. 按照需要修改 debian 目录中的控制文件,其中最重要的三个是:
    1. control – 管理软件包信息,包括依赖包和介绍等。
    2. rules – 管理软件包配置(configure)和构建(build)的具体步骤。使用的是 GNU Make 的格式,可以根据需要修改特定的 target。这里是水最深的,每个 target 默认调用一些 dh_xxx 的脚本,又可以通过指定 override_dh_* 名称的 target 来覆盖这些脚本的默认操作等等。
    3. changelog – 软件包版本更新信息,构建工具通过它来确定要构建的版本号。
  5. 在源代码目录中执行构造命令。这里有很多种命令可以用,例如 dpkg-buildpackage、debuild、pbuilder 等。大概都是一个高层包装(wrap)另一个底层的关系。
  6. 编译成功后,需要用自己的 PGP 密钥签名 .dsc 和 .changes 文件。debuild 之类的工具可以代劳,也可以手动用 debsign 工具。
  7. 把软件包上传到源。例如 Launchpad PPA 就要求使用 dput 工具上传,而且只传源代码包,它们用自己的服务器编译并发布 deb 二进制包。

简单来说就是这些步骤,其实每一步用到的工具都很复杂,我研究了两天才顺利打好我需要的软件包,准备测试之后再发出来。绝对黑科技,敬请期待。

Compiling kernel modules for Atheros AR5B22 (AR9462) on Jetson TK1

I recently got a Atheros AR5B22 chip for my Jetson TK1 board, in order to make it support WiFi and Bluetooth. The system provided by NVIDIA (Linux4Tegra 21.4) doesn’t have Atheros driver built-in, so I have to compile it to make use of the device.

AR5B22 installed on Jetson TK1

This is what the chip looks like when installed on TK1. AR5B22 is the Mini PCIe reference design for AR9462, which features both 2.4GHz and 5GHz WiFi and Bluetooth 4.0, according to WikiDevi.

Since it belongs to 9xxx series, Linux kernel has the well-supported driver ath9k for it. Unlike other WiFi-Bluetooth-combo chips from Atheros, this one doesn’t specify which Bluetooth chip it uses (judging by BT 4.0 support, it should be AR3012), but nevertheless you still need ath3k driver and firmware for Bluetooth support. This has bugged me for quite a while, but I figured it out anyway (with hints from this Ubuntu bug report).

If you are familiar with how to compile Linux kernel modules for Jetson TK1, above is all you need to continue. The rest of this article are detailed steps for those who don’t know about this.

Note: The following steps are to compile directly on TK1, and features some hack-y steps for installing them. Also, I am NOT responsible for bricking your device.

  1. First make sure you have the latest Linux4Tegra (L4T) 21.4 installed on your Jetson TK1, which features basic bluetooth support. You can use Jetpack to flash it.
  2. The following steps are all carried out with a shell on TK1. It could be either over SSH (ssh ubuntu@IP), or GNOME Terminal (Ctrl-Alt-T) from GUI if you have a monitor plugged in.
  3. Install the firmware (for ath3k) and dependency (for kernel config menu) packages on your TK1.
    sudo apt-get install linux-firmware libncurses5-dev
  4. Download and extract L4T kernel sources into your home directory.
    mkdir ~/kernel && cd ~/kernel
    wget -O kernel_src.tbz2 https://s.du9l.com/Iz4HK  # For L4T 21.4
    tar xf kernel_src.tbz2 && cd kernel
  5. Copy existent kernel config as a start.
    zcat /proc/config.gz > .config
  6. Enter kernel config menu, and change the following settings.
    make menuconfig
    • From “General setup” set “Local version” to “-gdacac96” (check with uname -a), otherwise your compiled module will report “Unknown symbol in module” and “ath9k: version magic … should be …” errors when you insert them.
    • Use “Exit” to go back to the top, then from “Device Drivers – Network device support – Wireless LAN”, press M on “Atheros Wireless Cards” to compile it as a module; then enter it, press M on “Atheros 802.11n wireless cards support”, and press Y on “Atheros bluetooth coexistence support” and “Atheros ath9k PCI/PCIe bus support”.
    • Again, “Exit” to the top, then from “Networking support – Bluetooth subsystem support (should already be M in 21.4 kernel) – Bluetooth device drivers”, press M on “Atheros firmware download driver”.
    • Use “Save” to save your work (default “.config” name is fine), and “Exit” until you are back to the shell.
  7. Use the following command to start the compilation. It usually needs ~5 minutes to finish.
    make -j4 modules
  8. Here comes the hack-y part: Officially you need sudo make modules_install to install the modules, but I just want to install the newly compiled ones into a separate folder, so I will use the following commands instead:
    sudo mkdir /lib/modules/`uname -r`/kernel/custom  # `uname -r` becomes "3.10.40-gdacac96" in this case
    find . -name 'ath*.ko' | xargs -I{} sudo cp {} /lib/modules/`uname -r`/kernel/custom/
    sudo depmod -a
  9. In order to use WiFi and Bluetooth together, you need to enable “Bluetooth coexistence” in ath9k module.
    echo "options ath9k btcoex_enable=1" | sudo tee /etc/modprobe.d/ath9k_btcoex.conf
  10. Finally, insert both modules into the kernel.
    sudo modprobe ath9k ath3k

You should now have both WiFi and Bluetooth working. You can check with the following commands:

iwconfig
hciconfig

Just to be clear, I used the above steps with the following hardware, but I suppose you can use the same drivers for any Atheros WiFi AR9xxx series and BT AR3xxx series chip (combo or separate), as long as the 3.10 kernel and ath9k & ath3k modules support them.

$ lspci |grep Atheros
# 01:00.0 Network controller: Qualcomm Atheros AR9462 Wireless Network Adapter (rev 01)
$ lsusb |grep Atheros
# Bus 001 Device 003: ID 0cf3:3004 Atheros Communications, Inc.