为 Ubuntu 官方源的 nginx 单独编译 ngx_pagespeed 模块

我最近尝试为 Ubuntu 源里的 nginx 编译 ngx_pagespeed 模块。nginx 从 1.9.11 开始支持动态模块,即第三方模块不再需要编译进 nginx 主程序,而是可以动态加载,条件是编译 nginx 和模块的参数基本相同。

English version: ngx_pagespeed module for Ubuntu’s stock nginx

目前网上好像还没有提供动态模块的先例,只有 nginx.org 官方源中有提供几个模块,但应该也是跟主程序一起编译的。原因可能是这样编译的模块虽然可以加载,但不一定可以稳定运行。据我观察 nginx 在 1.11.5 版本之后有增强模块兼容性的举动,具体是否好用还不好说。

在编译模块软件包的过程中,我使用的都是 Debian (Ubuntu) 的最新编译、打包工具和格式,同时使用 Launchpad PPA 来提供 apt 源。

可以使用上面 PPA 源中的命令来启用源、安装软件包。常用的命令应该是:

sudo add-apt-repository ppa:du9l/ngx-pagespeed
sudo apt-get update

使用模块前,先安装对应于 nginx 版本,然后在 /etc/nginx/nginx.conf 配置文件中添加下面一行:
load_module modules/ngx_pagespeed.so;
最后重载(reload)nginx 配置即可。

请注意,单独编译模块是实验性的特性,请不要在生产环境中使用该模块。更多信息请参见的 PPA 的介绍。

(以下是之前文章的备份)

打包 Debian/Ubuntu 软件的感想

最近正在尝试为 Ubuntu 打包一个 .deb 软件,然后看了一遍 Ubuntu 和 Debian 的打包文档,感觉简直是复杂到爆了。同样是全功能的软件包管理器,Arch Linux 的 pacman 只需要一个 PKGBUILD 加一些可选的额外文件(例如安装脚本)就能搞定,deb 系统需要一大堆脚本来辅助“简化”打包工作,学习曲线实在太陡了。经过这一番折腾,也让我明白了平时一个简单的 apt-get 背后有多少人默默的努力……

大概记录一下为一个软件打包、发布到 Launchpad PPA 的全过程。根据具体软件可能有些步骤不同,主要参考文档是 Debian 维护人员手册(有中文)。

  1. 创建目录并下载源代码。通常建议创建一个专门的目录(例如 ~/package),下载源代码 *.tar.gz 文件,并解压缩成单独的源代码目录(~/package/hello-2.10)。
  2. 初始化 debian 目录。在刚才的源代码目录中,使用 dh_make 工具创建 debian 目录的基础结构(~/package/hello-2.10/debian),例如 dh_make -f ../hello-2.10.tar.gz
  3. 可选:将源代码针对 Debian/Ubuntu 进行必要的修改,使用 quilt 工具将修改补丁管理在 debian/patches 目录中。
  4. 按照需要修改 debian 目录中的控制文件,其中最重要的三个是:
    1. control – 管理软件包信息,包括依赖包和介绍等。
    2. rules – 管理软件包配置(configure)和构建(build)的具体步骤。使用的是 GNU Make 的格式,可以根据需要修改特定的 target。这里是水最深的,每个 target 默认调用一些 dh_xxx 的脚本,又可以通过指定 override_dh_* 名称的 target 来覆盖这些脚本的默认操作等等。
    3. changelog – 软件包版本更新信息,构建工具通过它来确定要构建的版本号。
  5. 在源代码目录中执行构造命令。这里有很多种命令可以用,例如 dpkg-buildpackage、debuild、pbuilder 等。大概都是一个高层包装(wrap)另一个底层的关系。
  6. 编译成功后,需要用自己的 PGP 密钥签名 .dsc 和 .changes 文件。debuild 之类的工具可以代劳,也可以手动用 debsign 工具。
  7. 把软件包上传到源。例如 Launchpad PPA 就要求使用 dput 工具上传,而且只传源代码包,它们用自己的服务器编译并发布 deb 二进制包。

简单来说就是这些步骤,其实每一步用到的工具都很复杂,我研究了两天才顺利打好我需要的软件包,准备测试之后再发出来。绝对黑科技,敬请期待。

LeetCode 456: “132” Pattern

这个题是昨晚上回寝室前赶出来的,因为看到是Medium而且感觉不难(实际也不难)。

原题:https://leetcode.com/problems/132-pattern

题目意思是,判断一个数组是否包含子序列ai、aj、ak(i<j<k),满足ai<ak<aj

好像跟一般领奖台是反着的,但是可以直观理解题意。
好像跟一般领奖台是反着的,但是可以直观理解题意。

主算法

思路是遍历每一个可能的k值,然后找到它之前第一个(也就是距离最近的)比它大的数作为aj,然后再判断a0~aj-1之间的最小值是否小于ak

之所以要找到“第一个”比它大的数,理由也很明显:要尽量扩大找ai的范围,否则可能会有false negative的情况。

思路很简单,关键是如何做到O(n)。因为遍历每一个k的过程就是O(n)的,所以要求后面两个步骤都是O(1),也就是要预处理。预处理最小值很简单,开一个数组记录一下即可,空间复杂度O(n)。预处理“之前第一个比ak大的数(的位置)”就比较麻烦了,放在子算法里说,空间复杂度也是O(n)。

因此最终算法分三步:预处理a0~aj的最小值,预处理ak之前第一个大的数,遍历每一个可能的k。

子算法

要预处理每个数之前第一个比它大的数,用到了之前一个stack的奇技淫巧。方法是维护一个pair<数, 位置>的stack,其中数是递减的;对于数组从头到尾遍历,每个数先将栈顶比它小的数pop出去,如果pop完了栈空了,说明前面没有比它大的数,此时预处理结果为-1;否则预处理结果为栈顶数的位置。处理完之后将当前pair<数, 位置>也push进栈。

原理:因为要找到之前第一个比它大的数,如果栈顶的数比当前数还小,那对于它们之后的数来说,当前数更可能是比后面数大的结果,所以将栈顶pop出去。因为每个数最多进出栈一次,时间复杂度O(n)。

C++代码

class Solution {
public:
    bool find132pattern(vector<int>& nums) {
        if (nums.empty()) return false;
        int n = nums.size();
        vector<int> min1(n);
        min1[0] = nums[0];
        for (int i = 1; i < n; ++i) {
            min1[i] = min(min1[i-1], nums[i]);
        }
        typedef pair<int, int> pii;
        stack<pii> s;
        vector<int> max1;
        for (int i = 0; i < n; ++i) {
            int I = nums[i];
            while (!s.empty() && s.top().first <= I) s.pop();
            if (s.empty()) {
                max1.push_back(-1);
            } else {
                max1.push_back(s.top().second);
            }
            s.push(make_pair(I, i));
        }
        for (int i = n-1; i >= 0; --i) {
            int I = nums[i];
            int j = max1[i];
            if (j == -1 || j == 0) continue;
            if (min1[j-1] < I) return true;
        }
        return false;
    }
};

因为写的很赶,好多边界都没care,其实应该还有能优化的地方(因为运行时间才到beat 55%的位置……)

Magically “defeating” different CDN implementations

In this article I will show some different CDN implementations along with cases where each of them fails to bring the best performance. I am not a researcher in this area, so some of the points are based on my personal experiences.

1. Why people use CDN

Usually when people visit a website, their browsers first query the IP address from their ISP’s recursive DNS server, which in turn query the domain’s authoritative DNS. Then they will be connected to whatever IP address returned by the DNS which is usually distant (geographically far and has high latency) from them.

That is why people use CDN to solve this problem. By putting edge caching servers in different areas of the world (or in the target country) close to end-users, they can speed up the load time of their websites and improve user experience.

2. Traditional Geo-DNS setup

The most common and simple solution is to use a “Geo-DNS” service to point the same domain to different edge servers with different IP addresses (this is important). In this case, when people query the IP address of a domain, the authoritative Geo-DNS can point them to the nearest edge server, based on the users’ IP, their ISPs’ recursive DNS servers’ IP, or users’ IP provided by recursive DNS via EDNS Client Subnet (EDNS0) if supported.

This works fine in an ideal network setup, but it can easily fall apart. Here are some cases that I have come across:

(a) Unicast DNS

Sometimes Geo-DNS providers don’t use Anycast, instead provide Unicast IP addresses for different regions. The recursive DNS has no way of telling which one is closer to them, so it queries a random one, which can result in slow DNS resolution at first visit.

Example: DNSPod and CloudXNS (Popular Unicasted Geo-DNS providers in China)

DNSPod servers use ChinaNet, China Unicom and China Mobile unicasted IP addresses.
DNSPod servers use ChinaNet, China Unicom and China Mobile unicasted IP addresses.

(b) Bad GeoIP database

Some Geo-DNS providers don’t update their GeoIP database frequent enough, or just don’t have enough data.

Example:

(1) Amazon AWS CloudFront and Akamai don’t have servers in China for obvious reasons, but Chinese visitors are not consistently directed to nearest (Hong Kong, South Korea, Japan) locations. Sometimes a query from China can get a response of European locations, which results in ~500 ms latency.

Akamai directs ChinaNet users to Frankfurt, Germany when there are obviously better choices.
Akamai directs ChinaNet users to Frankfurt, Germany when there are obviously better choices.

(2) Some Geo-DNS providers in China, most notably Aliyun DNS. When both “Domestic” and “Global” records are set, they may direct Chinese users to “Global” servers.

(c) DNS different from network exit

Sometimes people may use recursive DNS servers in the network different from their actual network exit.

Example:

(1) In my university, we have mixed network exits, one in CERNET (AS4538) and one in TieTong (AS9394). Our recursive DNS has a CERNET address, so most Geo-DNS providers gives CERNET or (if the website doesn’t have CERNET servers) other networks’ addresses, for instance ChinaNet (AS4134). But our network exit is configured to use TieTong by default, so for most websites we are visiting ChinaNet servers with a TieTong network, even if they also have TieTong servers.

A more extreme case is that, in some networks they have different routing policies for TCP and UDP (which is a violation of OSI model), so when you do DNS query in UDP you have network A’s address, and when you actually connect to TCP port 80 you have network B. Magical? But true.

(2) Sometimes recursive DNS providers and/or Geo-DNS providers don’t support EDNS0. As long as either end doesn’t support it, it will not work. For instance, if user of open recursive DNS service “114DNS” (Anycasted across several Chinese networks) has a network that is not present in 114DNS’ Anycast network, and the authoritative Geo-DNS doesn’t support EDNS0, it will return the IP in the same network of 114DNS’ node, but different from the network of the user.

3. TCP Anycast setup

Some modern CDN providers use TCP Anycast technique, which means they provide a single IP address for their edge servers in multiple locations, and visitors are directed to the nearest location, decided by how they broadcast their routing tables to other networks.

Such providers include CloudFlare and MaxCDN, which use a single Anycasted IP for their edge servers across the planet. Verizon EdgeCast use a slightly different method where they provide several Anycasted IPs, each represent a geographical zone (Asia-Pacific, North America, South America, Europe).

A unified Anycasted IP solves many of the problems mentioned above, and it’s becoming harder and harder to defeat them. But here they comes:

(1) Magical routing policy (again)

Current Chinese IPv6 implementation has only one international exit (AS23911), which has two exit points: a default one in Los Angeles by HE.net (AS6939) and a premium one in Hong Kong (HKIX). When I resolve EdgeCast’s IPv6 address, I get one in 2606:2800:147::/48 network, which is Anycasted in Asia. But when I trace route to this address, the packet goes from China to Los Angeles and back to Asia, resulting in ~400 ms latency. Even if people use an Anycasted recursive DNS (like Google’s), since it has servers in Hong Kong, the result is the same. By querying the domain at OpenDNS (which doesn’t have Asian server) I get the IP in 2606:2800:11f::/48 network, which is Anycasted in North America, and the latency is only ~200 ms (same as the network exit’s).

Tracing route from AS23911 to EdgeCast's IPv6 edge servers in different continents.
Tracing route from AS23911 to EdgeCast’s IPv6 edge servers in different continents.

This only happens with EdgeCast’s “Continent-based Anycast” network. CloudFlare is not affected. But it has another kind of problem.

(2) Artificial routing deterioration

CloudFlare has edge servers everywhere, including Hong Kong, Taipei, Japan, South Korea, etc. which are all very close to Chinese users. But the major Chinese ISPs’ international exit routing policy directs CloudFlare traffic to Los Angeles (ChinaNet) and San Jose (China Unicom), where they are directed to the nearest edge servers in <3 hops. They did the same thing for Softlayer’s Hong Kong locations, for some magical reasons: maybe price, maybe [censored] ;). The latency from both ISP to CloudFlare’s US west locations are 200~300 ms, where with TieTong (which use Hong Kong as international exit) the value is <100 ms.

ChinaNet and Unicom users get 200~300 ms latency while China Mobile (TieTong) users get 80.
ChinaNet and Unicom users get 200~300 ms latency while China Mobile (TieTong) users get 80.

This is obviously not CloudFlare’s fault, because they cannot control the routing policy from another AS to themselves (unless they pay the other system to do so). If your ISP is doing this, switch to another ISP; If the whole country is doing this, maybe switch to another country?

Summary

To sum it up, when you and your customers’ networks don’t have any of the quirks above, a simple Anycasted Geo-DNS solution works fine – you don’t even need a commercial CDN service. But the real networks are hard, and so far a global TCP Anycast solution is the best we can do.

理解 React Native 的 FlexBox 排版模型

这两天尝试用 React Native 写了一个 APP,总体感觉是个好东西。目前没有 macOS 环境,没法测试 iOS 上的表现,但是安卓上是不错的。

在设计界面的过程中,如何正确排版、正确使用 FlexBox 模型就成了一个关键问题。其实这里的 FlexBox 跟 CSS3 中新加的那个 FlexBox (Flexible Box) 基本上是一样的,所以通过 MDN 上的这篇文章可以了解 FlexBox 的基础知识,例如主轴和侧轴、每个轴的元素排列等。另外,SegmentFault 上有一篇关于 React Native 布局的文章也不错,介绍了 FlexBox 排列元素时,遇到多个 flex 元素或有固定大小的(不 flex 的)元素,会怎样安排他们的大小。

假定读者阅读了上面两篇文章,明白了基础知识。接下来我结合 APP 中的实例来讲一下使用 FlexBox 过程中遇到的问题、解决方法和对 FlexBox 模型的理解。

考虑如图所示的 UI,上面是一个固定高度的标题栏,其中包含左右各一个固定大小的按钮,中间的文字填满剩余空间;下面是很长的内容,可以上下滚动,滚动时标题栏一直在最上面不动。

首先整个界面的顶层 <View> 标签,其 style 应该包括:{flex: 1, flexDirection: 'column',} ,这样内部元素就会上下排布。标题栏我额外定义了一个 Header 类(class Header extends Component {} ),因此 <View> 的第一个子元素就是 <Header style={...}>(标题栏子元素)</Header> 。这里要注意的是,给 Header 定义 style 本身没有任何效果,应该在 Header 类的 render() 方法中,将其实际顶级元素的 style 附加上 this.props.style,即 <View style={[header_styles.basic, this.props.style]}>...</View>

然后看 <Header> 对应的 <View> 的 style。由于 <Header> 本身是固定高度的,因此可以肯定它不是 {flex: 1} 的。但是我们又想使用 FlexBox 中的排版功能,对标题栏中的元素横向排版,所以在顶层 <View> 中再嵌套一层 <View>,它的 style 应该是这样的:

container: {
  flex: 1,
  flexDirection: 'row', // 子元素横向分布
  justifyContent: '', // 这个无所谓,因为子元素要占满整行
  alignItems: 'center', // 竖向居中线排列
},

接下来看内层 <View> 的子元素。首先是左上角、右上角的图标,这里我用的是 react-native-vector-icons,就是两个 <Icon> 元素,它们都是固定大小的(其实就是一个特殊字符),因此我们需要中间的标题 <Text style={{flex: 1}}> 以填满剩余空间。这里就看出来,{flex: 1} 的意思就是这个元素的长宽可以任意放大(或缩小),以适应排版的需要。

这样 <Header> 部分就搞定了,接下来看正文的滚动部分,这里用的是 React Native 的 <ScrollView> 元素。通过文档中看出,ScrollView 是由一个外层短元素包含一个内层长容器,外层短元素可以是固定高度(height)的或 flex 填满剩余高度的,但内层容器应该是内容的实际高度,而不应该是 flex 的。这里文档写的很迷茫(“把 flex 从视图栈向下传递”,并没有看明白哪里是栈、哪里是下……),我是踩了很多坑才搞明白这是什么意思。主要代码如下:

<ScrollView style={{flex: 1}}
    contentContainerStyle={{...}}>
  <View>...很长的内容...</View>
</ScrollView>

这个 <ScrollView> 在顶层 flex <View> 中,将其设为 {flex: 1} 即可填满除 <Header> 以外的剩余屏幕高度。但是 contentContainerStyle 控制的内层样式绝不应该有 {flex: 1},否则会被缩小到与外层元素一样高,裁剪掉多余的内容,因此无法实现滚动。这里与 <Header> 中类似,内部长内容可能还有更复杂的排版(比如进一步上下/左右分割),所以可以在 <View> 子元素的 style 上下功夫,比如设为 {flex: 1, flexDirection: 'row',} 即可将内部元素再左右排列。

下面提供一些我的 APP 中实现上述布局的代码(省略了逻辑和内容,只保留排版相关的代码,最终效果类似于 Material Design),供读者参考:

class Header extends Component {
  render() {
    return <View style={header_styles.header}>
      <View style={header_styles.container}>
        {this.props.children}
      </View>
    </View>;
  }
}

const header_styles = StyleSheet.create({
  header: {
    backgroundColor: "#2196F3",
    height: 60,
    paddingLeft: 20,
    paddingRight: 20,
    shadowRadius: 2,
    shadowOffset: {width:0, height:2},
    shadowOpacity: 0.7,
    shadowColor: 'black',
    elevation: 2,
  },
  container: {
    flex: 1,
    flexDirection: 'row',
    flexWrap: 'nowrap',
    justifyContent: 'flex-start',
    alignItems: 'center',
  },
});


class Single extends Component {
  render() {
    return <View style={single_styles.container}>
      <Header>
        <TouchableOpacity>
          <Icon
            name="arrow-left"
            size={20}
            color="#E3F2FD"
          />
        </TouchableOpacity>
        <Text style={single_styles.title} numberOfLines={2}>
          ...
        </Text>
        (注意:我的标题栏中只用了左侧图标,没有右侧图标。)
      </Header>
      <ScrollView style={single_styles.scroll}
        contentContainerStyle={single_styles.scroll_container}>
        <View>
          ...
        </View>
        <View style={single_styles.list}>
          <SingleLeft />
          <SingleRight />
        </View>
      </ScrollView>
    </View>;
  }
}

const single_styles = StyleSheet.create({
  container: {
    flex: 1,
  },
  scroll_container: {
  },
  list: {
    flexDirection: 'row',
    alignItems: 'flex-start',
  },
  scroll: {
    flex: 1,
  },
  title: {
    fontSize: 20,
    color: '#E3F2FD',
    marginLeft: 10,
    flex: 1,
    flexWrap: 'wrap',
  },
});

 

第二次自己做字幕,这次是脱口秀

Wow,离第一次做美剧字幕已经过去接近五年了,简直是时间飞逝……

Update: 后来又做了几集不同的视频字幕,也把链接放在这里:

  • Last Week Tonight S03E13 – AcFun, BiliBili(需登陆)
  • 柯南秀:明星填问卷 2016.06.16 – BiliBili
  • 鸡毛秀街头采访:川普粉丝有多死心塌地? – BiliBili
  • 赛金花深夜秀:近距离观察 – 川普接受党内提名 – BiliBili

更多视频可以参见我的 AB 站个人主页:

我已经加入阿尔法小分队字幕组,主要做 Last Week Tonight 的字幕。小分队主页:http://space.bilibili.com/60058


前两天又找到一个空闲时间,于是做了一集美国脱口秀的字幕,剧集是 Last Week Tonight with John Oliver(上周今夜秀)的 S03E13。这次只做了中文字幕,因为实在懒得把英文都打出来。把30分钟脱口秀的中文字幕输入到编辑器里,花了大概10个小时,之后做轴和压制大概各1小时。

这次没有传别的地方,只传了AB站。下面是B站的Flash播放器外链,也可以点击这里去B站看。(外链请进入全文查看) Continue reading “第二次自己做字幕,这次是脱口秀”