How the “Scan API” added in xiaodu-jsdelivr 1.3 works

I released the WordPress plugin “xiaodu-jsdelivr” months ago, which can be used to scan and replace references to static resources with jsDelivr CDN links. Details on how the plugin works can be found in the previous blog post. Yesterday, I released a new version 1.3, which contains a new feature called “Scan API”.

API Manager: https://xiaodu-jsdelivr-api.du9l.com/

What it is and why it is needed

“Scan API” is a hosted service, which can provide plugin users with pre-calculated scan results. Previous versions of the plugin used a more direct approach: Calculate local file hash, fetch CDN file and match their hashes. This obviously works, but it does slowly, because downloading remote files is a time-consuming job, which is the reason that initial scans after installing the plugin are always slow. Usually it took dozens of 30-second scanning sessions to complete an initial scan with the base WordPress, several plugins and themes. When you think about the fact that each website of all users has to go through the same process (there’s no import-export feature yet), it makes even less sense.

This is where the hosted API service becomes helpful. By pre-fetching and storing the hashes of WordPress and official plugins and themes (with all versions), and serving them to a client plugin when needed, the repetitive fetching and calculating can be avoided. That means the scanning process can be greatly accelerated, as long as the resources scanned are present in the API storage.

Flow chart of old and new processes

Current state and future development

The ultimate goal for the service is to provide scanning support for base WordPress versions, plugins and themes. As of now, both the service and plugin only implemented the first part, which is to provide hashes for all published (on GitHub) versions of WordPress. That means the client only uploads WordPress version, not plugin or theme versions; and the service only scans WordPress repository.

The remaining parts will be incrementally added in the future, which I have to think carefully. There are over 100,000 themes and plugins (combined) in the official SVN repositories, and it’s unrealistic to scan and store them all. So for the first future development goal may be to support the most popular of each category.

Also, as of right now the service is completely free. In the near future I don’t have a viable plan to charge users for the service, because with payments come payment gateway integrations and support requests. But I cannot guarantee that it will stay free forever – it may go paid or it may go away.

Technical details

Plugin users can stop reading here, because the following part is about technical details on how the service is built.

The API service is essentially a Python website built with Flask framework, with these main components:

  1. Authentication: This is provided by Auth0 (free plan), chosen for being relatively easy to use and the vast amount of login channels supported. It provides templates for a variety of languages, frameworks and applications, which can be downloaded and modified to fit the basic authentication needs. In my case (Python + Flask), Authlib is used in the template to provide the OAuth 2 functionality.
  2. Web UI: Written with React + React Router. It is not strictly necessary to use React, or even to build a single-page application, but I chose this path to get my skills up to date. For example, create-react-app is quite easy to use for creating a TypeScript-based project, while in the old days one had to configure TypeScript and Babel themselves. Also, React hooks is an interesting recent addition.
  3. API: There are two parts, one is the actual “Scan API” for the client plugin to query for stored data. The other is the backend API for the Web UI to manage API keys. MongoDB is used as permanent store for both scanning and user API key data.
  4. Scanner: Workers that regularly download and calculate remote hashes. A task queue (Celery) is used to manage scanning tasks, and the downloading part is handled by GitPython (later maybe PySVN will also be used).

The whole project is deployed in my private Kubernetes cluster, using Jenkins to build the frontend and backend and push the built images to an in-cluster docker registry. In the process of building the service, I am constantly amazed by how web development has evolved nowadays, with a lot of great tools and libraries available.

xiaodu-jsdelivr: WordPress Plugin To Scan and Serve Static Files From jsDelivr CDN

I created a plugin called “xiaodu-jsdelivr”, which automatically scan and replace references to WordPress static files, including plugins and themes that can be found in the official repository, with their canonical jsDelivr CDN addresses.

The plugin has been uploaded to the official WordPress plugin category. You can search for “xiaodu-jsdelivr” in the plugin installer, or download the ZIP archive here to install.

How it works

I downloaded and tested some of the existing WordPress plugins with jsDelivr replacing feature, and noticed that most of them employ a passive approach, which is to wait for visitors to request some static files, and then search the file hash against jsDelivr’s lookup API.

My plugin instead uses a more proactive approach. It starts by scanning static files directly from the WordPress installation directory, instead of waiting for visitors’ requests. To do this, I used the official WP-Cron scheduler to perform scans in the background at fixed intervals.

Then it will calculate local file hashes and compare them with known URL patterns (both WordPress itself and its official plugins and themes are supported), and matched URLs will be recorded and later used to replace local file references when they are requested. This is more reliable than just using the lookup API, because it can match against files that have never been visited by anyone, thus cannot be discovered simply by looking up hashes.

Here is a screenshot of what wp-content references can transform into with the plugin enabled:

Demo of plugin

Note the four different kinds of successful scan results shown in the screenshot above: Base WordPress files (line #32 – #35), an official plugin (#36), an official theme (#38 and #42) and custom files that the fallback hash lookup successfully found (#37).

Why jsDelivr

There are a handful of popular and reliable static file CDNs available. I think other plugin developers and I probably chose jsDelivr for the same reasons:

  • It has native WordPress support, namely https://cdn.jsdelivr.net/wp/plugins/ and https://cdn.jsdelivr.net/wp/themes/ that point to official plugin and theme SVN repositories. Other static CDNs mostly just load from GitHub and/or NPM.
  • It is more reliable in Mainland China, thanks to the QUANTIL (ChinaNetCenter / WangSu) nodes it uses for the Chinese visitors. Their overall visitor performance is better than their alternatives. The alternatives usually perform pretty bad in PRC.

Future development

The latest stable version will always be published to the WordPress plugin category. The plugin is GPLv2 (or later) licensed, and the first version (1.0) comes with full functionality and all the features mentioned above.

Future development will be carried out in the public GitHub repository.

DNS as Configuration / Code with DNSControl

Managing DNS for a domain name traditionally involves visiting the control panel of your DNS authoritative server providers to create, modify or delete the related records there. But I recently discovered a new project, DNS Control by Stack Overflow, which allows one to manage DNS records by modifying JavaScript configuration files, similar to the ways Kubernetes and Ansible work in.

A simple illustration of how DNSControl works.

Why did I switch?

In my experience, the main advantages of DNSControl, or rather the workflow it promotes, are the following:

  • Support for different authoritative DNS providers: It is no longer needed to visit the control panels of different providers. The configuration is provider-agnostic, and can be applied to different or even multiple DNS providers, which allows administrators to easily migrate between providers or mix and use servers from different providers simultaneously.
  • Specify the state instead of actions: This is analogous of managing infrastructure using Ansible vs manually. Only the final state is specified in the configuration file, and the software takes care of adding or modifying records and deleting unnecessary ones.
  • Use script to simplify records description: A basic version of JavaScript can be used in describing the DNS records, which can reduce repetition and ease the complexity of modifications. For example, variables (or constants) and functions can be used to generate similar DNS records in batch.

I will briefly introduce my new workflow for migrating and managing DNS below, in order to show you how it can be done.

Migrating existent zones

The first step of switching to the new workflow, is to export and migrate the existent DNS zones from the current providers into the configuration file.

If you are like me who have dozens of records in the old DNS control panel, and you simply don’t want to copy-paste everything by hand, DNSControl has a “get-zones” sub-command that can be used in this situation. You can read the official documentation about migration, and the steps I used are:

  1. In order to read from the current provider, credentials must be generated and provided in the creds.json file. The methods vary by provider, which can be found in their respective pages. For example, CloudFlare only requires an API token with sufficient permissions to access and modify zone records.
  2. With creds.json filled out and saved to the current directory, the following command can be executed to export current records of a specific zone:
    dnscontrol get-zones --format=js --out=dnsconfig.js <creds-name> <PROVIDER-IDENTIFIER> your-domain.tld

    1. The software is written in Go, so they provide static binaries in GitHub release page.
    2. creds-name is the key used in creds.json, and PROVIDER-IDENTIFIER can be found in the “Identifier” column in the provider table.
  3. Now dnsconfig.js should contain all your existent records, and you can optimize the script using JavaScript variables and functions. Note that they use a simple JavaScript interpreter, so please only use the simplest features of the language. (You will know what not to use in the testing steps below.)

Updating DNS records

In order to create or update DNS records for a domain, one should first edit dnsconfig.js by modifying the arguments or variables (if created in the previous part) that belongs to the domain in question. Then, in order to make sure that the JavaScript syntax is correct and all the changes are indeed desired, use the preview sub-command to compare the changes to the existent records online. Finally, when everything checks out, use dnscontrol push to apply the changes.

To further automate the workflow, I personally use a Git repository to version-control my dnsconfig.js configuration, and Jenkins to perform the steps above. My creds.json is kept private in Jenkins’ “Credentials” area, and mounted into the pipeline environment during execution. In this way, I can commit and push my DNS configuration to the Git server, and Jenkins will automatically check and apply the changes.

Supported providers

As of the time of writing this article, the following DNS providers are supported by DNSControl:

  • ActiveDirectory_PS
  • AXFRDDNS
  • Azure DNS
  • BIND
  • Cloudflare
  • ClouDNS
  • deSEC
  • DigitalOcean
  • DNSimple
  • Gandi_v5
  • Google Cloud DNS
  • Hurricane Electric DNS
  • Hetzner DNS Console
  • HEXONET
  • INWX
  • Linode
  • Microsoft DNS Server (Windows Server)
  • Name.com
  • Namecheap Provider
  • Netcup
  • NS1
  • Oracle Cloud
  • Ovh
  • PowerDNS
  • Route 53
  • SoftLayer DNS
  • Vultr

In addition, the following registrars are supported, which allow users to modify the domains’ NS records to point to the providers above:

  • CSC Global
  • DNSimple
  • DNS-over-HTTPS
  • Gandi_v5
  • HEXONET
  • Internet.bs
  • INWX
  • Name.com
  • Namecheap Provider
  • OpenSRS
  • Ovh
  • Route 53

And even if your current provider is not covered, you can easily add your own integration and possibly contribute to the upstream.

Solution: After switching to SyntaxHighlighter Evolved, all the codes are scrambled

tl;dr

If you switched from another syntax highlighting plugin to SyntaxHighlighter Evolved, and all your codes are scrambled, try running the following code as a single-file plugin.

function xiaodu_syntaxhighlighter_fix() {
	return 2;
}
add_filter('syntaxhighlighter_pre_getcodeformat', 'xiaodu_syntaxhighlighter_fix');

The easiest way is to go to Plugins – Plugin Editor, and paste the code at the bottom of any enabled plugin, maybe Hello Dolly.

Long version

After almost three years I finally started working on this blog again.

One of the first things I noticed is that the syntax highlighting plugin I used, Crayon Syntax Highlighter, is dead. Well, to its credit, the server-rendered markups it generated were… fine when I started this blog, but nowadays they just look sickening to me, especially when compared to the neat client rendering solutions.

So I went to the store and downloaded the most popular choice, SyntaxHighlighter Evolved, which uses the JavaScript library SyntaxHighlighter to perform client-side highlighting. After installing it and converting all my old code tags to their markup format, I found the highlighting to be working, but all the C++ and HTML looked screwed up.

Scrambled code
Scrambled code

As you can see, all the “<“, “>” and “&” in the code are now showing up as their HTML entities – “&lt;”, “&rt;” and “&amp;”. That is not cool, so I looked into the problem.

Looking under the hood

First thing we need to know is how the code is stored in the post. When I click on the “Text” tab in the post editor (yep, the old one… I haven’t adapted to the new blocks yet,) I found that the characters are displaying correctly.

Code in post editor
Code in post editor

Then I looked further into MySQL, and the code is stored encoded, which is fine – it can be stored in the database however it fits, as long as the final output is correct… which it isn’t.

Code in MySQL
Code in MySQL

Pinpointing the plugin code

Now that I know that the plugin did an extra encoding, I looked for “htmlspecialchars” in the plugin’s GitHub repository, and found this piece of code:

	// This function determines what version of SyntaxHighlighter was used when the post was written
	// This is because the code was stored differently for different versions of SyntaxHighlighter
	function get_code_format( $post ) {
		if ( false !== $this->codeformat )
			return $this->codeformat;
		if ( empty($post) )
			$post = new stdClass();
		if ( null !== $version = apply_filters( 'syntaxhighlighter_pre_getcodeformat', null, $post ) )
			return $version;
		$version = ( empty($post->ID) || get_post_meta( $post->ID, '_syntaxhighlighter_encoded', true ) || get_post_meta( $post->ID, 'syntaxhighlighter_encoded', true ) ) ? 2 : 1;
		return apply_filters( 'syntaxhighlighter_getcodeformat', $version, $post );
	}
	// Adds a post meta saying that HTML entities are encoded (for backwards compatibility)
	function mark_as_encoded( $post_ID, $post ) {
		if ( false == $this->encoded || 'revision' == $post->post_type )
			return;
		delete_post_meta( $post_ID, 'syntaxhighlighter_encoded' ); // Previously used
		add_post_meta( $post_ID, '_syntaxhighlighter_encoded', true, true );
	}

Apparently years ago they changed how codes are stored in the post. Now if you write and save a new post with their plugin installed, they will save the code already encoded, and insert a post meta “_syntaxhighlighter_encoded = True” to mark the post as the “new (encoded) format”.

But if you are like me who used other plugins when initially posting the code and later switched to Evolved, you are in bad luck, as they consider your post by default the “old format,” and will encode the code again in the final output.

Solution

The apparent solutions it to make the plugin think that all my posts are in the new format. I could add the same metadata to each of the posts, but luckily there is a easier way: Use the filter “syntaxhighlighter_pre_getcodeformat” (line 8 in the code above) they provided to override the result.

So I used the plugin code at the beginning to hook it. The hook function simply returns 2, which means all my posts, with or without the metadata, will be considered the already-encoded new format, so they will not be doubly encoded.

It’s been years, is that all you have to say?

OK, fair enough. So this blog may look the same, but the tech underneath it is constantly changing.

For example, all my appliances are now hosted on my own bare-metal (as opposed to cloud-vendored like GKE) globally-distributed Kubernetes cluster. Also, I have been hiding behind CloudFlare for years to avoid the haters, but now they are mostly gone (or grown up 🙄,) so I have been thinking of new ways to distribute my content.

All of these new stuff are exciting and worth sharing, and I will write about them soon™.

emlog 更新至 5.1.2 出现的问题 @emlog博客

今天 emlog 博客系统发布了 5.1.2 版本,在更新过程中遇到的问题来总结一下。

1. 从 5.0.1 升级时,执行“up5.0.1to5.1.1.php”升级程序,提示“错误操作:您必须完成升级步骤里的第一步才再进行本操作,详见安装说明”。

解决方法:因为我直接用了 5.1.1 和 5.1.2 的补丁,没有在更新到 5.1.1 之后就升级数据库。将升级程序 (up5.0.1to5.1.1.php) 的 105 行注释掉或删除即可。(请确保是从 5.0.1 升级的,不然要先执行之前的升级程序。)

2. 旧的“代码高亮”插件与新版自带功能冲突,导致“写文章”页面出错,无法载入编辑器。

解决方法:在后台管理页面的“插件”中删除“代码高亮”插件即可。新版 KindEditor 已经带了 prettyprint 代码高亮功能。

3. [W3C Validator 规范错误] (默认主题) 底部 <script>prettyPrint();</script> 缺少 type 属性。

解决方法:修改为 <script type=”text/javascript”> 即可。(位于 content/templates/default/footer.php)

4. [W3C Validator 规范错误] (默认主题) 右侧边栏的 <li> 标签位置错误。

解决方法:在 content/templates/default/module.php 的 79 行后面,插入新行内容 </li> 。应该是作者的小失误吧。

以上所有修改过的文件,可以点击这里下载

由于 emlog 当前没有真正的开源(版本库是托管于 bitbucket 的私有库),<更正>emlog 官微告诉我现在已经开源了,源码在GitHub,不过貌似没有之前版本的提交记录了</更正>,我在本地维护了一个修改版的 emlog 程序,包括一些默认模板和代码的修改,修改后的页面都符合 W3C 标准(可以测试本博客首页和大部分文章,有些文章内容中有错误除外)。如果有兴趣的话,可以联系我索取该版本或直接 pull 我的版本库。