How the “Scan API” added in xiaodu-jsdelivr 1.3 works

I released the WordPress plugin “xiaodu-jsdelivr” months ago, which can be used to scan and replace references to static resources with jsDelivr CDN links. Details on how the plugin works can be found in the previous blog post. Yesterday, I released a new version 1.3, which contains a new feature called “Scan API”.

API Manager: https://xiaodu-jsdelivr-api.du9l.com/

What it is and why it is needed

“Scan API” is a hosted service, which can provide plugin users with pre-calculated scan results. Previous versions of the plugin used a more direct approach: Calculate local file hash, fetch CDN file and match their hashes. This obviously works, but it does slowly, because downloading remote files is a time-consuming job, which is the reason that initial scans after installing the plugin are always slow. Usually it took dozens of 30-second scanning sessions to complete an initial scan with the base WordPress, several plugins and themes. When you think about the fact that each website of all users has to go through the same process (there’s no import-export feature yet), it makes even less sense.

This is where the hosted API service becomes helpful. By pre-fetching and storing the hashes of WordPress and official plugins and themes (with all versions), and serving them to a client plugin when needed, the repetitive fetching and calculating can be avoided. That means the scanning process can be greatly accelerated, as long as the resources scanned are present in the API storage.

Flow chart of old and new processes

Current state and future development

The ultimate goal for the service is to provide scanning support for base WordPress versions, plugins and themes. As of now, both the service and plugin only implemented the first part, which is to provide hashes for all published (on GitHub) versions of WordPress. That means the client only uploads WordPress version, not plugin or theme versions; and the service only scans WordPress repository.

The remaining parts will be incrementally added in the future, which I have to think carefully. There are over 100,000 themes and plugins (combined) in the official SVN repositories, and it’s unrealistic to scan and store them all. So for the first future development goal may be to support the most popular of each category.

Also, as of right now the service is completely free. In the near future I don’t have a viable plan to charge users for the service, because with payments come payment gateway integrations and support requests. But I cannot guarantee that it will stay free forever – it may go paid or it may go away.

Technical details

Plugin users can stop reading here, because the following part is about technical details on how the service is built.

The API service is essentially a Python website built with Flask framework, with these main components:

  1. Authentication: This is provided by Auth0 (free plan), chosen for being relatively easy to use and the vast amount of login channels supported. It provides templates for a variety of languages, frameworks and applications, which can be downloaded and modified to fit the basic authentication needs. In my case (Python + Flask), Authlib is used in the template to provide the OAuth 2 functionality.
  2. Web UI: Written with React + React Router. It is not strictly necessary to use React, or even to build a single-page application, but I chose this path to get my skills up to date. For example, create-react-app is quite easy to use for creating a TypeScript-based project, while in the old days one had to configure TypeScript and Babel themselves. Also, React hooks is an interesting recent addition.
  3. API: There are two parts, one is the actual “Scan API” for the client plugin to query for stored data. The other is the backend API for the Web UI to manage API keys. MongoDB is used as permanent store for both scanning and user API key data.
  4. Scanner: Workers that regularly download and calculate remote hashes. A task queue (Celery) is used to manage scanning tasks, and the downloading part is handled by GitPython (later maybe PySVN will also be used).

The whole project is deployed in my private Kubernetes cluster, using Jenkins to build the frontend and backend and push the built images to an in-cluster docker registry. In the process of building the service, I am constantly amazed by how web development has evolved nowadays, with a lot of great tools and libraries available.

13 thoughts on “How the “Scan API” added in xiaodu-jsdelivr 1.3 works

      1. 1794 seconds ago – Failed; Code: 429; Error: (HEAD) api key used too frequently

        用了API,的确是会快很多,如果不用,两天了都没扫描完。

        1. 每个 API Key 设置了 24 小时内只能获取一次,如果有多个网站可以多生成几个 Key 分别用,不知道是这个原因不?

  1. 142 seconds ago – Failed; Code: 429; Error: (HEAD) api key used too frequently
    装了16个插件,扫描没有完成过,然后每天都执行一次API,经常都出错

    1. 目前 API 里没有插件相关的数据;正常情况下,只有非官方插件会扫描不到,然后会有一个失败自动跳过的机制,没想出为啥会一直完成不了……建议打开设置项中的“Randomized scan order”,可以提高其它资源扫描的成功率。

      访问 API 报 429,如果不是多个网站共用 key,那还有一种可能就是 API 服务器连接太慢,导致结果没收到,但是服务器也算用了一次。这个我可以加一个 API 超时的设置项来避免一下。

  2. 非常感谢作者提供的这个插件,我在chrome 上登录”API Keys” page.创建密钥的时候是无法显示的,手机上登录这个页面倒是能显示。
    而且输入正确的Key ,也提示密钥输入过于频繁,不知道这些问题是不是个例,能不能修复。

    1. 如果创建 Key 成功之后没法显示 Key 和 Secret,听起来像是个 Bug。不过可以通过 Reset secret 重置并显示新的 Secret —— 这个的确是每次重置都会变的,类似于“重置密码”的逻辑,不会显示上次的 Secret,而是生成并显示一个新的。

      提示 429 的通常原因是 WordPress 访问 API 之后超时没有拿到结果,但是 API 服务器却记录了一次使用,导致下次再报错。可以尝试用最新版插件,提高 API 访问超时的时间(Scan API – Options – API timeout)试一下。

发表评论

邮箱地址不会被公开。 必填项已用*标注