Why hg.mozilla.org is Slow - 彷徨えるフジワラ

This entry is a Japanese translation of Gregory Szorc's "Why hg.mozilla.org is Slow". Thank you for permitting to translate the blog entry, Gregory !!

Gregory Szorc 氏は、Mozilla プロジェクトにおける大規模リポジトリでの運用経験を元に、性能改善等に関する多くの修正で、Mercurial の開発に貢献しています。

このエントリでは、Szorc 氏のブログにおける "Why hg.mozilla.org is Slow" を翻訳したものを、Szorc 氏の許可の元に公開します。

少々冗長ではありますが、翻訳との対応がわかりやすいように、段落毎に原文も併記しています。

なお、文中で度々登場する "pushlog" とは、Mozilla プロジェクトのインフラで使用されている Mercurial のエクステンションのことです。Mercurial 本体に取り込まれた修正の中でも言及されていますが、hg push 操作によるリポジトリへの履歴反映の内容を、SQLite でも記録する仕組みとのこと。

Mozilla's pushlog extension (an extension that opens a SQLite database and tries to tie its transaction semantics to Mercurial's transaction)

hg.mozilla.org が遅いのはなぜか (Why hg.mozilla.org is Slow)

At Mozilla, I often hear statements like Mercurial is slow. That's a very general statement. Depending on the context, it can mean one or more of several things:

Mozilla プロジェクトにいると、"Mercurial が遅い" といった声が度々聞かれます。しかし、"遅い" というのは、非常に大雑把な表現です。文脈によって、以下のいずれか、あるいは複数の組み合わせを意味します。

My Mercurial workflow is not very efficient

hg commands I execute are slow to run

hg commands I execute appear to stall

The Mercurial server I'm interfacing with is slow

Mercurial を使った (ユーザ毎の) 作業手順が非効率
hg コマンドの実行自体が遅い (※ 起動速度 ?)
hg コマンドの実行が止まっているように見える (※ 実行速度 ?)
連携先の Mercurial リポジトリサーバ (からの応答) が遅い

I want to spend time talking about a specific problem: why hg.mozilla.org (the server) is slow.

以下、長くなりますが、"リポジトリサーバ hg.mozilla.org が遅いのはなぜか" について、説明しようと思います。

遅くないものは何か (What Isn't Slow)

If you are talking to hg.mozilla.org over HTTP or HTTPS (https://hg.mozilla.org/), there should not currently be any server performance issues. Our Mercurial HTTP servers are pretty beefy and are able to absorb a lot of load.

HTTP/HTTPS 経由での hg.mozilla.org アクセス (https://hg.mozilla.org/) に関しては、現時点では性能に関する問題はありません。Mozilla における HTTP 経由での Mercurial リポジトリへのアクセスは、大量の負荷であっても十分さばき切れます。

If https://hg.mozilla.org/ is slow, chances are:

もしも https://hg.mozilla.org/ からの応答が遅いとすれば、以下の様な理由が考えられます:

You are doing something like cloning a 1+ GB repository.

You are asking the server to do something really expensive (like generate JSON for 100,000 changesets via the pushlog query interface).

You don't have a high bandwidth connection to the server.

There is a network event.

1GB 超のリポジトリを複製しようとしている
サーバへのリクエストがそもそも負荷の掛かる処理である (例: 100,000 リビジョン分の JSON データの要求)
サーバへの接続経路のネットワーク帯域幅が広くない
ネットワーク障害が発生している

これまでに発生したネットワーク障害 (Previous Network Events)

There have historically been network capacity issues in the datacenter where hg.mozilla.org is hosted (SCL3).

hg.mozilla.org をホスティングしているデータセンタ (SCL3) では、歴史的にも、ネットワーク帯域幅が問題でした。

During Mozlandia, excessive traffic to ftp.mozilla.org essentially saturated the SCL3 network. During this time, requests to hg.mozilla.org were timing out: Mercurial traffic just couldn't traverse the network. Fortunately, events like this are quite rare.

Mozlandia の期間中 (※ 訳注: Mozlandia は Mozilla 主催のイベント？) は、ftp.mozilla.org への多大なアクセスによって、SCL3 のネットワークが飽和状態になってしまいました。この状況での hg.mozilla.org へのアクセスは、タイムアウトしてしまいます。そもそも Mercurial リポジトリへの通信ができなかったのです。幸いな事に、このような障害の発生は極めて稀です。

Up until recently, Firefox release automation was effectively overwhelming the network by doing some clownshoesy things.

Firefox のリリース自動化機構は、不適切な設定等によって、最近までネットワークに大変な負荷を掛けていました。

For example, gaia-central was being cloned all the time We had a ~1.6 GB repository being cloned over a thousand times per day. We were transferring close to 2 TB of gaia-central data out of Mercurial servers per day

例えば、gaia-central リポジトリが毎回 clone されるといった問題がありました。Mozilla の持つ 1.6GB 程度のリポジトリが、一日当たり千回以上 clone されていたのです。gaia-central の Mercurial サーバから、一日当たり 2TB 近くのデータを転送していたことになります。

We also found issues with pushlogs sending 100+ MB responses.

pushlog が100MB 超の応答を送信する、という問題もありました。

And the build/tools repo was getting cloned for every job. Ditto for mozharness.

build/tools のリポジトリが、ビルドのジョブ毎に毎回 clone される問題もありました。mozharness も同様です。

In all, we identified a few terabytes of excessive Mercurial traffic that didn't need to exist. This excessive traffic was saturating the SCL3 network and slowing down not only Mercurial traffic, but other traffic in SCL3 as well.

いずれのケースでも、本来は必要ない筈の Mercurial サーバへの通信が、数TB規模で発生していることが判明しました。このような通信によって、SCL3 のネットワークが飽和してしまい、Mercurial サーバを含む SCL3 全体の通信速度が低下していたのです。

Fortunately, people from Release Engineering were quick to respond to and fix the problems once they were identified. The problem is now firmly in control. Although, given the scale of Firefox's release automation, any new system that comes online that talks to version control is susceptible to causing server outages. I've already raised this concern when reviewing some TaskCluster code. The thundering herd of automation will be an ongoing concern. But I have plans to further mitigate risk in 2015. Stay tuned.

幸い、リリースエンジニア陣が、問題を見つけるやいなや、素早い対応をしてくれました。現在、この手の問題は抑止されています。とは言え、Firefox のリリース自動化の規模を考えると、履歴管理と連携する新規システムが今後導入される際に、同様の問題の原因となる可能性は十分あります。私が TaskCluster コードをレビューする際には、この類の点に留意するようにしています。非常に大規模な自動化に関しては、今後も配慮が必要です。しかし 2015 年には、このような問題の発生を緩和する対策を実施する予定ですので、ご期待ください。

Looking back at our historical data, it appears that we hit these network saturation limits a few times before we reached a tipping point in early November 2014. Unfortunately, we didn't realize this because up until recently, we didn't have a good source of data coming from the servers. We lacked the tooling to analyze what we had. We lacked the experience to know what to look for. Outages are effective flashlights. We learned a lot and know what we need to do with the data moving forward.

過去のデータを振り返ると、このようなネットワークの飽和は、2014-11 初旬の要因特定以前にも、数回発生していたものと思われます。サーバからのデータ採取が不適切であったため、不幸にも最近までこのような状況を把握できませんでした。状況分析のための道具立てや、調査対象を絞り込む経験も不足していました。サーバの機能停止は、問題点を照らし出します。刻々と移り変わるデータに対して何をすべきか、多くの事を学びました。

利用可能なネットワーク帯域 (Available Network Bandwidth)

One person pinged me on IRC with the comment Git is cloning much faster than Mercurial. I asked for timings and the Mercurial clone wall time for Firefox was much higher than I expected.

あるユーザから、「Git の clone は、Mercurial よりも大変早い」と言及されたことがあります。状況を確認してみたところ、Firefox リポジトリの Mercurial による clone の所要時間が、私の想定を大きく上回っていました。

The reason was network bandwidth. This person was performing a Git clone between 2 hosts in EC2 but was performing the Mercurial clone between hg.mozilla.org and a host in EC2. In other words, they were partially comparing the performance of a 1 Gbps network against a link over the public internet! When they did a fair comparison by removing the network connection as a variable, the clone times rebounded to what I expected.

このケースでの原因は、ネットワークの帯域幅でした。当該ユーザは、Git の clone を EC2 上の2ホスト間で実施する一方で、Mercurial の clone は hg.mozilla.org とEC2 上のホストの間で実施していたのです。言い換えれば、「1Gbps のローカルネットワーク内接続」と「公衆ネット経由の接続」の間で、ネットワーク性能の不公平な比較を行っていたわけです。ネットワーク接続条件を統一した上での比較では、期待通りの性能が計測できました。

The single-homed nature of hg.mozilla.org in a single datacenter in northern California is not only bad for disaster recovery reasons, it also means that machines far away from SCL3 or connecting to SCL3 over a slow network aren't getting optimal performance.

北カリフォルニアの単一のデータセンタにおいて、hg.mozilla.org が single-home 運用されていることは、災害時対応における問題以外にも、「SCL3 から遠い」ことや、「遅い回線経由でしか SCL3 に接続できない」ことで、適切な性能が得られない問題の原因にもなります。

In 2015, expect us to build out a geo-distributed hg.mozilla.org so that connections are hitting a server that is closer and thus faster. This will probably be targeted at Firefox release automation in AWS first. We want those machines to have a fast connection to the server and we want their traffic isolated from the servers developers use so that hiccups in automation don't impact the ability for humans to access and interface with source code.

地理的に分散された hg.mozilla.org を立ち上げることで、近隣のサーバとの間で高速な通信を可能にすることが、2015年に予定されています。まずは、AWS における Firefox のリリース自動化への適用が目標になるでしょう。リリース自動化用の AWS ホストがサーバと高速な回線で接続され、開発者が利用するサーバのトラフィックと分離されることで、履歴管理サーバにアクセスする開発者が、自動化処理で発生した問題の影響を受けないようになることが期待されます。

SSH マスタサーバ上の NFS アクセス (NFS on SSH Master Server)

If you connect to http://hg.mozilla.org/ or https://hg.mozilla.org/, you are hitting a pool of servers behind a load balancer. These servers have repository data stored on local disk, where I/O is fast. In reality, most I/O is serviced by the page cache, so local disks don't come into play.

http://hg.mozilla.org/ や https://hg.mozilla.org/ にアクセスした場合、ロードバランサの向こう側にある複数のサーバのいずれかに接続します。こららのサーバは、高速な I/O アクセスの可能なローカルディスク上に、リポジトリデータを保持しています。実際には、殆どの I/O アクセスにおいて、ページキャッシュがヒットするので、ローカルディスク自体へのアクセスは、殆ど発生しません。

If you connect to ssh://hg.mozilla.org/, you are hitting a single, master server. Its repository data is hosted on an NFS mount. I/O on the NFS mount is horribly slow. Any I/O intensive operation performed on the master is much, much slower than it should be. Such is the nature of NFS.

ssh://hg.mozilla.org/ にアクセスした場合、単一のマスタサーバに接続します。このサーバでのリポジトリデータへのアクセスは、NFS 経由となります。NFS 経由での I/O は、非常に低速です。マスタサーバ上での I/O 主体となる処理は、ものすごく性能が低下します。これは NFS 要因によるものです。

We'll be exploring ways to mitigate this performance issue in 2015. But it isn't the biggest source of performance pain, so don't expect anything immediately.

2015 年には、NFS 要因による性能劣化を緩和するための作業を行う予定です。しかし、NFS による性能劣化は、性能問題の最大要因ではないため、性能問題全体の早急な解決は期待しないでください。

push 時の複製同期 (Synchronous Replication During Pushes)

When you hg push to hg.mozilla.org, the changes are first made on the SSH/NFS master server. They are subsequently mirrored out to the HTTP read-only slaves.

hg.mozilla.org への hg push 成果は、まずは SSH/NFS マスタサーバ上に記録されます。その後、読み出し専用の HTTP サーバ群に反映されます。

As is currently implemented, the mirroring process is performed synchronously during the push operation. The server waits for the mirrors to complete (to a reasonable state) before it tells the client the push has completed.

現状の実装では、HTTP サーバ群へのミラーリング処理は hg push 操作と同期して実施されます。hg push におけるクライアントに対する「push 完了」の通知は、ミラーリング処理が (妥当な段階まで) 完了してから実施されます。

Depending on the repository, the size of the push, and server and network load, mirroring commonly adds 1 to 7 seconds to push times. This is time when a developer is sitting at a terminal, waiting for hg push to complete. The time for Try pushes can be larger: 10 to 20 seconds is not uncommon (but fortunately not the norm).

連携先のリポジトリや、hg push による履歴反映量、サーバやネットワークの負荷状況などに応じて、ミラーリング処理に要する 1 〜 7 分が、hg push の所要時間に加算されます。開発者はこの間、端末の前に座って、hg push の完了を待たなければなりません。Try リポジトリへの hg push は更に深刻で、10 〜 20 分というケースも珍しくありません (幸い、常時そこまで低速なわけでもありません)。

The current mirroring mechanism is overly simple and prone to many failures and sub-optimal behavior. I plan to work on fixing mirroring in 2015. When I'm done, there should be no user-visible mirroring delay.

現行のミラーリング機構は、過度に単純で、様々な要因で失敗しやすく、最適化もされていません。2015年には、ミラーリングの性能問題解消のための作業を行う予定です。問題が解消した暁には、hg push の実施における、ミラーリング由来の応答遅延は無くなる筈です。

pushlog 複製の非効率性 (Pushlog Replication Inefficiency)

Up until yesterday (when we deployed a rewritten pushlog extension, the replication of pushlog data from master to server was very inefficient. Instead of tranferring a delta of pushes since last pull, we were literally copying the underlying SQLite file across the network!

昨日、新たな pushlog エクステンションが配備されるまでは、マスタからサーバへの pushlog データの複製は、非常に非効率的でした。直近の push 時からの差分を転送する代わりに、pushlog 記録に使用している SQLite ファイルを、ネットワーク経由で丸々複製していたのです。

Try's pushlog is ~30 MB. mozilla-central and mozilla-inbound are in the same ballpark. 30 MB x 10 slaves is a lot of data to transfer. These operations were capable of periodically saturating the network, slowing everyone down.

Try リポジトリの pushlog は最大 30MB 程です。mozilla-central や mozilla-inbound も同様の規模です。10 のスレーブホストに対する 30MB の転送は、非常に大規模です。ネットワークの定期的な飽和や、全ての通信の速度低下などが、この転送処理によって発生する可能性があります。

The rewritten pushlog extension performs a delta transfer automatically as part of hg pull. Pushlog synchronization now completes in milliseconds while commonly only consuming a few kilobytes of network traffic.

新たな pushlog エクステンションは、hg pull 処理の一部として自動的に差分転送を行います。現在の pushlog 同期は、概ね数 KB 程度のネットワーク帯域しか消費せず、数 ms 以内に完了します。

Early indications reveal that deploying this change yesterday decreased the push times to repositories with long push history by 1-3s.

pushlog エクステンション更新後の初期計測によれば、大規模な履歴を持つリポジトリへの hg push に要する時間は、1〜3秒程度に減少したことがわかっています。

Try

Pretty much any interaction with the Try repository is guaranteed to have poor performance. The Try repository is doing things that distributed versions control systems weren't designed to do. This includes Git.

Try リポジトリとの連携は、あらゆる操作で低い性能しかでません。Try リポジトリの振る舞いは、Git を含む分散履歴管理ツールの方向性とは、馴染まないものであるためです。

If you are using Try, all bets are off. Performance will be problematic until we roll out the headless try repository.

Try リポジトリと連携する場合、話は全て白紙に戻ります。ヘッドレスな Try リポジトリをリリースするまでは、性能問題は残ることでしょう。

That being said, we've made changes recently to make Try perform better. The median time for pushing to Try has decreased significantly in the past few weeks. The first dip in mid-November was due to upgrading the server from Mercurial 2.5 to Mercurial 3.1 and from converting Try to use generaldelta encoding. The dip this week has been from merging all heads and from deploying the aforementioned pushlog changes. Pushing to Try is now significantly faster than 3 months ago.

とはいえ、Try リポジトリの性能を改善するための変更も実施しています。Try リポジトリへの push に要する時間の中央値は、ここ数週間で大きく減少しています。11月中旬には、サーバ側で使用する Mercurial を 2.5 から 3.1 に更新し、履歴の記録方式を generaldelta 形式に変換しました。今週は、全てのヘッドのマージと、先述した pushlog エクステンションの配備を行いました。Try リポジトリへの hg push は、三ヶ月前と比較して劇的に早くなっています。

結論 (Conclusion)

Many of the reasons for hg.mozilla.org slowness are known. More often than not, they are due to clownshoes or inefficiencies on Mozilla's part rather than fundamental issues with Mercurial.

hg.mozilla.org の応答性能低下は、その大半の原因が既知のものです。多くの場合は、利用者による誤解／誤操作や、Mozilla 側（インフラ等）の効率の問題で、Mercurial 側の問題ではありません。

We have made significant progress at making hg.mozilla.org faster. But we are not done. We are continuing to invest in fixing the sub-optimal parts and making hg.mozilla.org faster yet. I'm confident that within a few months, nobody will be able to say that the servers are a source of pain like they have been for years.

hg.mozilla.org を高速化することに、これまで大きな成果を上げてきましたが、高速化はまだ完了していません。今後も、不十分な点の改良と、hg.mozilla.org の高速化に、投資を続けて行きます。私自身は、サーバの性能に関する数年に渡る問題が、数ヶ月以内には解消されているであろうことを確信しています。

Furthermore, Mercurial is investing in features to make the wire protocol faster, more efficient, and more powerful. When deployed, these should make pushes faster on any server. They will also enable workflow enhancements, such as Facebook's experimental extension to perform rebases as part of push (eliminating push races and having to manually rebase when you lose the push race)

それに加えて、Mercurial の通信プロトコルを高速化、効率化、高機能化する作業が実施されています。これらが利用されるようになることで、どのサーバに対しても hg push の性能が改善されることでしょう。通信プロトコルの強化によって、hg push の延長で rebase を実施する Facebook の実験的エクステンション (サーバ側の hg push 競合に「敗れた」場合は rebase の手動実施が必要) のような、ワークフローの拡張も可能になります。