1. 原文

Speeding up spread elements

本文大量提到了array literals这个概念,请参考MDN的解释:MDN » Grammar and types » Literals » Array literals

2. 摘要翻译

During his three-months internship on the V8 team, Hai Dang worked on improving the performance of [...array], [...string], [...set], [...map.keys()], and [...map.values()] (when the spread elements are at the start of the array literal). He even made Array.from(iterable) much faster as well. This article explains some of the gory details of his changes, which are included in V8 starting with v7.2.

在为期三个月的V8团队内部实习中,Hai Dang在努力提升[...array], [...string], [...set], [...map.keys()], 以及 [...map.values()]的性能(当扩展元素在数组字面量(array literal)的起始)。他甚至让Array.from(iterable)也更快了。这篇文章解释了部分他改动的细节,这部分改动被包含在了V8 v7.2开始的版本中。

Spread elements

Spread elements are components of array literals that have the form ...iterable. They were introduced in ES2015 as a way to create arrays from iterable objects. For example, the array literal [1, ...arr, 4, ...b] creates an array whose first element is 1 followed by the elements of the array arr, then 4, and finally the elements of the array b:

扩展元素是拥有...iterable形式的数组字面量(array literals)组件。它们是从ES2015被引入的特性,作为一种从可迭代对象创建数组的方法。举例来说,数组字面量[1, ...arr, 4, ...b]创建了一个数组,这个数组的第一个元素是1,然后跟着是数组arr的元素,接着是4,最后是数组b的元素:

const a = [2, 3];
const b = [5, 6, 7];
const result = [1, ...a, 4, ...b];
// → [1, 2, 3, 4, 5, 6, 7]

As another example, any string can be spread to create an array of its characters (Unicode code points):

另一个例子,任何字符串可以用来扩展创建一个含有其字符为内容的数组(Unicode code points):

const str = 'こんにちは';
const result = [...str];
// → ['こ', 'ん', 'に', 'ち', 'は']

Similarly, any set can be spread to create an array of its elements, sorted by insertion order:

类似的,任何set可以被用来扩展创建含有其元素的数组,按插入顺序进行排序:

const s = new Set();
s.add('V8');
s.add('TurboFan');
const result = [...s];
// → ['V8', 'TurboFan']

In general, the spread elements syntax ...x in an array literal assumes that x provides an iterator (accessible through x[Symbol.iterator]()). This iterator is then used to obtain the elements to be inserted into the resulting array.

通常来说,扩展运算语法...x会假设x提供了一个迭代器(通过x[Symbol.iterator]()来访问)。这个迭代器被用来获取元素用以插入到新创建的结果数组中。

The simple use case of spreading an array arr into a new array, without adding any further elements before or behind, [...arr], is considered a concise, idiomatic way to shallow-clone arr in ES2015. Unfortunately, in V8, the performance of this idiom lagged far behind its ES5 counterpart. The goal of Hai’s internship was to change that!

最简单的扩展一个数组arr到一个新数组,并不在其前后添加任何元素,[...arr]这样的操作,被认为是在ES2015中最简洁、通顺的对arr进行浅拷贝(shallow-clone)的方法。不幸的是,在V8中,这种写法的性能要远不及ES5中功能类似的其他写法。Hai实习的目标就是改变整个现状!

Why is (or were!) spread elements slow?

There are many ways to shallow-clone an array arr. For instance, you can use arr.slice(), or arr.concat(), or [...arr]. Or, you can write your own clone function that employs a standard for-loop:

有很多方法可以浅拷贝一个数组arr。例如,你可以使用arr.slice()arr.concat(),或[...arr]。甚至你可以基于标准的for循环来编写你自己的clone函数:

function clone(arr) {
  // Pre-allocate the correct number of elements, to avoid
  // having to grow the array.
  const result = new Array(arr.length);
  for (let i = 0; i < arr.length; i++) {
    result[i] = arr[i];
  }
  return result;
}

Ideally, all these options would have similar performance characteristics. Unfortunately, if you pick [...arr] in V8, it is (or was) likely to be slower than clone! The reason is that V8 essentially transpiles [...arr] into an iteration like the following:

理想的情况下,所有这些选项都会有接近的性能。不幸的是,如果你在V8中使用[...arr],它很有可能会比clone要慢!这是因为本质上来说V8在处理[...arr]的时候是进行了一次如下的迭代:

function(arr) {
  const result = [];
  const iterator = arr[Symbol.iterator]();
  const next = iterator.next;
  for ( ; ; ) {
    const iteratorResult = next.call(iterator);
    if (iteratorResult.done) break;
    result.push(iteratorResult.value);
  }
  return result;
}

This code is generally slower than clone for a few reasons:

这样的代码通常会比clone要慢,是因为几个原因:

  1. It needs to create the iterator at the beginning by loading and evaluating the Symbol.iterator property.
  2. It needs to create and query the iteratorResult object at every step.
  3. It grows the result array at every step of the iteration by calling push, thus repeatedly reallocating the backing store.
  • 它需要在一开始通过读取和评估Symbol.iterator属性,来创建一个iterator
  • 它需要在每一步创建并查询iteratorResult对象
  • 它在迭代的每一步中都需要使用push方法来增长result的长度,这导致了内存的再分配

The reason for using such an implementation is that, as mentioned earlier, spreading can be done not only on arrays but, in fact, on arbitrary iterable objects, and must follow the iteration protocol. Nevertheless, V8 should be smart enough to recognize if the object being spread is an array such that it can perform the elements extraction at a lower level and thereby:

会使用这样的实现是因为,这我们之前提到过,扩展运算这个行为可以在很多目标上实施,不仅仅只是数组,事实上,是任何可迭代的对象,当然这些对象必须遵循迭代协议(the iteration protocol)。尽管如此,V8应该足够聪明,识别扩展对象是不是一个数组,以便在低层级执行元素抽取操作,从而:

  1. avoid the creation of the iterator object,
  2. avoid the creation of the iterator result objects, and
  3. avoid continuously growing and thus reallocating the result array (we know the number of elements in advance).
  • 避免创建迭代对象
  • 避免创建迭代结果对象
  • 避免持续增长结果数组,避免内存的再分配(我们在一开始就知道目标长度了)

We implemented this simple idea using CSA for fast arrays, i.e. arrays with one of the six most common elements kinds. The optimization applies for the common real-world scenario where the spread occurs at the start of the array literal, e.g. [...foo]. As shown in the graph below, this new fast path yields roughly a 3× performance improvement for spreading an array of length 100,000, making it about 25% faster than the hand-written clone loop.

对于数组(fast arrays)来说,我们使用CSA实现了这个简单的做法,举例来说,含有最常见的elements kinds的数组。优化是针对真实场景进行的,扩展操作最初是从数组字面量开始的,举例来说,[...foo]。如下图所示,新的快速修改,对一个长度为100,000的数组产生了大约3x的性能提升,使得它比手写的clone循环实现要快25%左右。

Performance improvement of spreading a fast array

Note: While not shown here, the fast path also applies when the spread elements are followed by other components (e.g. [...arr, 1, 2, 3]), but not when they are preceded by others (e.g. [1, 2, 3, ...arr]).

请注意: 虽然并没有在例子中体现出来,这个快速修改对目标扩展元素之后追加其他组件也有效(比如:[...arr, 1, 2, 3]),但追加在其他组件之后则不可以(比如:[1, 2, 3, ...arr])。

Tread carefully down that fast path

That’s clearly an impressive speedup, but we must be very careful about when it is correct to take this fast path: JavaScript allows the programmer to modify the iteration behavior of objects (even arrays) in various ways. Because spread elements are specified to use the iteration protocol, we need to ensure that such modifications are respected. We do so by avoiding the fast path completely whenever the original iteration machinery has been mutated. For example, this includes situations like the following.

这无疑是令人印象深刻的优化提速,但我们必须非常仔细,保证这个快速修改必须是正确的:JavaScript允许程序员通过一系列途径改变对象(也包含array)的迭代行为。因为扩展元素被spec指定需要使用迭代协议,我们需要保证前述的一系列改动也一样能够正确工作。我们会检查迭代装置,在经过改动的情况下完全禁止这个快速修正的应用。举例来说,有下面几种情况。

Own Symbol.iterator property

Normally, an array arr does not have its own Symbol.iterator property, so when looking up that symbol, it will be found on the array’s prototype. In the example below, the prototype is bypassed by defining the Symbol.iterator property directly on arr itself. After this modification, looking up Symbol.iterator on arr results in an empty iterator, and thus the spread of arr yields no elements and the array literal evaluates to an empty array.

通常情况下,一个数组arr不会有自建的Symbol.iterator属性,因此当查找这个symbol,会追溯到数组的prototype上。在下面的例子中,原型上的实现会被略过,因为arr自己已经直接实现了一个Symbol.iterator属性。经过这个修改,当访问arr上的Symbol.iterator属性时候,会得到一个空的迭代器,导致arr的扩散操作得不到任何元素,并且最终的结果数组是一个空数组。

const arr = [1, 2, 3];
arr[Symbol.iterator] = function() {
  return { next: function() { return { done: true }; } };
};
const result = [...arr];
// → []

Modified %ArrayIteratorPrototype%

The next method can also be modified directly on %ArrayIteratorPrototype%, the prototype of array iterators (which affects all arrays).

下一个例子,程序员可以直接修改%ArrayIteratorPrototype%,数组迭代器的原型(这会影响所有的数组)。

Object.getPrototypeOf([][Symbol.iterator]()).next = function() {
  return { done: true };
}
const arr = [1, 2, 3];
const result = [...arr];
// → []

Dealing with holey arrays

Extra care is also needed when copying arrays with holes, i.e., arrays like ['a', , 'c'] that are missing some elements. Spreading such an array, by virtue of adhering to the iteration protocol, does not preserve the holes but instead fills them with the values found in the array’s prototype at the corresponding indices. By default there are no elements in an array’s prototype, which means that any holes are filled with undefined. For example, [...['a', , 'c']] evaluates to a new array ['a', undefined, 'c'].

拷贝带空洞的数组需要额外的注意,举例来说,像这样的数组,['a', , 'c']当中缺失了部分元素。扩展这样的数组,根据迭代协议,并不会保留这些空洞,而是将从数组原型链上找到的对应位置的值填入到这些空洞中。默认上,数组原型链上是没有元素的,这导致了这些空洞被undefined填上。举例来说,[...['a', , 'c']]最终会是['a', undefined, 'c']

Our fast path is smart enough to handle holes in this default situation. Instead of blindly copying the input array’s backing store, it watches out for holes and takes care of converting them to undefined values. The graph below contains measurements for an input array of length 100,000 containing only (tagged) 600 integers — the rest are holes. It shows that spreading such a holey array is now over 4× faster than using the clone function. (They used to be roughly on par, but this is not shown in the graph).

我们的快速修正足够聪明,可以处理这些默认情况下的数组空洞。它会很小心地避免将这些空洞转成undefined值,而不是根据默认行为直接将源数组的值拷贝过去。下图中的数组长度为100,000,但仅包含了600个整型(有标签) - 剩下的全部都是空洞。图表显示了,扩展这样的一个带洞数组现在要比clone函数快上超过4x。(它们之前的表现是基本相同的,这在当前的图表上没有得到体现)。

Note that although slice is included in this graph, the comparison with it is unfair because slice has a different semantics for holey arrays: it preserves all the holes, so it has much less work to do.

请注意,虽然slice出现在了这张图表上,但这样的比较是不公平的,因为slice在处理空洞数组上有不同的工作机制:它保留了所有的空洞,因此对它来说,要做的工作会少很多。

Performance improvement of spreading a holey array of integers (HOLEY_SMI_ELEMENTS)

The filling of holes with undefined that our fast path has to perform is not as simple as it sounds: it may require converting the whole array to a different elements kind. The next graph measures such a situation. The setup is the same as above, except that this time the 600 array elements are unboxed doubles and the array has the HOLEY_DOUBLE_ELEMENTS elements kind. Since this elements kind cannot hold tagged values such as undefined, spreading involves a costly elements kind transition, which is why the score for [...a] is much lower than in the previous graph. Nevertheless, it is still much faster than clone(a).

对于在空洞中填入undefined这件事情,并不如听起来的那么简单:可能会需要将整个数组转成一种不同的elements kind。下一张图表会展示这样的情况。初始设定和上面一致,除了这次600个数组元素是未装箱的双精度浮点型(unboxed doubles),此外这个数组拥有HOLEY_DOUBLE_ELEMENTS elements kind。因为这个数组的elements kind可以包含有标签的值,例如undefined,扩展运算就必须包含代价很大的elements kind转化,这导致了[...a]的分数比上一张图表中下降了那么多。尽管如此,它仍旧比clone(a)要快很多。

Performance improvement of spreading a holey array of doubles (HOLEY_DOUBLE_ELEMENTS)

Spreading strings, sets, and maps

The idea of skipping the iterator object and avoiding growing the result array equally applies to spreading other standard data types. Indeed, we implemented similar fast paths for primitive strings, for sets, and for maps, each time taking care to bypass them in the presence of modified iteration behavior.

我们也有想法,跳过迭代对象,并避免其他数据类型的扩展运算导致的数组增长(这句话简直了)。确实,我们对原始字符串、sets、maps,也实现了类似的快速修正,并小心跳过迭代行为被改变的情况。

Concerning sets, the fast path supports not only spreading a set directly ([...set]), but also spreading its keys iterator ([...set.keys()]) and its values iterator ([...set.values()]). In our micro-benchmarks, these operations are now about 18× faster than before.

对sets来说,快速修正支持不仅仅直接扩展一个set([...set]),也支持扩展它的键([...set.keys()])以及它的值([...set.values()])。在我们的微型benchmark中,这些操作现在比之前要快大约18x。

The fast path for maps is similar but does not support spreading a map directly ([...map]), because we consider this an uncommon operation. For the same reason, neither fast path supports the entries() iterator. In our micro-benchmarks, these operations are now about 14× faster than before.

对maps来说,情况比较类似,但不支持直接扩展一个map([...map]),因为我们认为这样的操作比较罕见。同样的理由,快速修正也不支持entries()迭代。在我们的微型benchmark中,这些操作比之前要快大约x14。

For spreading strings ([...string]), we measured a roughly 5× improvement, as shown in the graph below by the purple and green lines. Note that this is even faster than a TurboFan-optimized for-of-loop (TurboFan understands string iteration and can generate optimized code for it), represented by the blue and pink lines. The reason for having two plots in each case is that the micro-benchmarks operate on two different string representations (one-byte strings and two-byte strings).

对字符串来说([...string]),我们获得了大约5x的性能提升,请查看下图中紫色和绿色的线。请注意,这已经比经过TurboFan优化的for-of循环还要快了(TurboFan能理解字符串迭代,并为其创建经过优化的代码),请查看蓝色和粉色的线。为什么每个情况都会有两条线,是因为微型benchmark是在两种不同的字符串上进行测试的(one-byte strings and two-byte strings)。

Performance improvement of spreading a string

Performance improvement of spreading a set with 100,000 integers (magenta, about 18×), shown here in comparison with a for-of loop (red)

Improving Array.from performance

Fortunately, our fast paths for spread elements can be reused for Array.from in the case where Array.from is called with an iterable object and without a mapping function, for example, Array.from([1, 2, 3]). The reuse is possible because in this case, the behavior of Array.from is exactly the same as that of spreading. It results in an enormous performance improvement, shown below for an array with 100 doubles.

幸运的是,我们对扩展元素实现的快速修正也可以在Array.from上得到重用,只需要Array.from是在一个可迭代对象上工作,并且不含绑定函数,举例来说,Array.from([1, 2, 3])。可以重用的原因是因为在这种用例下,Array.from的行为和扩展运算是一致的。这带来了巨大的性能提升,请看下图,例子是一个含100个双精度浮点数的数组。

Performance improvement of Array.from(array) where array contains 100 doubles

Conclusion

V8 v7.2 / Chrome 72 greatly improves the performance of spread elements when they occur at the front of the array literal, for example [...x] or [...x, 1, 2]. The improvement applies to spreading arrays, primitive strings, sets, maps keys, maps values, and — by extension — to Array.from(x).

V8 v7.2 / Chrome 72大大提升了出现在数组字面量起始的扩展元素性能,举例来说[...x][...x, 1, 2]。这项提升对扩展数组、原始字符串、sets、maps键、maps值,以及 - 作为扩展 - Array.from(x)都有效。

EOF