DataViews 是在JS中可以直接进行低级别内存操作的两种可行方法中的其中一种，另一种则是 TypedArrays。迄今为止，在 V8 中，DataViews 的优化比 TypedArrays 来的要少，导致了在某些操作上的低性能，比如说强调图像处理的工作或编码/解码二进制数据。原因主要在于历史性的选择，类似于 asm.js 选择了 TypedArrays 而不是 DataViews，导致了引擎更倾向于聚焦于 TypedArrays 的性能。
因为性能原因，诸如谷歌地图团队这样的JS开发者决定避免使用 DataViews 并依赖 TypedArrays 作为替代，这导致了代码复杂度提升的代价。这篇博客解释了我们是如何在 V8 6.9 中，将 DataViews 性能提升到和 TypedArrays 代码性能一致甚至超越的水平，使得 DataView 的性能达到在真实世界的性能敏感应用中高效可用。
自从 ES2015 以来，JS支持直接读写二进制缓存 ArrayBuffers 中的数据。ArrayBuffers 不可以被直接访问到，而是要让程序使用被称为 array buffer view object 的东西来进行访问，这可以是 DataView 或 TypedArray。
TypedArrays allow programs to access the buffer as an array of uniformly typed values, such as an Int16Array or a Float32Array.
TypedArrays 允许程序像访问一个值类型一致的数组一样访问缓存，例如 Int16Array 或 Float32Array。
On the other hand, DataViews allow for more fine-grained data access. They let the programmer choose the type of values read from and written tothe buffer by providing specialized getters and setters for each number type, making them useful for serializing data structures.
而 DataViews 则允许更细致的数据操作。它们允许程序员选择从缓存中读取及写入的值类型，通过为每个数字类型提供专门的 getters 和 setters，来进行数据类型的序列化。
Moreover, DataViews also allow the choice of the endianness of the data storage, which can be useful when receiving data from external sources such as the network, a file, or a GPU.
An efficient DataView implementation has been a feature request for a long time (see this bug report from over 5 years ago), and we are happy to announce that DataView performance is now on par!
一个高性能的 DataView 实现已经是一个长期存在的功能需求了（见 bug，5年前已存在），我们非常高兴现在 DataView 的性能已经得到了提升。
Legacy runtime implementation
直到最近，DataView 方法过去一直由内建在 V8 的 C++ 运行时函数实现。这非常耗，因为每个调用都必须从 JS 到 C++ 进行一次转换（反向亦然）。
为了调查由这种实现导致的真实性能损耗，我们设置了一个性能 benchmark 来比对原生的 DataView getter 实现，和一个 JS wrapper 模拟的 DataView 行为。这个 wrapper 使用一个 Uint8Array 来从内部的缓存一个字节一个字节读取数据，最终从这些字节里计算出返回结果。下面是一个例子，这个函数读取 小端 32位 无符号 整型值：
TypedArrays are already heavily optimized in V8, so they represent the performance goal that we wanted to match.
TypedArrays 已经在 V8 里得到了很大程度上的优化，因此它们即是目前的性能优化目标。
Original DataView performance
Our benchmark shows that native DataView getter performance was as much as 4 times slower than the Uint8Array based wrapper, for both big-endian and little-endian reads.
我们的 benchmark 显示原生的 DataView getter 性能是基于 Uint8Array 的 wrapper 的4倍之慢，无论是大端还是小端的读操作。
Improving baseline performance
我们第一步对 DataView 的优化操作是将其实现从 C++ 转移到 CSA。CSA 是一种便携式汇编语言，允许我们直接在 TurboFan 的 machine-level intermediate representation (IR) 中编写代码，我们使用它来实现 V8 标准库中的优化部分。使用 CSA 重写代码绕开了 C++ 的调用，并且能生成高效的机器代码来让 TurboFan 的后端利用。
However, writing CSA code by hand is cumbersome. Control flow in CSA is expressed much like in assembly, using explicit labels and gotos, which makes the code harder to read and understand at a glance.
然而，直接编写 CSA 代码是非常困难的。CSA 中的控制流表达更类似于汇编语言，使用 explicit labels 以及 gotos，这会导致代码更难阅读以及理解。
为了让开发者更容易向 V8 优化的 JS 标准库贡献代码，并提升可读性及可维护性，我们开始设计一种新语言被称为 V8 Torque，这种语言会被编译为 CSA。Torque 的目标是将低级别的细节（使得 CSA 代码更难编写和维护）被抽象隔离开来，并保持性能不变。
Rewriting the DataView code was an excellent opportunity to start using Torque for new code, and helped provide the Torque developers with a lot of feedback about the language. This is what the DataView’s getUint32() method looks like, written in Torque:
重写 DataView 代码是在代码中开始使用 Torque 的一个绝佳机会，帮助提供了关于 Torque 的一系列关于这个语言的反馈。下面的代码是 DataView 的 getUint32() 方法的 Torque 实现：
Moving the DataView methods to Torque already showed a 3× improvement in performance, but did not quite match Uint8Array based wrapper performance yet.
将 DataView 方法交由 Torque 实现已经显示出了3倍性能提升，但仍旧未能达到基于 Uint8Array 的 wrapper 的性能。
Torque DataView performance
Optimizing for TurboFan
当 JS 代码变得 hot，我们会使用 TurboFan 优化编译器将其编译，来生成高度优化的机器码，这会比解释型字节码要高效得多。
TurboFan 将传入的 JS 代码翻译为内部图表达式（见 a “sea of nodes”）。开始会处理高层级的 node 符合 JS 操作和语义，然后缓慢重构成低层级的 node，直到最终生成机器码。
In particular, a function call, such as calling one of the DataView methods, is internally represented as a JSCall node, which eventually boils down to an actual function call in the generated machine code.
However, TurboFan allows us to check whether the JSCall node is actually a call to a known function, for example one of the builtin functions, and inline this node in the IR. This means that the complicated JSCall gets replaced at compile-time by a subgraph that represents the function. This allows TurboFan to optimize the inside of the function in subsequent passes as part of a broader context, instead of on its own, and most importantly to get rid of the costly function call.
Initial TurboFan DataView performance
Implementing TurboFan inlining finally allowed us to match, and even exceed, the performance of our Uint8Array wrapper, and be 8 times as fast as the former C++ implementation.
实现 TurboFan 内联最终会使得我们赶上，甚至超过基于 Uint8Array 的 wrapper 的性能，并达到之前 C++ 实现的8倍性能。
Further TurboFan optimizations
Looking at the machine code generated by TurboFan after inlining the DataView methods, there was still room for some improvement. The first implementation of those methods tried to follow the standard pretty closely, and threw errors when the spec indicates so (for example, when trying to read or write out of the bounds of the underlying ArrayBuffer).
However, the code that we write in TurboFan is meant to be optimized to be as fast as possible for the common, hot cases — it doesn’t need to support every possible edge case. By removing all the intricate handling of those errors, and just deoptimizing back to the baseline Torque implementation when we need to throw, we were able to reduce the size of the generated code by around 35%, generating a quite noticeable speedup, as well as considerably simpler TurboFan code.
Following up on this idea of being as specialized as possible in TurboFan, we also removed support for indices or offsets that are too large (outside of Smi range) inside the TurboFan-optimized code. This allowed us to get rid of handling of the float64 arithmetic that is needed for offsets that do not fit into a 32-bit value, and to avoid storing large integers on the heap.
Compared to the initial TurboFan implementation, this more than doubled the DataView benchmark score. DataViews are now up to 3 times as fast as the Uint8Array wrapper, and around 16 times as fast as our original DataView implementation!
和最初的 TuiboFan 实现比较起来，这种实现将 DataView benchmark 的分数翻了个倍还要多。DataViews 现在已经达到了 Uint8Array wrapper 3倍的性能，并几乎达到了我们最早的 DataView 实现的16倍性能提升之多。
Final TurboFan DataView performance
We’ve evaluated the performance impact of the new implementation on some real-world examples, on top of our own benchmark.
我们已经基于我们自己的 benchmark 进行了在真实场景例子上，最新实现的性能影响评估。
We compared the overall performance of DataViews against TypedArrays. We found that our new DataView implementation provides almost the same performance as TypedArrays when accessing data aligned in the native endianness (little-endian on Intel processors), bridging much of the performance gap and making DataViews a practical choice in V8.
我们比对了 DataViews 以及 TypedArrays 的整体性能。并发现我们新的 DataView 实现的性能基本上和 TypedArrays 保持一致，当访问原生字节顺序对齐的数据时（在 Intel 处理器上的低位），弥补了性能瓶颈并使得 DataViews 成为 V8 中的一个实用选项。
DataView vs. TypedArray peak performance