Trash talk: the Orinoco garbage collector
当前这篇技术博客的目的是针对过去几年的V8垃圾回收机制的改进做一个总结性回顾,整体来说偏理论、偏概念性,技术的深度并不是很足。因此如果想要详细深入了解V8的GC细节的读者,需要打开所有当前博文中链接的历史博文,一篇篇仔细过一下。或者可以阅读我之前整理的Node系列性能博文:Node.JS Profile。
Over the past years the V8 garbage collector (GC) has changed a lot. The Orinoco project has taken a sequential, stop-the-world garbage collector and transformed it into a mostly parallel and concurrent collector with incremental fallback.
在过去的几年里V8的垃圾回收器(GC)改变了很多。Orinoco项目将一个线性的、stop-the-world的垃圾回收器转变成了一个大部分情况下并行以及并发运作的垃圾回收器。
Any garbage collector has a few essential tasks that it has to do periodically:
任何垃圾回收器都有一些周期性必须要完成的任务:
These tasks can be performed in sequence or can be arbitrarily interleaved. A straight-forward approach is to pause JavaScript execution and perform each of these tasks in sequence on the main thread. This can cause jank and latency issues on the main thread, which we’ve talked about in previous blog posts, as well as reduced program throughput.
这些任务可以被序列执行,也可以被强制交叉执行。最简单直白的方法就是在主线程上序列处理这些任务,并在运行的时候暂停JavaScript的执行。这会造成应用程序闪断,以及主线程的延迟问题,我们已经在之前的博客中讨论过这些问题了,并且这也会降低应用程序的吞吐量(throughput)。
The major GC collects garbage from the entire heap.
Major GC负责从整个堆回收垃圾。
Figuring out which objects can be collected is an essential part of garbage collection. Garbage collectors do this by using reachability as a proxy for ‘liveness’. This means that any object currently reachable within the runtime must be kept, and any unreachable objects may be collected.
找出哪些对象可以被回收是垃圾回收过程中的一个必须步骤。垃圾回收器根据可达性
来作为”存活”的指标。这意味着任何在运行时中可以被访问抵达的对象都必须被保留,而任何无法抵达的对象则可以被回收。
Marking is the process by which reachable objects are found. The GC starts at a set of known objects pointers, called the root set. This includes the execution stack and the global object. It then follows each pointer to a JavaScript object, and marks that object as reachable. The GC follows every pointer in that object, and continues this process recursively, until every object that is reachable in the runtime has been found and marked.
标记(Marking)是找出可达对象的流程。垃圾回收器从一系列已知的对象指针开始查找,它们被称为根节点。这包含了执行栈(execution stack)以及全局对象(global object)。接下来只需要跟着指针走,将一个个可达的对象标记起来即可。垃圾回收器会追踪对象内的所有指针,然后递归继续这个流程,直到所有运行时内的可达对象都被找到并被标记起来。
Sweeping is a process where gaps in memory left by dead objects are added to a data structure called a free-list. Once marking has completed, the GC finds contiguous gaps left by unreachable objects and adds them to the appropriate free-list. Free-lists are separated by the size of the memory chunk for quick lookup. In the future when we want to allocate memory, we just look at the free-list and find an appropriately sized chunk of memory.
清除(Sweeping)是一个流程,将死亡对象所遗留的内存空间地址添加到一个被称为free-list数据结构中。一旦当标记流程结束,垃圾回收器寻找连续的不可达对象占用的内存空间,并将它们加到适当的free-list中。Free-list根据内存块的大小分类,以便引擎可以快速定位利用。之后,当有内存分配需求的时候,只需要查找对应尺寸的free-list即可。
The major GC also chooses to evacuate/compact some pages, based on a fragmentation heuristic. You can think of compaction sort of like hard-disk defragmentation on an old PC. We copy surviving objects into other pages that are not currently being compacted (using the free-list for that page). This way, we can make use of the small and scattered gaps within the memory left behind by dead objects.
Major GC也会根据内存碎片的程度,选择性地执行某些内存页的撤出(evacuate)/压缩(compact)。你可以将压缩行为类比老式PC上的磁盘碎片整理行为。引擎会将存活的对象从一个内存页拷贝到另一个当前并未在进行压缩的内存页上(根据那个内存页上的free-list指引)。用这种方法,我们可以利用那些死亡对象留下来的小块的、分散的内存空间。
One potential weakness of a garbage collector which copies surviving objects is that when we allocate a lot of long-living objects, we pay a high cost to copy these objects. This is why we choose to compact only some highly fragmented pages, and just perform sweeping on others, which does not copy surviving objects.
内存回收器执行这样的内存对象拷贝行为有一个潜在弱点,就是当存在大量长期存活对象(long-living)的情况时,拷贝行为的代价会非常大。这也是为什么我们选择仅仅针对那些碎片化程度非常剧烈的内存页进行压缩操作,而针对那些不那么严重的,就对它们进行清除(sweeping)操作,清除操作则不会对存活对象进行拷贝。
The heap in V8 is split into different regions called generations. There is a young generation (split further into ‘nursery’ and ‘intermediate’ sub-generations), and an old generation. Objects are first allocated into the nursery. If they survive the next GC, they remain in the young generation but are considered ‘intermediate’. If they survive yet another GC, they are moved into the old generation.
V8的堆内存会划分成多个不同的区域,被称为代。其中一个被称为新生代
(其中还进一步分为”nursery”和”intermediate”两种不同的子代),另一个被称为老生代
。对象一开始被分配在新生代的nursery里。如果他们在下一次GC中存活下来了,就被认为是”intermediate”。如果它们在更下一次的GC中存活下来了,它们就会被移到老生代中。
In garbage collection there is an important term: “The Generational Hypothesis”. This basically states that most objects die young. In other words, most objects are allocated and then almost immediately become unreachable, from the perspective of the GC. This holds not only for V8 or JavaScript, but for most dynamic languages.
垃圾回收器遵循着一条原则:“代际假设”。基本上来说,假设认为大部分的对象都会在很短的时间内失效。换句话来说,从垃圾回收器的角度来看,大部分对象都是被分配出来,然后基本上马上就不可达了。这条原则不仅仅只针对V8或JavaScript有效,而是对几乎所有的动态语言都有效。
V8’s generational heap layout is designed to exploit this fact about object lifetimes. The GC is a compacting/moving GC, which means that it copies objects which survive garbage collection. This seems counterintuitive: copying objects is expensive at GC time. But we know that only a very small percentage of objects actually survive a garbage collection, according to the generational hypothesis. By moving only the objects which survive, every other allocation becomes ‘implicit’ garbage. This means that we only pay a cost (for copying) proportional to the number of surviving objects, not the number of allocations.
V8的代际堆内存布局则是根据这个对象生命周期现状来进行设计。这样设计的垃圾回收器是一种压缩(compacting)/移动(moving)式GC,这意味着这种垃圾回收器会将存活的对象进行拷贝。这有点违反直觉:拷贝对象在垃圾回收时间里是非常昂贵的。但我们知道,根据代际假设,仅仅非常一小部分的对象可以存活过垃圾回收。仅将存活的对象移动,任何其他的分配对象则成为”隐式”垃圾。这意味着我们只需要支付相对于死亡对象数量来说仅只占了一小部分的存活对象的拷贝代价,而不是所有分配出来的对象的数量。
There are two garbage collectors in V8. The Major GC (Mark-Compact) collects garbage from the whole heap. The Minor GC (Scavenger)
collects garbage in the young generation. The major GC is effective at collecting garbage from the whole heap, but the generational hypothesis tells us that newly allocated objects are very likely to need garbage collection.
V8有两种垃圾回收器。Major GC (Mark-Compact)从整个堆内存回收垃圾。Minor GC (Scavenger)
仅只从新生代回收垃圾。Major GC的性能在从整个堆内存回收垃圾上已经足够高效,但代际假设告诉我们新分配的内存对象非常值得拥有一个专门的内存回收器。
In the Scavenger, which only collects within the young generation, surviving objects are always evacuated to a new page. V8 uses a ‘semi-space’ design for the young generation. This means that half of the total space is always empty, to allow for this evacuation step. During a scavenge, this initially-empty area is called ‘To-Space’. The area we copy from is called ‘From-Space’. In the worst case, every object could survive the scavenge and we would need to copy every object.
Scavenger仅处理新生代的内存,存活对象总是会被直接拷贝到一个新的内存页。V8使用一个被称为”semi-space”的设计来处理新生代内存。这种设计将一半的内存空间置空,在进行拷贝的时候将存活对象直接拷贝到这一半的内存中。在scavenge操作中,这个初始置空的内存空间被称为”To-Space”。我们将对象拷贝出来的被称为”From-Space”。在最糟糕的情况下,每个对象都在scavenge中存活下来了,我们就必须拷贝所有的对象。
For scavenging, we have an additional set of roots which are the old-to-new references. These are pointers in old-space that refer to objects in the young generation. Rather than tracing the entire heap graph for every scavenge, we use write barriers to maintain a list of old-to-new references. When combined with the stack and globals, we know every reference into the young generation, without the need to trace through the entire old generation.
对scavenging来说,我们还有一个额外的根节点集合,它们是老到新引用(old-to-new references)。这些是存在于老生代并指向新生代对象的指针。为了不在每次scavenge中追踪整个堆内存,我们使用了write barriers来维护一份老到新引用列表。结合栈内存以及全局对象,我们就了解了所有指向新生代的指针了,而不需要过一遍整个老生代。
The evacuation step moves all surviving objects to a contiguous chunk of memory (within a page). This has the advantage of completing removing fragmentation - gaps left by dead objects. We then switch around the two spaces i.e. To-Space becomes From-Space and vice-versa. Once GC is completed, new allocations happen at the next free address in the From-Space.
Evacuation步骤会将所有存活对象拷贝到一个连续的内存块中(在一个内存页中)。这样做的优点是完全避免了内存碎片 - 死亡对象遗留下来的内存空隙。然后就可以翻转这两个内存空间(指semi-space设计分隔开来的新生代两块内存空间),举例来说To-Space变成From-Space,反之亦然。当GC完成的时候,新的内存分配行为会在新的From-Space上发生。
We quickly run out of space in the young generation with this strategy alone. Objects that survive a second GC are evacuated into the old generation, rather than To-Space.
当使用这个策略之后,我们很快就会在新生代耗尽内存。存活过两次GC的对象将会被清理到老生代,而不是To-Space。
The final step of scavenging is to update the pointers that reference the original objects, which have been moved. Every copied object leaves a forwarding-address which is used to update the original pointer to point to the new location.
Scavenging的最后一步是更新指向已经被移走的对象的指针。每一个被拷贝的对象都会留下一个forwarding-address,用来将老的指针更新成新的。
In scavenging we actually do these three steps — marking, evacuating, and pointer-updating — all interleaved, rather than in distinct phases.
在scavenging中,其实我们做了三个步骤 - 标记、清理、更新指针 - 所有都可以交错执行,而不是按确定的步骤执行。
Most of these algorithms and optimizations are common in garbage collection literature and can be found in many garbage collected languages. But state-of-the-art garbage collection has come a long way. One important metric for measuring the time spent in garbage collection is the amount of time that the main thread spends paused while GC is performed. For traditional ‘stop-the-world’ garbage collectors, this time can really add up, and this time spent doing GC directly detracts from the user experience in the form of janky pages and poor rendering and latency.
大部分这些算法和优化在垃圾回收的技术历史中都非常常见,并可以在很多垃圾回收语言中得到印证。但最先进的(state-of-the-art)垃圾回收还有一段很长的路要走。衡量垃圾回收耗费时长的一项很重要的指标就是,当GC执行时主线程暂停的时长。对传统的”stop-the-world”垃圾回收器来说,这段时间相当致命。在GC执行时,停顿的页面、糟糕的渲染以及应用延迟等问题,会摧毁用户体验。
Orinoco is the codename of the GC project to make use of the latest and greatest parallel, incremental and concurrent techniques for garbage collection, in order to free the main thread. There are some terms here that have a specific meaning in the GC context, and it’s worth defining them in detail.
Orinoco是一个项目的代号,这个项目使用了最新以及最棒的并行、增量、并发垃圾回收技术,争取将主线程从垃圾回收这件事情上解放出来。在GC这个上下文中,上述的一些名词都有特殊的含义,并值得详细解释下。
Parallel is where the main thread and helper threads do a roughly equal amount of work at the same time. This is still a ‘stop-the-world’ approach, but the total pause time is now divided by the number of threads participating (plus some overhead for synchronization). This is the easiest of the three techniques. The JavaScript heap is paused as there is no JavaScript running, so each helper thread just needs to make sure it synchronizes access to any objects that another helper might also want to access.
并行(Parallel)指的是主线程以及辅助线程在相同的时间内处理接近量的GC工作。这仍旧是一个”stop-the-world”解决方案,但暂停的总时长被协同进行GC的辅助线程分了出去(需要加上一些同步所需的额外开销)。这是三种技术中最早得到实装的。JavaScript堆内存在GC中是暂停的,因此没有任何JavaScript代码在运行。每个辅助线程只需要确保同步访问对象的状态,以保证自己访问的对象不会被其他辅助线程处理。
Incremental is where the main thread does a small amount of work intermittently. We don’t do an entire GC in an incremental pause, just a small slice of the total work required for the GC. This is more difficult, because JavaScript executes between each incremental work segment, meaning that the state of the heap has changed, which might invalidate previous work that was done incrementally. As you can see from the diagram, this does not reduce the amount of time spent on the main thread (in fact, it usually increases it slightly), it just spreads it out over time. This is still a good technique for solving one of our original problems: main thread latency. By allowing JavaScript to run intermittently, but also continue garbage collection tasks, the application can still respond to user input and make progress on animation.
增量(Incremental)指的是主线程间歇性地处理一小部分GC工作。引擎不会在一次增量GC暂停中处理整个GC流程工作,而仅只会处理GC需要做的工作中的一小部分。这非常困难,因为JavaScript在每次增量GC段之间仍旧在运行,这意味着堆内存的状态一直在改变,会导致之前做的增量GC工作无效。从图上可见,这不会减少主线程的GC工作时长(事实上,通常这更会稍稍增加),而是将工作散布到更长的时间段内。这对解决我们一开始的问题:主线程延迟问题来说仍旧是一个好用的技术。在保持主线程间歇性运作的同时,仍旧持续处理垃圾回收任务,应用程序得以继续响应用户的输入和处理动画等。
Concurrent is when the main thread executes JavaScript constantly, and helper threads do GC work totally in the background. This is the most difficult of the three techniques: anything on the JavaScript heap can change at any time, invalidating work we have done previously. On top of that, there are now read/write races to worry about as helper threads and the main thread simultaneously read or modify the same objects. The advantage here is that the main thread is totally free to execute JavaScript — although there is minor overhead due to some synchronization with helper threads.
并发(Concurrent)是指当主线程在不停歇地运行JavaScript的同时,辅助线程在后台处理垃圾回收工作。这是三个技术中最难实现的一个:JavaScript堆内存可能在任何时间点发生任何改动,使得垃圾回收器之前做的工作失效。此外,还有读写竞争的问题需要考虑,因为辅助线程和主线程可能同时在读取或修改同一个对象。优势则是主线程完全从垃圾回收工作中解放出来,得以全身心处理JavaScript执行工作 - 虽然和辅助线程之间同步状态会有小小的额外开销。
Today, V8 uses parallel scavenging to distribute work across helper threads during the young generation GC. Each thread receives a number of pointers, which it follows, eagerly evacuating any live objects into To-Space. The scavenging tasks have to synchronize via atomic read/write/compare-and-swap operations when trying to evacuate an object; another scavenging task may have found the same object via a different path and also try to move it. Whichever helper moved the object successfully then goes back and updates the pointer. It leaves a forwarding pointer so that other workers which reach the object can update other pointers as they find them. For fast synchronization-free allocation of surviving objects, the scavenging tasks use thread-local allocation buffers.
如今,V8使用并行scavenging将新生代的垃圾回收工作分发给辅助线程执行。每个辅助线程会收到一些指针,需要它们处理,快速地将存活对象拷贝到To-Space。当清理一个而对象的时候,scavenging任务必须通过原子性的 读/写/比对清理(compare-and-swap)操作来同步状态;其他的scavenging任务可能会通过不同的路径定位到同一个对象,并也想处理它。无论哪个辅助线程成功完成了对这个对象的操作都会返回并更新对应的指针。一个forwarding指针会被保留下来,其他辅助线程可以根据它们找到这个对象的指针进行对应的更新。对无需同步状态的存活对象进行的内存分配,scavenging任务使用线程本地的内存buffer来处理。
Major GC in V8 starts with concurrent marking. As the heap approaches a dynamically computed limit, concurrent marking tasks are started. The helpers are each given a number of pointers to follow, and they mark each object they find as they follow all references from discovered objects. Concurrent marking happens entirely in the background while JavaScript is executing on the main thread. Write barriers are used to keep track of new references between objects that JavaScript creates while the helpers are marking concurrently.
V8中的Major GC开始使用并发标记。当堆内存达到了一个动态运算上限(dynamically computed limit),并发标记任务就开始了。每个辅助线程都会获得一些指针来处理,然后它们会顺着已发现对象的指针将找到的每个对象进行标记。并发标记完全发生在后台辅助线程上,而主线程仍旧会继续处理程序的运行。Write barriers用来追踪那些在辅助线程并发标记对象时主线程创建出来的新对象之间的引用。
When the concurrent marking is finished, or we reach the dynamic allocation limit, the main thread performs a quick marking finalization step. The main thread pause begins during this phase. This represents the total pause time of the major GC. The main thread scans the roots once again, to ensure that all live objects are marked, and then along with a number of helpers, starts parallel compaction and pointer updating. Not all pages in old-space are eligible for compaction — those that aren’t will be swept using the free-lists mentioned earlier. The main thread starts concurrent sweeping tasks during the pause. These run concurrently to the parallel compaction tasks and to the main thread itself — they can continue even when JavaScript is running on the main thread.
当并发标记完成,或当引擎达到了动态分配上限,主线程会执行一个快速标记完成步骤(a quick marking finalization step)。主线程会在这一步中暂停。这代表了major GC所有的暂停时长。主线程会再次扫描根节点,来保证所有的存活对象都被标记了,然后协同一系列的辅助线程,开始并行清理和指针更新工作。不是所有的老生代内存页都适合压缩(compaction) - 那些不适合压缩的会使用free-list来进行清理,我们之前提到过这点。主线程在暂停中开始并发清理任务。这些工作会并发发生在并行压缩任务上以及发生在主线程自身上 - 即便主线程上的JavaScript正在运行这些工作也可以继续。
Users of JavaScript don’t have direct access to the garbage collector; it is totally implementation-defined. V8 does however provide a mechanism for the embedder to trigger garbage collection, even if the JavaScript program itself can’t. The GC can post ‘Idle Tasks’ which are optional work that would eventually be triggered anyway. Embedders like Chrome might have some notion of free or idle time. For example in Chrome, at 60 frames per second, the browser has approximately 16.6 ms to render each frame of an animation. If the animation work is completed early, Chrome can choose to run some of these idle tasks that the GC has created in the spare time before the next frame.
JavaScript的用户不可以直接操作垃圾回收器;它是完全实现定义的(implementation-defined)。V8确实提供了一个机制让嵌入者触发垃圾回收行为,即便JavaScript程序自身并不可以。垃圾回收器可以发布可选的工作”Idle Tasks”,并最终被触发。V8的嵌入者,比如说Chrome可能对GC触发或闲置有时机规划。比如说Chrome,以每秒60帧状态运作,浏览器有大约16.6毫秒来渲染一个动画的每一帧。如果动画工作提前结束了,Chrome可以选择在下一帧渲染之前来运行一部分GC创建出来的闲置工作(idle tasks)。
For more details, refer to our in-depth publication on idle-time GC.
如需要了解更多细节,请查看our in-depth publication on idle-time GC。
The garbage collector in V8 has come a long way since its inception. Adding parallel, incremental and concurrent techniques to the existing GC was a multi-year effort, but has paid off, moving a lot of work to background tasks. It has drastically improved pause times, latency, and page load, making animation, scrolling, and user interaction much smoother. The parallel Scavenger has reduced the main thread young generation garbage collection total time by about 20%–50%, depending on the workload. Idle-time GC can reduce Gmail’s JavaScript heap memory by 45% when it is idle. Concurrent marking and sweeping has reduced pause times in heavy WebGL games by up to 50%.
V8的垃圾回收器从起始到现在已经走了很长一段路。添加并行、增量、并发技术到现存的垃圾回收器中是一件持续多年的艰苦工作,但得到了回报,许多垃圾回收工作已经移交到了后台线程。这大大优化了 暂停时长、延迟、页面加载 等性能指标,并使得诸如 动画、滚动以及用户界面 更加流畅。Parallel Scavenger将主线程新生代的垃圾回收总时长减少了20%-50%,根据不同的工作负载。Idle-time GC可以在闲置状态减少Gmail的JavaScript堆内存量45%。Concurrent marking and sweeping已经减少了重度WebGL游戏的垃圾回收暂停时长50%。
But the work here is not finished. Reducing garbage collection pause times is still important for giving users the best experience on the web, and we are looking into even more advanced techniques. On top of that, Blink (the renderer in Chrome) also has a garbage collector (called Oilpan), and we are doing work to improve cooperation between the two collectors and to port some of the new techniques from Orinoco to Oilpan.
但工作仍旧未完成。减少垃圾回收暂停时长对给予用户最佳WEB体验至关重要,我们已经开始涉足更先进的技术。除此之外,Blink(Chrome浏览器的渲染引擎)也有一个自己的垃圾回收器(代号Olipan),我们正在努力提升两个垃圾回收器之间的协作,并将部分新技术也应用到Olipan上。
Most developers don’t need to think about the GC when developing JavaScript programs, but understanding some of the internals can help you to think about memory usage and helpful programming patterns. For example, with the generational structure of the V8 heap, short-lived objects are actually very cheap from the garbage collector’s perspective, as we only pay for objects that survive the collection. These sorts of patterns work well for many garbage-collected languages, not just JavaScript.
大部分开发者不需在研发JavaScript应用的时候考虑垃圾回收问题,但理解里面的原理机制能帮助程序员思考内存使用以及编程范式问题。举例来说,因为V8引擎堆内存有分代设计,短期存活的对象对垃圾回收器来说非常廉价,因为只需要处理存活的对象。像这样的原则对很多垃圾回收语言都通用,不仅仅只是JavaScript。
EOF