Dynamic Language VMsInside Ruby - Lourens Naudé
sexta-feira, 4 de Dezembro de 2009
Background• Freelance Ruby/C/Systems Developer• http://github.com/methodmissing• Contractor at Trade2Win Ltd.• Realtime Forex / Autotrading Platform
sexta-feira, 4 de Dezembro de 2009
ProcessFront-end (parsing)
Semantics
Back-end (runtime)
sexta-feira, 4 de Dezembro de 2009
Roadmap
• Source to Nodes and the AST• VM: Symbol table, caches, opcode dispatch and optimizations• Object Model : Objects, methods and variables• Garbage Collection• Contexts : Threads, GVL, Fibers
sexta-feira, 4 de Dezembro de 2009
Source to AST
sexta-feira, 4 de Dezembro de 2009
Lexical Analysis• Converts source code to a token stream• Token identification (keyword_class, keyword_module etc.)
sexta-feira, 4 de Dezembro de 2009
Grammar• Describes program syntax structure• Semantics of a program is defined by it’s syntax• Production rules : name and use case• Object#block_arg(&block)• Object#opt_block_arg(arg1, arg2, &block)
sexta-feira, 4 de Dezembro de 2009
Abstract Syntax Tree
sexta-feira, 4 de Dezembro de 2009
VM Architecture• Reuse of some 1.8.x series architecture : parsing, AST nodes, Object, GC etc.
• Introduces a code generation phase to convert the AST to instruction sequences for better optimization hooks and faster runtime
• No speedup for inherited MRI features such as string processing etc.
sexta-feira, 4 de Dezembro de 2009
• Represents grammar• Sometimes referred to as an annotated AST• Annotations / attributes attach semantics to nodes• Literals, values, statements, callsite info ( file and line number )• Can be augmented with semantic analysis
AST Annotations
sexta-feira, 4 de Dezembro de 2009
AST Transformation
• Removes AST noise• Refactor to features that map closer to machine instructions• Usually yields more AST nodes, but reduces overall complexity
sexta-feira, 4 de Dezembro de 2009
Intermediate Tree Nodes
• Minimal subset required for code generation• Expressions and assignments • Method calls, arguments and return values• Conditional jumps - if/else, iterators• Unconditional jumps - exceptions, retry, catch/throw
sexta-feira, 4 de Dezembro de 2009
Code Generation
• Converts AST to code segments - a linear instruction set• Selection : Which tree sections to rewrite ?• AST Node -> instruction ordering• Narrow tree scope considers only small subsets of the AST to reduce the inherent complexity of code generation
sexta-feira, 4 de Dezembro de 2009
Codegen Workflow
• Preprocessing : AST node refactorings ( YARV doesn’t do this )• Codegen : Nodes to instruction sequences• Postprocessing : Generated instruction sequences replaced with optimal ones - compiled instruction sequences and peephole optimization
• Pre and Postprocessing phases may benefit from multiple passes
sexta-feira, 4 de Dezembro de 2009
VM Internals
sexta-feira, 4 de Dezembro de 2009
Symbol (Hash) Table• Access to int/char indexed values in almost constant time with a hash table• Lookup of methods, ivars, global vars, encodings, VM instructions etc.• Table defaults to 11 bins and max 5 entries per bin.Bins count can increase.
• Sequential Lookup inside bins, thus slow down for a density of > 5
sexta-feira, 4 de Dezembro de 2009
Symbols - VMNS :-)• An entity with both a String and Number representation• It does NOT contain a String or Number, simply points to a hash table entry• Developer identifies by name, VM identifies by it’s numeric representation• Immutable (4 bytes per Symbol) for performance benefits • DNS anology : developers prefer named entities, runtime prefers numerical representations
sexta-feira, 4 de Dezembro de 2009
VM Opcodes• Stateless functions that operate on a Stack Machine• 79 instructions as of Dec 4, 2009• Notation : instruction / opcode / operands
sexta-feira, 4 de Dezembro de 2009
Instruction Categories
• variable : get or set local variable• put : push an object onto the stack• stack : pop from stack, empty the stack• setting : is a given variable defined ?• class/module : define a class / module• method/iterator : invoking methods, calling blocks• exception : • jump : control flow branching• optimization : redefines +, <<, * etc. in some cases
sexta-feira, 4 de Dezembro de 2009
Pure Stack Machine
• 2 instruction types• Move / copy value(s) between top of stack and elsewhere• Operate on the top stack element(s)• SP: top of stack pointer• BP: beginning of stack pointer
sexta-feira, 4 de Dezembro de 2009
Stack Machine • Put 3 strings on the stack, “a”, “b” and “c”• Fetch the top 3 stack elements and create an Array from them
sexta-feira, 4 de Dezembro de 2009
Instruction Sequence
• Flat instruction sequences structure is much faster than traversing tree nodes, but instruction dispatch from this pipeline can be a bottleneck
• Ability to optimize simple instructions is very important• Native code / language extensions is usually only a small subset of the hot path
• Native DB socket layer VS multi-model ORM in Ruby• Direct Threaded Dispatch : fastest way to the next VM instruction• Switch Dispatch : slower, but more portable
sexta-feira, 4 de Dezembro de 2009
Switch Dispatch• Most portable, but much slower due to excessive CPU branch mispredictions
• Executes more native instructions per opcode dispatch• Average 50% slower than Threaded Dispatch
sexta-feira, 4 de Dezembro de 2009
Direct Threaded Dispatch• Represents an instruction by the address of the routine that implements it • Jumps context to the address of the current instruction and bumps the PC • Requires first class labels and some GCC help - thus portability concern
sexta-feira, 4 de Dezembro de 2009
VM Versioning
• Each VM instance has a state counter used to scope caches to the current VM state
• Lazy cache invalidation: bumping the version value avoids any cache expiry overhead
• Expired on : const definition, constant removal, method definition, method removal and method cache changes (covered later)
sexta-feira, 4 de Dezembro de 2009
Common Optimizations*
• Constant folding• Constant propagation• Dead code elimination• Subexpression elimination• Method in-lining
sexta-feira, 4 de Dezembro de 2009
Static Analysis Notes
• Examining source code without execution• Dynamic analysis : Runtime introspection• Cannot assume much beyond literals in Ruby ...• Constants can be redefined• Open classes imply methods can be redefined at any time• Object#method_missing• Methods don't have an explicit return type
sexta-feira, 4 de Dezembro de 2009
Constant Folding
• Compile time constant expression evaluation
• Strength reductions : replace operationswith cheaper ones
• Null sequences : operations that can beremoved
• Very hard to pull off due to the dynamicnature of the Ruby spec
sexta-feira, 4 de Dezembro de 2009
• Remove code segments without data flow• Works very well with static analysis, but tricky to pull off in Ruby
Code Elimination
sexta-feira, 4 de Dezembro de 2009
• Expression reuse by extractingto a temporary variable
Subexpression elimination
sexta-feira, 4 de Dezembro de 2009
• Replace a literal variable referencewith it’s value
Constant Propagation
sexta-feira, 4 de Dezembro de 2009
• Replaces a method call with it’s body to reduce function calloverhead
• Very efficient in iterator contexts• Opportunity for further optimization• Not a silver bullet - excessive in-liningcan overload instruction cache
• Some cases change semantics
In-lining
sexta-feira, 4 de Dezembro de 2009
• Copies a method to replace a commoncall pattern
• Identified with static analysis, thusof limited use to Ruby
Cloning
sexta-feira, 4 de Dezembro de 2009
• Replace generated instruction sequenceswith more efficient ones
• Benefits is directly proportional tothe quality of the code generated
• Removes useless flow control
Peephole Optimization
sexta-feira, 4 de Dezembro de 2009
Object Model
sexta-feira, 4 de Dezembro de 2009
Object Requirements
• Identity : unique identifier to represent the object at runtime• Stateful : ability to maintain state• Methods : exposes methods to change / query object state
sexta-feira, 4 de Dezembro de 2009
Base Object Structure• Pointer type that represent addresses to language structures• Pointer cast dereferences VALUE to an object structure• RBASIC(obj)->flags; / * ((struct RBasic *)obj) -> flags * /• Flags: frozen, marked, tainted etc.
sexta-feira, 4 de Dezembro de 2009
Classes / modules• Symbol tables for methods, class and instance variables• Class / module distinction through flags• RCLASS(a_str)->ptr.super #=> Object• RCLASS(a_fixnum)->ptr.super #=> Integer
sexta-feira, 4 de Dezembro de 2009
Immediates
• Small enough to fit in a VALUE• No Runtime casting overheads• nil = 4• true = 2• false = 0 • Symbols• Fixnums <= 30 bits• Float, Bignum are complex objects, hence poor FP benchmarks• RFLOAT(float_obj)->float_value #=> a double
sexta-feira, 4 de Dezembro de 2009
Object Layout
• Assuming a 32bit architecture ....• sizeof(VALUE) is 4 bytes• Objects are even - multiples of 4• Symbols are even - multiples of 8• Integers are odd• Immediates < 4
sexta-feira, 4 de Dezembro de 2009
Mutable Objects• Mutable Strings and Arrays require the ability to shrink / grow capacity • Allocates slightly more memory than is required to represent object data in order to avoid malloc, realloc and memmove operations in common cases.
• Capacity for short strings and small arrays : “str” and %w(s t r)sexta-feira, 4 de Dezembro de 2009
Shared Objects• Literal declarations of Arrays and Strings is shared amongst instances • Avoids duplicates with this “copy-on-write” (COW) scheme• Attempt to modify creates a copy to the object, and modifies the copy
sexta-feira, 4 de Dezembro de 2009
Object Method Dispatch• Loose typing and open classes means that method calls could never be reduced to a single CALL instruction
• Method dispatch in OO languages requires methods to be searched for, on the object itself, superclasses etc.
sexta-feira, 4 de Dezembro de 2009
Call VS Send
• object.__send__(:method)• We don’t call functions / routines, rather send a command or query message to an object
• Ruby methods always return a value, thus RPC style messaging• Method cache is like a router • Method redefinition clears the method cache / router • “Routing” overhead for subsequent method calls
sexta-feira, 4 de Dezembro de 2009
Cache - before include
sexta-feira, 4 de Dezembro de 2009
Cache - after include• ALL methods on ALL classes invoked since VM startup is expired• DON’T extend / include in a request / response cycle• Rails busts the method cache multiple times on boot
sexta-feira, 4 de Dezembro de 2009
Method cache - Warm• Average 95% hit rate
sexta-feira, 4 de Dezembro de 2009
Instance Variables
• Optimization : the first 3 ivars is embedded on the object, iow. no symbol table lookups required
• Index table per class VS a symbol table per object on MRI 1.8• Index table is shared by all instances of the same class• Saves on the memory footprint of a table per instance
sexta-feira, 4 de Dezembro de 2009
Garbage Collection
sexta-feira, 4 de Dezembro de 2009
Process Memory Layout
• Code segment : executable code, read only area• Stack segment : stack storage, addressed with stack pointers• Heap : stretch of memory available for program / developer use
sexta-feira, 4 de Dezembro de 2009
malloc / free layout• Free chunks == the free list• Linear search overhead to find free chunks
sexta-feira, 4 de Dezembro de 2009
a better layout• Free chunks indexed by size intervals
sexta-feira, 4 de Dezembro de 2009
Garbage Collection
• Objects allocated explicitly on the heap• Automatically reclaim memory chunks not accessible from the root set
• Root set : C stack, global vars, global constants (accessible without pointer scanning)
• Unreachable hooks : variable assignment (nil), method return etc.• Stop the World : halts execution to reclaim memory, very disruptive when in the hot path
• Incremental : some collection actions occur for each allocation, smoother and suitable for realtime requirements
sexta-feira, 4 de Dezembro de 2009
GC Algorhitms
• Most scripting languages implements either of the following• Mark and Sweep : identifies reachable chunks and assume remainder is garbage (concerned with garbage)
• Stop and Copy : 2 heap spaces, copies reachable chunks to the new active heap area (concerned with live chunks)
sexta-feira, 4 de Dezembro de 2009
GC Issues
• Memory fragmentation• Dangling pointers• Memory leaks form incomplete recycling (circular garbage and conservative GC)
• Bursty allocation• Knowledge of pointer and chunk layouts required
sexta-feira, 4 de Dezembro de 2009
Ruby heap layout• Multiple heaps, referenced through the heap list• Heaps are freed when empty, IF all slots is tagged free• Ballpark : Rails allocates 4 to 6 heaps on startup
sexta-feira, 4 de Dezembro de 2009
Per heap slots layout• Each slot references a single object• 10 000 slots per Ruby heap• Threshold of 4096 free slots per heap• Free list points to the next free slot
sexta-feira, 4 de Dezembro de 2009
Heaps and slots layout
sexta-feira, 4 de Dezembro de 2009
Pointer Layout• Pointer layout of both the program data area and heap is self describing• RVALUE union can accommodate any ruby object, Ruby frames, global variable structure etc. is well defined
• 20 bytes (32bit arch) of Ruby heap space is require to represent a slot sexta-feira, 4 de Dezembro de 2009
Ruby Heap VS OS Heap• Slot points to the actual object data, on the OS / system heap• 20 byte (32bit arch) slot references an eg. 2MB chunk on the system heap• RVALUE union can accommodate any ruby object, Ruby frames, global variable structure etc. is well defined
• 20 bytes of Ruby heap space is require to represent a slot sexta-feira, 4 de Dezembro de 2009
CRuby: Mark and Sweep
• Conservative : cannot determine with certainty if a given value is a pointer or not and assume it’s in use
• Two phase implementation• Mark phase : marks all reachable objects from the current program context
• Sweep phase : iterate through the object space and frees all objects not marked + unmark the marked ones
sexta-feira, 4 de Dezembro de 2009
Pros and Cons
• Pauses program execution• Work is proportional to the heap size• Prone to memory fragmentation (no compaction)• Recursive• Every 8MB allocated triggers GC• 8m malloc calls also triggers GC• Frees all* memory that can be freed
sexta-feira, 4 de Dezembro de 2009
Source representation
sexta-feira, 4 de Dezembro de 2009
Objectspace
sexta-feira, 4 de Dezembro de 2009
Objectspace - marked
sexta-feira, 4 de Dezembro de 2009
Objectspace after sweep
sexta-feira, 4 de Dezembro de 2009
Generational GC
• Vast majority of objects are short lived ( 80% + )• Expensive to continuously account for long lived objects• Partition objects by age and collect short lived ones more frequently OR
• Restrict GC to the most recently modified slots• Perform a full GC only when the younger generation fails to meet current memory requirements
sexta-feira, 4 de Dezembro de 2009
Context Switches
sexta-feira, 4 de Dezembro de 2009
Threading
• First CRuby to support native OS Threads• Ruby thread == pthread• Scheduling, synchronization and create delegated to syscalls, which implies a user / kernel space context switch
• Can use multiple CPU cores - NOT at the same time though• No parallel execution - Global VM Lock (GVL)• ... although MacRuby doesn’t have a GVL
sexta-feira, 4 de Dezembro de 2009
Global VM Lock (GVL)
• Thread that owns the GVL is allowed to execute• Blocking operations should release the GVL to not block the process• Also released during scheduling• Allows for easy C extensions - author doesn’t have to concern with synchronization
• The Kernel’s better suited for load balancing multiple processes than most developers can squeeze from a single process
• Constraintless Threading is a weapon of mass destruction• Effect on existing app performance that rely on user space threads from MRI 1.8 may be significant
• Unix pipes are often the best scheduler ....
sexta-feira, 4 de Dezembro de 2009
Releasing the GVL
• Internal API exposed to release the GVL• Blocking function : slow system call / computation• Unblock function : called on Thread interrupt • Dangerous territory - look for alternatives first• Cannot access Ruby VALUEs in blocking functions • No exception handling
sexta-feira, 4 de Dezembro de 2009
Blocking VM Operations
• IO : potentially blocking reads / writes• DNS resolution / connects : often has a lot more handshake overhead
• Expensive Bignum computations blocked 1.8 interpreters• File locking• Process#waitpid
sexta-feira, 4 de Dezembro de 2009
Fibers
• Coroutines for lightweight concurrency (4k stack size)• Very fast user space context switches• Cooperative scheduling required - also not concurrent• Common use cases being generators or blocking IO eg. Neverblock• Fiber.yield pauses the activation record, which keeps context across multiple calls
sexta-feira, 4 de Dezembro de 2009
The Road Ahead
• MVM: Multiple Virtual Machines• Shared process space, cannot share state• Distribute VMs across multiple cores• Message passing / channel API for inter VM communication• Many Ruby deployments are not thread safe - MVM is better suited for this use case
• Thread safe framework does not guarantee a thread safe application ...
sexta-feira, 4 de Dezembro de 2009
Questions ?
sexta-feira, 4 de Dezembro de 2009
Thanks for Listening !
@methodmissinghttp://github.com/methodmissinghttp://www.methodmissing.com
sexta-feira, 4 de Dezembro de 2009
Top Related