• Welcome to the new COTI server. We've moved the Citizens to a new server. Please let us know in the COTI Website issue forum if you find any problems.

Vilani Programming Languages

Eh, most of the time, particularly in modern languages, the stack sets up the call frame, but once invoked, the code uses static offsets in to it with vary little actual active stack manipulation going on.
I think we might be arguing the same thing. There is a world of difference between the way a language like C uses the stack and the way that a stack based ISA like FORTH uses it. In a true stack machine, the stack is implicit in the opcode. All instructions implicitly read from the top of the stack and drop the result onto the stack. You don't need to specify anything. This also means the instructions themselves contain very little information about data flows that can be used for compile-time or run-time optimisation.

On a C-style stack, the stack frames are just chunks of memory. The CPU still has to read data into registers in order to work on it. Compilers can optimise register allocation to minimise memory traffic, and the decode stages of the CPU can do a relatively efficient graph analysis based on register usage to identify instructions that are independent of each others results and use this to dispatch a single instruction stream across multiple execution units. This allows more than one instruction to be executed concurrently where there are no dependencies between instructions.

A stack machine like FORTH or Java has everything going on and off the stack, and requires you to interpret what the code is doing in order to work out what might be able to execute in parallel, whereas with a register based machine you can do the dependency solely in terms of dependencies between input and output registers on the instructions. Instructions with no dependencies on the output registers of each other can be executed in parallel. This type of graph analysis can be done in hardware quickly enough to run in the instruction pipeline of a modern CPU chip.

In practice, JIT systems for java convert the (stack based) java bytecode to a virtual register machine when compiling to native code, essentially constructing data flow trace called a trace tree and then allocating data items identified through this to virtual registers. Once the individual data items are identified and allocated to virtual registers the same sort of optimisations that can be done as with a natively register based architecture. However, unlike a native register-based architecture this type of analysis is too complex to be done on the fly by the decode functions on a CPU chip, so the translation has to be done in software, necessitating the JIT compilation process.

[ . . . ]so here we have, effectively, a register array used as a stack.
It's not really a valid analogy - the zero page has its own addressing modes which are slightly faster than the rest of the address space, but it's still memory addressable via an offset in an index register, and you still have to load into the accumulator to do anything with it.

I did have a discussion of SPARC register windows and caching the top of the stack, plus the differences between C-style and FORTH style stack usage as they affect the ability to optimise code, but the system seems to have lost the posting and I can't be arsed re-typing it - on a level that you can't begin to comprehend.

So, here's a blurb from Sun's docs about how register windows work on a Sparc chip. http://icps.u-strasbg.fr/people/loechner/public_html/enseignement/SPARC/sparcstack.html

It's about as close as you will get to what you describe but it's still C-style stack usage and not a true stack machine.

Here is a blurb about trees and their role in JIT compilation architecture.

https://wiki.aalto.fi/download/attachments/40010375/jit_tt_presentation.pdf

TL;DR. You can't optimise stack machines for superscalar operation without doing static code analysis that's too expensive to do at runtime (and essentially converts it to a register VM anyway), but they are still interesting for low cost/low power operations and you can buy (for example) ARM chips with built-in hardware JVMs. For performance-senstitive operations, just-in-time compilation gives you better performance.
 
Last edited:
Back
Top