🗿jjabrahams567

Forum

Forum
General Python Programming
which one of these bytecode instruction

which one of these bytecode instruction designs is better?

May 12, 2024 at 8:56am

I've been going down a bit of a rabbit hole trying to figure out the most optimal way to design a bytecode instruction class.
To be clear an instruction is something that holds an opcode and then a differing amount of operand values depending on the type.
After looking at a few open source projects i've noticed 2 primary type of designs:

The common polymorphic approach:

struct Operands
{ 
  virtual size_t GetNOperands() = 0;

  //U32 as the largest type of all operands (there are no S32 or above operands)
  virtual U32 GetOperand(size_t index) = 0;
  virtual void SetOperand(size_t index, U32 value) = 0;
};

struct NoOperands        : public Operands { /*impl Operands interface*/; };
struct PushInstrOperands : public Operands { /*impl Operands interface*/; U8 pushVal1; U8 pushVal2; };
struct RetInstrOperands  : public Operands { /*impl Operands interface*/; U16 retValue; };

struct Instr
{
  U8 OpCode;
  Operands* Ops;

  //possibly include a constructor that ensures the correct operand type with OpCode
};

std::vector<Instr> code = 
{
  Instr{NOP,  new NoOperands{}},
  Instr{PUSH, new PushInstrOperands{43, 43}},
  Instr{RET,  new RetInstrOperands{0}},
};

The union with encoded format strings approach:

union OperandType : char
{
  U8  = 'B',
  U16 = 'S'
  U32 = 'I';
};

static const char* InstrFormats[_N_OPCODES];
InstrFormats[NOP]  = "";
InstrFormats[PUSH] = "BB";
InstrFormats[RET]  = "S";

struct Instr 
{ 
  U8 OpCode;
  union
  {
    U8 StackOperands[ sizeof(void*) + 3 ]; // +3 because padding caused from the unaligned U8 OpCode
    void* HeapOperands;
  }; 

  size_t GetNOperands()
  {
    return strlen( InstrFormas[OpCode] );
  }

  //As an optimization any operands that takes up less than sizeof(void*) + 3 bytes can be stored in the stack array
  U32 GetOperand(size_t index) { /*...*/ }
  void SetOperand(size_t index, U32 value) { /*...*/ }

  //include templated constructor that asserts the arguments match and fit in the operands as specified in InstrFormats[OpCode]
};

std::vector<Instr> code = 
{
  Instr{NOP},
  Instr{PUSH, 24, 73},
  Instr{RET, 0},
};

Initially i believed the second approach was better because it doesn't do virtual table lookups on every API call (GetNOperands, GetOperand, SetOperan, etc.). However then i realized after all it still has to lookup the c string pointer to check the format. But then again, the format strings are going to be tightly packed in the bss section so its likely many (all?) formats will get included in a single cache line, whereas the virtual functions are longer (since they actually contain sequences of code) and probably further apart in the code section, right?

But then i've also heard many CPUs have optimizations for virtual function calls, im not sure what that involves exactly.

Also the union approach stores small operand types on the stack which is better, right? Although in order to access them you have to dereference the format string, which is equivalent of dereferencing the base pointer string which will cache the child types members so i guess its really the same thing. Unless of course the user just wants to access the operand bytes directly and doesn't have to check the format of them.

Last edited on May 12, 2024 at 9:34am

May 12, 2024 at 1:30pm

Duthomhas (13277)

You are basically looking at the difference between Intel and RISC instruction set formats. The key point, though, is that both methods are valid and work well. Pick the one you like best and run with it!

May 12, 2024 at 4:14pm

mbozzi (3943)

I've been going down a bit of a rabbit hole trying to figure out the most optimal way to design a bytecode instruction class.

What's it for? For example, are you generating instructions? Modifying a sequence of them? Are you executing them, like in an emulator?

May 12, 2024 at 4:35pm

jonnin (11494)

All I will contribute is that this is, at the end of the day, a simple problem. You have some bytes, and you just need to get them into a useful form to be consumed.

With that in mind, all you have to do with your clean implementation is not inject into it any bottlenecks. That's it .. there isn't anything inherently slow about what needs doing nor too many ways to make it 'optimal'. So, KISS. Simple, clean, and avoid screwing it up with unnecessary copying of the data (eg pointer casted to the correct struct directly rather than field by field extraction and copying into a new container) should give you great performance.

May 12, 2024 at 6:04pm

salem c (3715)

> InstrFormats[PUSH] = "BB";
1. How many operands can an opcode have?

Consider say encoding as bits in say an unsigned int, along the lines of
format = ('B'<<8) | ('B');

> return strlen( InstrFormas[OpCode] );
2. There are no run-time strings here. Just make a compile time lookup table of the number of operands per opcode.

May 12, 2024 at 6:31pm

deleted account xyzzy (5768)

make a compile time lookup table of the number of operands per opcode.

I know you don't need assistance, salem c, maybe someone else does. To help out with the creation here's an example to look at:

https://joelfilho.com/blog/2020/compile_time_lookup_tables_in_cpp/

I'll be honest the type of code I mash on regularly doesn't really require a look-up table, though how to slap one together is a good thing to know.

May 13, 2024 at 6:43pm

jonnin (11494)

Its very good to know, as tables are very fast and you can often use them to get great performance. A really simple example is just factorials.. you can compute them, and its not really that slow to do so on modern machines, but why not just look up the answer and be done with it instead? There are many things like this out in the world.

Jun 7, 2024 at 1:58pm

dhayden (5799)

I don't think either of these approaches of good. To see why, consider a program in this byte code. If you want to keep the instructions small then the second method isn't good because every instruction will be the maximum size. If you use the first method, then you you need a factory class to instantiate each instruction because that's the only way to create the virtual function table properly.

I'd start with the interpreter. Something like:

Instruction *instructionPointer;
while (true) {
   Instruction *curInstr = instructionPointer;
   instructionPointer += curInstr->size();
   curInstr->execute();
}

This will be pretty fast, but notice that it requires the Instruction class's data to be exactly what's in the instruction stream.

Topic archived. No new replies allowed.

Python

Forum

which one of these bytecode instruction designs is better?