Modern CPUs employ all kinds of clever techniques to improve
instruction level parallelism (ILP). I was wondering if it
makes sense to try to employ similar techniques in the
virtual machines used to execute byte code produced by language
compilers.
By that I mean what if virtual machines were to examine byte code
streams to detect when it would be safe to execute multiple
byte codes concurrently? Then, based on its findings, the virtual
machine would execute as many byte codes concurrently as is safe.
I have no idea if the overhead of the byte code examination would
exceed any advantage of the concurrent execution, although it's
important to point out that this examination would only have to
be done once, and the results could somehow be stored along with
the byte code. Of course, if the byte code changes the examination
would have to be done again.
I'm also worried that internal virtual machine locking requirements
might make this idea infeasible. For example, in a virtual machine with
a global interpreter lock, would it be possible for there to be any concurrent execution?
This idea, if it works, would be a great way to take advantage of
multiple cores without having to rewrite any user code. The big
question is whether it would work.
Modern CPUs employ all kinds of clever techniques to improve
instruction level parallelism (ILP). I was wondering if it
makes sense to try to employ similar techniques in the
virtual machines used to execute byte code produced by language
compilers.
By that I mean what if virtual machines were to examine byte code
streams to detect when it would be safe to execute multiple
byte codes concurrently?
I have no idea if the overhead of the byte code examination would
exceed any advantage of the concurrent execution, although it's
important to point out that this examination would only have to
be done once, and the results could somehow be stored along with
the byte code. Of course, if the byte code changes the examination
would have to be done again.
This idea, if it works, would be a great way to take advantage of
multiple cores without having to rewrite any user code. The big
question is whether it would work.
Modern CPUs employ all kinds of clever techniques to improve
instruction level parallelism (ILP). I was wondering if it
makes sense to try to employ similar techniques in the
virtual machines used to execute byte code produced by language
compilers.
By that I mean what if virtual machines were to examine byte code
streams to detect when it would be safe to execute multiple
byte codes concurrently? Then, based on its findings, the virtual
machine would execute as many byte codes concurrently as is safe.
I have no idea if the overhead of the byte code examination would
exceed any advantage of the concurrent execution, although it's
important to point out that this examination would only have to
be done once, and the results could somehow be stored along with
the byte code. Of course, if the byte code changes the examination
would have to be done again.
I'm also worried that internal virtual machine locking requirements
might make this idea infeasible. For example, in a virtual machine with
a global interpreter lock, would it be possible for there to be any concurrent execution?
This idea, if it works, would be a great way to take advantage of
multiple cores without having to rewrite any user code. The big
question is whether it would work.
Modern CPUs employ all kinds of clever techniques to improve
instruction level parallelism (ILP). I was wondering if it
makes sense to try to employ similar techniques in the
virtual machines used to execute byte code produced by language
compilers.
By that I mean what if virtual machines were to examine byte code
streams to detect when it would be safe to execute multiple
byte codes concurrently? Then, based on its findings, the virtual
machine would execute as many byte codes concurrently as is safe.
I'm also worried that internal virtual machine locking requirements
might make this idea infeasible. For example, in a virtual machine with
a global interpreter lock, would it be possible for there to be any >concurrent execution?
This idea, if it works, would be a great way to take advantage of
multiple cores without having to rewrite any user code. The big
question is whether it would work.
I've heard/read several times that byte-code micro-optimizations are not >worth the trouble.
Here is a paper from 2015 on a related subject
("Branch prediction and the performance of interpreters -- Don't trust >folklore"):
https://ieeexplore.ieee.org/document/7054191
(you may find the corresponding research report if you can't access the
full text from that site). It shows how far processors have gone in what
was once left to the program designer.
Alain Ketterlin <alain@universite-de-strasbourg.fr> writes:
I've heard/read several times that byte-code micro-optimizations are not >>worth the trouble.
Apart from the paper below, which is discussed below, what else?
[...]https://ieeexplore.ieee.org/document/7054191
On that I can only say: Not all research papers are trustworthy.
Catchy titles may be a warning signal.
I did my own measurements on a Haswell (the same CPU they used in the
paper) and published them in
<2015Sep7.142507@mips.complang.tuwien.ac.at> (<http://al.howardknight.net/?ID=158702747000> for those of you who
don't know what to do with Message-IDs).
|Why are the results here different from those in the paper?
|1) Different Interpreter 2) different benchmarks.
Modern CPUs employ all kinds of clever techniques to improve
instruction level parallelism (ILP). I was wondering if it
makes sense to try to employ similar techniques in the
virtual machines used to execute byte code produced by language
compilers.
On Saturday, October 22, 2022 at 11:51:31 AM UTC-7, nob...@gmail.com wrote:
Modern CPUs employ all kinds of clever techniques to improve
instruction level parallelism (ILP). I was wondering if it
makes sense to try to employ similar techniques in the
virtual machines used to execute byte code produced by language
compilers.
Seems to me it is not that parallelizing byte codes that is
a dumb idea, but byte codes themselves are.
This was known when Alpha replaced VAX. Work on making faster VAX
systems was stuck with the byte oriented instruction stream which was impossible to pipeline.
So it seems that the real answer is to devise a word oriented, or in
other words RISC, virtual machine. (Actual RISC hardware might not be
a good choice.)
Seems to me it is not that parallelizing byte codes that is
a dumb idea, but byte codes themselves are.
This was known when Alpha replaced VAX. Work on making faster VAX
systems was stuck with the byte oriented instruction stream which was >impossible to pipeline.
...Alain Ketterlin <alain@universite-de-strasbourg.fr> writes:
I've heard/read several times that byte-code micro-optimizations are not >>>worth the trouble.
This is not directly related to the paper I mention later. I was talking >about optimizing bytecode vs. compiler optimizations. I know of no >interpreter doing elaborate static byte-code optimization.
https://ieeexplore.ieee.org/document/7054191
I'm glad it works for you.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 475 |
Nodes: | 16 (2 / 14) |
Uptime: | 37:08:10 |
Calls: | 9,490 |
Calls today: | 1 |
Files: | 13,617 |
Messages: | 6,121,182 |