powdr-OpenVM: end-to-end autoprecompiles

We recently presented an approach to automatically synthesize precompiles for zkVMs. We were able to see the potential of such autoprecompiles in those early experiments in terms of total committed cells. Now, we are excited to share end-to-end proof generation performance numbers. The experiments presented here show that manually written precompiles for zkVMs might become artifacts of the past sooner than we think.

update June 5th: After more optimizations were integrated, the benchmark using only autoprecompiles is now more performant than the manual Keccak precompile! See the updated results table below.

powdr 🤝 OpenVM

powdr labs and Axiom have been collaborating to take advantage of the flexibility provided by both powdr and OpenVM, stacking them in order to achieve better performance than the individual parts. We use OpenVM's base RISC-V zkVM implementation, enhanced with powdr middleware providing autoprecompiles and various compile-time chip optimizations based on static analysis and constraint solving.

Keccak + autoprecompiles

In this experiment we report on the end-to-end proof generation time for 25000 Keccak iterations in a Rust guest in 3 scenarios:

Software. Uses tinykeccak for Keccak computation, no acceleration.
Manual precompile. Uses the manually written Keccak precompile used in OpenVM.
Autoprecompile. Uses powdr's autoprecompile generation and optimization to accelerate the software version.

The proofs are computed from continuations segments up to one single STARK that compresses the recursion tree. The experiment was run on CPU using a machine with Intel Xeon Gold 5412U and 256GB DDR5 RAM.

Results

The table below shows the performance and comparison results.

	Segments	Proof time (s)	Times faster than software	Times slower than manual precompile
Software	49	1,380.65	1.00	12.22
Manual precompile	4	112.95	12.22	1.00
Autoprecompiles - May 21st	5	253.28	5.45	2.24
Autoprecompiles - June 5th	3	106.5	12.96	0.94

As we can see from the number of segments, both scenarios with precompiles drastically reduce the number of cycles compared to the software case, as expected. This translates to better proof generation time for app proofs, which compute one STARK per segment.

Precompiles are usually complex chips with many columns and constraints. The number of columns, especially, can have a negative effect in the recursion layer, since it affects STARK verification. Considering the full end-to-end proof generation time we can see that the manual precompile achieves 12.22x improvement versus the software version. The fourth row of the table is the main object of this experiment. The automatically synthesized autoprecompiles achieve a total improvement of 12.96x over the software version, and provide a 6% speedup compared to the manual Keccak precompile! This is a groundbreaking result that shows the full strength of autoprecompiles.

Note that these results were updated on June 5th, where the initially published version (third row of the table) reported autoprecompiles as slower than the manual precompile.

We expect these results to apply to other benchmarks as well, making manually written zkVM precompiles redundant in the presence of safer (FV? 👀), faster and more productive autoprecompiles.

Note that this approach applies not only to precompiles that already exist, but to any costly guest program code that benefits from circuit acceleration.

Upcoming

powdr is fully committed to enhancing existing zkVMs' performance and security using static analysis and other compiler-based techniques. If you are interested in learning more or collaborating reach out to us on Matrix, Github, or Twitter.

Future work includes:

Benchmarking: It is now clear that autoprecompiles have similar or better efficiency than manually written precompiles, without the need for developer intervention. We now intend to perform more benchmarking in the presence of manual precompiles, for example Ethereum L1 verification.
Moar Optimizations: We are constantly working on more optimizations that propagate to any autoprecompile, including:
- Stack accesses
- Heap accesses
- Chip stacking
- Optimal column usage/inlining
Formal Verification: Synthesizing custom circuits for each guest is unfeasible from an engineer's perspective, both from the development time and security risk introduced by hundreds or thousands of precompiles. Automatically synthesizing them also implies that formally verifying their soundness scales to all users. Because of the uniform shape that autoprecompiles have, re-using or building new tooling to analyze and formally verify them should be much simpler than doing the same for manual precompiles.