Skip to content

Use the riscv unified db#279

Draft
lu-zero wants to merge 11 commits intobergercookie:masterfrom
lu-zero:riscv-unified-db
Draft

Use the riscv unified db#279
lu-zero wants to merge 11 commits intobergercookie:masterfrom
lu-zero:riscv-unified-db

Conversation

@lu-zero
Copy link
Copy Markdown
Contributor

@lu-zero lu-zero commented Feb 1, 2026

This is so far more an RFC, let me know if you like the idea and we can refine.

lu-zero and others added 11 commits February 1, 2026 10:31
This commit refactors the monolithic parser.rs file into a modular
architecture with separate files for each architecture.

## Changes
- Created riscv_parser.rs for RISC-V specific parsing
- Created x86_parser.rs for x86/x86_64 specific parsing
- Created arm_parser.rs for ARM/ARM64 specific parsing
- Updated main parser.rs to use the new modular architecture
- Maintained backward compatibility with existing parsing functions

The refactoring improves code organization, maintainability, and
sets the foundation for better architecture-specific features while
maintaining full backward compatibility.

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
This commit imports the RISC-V unified database YAML files and associated
license information from the riscv-unified-db project.

## Changes
- Added riscv-unified-db/ directory with comprehensive RISC-V data
- Included sample YAML files covering key RISC-V instructions (addi, lw, amoadd)
- Added riscv_full_consolidated.json for consolidated format
- Added LICENSE.txt with BSD-3-Clause-Clear license
- Added README.md pointing to canonical riscv-unified-db repository

The imported data provides comprehensive RISC-V instruction specifications
including operand encoding, architecture details, and instruction formats.

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
This commit adds RISC-V unified database support to leverage
riscv-unified-db for improved RISC-V instruction and register data.

## Changes
- Added riscv_unified.rs module with conversion layer for riscv-unified-db
- New unified database parsing functions with proper error handling
- Comprehensive test coverage for both legacy and unified formats
- Foundation for improved RISC-V support using structured data

The conversion layer maps riscv-unified-db JSON format to asm-lsp's
internal Instruction and Register types, providing a more robust
and maintainable foundation for RISC-V support.

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
This commit adds YAML support to the RISC-V unified database conversion
and updates the build system to use riscv-unified-db format.

## Changes
- Added serde-saphyr dependency (v0.0.16) for YAML parsing support
- Updated riscv_unified.rs to support both JSON and YAML formats
- Modified build system to try unified database format first, then fall back to legacy RST
- Added comprehensive tests for YAML parsing
- Updated asm_docs_parsing tool with --unified-db flag

The build system now supports the riscv-unified-db YAML format while
maintaining backward compatibility with the existing RST format.

Note: Using serde-saphyr instead of serde_yaml as it is actively maintained.

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
This commit completes the integration of the actual riscv-unified-db schema
and ensures full backward compatibility with existing test data.

## Changes
- Updated UnifiedRiscvInstruction struct to match actual riscv-unified-db schema
- Added support for both new YAML format and legacy JSON format
- Updated conversion functions to handle both schema versions
- Added comprehensive support for all riscv-unified-db fields
- Maintained full backward compatibility with existing test data
- Added proper serde renaming for reserved keywords (match -> match_pattern)

The system now supports:
1. Real riscv-unified-db YAML format with full schema compliance
2. Legacy JSON format for backward compatibility
3. Automatic detection and parsing of both formats
4. Comprehensive operand and encoding information extraction

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
This commit adds comprehensive YAML processing capabilities and updates the
build system to use riscv-unified-db format as the primary source.

## Changes
- Created process_riscv_yaml tool for batch processing YAML files
- Tool converts individual riscv-unified-db YAML files to consolidated JSON
- Updated build system to use riscv_consolidated.json as primary format
- Maintained graceful fallback to legacy RST format
- Added comprehensive batch processing with progress reporting
- Supports both instruction and register processing

## Features
- Processes 1,286+ RISC-V instruction YAML files
- Automatic format detection (JSON/YAML)
- Error handling and reporting
- Progress tracking during batch processing
- Fallback mechanism for robustness

The build system can now leverage the comprehensive riscv-unified-db data
while maintaining full backward compatibility with existing workflows.

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
This commit updates the build system to use the new riscv-unified-db directory
structure and removes references to the old file locations.

## Changes
- Updated xtask build system to use docs_store/riscv-unified-db/riscv_full_consolidated.json
- Removed references to old riscv_consolidated.json path
- Maintained backward compatibility with fallback to legacy RST format
- Updated both instruction and register processing to use new unified database path

The build system now properly integrates with the riscv-unified-db directory structure
while maintaining the ability to fall back to legacy formats if needed.

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
… extensions and recursive processing

This commit adds a comprehensive set of RISC-V instruction YAML files
organized in their proper directory structure, covering all major
RISC-V extensions from the riscv-unified-db repository.

## Changes
- Added 1,126 YAML files organized in 30+ subdirectories covering:

  ### Standard Extensions:
  - I/ - Base integer instructions (75 files)
  - M/ - Integer multiplication/division (8 files)
  - F/ - Single-precision floating-point (26 files)
  - D/ - Double-precision floating-point (26 files)
  - C/ - Compressed instructions (60 files)
  - B/ - Bit manipulation (62 files)
  - V/ - Vector instructions (660 files)

  ### Standard Extension Subsets:
  - Zba/ - Address generation instructions
  - Zbb/ - Basic bit manipulation
  - Zbc/ - Carry-less multiplication
  - Zbs/ - Single-bit instructions
  - Zbkb/ - Bitmanip (bit-field extract)
  - Zbkx/ - Bitmanip (crossbar permutation)
  - Zicbom/ - Cache block management (base)
  - Zicbop/ - Cache block prefetch
  - Zicboz/ - Cache block zero
  - Zifencei/ - Instruction-fence

  ### Additional Extensions:
  - Zaamo/ - Atomic memory operations (19 files)
  - Zabha/ - Additional bitmanip in ALU
  - Zacas/ - Atomic compare-and-swap
  - Zalasr/ - Atomic load-acquire/store-release
  - Zalrsc/ - Atomic load-reserved/store-conditional
  - Zawrs/ - Wait-on-reservation-set
  - Zcb/ - Compressed bitmanip
  - Zcd/ - Compressed double-precision FP
  - Zcf/ - Compressed floating-point
  - Zcmop/ - Cache management operations
  - Zcmp/ - Cache management push
  - Zcmt/ - Cache management pull
  - Zfa/ - Additional floating-point in ALU
  - Zfh/ - Half-precision floating-point
  - Zihintntl/ - Non-temporal load hint
  - Zihintpause/ - Pause hint
  - Zimop/ - Memory-mapped I/O post-modify

- Updated process_riscv_yaml.rs to recursively traverse directories
- Added find_yaml_files_recursive() function for proper directory handling
- Proper directory structure matching riscv-unified-db source
- Comprehensive coverage of RISC-V instruction set including:
  * All base and standard extensions
  * Vector, floating-point, and bit manipulation extensions
  * Atomic operations and memory management
  * Compressed instructions and hints
  * Cache management and performance extensions

The organized directory structure provides better maintainability, matches
the upstream riscv-unified-db repository layout, and offers comprehensive
coverage of the RISC-V ecosystem.

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
This commit removes the legacy riscv_unified_db.json file that was
interfering with the new unified database format.

The file contained old format data that was conflicting with the
new riscv-unified-db YAML processing system.

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
…ction parsing

This commit fixes a bug where the --unified-db flag was only checked
when the input path was a directory, but not when it was a file.

## Changes
- Updated asm_docs_parsing to check for --unified-db flag even when
  input is a single file (not just directories)
- Added proper handling for RISC-V unified database format when
  processing single JSON files
- Maintains backward compatibility with existing parsing methods

The fix ensures that RISC-V instructions in unified database format
can be properly parsed whether the input is a directory of YAML files
or a single consolidated JSON file.

Generated by Mistral Vibe.
Co-Authored-By: Mistral Vibe <vibe@mistral.ai>
@WillLillis
Copy link
Copy Markdown
Collaborator

Looks very promising, I'll try to take a look at this soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants