Skip to content

Conversation

AFOliveira
Copy link
Collaborator

Implements a new backend generator for SystemVerilog output, matching
the exact format used by riscv-opcodes/inst.sverilog. This provides
direct compatibility with hardware designs using the riscv-opcodes
SystemVerilog package format.

Features:

  • Generates SystemVerilog package with instruction and CSR definitions
  • Outputs 32-bit instruction patterns with proper bit encoding
  • Handles compressed (16-bit) instructions correctly
  • Supports all standard RISC-V extensions
  • Integrated with the ./do build system as gen:sverilog task

The generator produces output identical to riscv-opcodes format:

  • Instructions as 'localparam [31:0] NAME = 32'bpattern'
  • CSRs as 'localparam logic [11:0] CSR_NAME = 12'haddr'
  • Proper alignment and formatting for readability

Tested against riscv-opcodes/inst.sverilog to ensure format compatibility.

    UDB currently allows for 2 schemas for instructions with the on-going
    sub-type development. Some instructions now use a 'format' field with
    'opcodes' instead of the traditional 'encoding' field with 'match' and
    'variables'. This caused generators to fail when processing these
    instructions as they couldn't extract the bit pattern matching information.

    Changes:
    - Added build_match_from_format() function to convert format.opcodes
      to match strings compatible with existing generator logic
    - Enhanced encoding detection in load_instruction() to handle both
      old schema (encoding.match) and new schema (format.opcodes)
    - Maintains full backward compatibility with existing instructions
    - No functional changes to generated output format

    The fix ensures generators can process the complete UDB instruction
    set regardless of which schema format individual instructions use.

    Signed-off-by: Afonso Oliveira <[email protected]>
Implements a new backend generator for SystemVerilog output, matching
the exact format used by riscv-opcodes/inst.sverilog. This provides
direct compatibility with hardware designs using the riscv-opcodes
SystemVerilog package format.

Features:
- Generates SystemVerilog package with instruction and CSR definitions
- Outputs 32-bit instruction patterns with proper bit encoding
- Handles compressed (16-bit) instructions correctly
- Supports all standard RISC-V extensions
- Integrated with the ./do build system as gen:sverilog task

The generator produces output identical to riscv-opcodes format:
- Instructions as 'localparam [31:0] NAME = 32'bpattern'
- CSRs as 'localparam logic [11:0] CSR_NAME = 12'haddr'
- Proper alignment and formatting for readability

Tested against riscv-opcodes/inst.sverilog to ensure format compatibility.

Signed-off-by: Afonso Oliveira <[email protected]>
@AFOliveira
Copy link
Collaborator Author

CC @jordancarlin If you can test this or spot any errors/improvements, it would be awesome. If you are already working on something similar I can drop this, I just wanted to give a little help and improve UDB's outputs

Copy link

codecov bot commented Sep 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 46.05%. Comparing base (a83d966) to head (cfbf86c).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1054   +/-   ##
=======================================
  Coverage   46.05%   46.05%           
=======================================
  Files          11       11           
  Lines        4942     4942           
  Branches     1345     1345           
=======================================
  Hits         2276     2276           
  Misses       2666     2666           
Flag Coverage Δ
idlc 46.05% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jordancarlin
Copy link
Contributor

CC @jordancarlin If you can test this or spot any errors/improvements, it would be awesome. If you are already working on something similar I can drop this, I just wanted to give a little help and improve UDB's outputs

Thanks @AFOliveira. I started on it a while ago, but haven't touched it since I ran into #893, so let's go with this version since it is more up to date.

This is looking great so far! I have a few comments with differences from the riscv-opcodes version and then some additional comments for things we should improve compared to the riscv-opcodes version since we're reimplementing it anyway:

  • Differences from riscv-opcodes to address
    • There are a number of instructions that have _RV32 versions even though the encoding is the same for both rv32 and rv64. It would be good to avoid this if possible. I noticed it for AMOCAS_D_RV32, C_LD_RV32, C_LDSP_RV32, C_SD_RV32, S_SDSP_RV32, LD_RV32, and SD_RV32.
    • All of the *h CSRs have _RV32 at the end of the name. This convention makes sense for instruction encodings when they are different for rv32/rv64, but the *h CSRs only have one version so it is unnecessary. The riscv-opcodes version also does not do this, so migration will be harder.
    • Beyond missing extensions (which just need to get added to UDB), I also noticed that riscv-opcodes has C_ILLEGAL as an instruction and it is missing from the UDB generated version.
  • Improvements that we should make over the riscv-opcodes version
    • Instructions are of type localparam while CSRs are localparam logic. It would be best to standardize on one, and localparam logic is probably the right choice here.
    • *.sverilog is an unusual file extension to use. *.svh would be a more traditional choice for a SystemVerilog header file.
    • Similar to the C header, it might be worth including the exception causes in the generated SystemVerilog header as localparams.

I'll leave some implementation specific comments inline.

Comment on lines +18 to +21
if name.startswith("c."):
name = "C_" + name[2:]
# Replace dots with underscores and convert to uppercase
return name.replace(".", "_").upper()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there is no reason to special-case compressed instructions here? The c. --> C_ conversion should be handled by the standard name.replace(".", "_").upper().

def match_to_sverilog_bits(match_str, is_compressed=False):
"""Convert a match string to SystemVerilog bit pattern."""
if not match_str:
return "32'b" + "?" * 32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this throw an error/warning instead of silently generating a match string of all wildcards that will match any instruction? That would be very problematic in use because it might shadow an actual match later.

Comment on lines +40 to +42
elif len(match_str) < 32:
# For other cases, pad on the right
match_str = match_str + "-" * (32 - len(match_str))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What cases do you envision this matching? If none, we're probably better off throwing an error here.

return "32'b" + "?" * 32

# For compressed instructions (16-bit), we need to handle them differently
# The riscv-opcodes format puts the 16-bit pattern in the lower 16 bits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better for comments to stand alone instead of justifying choices based on riscv-opcodes.

Comment on lines +46 to +52
for bit in match_str:
if bit == "0":
result.append("0")
elif bit == "1":
result.append("1")
else: # '-' or any other character
result.append("?")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this whole block be simplified to just match_str.replace("-", "?")?

Comment on lines +79 to +81
else:
# If no match field, use all wildcards
match = "-" * 32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, please don't generate all wildcards if we don't find a match. A warning/error seems like a better choice here.

match = "-" * 32

# Check if this is a compressed instruction
is_compressed = name.startswith("c.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the is_compressed logic necessary since you also check if the instruction encoding is 16 bits in the match_to_sverilog_bits function? Seems like it should be fine to just use that to check instead of having both the string-based and encoding-based approaches.

)
parser.add_argument(
"--output",
default="inst.sverilog",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like this instead?

Suggested change
default="inst.sverilog",
default="riscv_decode_package.svh",

Comment on lines +122 to +126
parser.add_argument(
"--extensions",
default="A,D,F,I,M,Q,Zba,Zbb,Zbs,S,System,V,Zicsr,Smpmp,Sm,H,U,Zicntr,Zihpm,Smhpm",
help="Comma-separated list of enabled extensions. Default includes standard extensions.",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not default to a subset of (seemingly arbitrary) extensions. Not sure what the comments means when it says it defaults to the standard extensions. Similar to the generated C header, the default should be to include all extensions.

Comment on lines +127 to +132
parser.add_argument(
"--arch",
default="RV64",
choices=["RV32", "RV64", "BOTH"],
help="Target architecture (RV32, RV64, or BOTH). Default is RV64.",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated C header does not have this option. I think it makes sense to keep the same CLI interface for both of them. The default output should also include everything, not just RV64.

@AFOliveira
Copy link
Collaborator Author

Thanks for the review @jordancarlin I tend to agree with all of them. I made this on a late night flight thus the ammount of small inconsistencies. I'll fix them ASAP.

Copy link
Collaborator

@dhower-qc dhower-qc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work guys. Two high-level comments:

  • Once this is ready, I think it'd be a good candidate for golden output checking like we do with the instruction appendix
  • Not a must, but I do have a long-term vision of transitioning everything out of backends/ and into tools/ to finish up the file reorg. If it isn't too much work, we could move it there now and save some work down the road.

@AFOliveira
Copy link
Collaborator Author

  • Once this is ready, I think it'd be a good candidate for golden output checking like we do with the instruction appendix

Sure, no problem.

  • Not a must, but I do have a long-term vision of transitioning everything out of backends/ and into tools/ to finish up the file reorg. If it isn't too much work, we could move it there now and save some work down the road.

To keep the commit history organized, can we first merge these two (this PR and #1051) and then I may move the full generator folder from one place to another?

@jordancarlin
Copy link
Contributor

  • Once this is ready, I think it'd be a good candidate for golden output checking like we do with the instruction appendix

Sure, no problem.

We should also add the generated C header as a golden output to avoid issues like #893 from popping up again.

@dhower-qc
Copy link
Collaborator

To keep the commit history organized, can we first merge these two (this PR and #1051) and then I may move the full generator folder from one place to another?

Works for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants