-
Notifications
You must be signed in to change notification settings - Fork 15k
[lldb] Support disassembling RISC-V proprietary instructions #145793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
RISC-V supports proprietary extensions, where the TD files don't know about certain instructions, and the disassembler can't disassemble them. Internal users want to be able to disassemble these instructions. With llvm-objdump, the solution is to pipe the output of the disassembly through a filter program. This patch modifies LLDB's disassembly to look more like llvm-objdump's, and includes an example python script that adds a command "fdis" that will disassemble, then pipe the output through a specified filter program. This has been tested with crustfilt, a sample filter located at https://github.com/quic/crustfilt . Changes in this PR: - Decouple "can't disassemble" with "instruction size". DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for valid disassembly, and has the size as an out paramter. Use the size even if the disassembly is invalid. Disassemble if disassemby is valid. - Always print out the opcode when -b is specified. Previously it wouldn't print out the opcode if it couldn't disassemble. - Print out RISC-V opcodes the way llvm-objdump does. Add DumpRISCV method based on RISC-V pretty printer in llvm-objdump.cpp. - Print <unknown> for instructions that can't be disassembled, matching llvm-objdump, instead of printing nothing. - Update max riscv32 and riscv64 instruction size to 8. - Add example "fdis" command script. Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-lldb Author: None (tedwoodward) ChangesRISC-V supports proprietary extensions, where the TD files don't know about certain instructions, and the disassembler can't disassemble them. Internal users want to be able to disassemble these instructions in LLDB. With llvm-objdump, the solution is to pipe the output of the disassembly through a filter program. This patch modifies LLDB's disassembly to look more like llvm-objdump's, and includes an example python script that adds a command "fdis" that will disassemble, then pipe the output through a specified filter program. This has been tested with crustfilt, a sample filter located at https://github.com/quic/crustfilt . Changes in this PR:
Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5 Full diff: https://github.com/llvm/llvm-project/pull/145793.diff 6 Files Affected:
diff --git a/lldb/examples/python/filter_disasm.py b/lldb/examples/python/filter_disasm.py
new file mode 100644
index 0000000000000..adb3455209055
--- /dev/null
+++ b/lldb/examples/python/filter_disasm.py
@@ -0,0 +1,87 @@
+"""
+Defines a command, fdis, that does filtered disassembly. The command does the
+lldb disassemble command with -b and any other arguments passed in, and
+pipes that through a provided filter program.
+
+The intention is to support disassembly of RISC-V proprietary instructions.
+This is handled with llvm-objdump by piping the output of llvm-objdump through
+a filter program. This script is intended to mimic that workflow.
+"""
+
+import lldb
+import subprocess
+
+filter_program = "crustfilt"
+
+def __lldb_init_module(debugger, dict):
+ debugger.HandleCommand(
+ 'command script add -f filter_disasm.fdis fdis')
+ print("Disassembly filter command (fdis) loaded")
+ print("Filter program set to %s" % filter_program)
+
+
+def fdis(debugger, args, result, dict):
+ """
+ Call the built in disassembler, then pass its output to a filter program
+ to add in disassembly for hidden opcodes.
+ Except for get and set, use the fdis command like the disassemble command.
+ By default, the filter program is crustfilt, from
+ https://github.com/quic/crustfilt . This can be changed by changing
+ the global variable filter_program.
+
+ Usage:
+ fdis [[get] [set <program>] [<disassembly options>]]
+
+ Choose one of the following:
+ get
+ Gets the current filter program
+
+ set <program>
+ Sets the current filter program. This can be an executable, which
+ will be found on PATH, or an absolute path.
+
+ <disassembly options>
+ If the first argument is not get or set, the args will be passed
+ to the disassemble command as is.
+
+ """
+
+ global filter_program
+ args_list = args.split(' ')
+ result.Clear()
+
+ if len(args_list) == 1 and args_list[0] == 'get':
+ result.PutCString(filter_program)
+ result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
+ return
+
+ if len(args_list) == 2 and args_list[0] == 'set':
+ filter_program = args_list[1]
+ result.PutCString("Filter program set to %s" % filter_program)
+ result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
+ return
+
+ res = lldb.SBCommandReturnObject()
+ debugger.GetCommandInterpreter().HandleCommand('disassemble -b ' + args, res)
+ if (len(res.GetError()) > 0):
+ result.SetError(res.GetError())
+ result.SetStatus(lldb.eReturnStatusFailed)
+ return
+ output = res.GetOutput()
+
+ try:
+ proc = subprocess.run([filter_program], capture_output=True, text=True, input=output)
+ except (subprocess.SubprocessError, OSError) as e:
+ result.PutCString("Error occurred. Original disassembly:\n\n" + output)
+ result.SetError(str(e))
+ result.SetStatus(lldb.eReturnStatusFailed)
+ return
+
+ print(proc.stderr)
+ if proc.stderr:
+ pass
+ #result.SetError(proc.stderr)
+ #result.SetStatus(lldb.eReturnStatusFailed)
+ else:
+ result.PutCString(proc.stdout)
+ result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
diff --git a/lldb/include/lldb/Core/Opcode.h b/lldb/include/lldb/Core/Opcode.h
index f72f2687b54fe..88ef17093d3f3 100644
--- a/lldb/include/lldb/Core/Opcode.h
+++ b/lldb/include/lldb/Core/Opcode.h
@@ -200,6 +200,7 @@ class Opcode {
}
int Dump(Stream *s, uint32_t min_byte_width);
+ int DumpRISCV(Stream *s, uint32_t min_byte_width);
const void *GetOpcodeBytes() const {
return ((m_type == Opcode::eTypeBytes) ? m_data.inst.bytes : nullptr);
diff --git a/lldb/source/Core/Disassembler.cpp b/lldb/source/Core/Disassembler.cpp
index 833e327579a29..f95e446448036 100644
--- a/lldb/source/Core/Disassembler.cpp
+++ b/lldb/source/Core/Disassembler.cpp
@@ -658,8 +658,13 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t max_opcode_byte_size,
// the byte dump to be able to always show 15 bytes (3 chars each) plus a
// space
if (max_opcode_byte_size > 0)
- m_opcode.Dump(&ss, max_opcode_byte_size * 3 + 1);
- else
+ // make RISC-V opcode dump look like llvm-objdump
+ if (exe_ctx &&
+ exe_ctx->GetTargetSP()->GetArchitecture().GetTriple().isRISCV())
+ m_opcode.DumpRISCV(&ss, max_opcode_byte_size * 3 + 1);
+ else
+ m_opcode.Dump(&ss, max_opcode_byte_size * 3 + 1);
+ else
m_opcode.Dump(&ss, 15 * 3 + 1);
} else {
// Else, we have ARM or MIPS which can show up to a uint32_t 0x00000000
@@ -685,10 +690,13 @@ void Instruction::Dump(lldb_private::Stream *s, uint32_t max_opcode_byte_size,
}
}
const size_t opcode_pos = ss.GetSizeOfLastLine();
- const std::string &opcode_name =
+ std::string &opcode_name =
show_color ? m_markup_opcode_name : m_opcode_name;
const std::string &mnemonics = show_color ? m_markup_mnemonics : m_mnemonics;
+ if (opcode_name.empty())
+ opcode_name = "<unknown>";
+
// The default opcode size of 7 characters is plenty for most architectures
// but some like arm can pull out the occasional vqrshrun.s16. We won't get
// consistent column spacing in these cases, unfortunately. Also note that we
diff --git a/lldb/source/Core/Opcode.cpp b/lldb/source/Core/Opcode.cpp
index 3e30d98975d8a..dbcd18cc0d8d2 100644
--- a/lldb/source/Core/Opcode.cpp
+++ b/lldb/source/Core/Opcode.cpp
@@ -78,6 +78,44 @@ lldb::ByteOrder Opcode::GetDataByteOrder() const {
return eByteOrderInvalid;
}
+// make RISC-V byte dumps look like llvm-objdump, instead of just dumping bytes
+int Opcode::DumpRISCV(Stream *s, uint32_t min_byte_width) {
+ const uint32_t previous_bytes = s->GetWrittenBytes();
+ // if m_type is not bytes, call Dump
+ if (m_type != Opcode::eTypeBytes)
+ return Dump(s, min_byte_width);
+
+ // from RISCVPrettyPrinter in llvm-objdump.cpp
+ // if size % 4 == 0, print as 1 or 2 32 bit values (32 or 64 bit inst)
+ // else if size % 2 == 0, print as 1 or 3 16 bit values (16 or 48 bit inst)
+ // else fall back and print bytes
+ for (uint32_t i = 0; i < m_data.inst.length;) {
+ if (i > 0)
+ s->PutChar(' ');
+ if (!(m_data.inst.length % 4)) {
+ s->Printf("%2.2x%2.2x%2.2x%2.2x", m_data.inst.bytes[i + 3],
+ m_data.inst.bytes[i + 2],
+ m_data.inst.bytes[i + 1],
+ m_data.inst.bytes[i + 0]);
+ i += 4;
+ } else if (!(m_data.inst.length % 2)) {
+ s->Printf("%2.2x%2.2x", m_data.inst.bytes[i + 1],
+ m_data.inst.bytes[i + 0]);
+ i += 2;
+ } else {
+ s->Printf("%2.2x", m_data.inst.bytes[i]);
+ ++i;
+ }
+ }
+
+ uint32_t bytes_written_so_far = s->GetWrittenBytes() - previous_bytes;
+ // Add spaces to make sure bytes display comes out even in case opcodes aren't
+ // all the same size.
+ if (bytes_written_so_far < min_byte_width)
+ s->Printf("%*s", min_byte_width - bytes_written_so_far, "");
+ return s->GetWrittenBytes() - previous_bytes;
+}
+
uint32_t Opcode::GetData(DataExtractor &data) const {
uint32_t byte_size = GetByteSize();
uint8_t swap_buf[8];
diff --git a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
index ed6047f8f4ef3..eeb6020abd73a 100644
--- a/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
+++ b/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp
@@ -61,6 +61,8 @@ class DisassemblerLLVMC::MCDisasmInstance {
uint64_t GetMCInst(const uint8_t *opcode_data, size_t opcode_data_len,
lldb::addr_t pc, llvm::MCInst &mc_inst) const;
+ bool GetMCInst(const uint8_t *opcode_data, size_t opcode_data_len,
+ lldb::addr_t pc, llvm::MCInst &mc_inst, size_t &size) const;
void PrintMCInst(llvm::MCInst &mc_inst, lldb::addr_t pc,
std::string &inst_string, std::string &comments_string);
void SetStyle(bool use_hex_immed, HexImmediateStyle hex_style);
@@ -524,11 +526,11 @@ class InstructionLLVMC : public lldb_private::Instruction {
const addr_t pc = m_address.GetFileAddress();
llvm::MCInst inst;
- const size_t inst_size =
- mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
- if (inst_size == 0)
- m_opcode.Clear();
- else {
+ size_t inst_size = 0;
+ m_is_valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len,
+ pc, inst, inst_size);
+ m_opcode.Clear();
+ if (inst_size != 0) {
m_opcode.SetOpcodeBytes(opcode_data, inst_size);
m_is_valid = true;
}
@@ -604,10 +606,11 @@ class InstructionLLVMC : public lldb_private::Instruction {
const uint8_t *opcode_data = data.GetDataStart();
const size_t opcode_data_len = data.GetByteSize();
llvm::MCInst inst;
- size_t inst_size =
- mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
-
- if (inst_size > 0) {
+ size_t inst_size = 0;
+ bool valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc,
+ inst, inst_size);
+
+ if (valid && inst_size > 0) {
mc_disasm_ptr->SetStyle(use_hex_immediates, hex_style);
const bool saved_use_color = mc_disasm_ptr->GetUseColor();
@@ -1206,9 +1209,10 @@ class InstructionLLVMC : public lldb_private::Instruction {
const uint8_t *opcode_data = data.GetDataStart();
const size_t opcode_data_len = data.GetByteSize();
llvm::MCInst inst;
- const size_t inst_size =
- mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len, pc, inst);
- if (inst_size == 0)
+ size_t inst_size = 0;
+ const bool valid = mc_disasm_ptr->GetMCInst(opcode_data, opcode_data_len,
+ pc, inst, inst_size);
+ if (!valid)
return;
m_has_visited_instruction = true;
@@ -1337,19 +1341,18 @@ DisassemblerLLVMC::MCDisasmInstance::MCDisasmInstance(
m_asm_info_up && m_context_up && m_disasm_up && m_instr_printer_up);
}
-uint64_t DisassemblerLLVMC::MCDisasmInstance::GetMCInst(
+bool DisassemblerLLVMC::MCDisasmInstance::GetMCInst(
const uint8_t *opcode_data, size_t opcode_data_len, lldb::addr_t pc,
- llvm::MCInst &mc_inst) const {
+ llvm::MCInst &mc_inst, size_t &size) const {
llvm::ArrayRef<uint8_t> data(opcode_data, opcode_data_len);
llvm::MCDisassembler::DecodeStatus status;
- uint64_t new_inst_size;
- status = m_disasm_up->getInstruction(mc_inst, new_inst_size, data, pc,
+ status = m_disasm_up->getInstruction(mc_inst, size, data, pc,
llvm::nulls());
if (status == llvm::MCDisassembler::Success)
- return new_inst_size;
+ return true;
else
- return 0;
+ return false;
}
void DisassemblerLLVMC::MCDisasmInstance::PrintMCInst(
diff --git a/lldb/source/Utility/ArchSpec.cpp b/lldb/source/Utility/ArchSpec.cpp
index 70b9800f4dade..7c71aaae6bcf2 100644
--- a/lldb/source/Utility/ArchSpec.cpp
+++ b/lldb/source/Utility/ArchSpec.cpp
@@ -228,9 +228,9 @@ static const CoreDefinition g_core_definitions[] = {
{eByteOrderLittle, 4, 4, 4, llvm::Triple::hexagon,
ArchSpec::eCore_hexagon_hexagonv5, "hexagonv5"},
- {eByteOrderLittle, 4, 2, 4, llvm::Triple::riscv32, ArchSpec::eCore_riscv32,
+ {eByteOrderLittle, 4, 2, 8, llvm::Triple::riscv32, ArchSpec::eCore_riscv32,
"riscv32"},
- {eByteOrderLittle, 8, 2, 4, llvm::Triple::riscv64, ArchSpec::eCore_riscv64,
+ {eByteOrderLittle, 8, 2, 8, llvm::Triple::riscv64, ArchSpec::eCore_riscv64,
"riscv64"},
{eByteOrderLittle, 4, 4, 4, llvm::Triple::loongarch32,
|
You can test this locally with the following command:darker --check --diff -r HEAD~1...HEAD lldb/examples/python/filter_disasm.py lldb/test/Shell/Commands/Inputs/dis_filt.py View the diff from darker here.--- examples/python/filter_disasm.py 2025-07-14 15:47:07.000000 +0000
+++ examples/python/filter_disasm.py 2025-07-14 15:49:15.459984 +0000
@@ -20,31 +20,31 @@
print("Filter program set to %s" % filter_program)
def fdis(debugger, args, exe_ctx, result, dict):
"""
- Call the built in disassembler, then pass its output to a filter program
- to add in disassembly for hidden opcodes.
- Except for get and set, use the fdis command like the disassemble command.
- By default, the filter program is crustfilt, from
- https://github.com/quic/crustfilt . This can be changed by changing
- the global variable filter_program.
+ Call the built in disassembler, then pass its output to a filter program
+ to add in disassembly for hidden opcodes.
+ Except for get and set, use the fdis command like the disassemble command.
+ By default, the filter program is crustfilt, from
+ https://github.com/quic/crustfilt . This can be changed by changing
+ the global variable filter_program.
- Usage:
- fdis [[get] [set <program>] [<disassembly options>]]
+ Usage:
+ fdis [[get] [set <program>] [<disassembly options>]]
- Choose one of the following:
- get
- Gets the current filter program
+ Choose one of the following:
+ get
+ Gets the current filter program
- set <program>
- Sets the current filter program. This can be an executable, which
- will be found on PATH, or an absolute path.
+ set <program>
+ Sets the current filter program. This can be an executable, which
+ will be found on PATH, or an absolute path.
- <disassembly options>
- If the first argument is not get or set, the args will be passed
- to the disassemble command as is.
+ <disassembly options>
+ If the first argument is not get or set, the args will be passed
+ to the disassemble command as is.
"""
global filter_program
args_list = args.split(" ")
@@ -60,25 +60,33 @@
result.PutCString("Filter program set to %s" % filter_program)
result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
return
res = lldb.SBCommandReturnObject()
- debugger.GetCommandInterpreter().HandleCommand("disassemble -b " + args, exe_ctx, res)
+ debugger.GetCommandInterpreter().HandleCommand(
+ "disassemble -b " + args, exe_ctx, res
+ )
if len(res.GetError()) > 0:
result.SetError(res.GetError())
result.SetStatus(lldb.eReturnStatusFailed)
return
output = res.GetOutput()
try:
- proc = subprocess.run([filter_program], capture_output=True, text=True, input=output)
+ proc = subprocess.run(
+ [filter_program], capture_output=True, text=True, input=output
+ )
except (subprocess.SubprocessError, OSError) as e:
result.PutCString("Error occurred. Original disassembly:\n\n" + output)
result.SetError(str(e))
result.SetStatus(lldb.eReturnStatusFailed)
return
if proc.returncode:
- result.PutCString("warning: {} returned non-zero value {}".format(filter_program, proc.returncode))
+ result.PutCString(
+ "warning: {} returned non-zero value {}".format(
+ filter_program, proc.returncode
+ )
+ )
result.PutCString(proc.stdout)
result.SetStatus(lldb.eReturnStatusSuccessFinishResult)
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
Before this change, disassembly of the crustfilt test program looks like this:
Note that the instruction at 0x3aa is an 8 byte instruction, but lldb's disassembler is incorrectly treating it as a 2 byte instruction, then incorrectly disassembling the following addresses as 2 byte instructions. After this change, disassembly looks like this:
The instruction at 0x3aa is identified as a 64 bit instruction, the opcode is shown, and the instruction is marked as The output from fdis looks like this:
The filter replaces the instructions with instructions that it can disassemble. |
Thanks @tedwoodward! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the idea but am a bit wary of making the output look like objdump as a general goal just because they are not used for the same thing, debuggers do sometimes have different goals. In this case I think it's OK because you're improving a part of the disassembly that wouldn't be useful anyway.
Is there a reason we couldn't handle this the same way for all targets?
(lldb) dis
test.o`main:
-> 0xaaaaaaaaa714 <+0>: adrp x0, 17
0xaaaaaaaaa718 <+4>: add x0, x0, #0x10 ; a
0xaaaaaaaaa71c <+8>: mov w1, #0x2 ; =2
0xaaaaaaaaa720 <+12>: str w1, [x0]
0xaaaaaaaaa724 <+16>: adrp x0, 17
0xaaaaaaaaa728 <+20>: add x0, x0, #0x10 ; a
0xaaaaaaaaa72c <+24>: ldr w0, [x0]
0xaaaaaaaaa730 <+28>: ret
(lldb) dis -b
test.o`main:
-> 0xaaaaaaaaa714 <+0>: 0xb0000080 adrp x0, 17
0xaaaaaaaaa718 <+4>: 0x91004000 add x0, x0, #0x10 ; a
0xaaaaaaaaa71c <+8>: 0x52800041 mov w1, #0x2 ; =2
0xaaaaaaaaa720 <+12>: 0xb9000001 str w1, [x0]
0xaaaaaaaaa724 <+16>: 0xb0000080 adrp x0, 17
0xaaaaaaaaa728 <+20>: 0x91004000 add x0, x0, #0x10 ; a
0xaaaaaaaaa72c <+24>: 0xb9400000 ldr w0, [x0]
0xaaaaaaaaa730 <+28>: 0xd65f03c0 ret
The missing part is knowing how to split up that encoding value isn't it. For AArch64 you'd just print it because we only have 32-bit, Intel you would roll dice to randomly decide what to do and RISC-V we have these 2/3 formats.
Arm M profile might benefit from it but I have never been asked to support such a thing. With the thumb 16 and 32-bit encodings plus the sort of custom instruction extension it has.
Because as usual, anything if certain architecture
in generic code isn't great, but hiding that behind a virtual method is just the same thing with more steps.
So I'm not against this, and I agree that RISC-V is a special case in that it leans harder into custom instructions than other architecures. Arm M Profile has a sort of custom instruction thing, and you've been able to use coprocessor encodings for years on many architectures but no one bothered to add this to tools (which is not on you of course, that's their loss).
So this being RISC-V specific, kinda has to be. But it's one generic method away from being a generic "format in a way that's more useful", if we needed it to be in future.
I think you need to expand the comments in this formatting code to say both:
- This is mimicking the objdump format and -
- It is doing so because this is easier to read / substitute / whatever
Then I don't have to go look at what objdump does to figure out the goals of this code.
How standard are these ways of printing the byte encodings? Is it recommended / used by the spec, gnu objdump and llvm objdump? That's the ideal.
If it's non-standard my worry would be you doing this to fit one filtering application and another coming along and wanting it in a different format.
Also this needs tests.
- See if we have tests for
-b
, if not, write some for something that is not RISC-V. Easy way to check is to mess up the formatting and see what fails. - Add some
-b
for RISC-V specifically, with all the different formats - Add tests for the filtering script that use another script that pretends to be a filtering program. As this script is less of a demo, more of a real feature for your customers at least.
print(proc.stderr) | ||
if proc.stderr: | ||
pass | ||
#result.SetError(proc.stderr) | ||
#result.SetStatus(lldb.eReturnStatusFailed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume this needs to be swapped. Put stderr in the error so that the user will see it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the presence of data on stderr means a failure. I'll look into error handling more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like python's subprocess.run returns a subprocess.CompletedProcess object which has a returncode
ivar, and the usual convention in Unix is 0 means success. I wouldn't treat the presence of stderr as indicative of an error by itself.
Few more things:
|
@asb Just pinging you on the chance you have heard discussions about doing this with other tools. What approaches, if any, did they take? |
The llvm disassembler returns the size, and the lldb disassembler sets the Opcode (class instantiation) to a certain type - 8 bit, 16 bit, 16 bit thumb, 32 bit, 64 bit, bytes. For RISC-V, it's set to bytes, which means it prints out 1 byte at a time. I didn't want to add a type, because "bytes" works well for RISC-V, except when it comes to print them out. That is an option, though, if we didn't want to have special pretty printers for the "this is a weird thing" cases.
Sure, I'll do that.
For RISC-V, llvm-objdump mimics gnu objdump, so this change makes the RISC-V byte encodings match both.
That's one thing I'm trying to avoid - we don't want a filter app for objdump and another for the debugger. That's why I changed how the bytes are displayed, and changed "can't disassemble" from blank to
Will do. |
I think that's a good idea. Do you know how to add something to release notes? I've never done it upstream.
I don't know what people are doing with GDB. Possibly nothing yet, since this is pretty new. |
Great, that's standard enough.
Yeah that makes sense, the glue between the tool and the filter may change a bit but a portable filter you can take between tools and toolchains is nice. Which reminds me I didn't ask about the alternatives. I can think of a few, with their own positives and negatives:
Looking at it that way, I think another way to describe this proposal is: And that I definitely agree with. Then you can multiply those pros and cons a bunch if you are a vendor, to public or private customers. You have the effort of telling them no you have to use But now I'm making your argument for you I think. You get the idea. The only real "gotcha" question I can come up with is, wouldn't a user who is interested in custom instructions already have access to a compiler that produces them and that compiler is likely to be llvm, so is lldb from that llvm not within reach already? Educate me on that, I don't know how folks approach developing these extensions. Maybe code generation is the last thing they do, after they know it's worth pursuing. So you've got a lot of hand written assembler and
LLDB's release notes live in a section of LLVM's - https://github.com/llvm/llvm-project/blob/main/llvm/docs/ReleaseNotes.md#changes-to-lldb. |
@JDevlieghere / @jasonmolenda you have been in the RISC-V world lately, what do you think to this approach? |
Just doing some googling, mostly people talk about modifying the binutils in the gnu toolchain for custom instructions and rebuilding it. This refers to some XML file, but I can't tell if they're talking about an upstream patch, or their own product based on GDB:
Might be my search skills, but I only see references to XML for target descriptions in upstream GDB. Anyway, that's another approach that I guess would require each tool to add an option to load that file. Rather than your proposal which wraps the filter around the tool (kind of, the custom command is still inside of lldb from the user's perspective). You could build the filter options into the disassemble command, but that's more complicated and you can argue that this PR is simply making the output more useful. Which has the side effect of being suitable for filtering. In future someone could come along and propose changing the builtin command. |
This particular PR has to do with Qualcomm's push to be more open source. We don't want to provide custom toolchains for things with good upstream support. That's why we're pushing a bunch of our former downstream code upstream; Polly vectorizer changes, LLVM vectorizer changes, etc. It's made upstream clang performance much closer to downstream clang performance. We got the Xqci guys to open source their extension definition so we could upstream the TD files for it. But...not everyone wants to upstream their extensions, and some people don't want to build a custom toolchain for every little extension. So they decided to use the .insn assembler directive (supported for at least 5 years, according to a quick google search), and a filter program to pipe objdump output through.
Thank you! |
I built that crustfilt precisely because I did not have the ability to add to the compiler's disassembler especially on mass. The sideband decoder approach was much faster to support and was the only reason we were able to move most of our custom architecture to RISCV. It also still covers some of the custom DSP stuff we cannot share / wouldn't be generally useful for people. I'd also argue a lot of RTL development happens in walled gardens now which increases the difficult of building fundamental tools. A python script has very few dependencies that the RTL team likely already has. |
To chime in as one of the devisors and proponents on the toolchain side of this approach:
Precisely we do have a lot of Qualcomm's approach here is twofold:
One approach is to have a downstream-only toolchain, which we control very carefully and which diverges slowly but inevitably from LLVM upstream. Upon realising that almost all of our custom extensions are only used in assembly, and we had no plans to implement code generation for most of the extensions we can't publicise, I realised that there was the ability for us to take a different approach. This approach keeps assembly and disassembly support for their proprietary custom extensions outside the toolchain itself. I describe exactly how to follow the approach here: https://github.com/quic/crustfilt/blob/main/docs/supporting-external-instructions-on-llvm.md but broadly the idea (stolen from Luke Wren's Hazard3 docs (Chapter 4)) is to use assembly macros to hide the On the disassembly side, I agree there are a few different ways this could be supported, a real plugin "I failed to disassemble, can you manage?" interface is one approach, but has a lot of drawbacks compared to just passing the output of objdump through another text-based tool, which is where This then gives us a few different advantages:
And that's how we get here: filtering the output of |
To also respond to something earlier in the thread, where there is a little complexity:
One "weird" bit of the approach is that we actually still rely on LLVM's MC-layer to understand the length of the instruction. RISC-V currently has only 2 ratified lengths (16 and 32-bit), but describes an encoding scheme for longer instructions which both GNU objdump and LLVM's MC-layer understand when disassembling. RISC-V does not, at the moment, have a maximum length of instruction, but our callback only implements the scheme up to 176-bit long instructions. On the assembler side, we can only assemble up to 64-bit instructions, so we ensure our teams keep to this lower limit. There are two relevant callbacks on MC's
|
Which is surprising to me, but if I looked at all the assembly only extensions for Arm I'd find the the same thing. I just rarely have to deal with those.
Yes I see how you'd get there. Even if we want to be super reductive, printing the bytes out at least means you can decode by hand. Which you can't do today. So I like the direction.
I should check how this ends up displaying in lldb. AArch64 unlikely to be a problem but Thumb is variable so I hope we at least show the bytes. @tedwoodward I think this change might be more clearly pitched as:
Which I think is a net positive. Still unsure about Arm M profile has https://developer.arm.com/Architectures/Arm%20Custom%20Instructions, I'll look into that and see if it would benefit from a change of output format too. Just in theory though, I don't think we shipped support for open source debuggers and I'm not about to implement it myself. |
lldb/source/Core/Disassembler.cpp
Outdated
// make RISC-V opcode dump look like llvm-objdump | ||
if (exe_ctx && | ||
exe_ctx->GetTargetSP()->GetArchitecture().GetTriple().isRISCV()) | ||
m_opcode.DumpRISCV(&ss, max_opcode_byte_size * 3 + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull the minimum size out into:
auto min_byte_width = max_opcode_byte_size * 3 + 1;
Then use it in both calls, so it's clear they use the same minimum.
CDE works by assigning one of the co-processors to be the encodings for the custom instructions and it has a few fixed formats you can use. It's not like RISC-V where the layout can be anything you want. Which means that the disassembler would probably print some form, with encoding, and that can already be used to filter if you wanted to. I also checked how AArch64 works:
objdump shows the bytes:
As does LLDB:
So for AArch64 we happen to mostly match objdump's output, and it's useful enough for humans and scripts because it can only ever be 32-bit encodings. Also having looked at the pretty printer system in llvm-objdump, I think |
lldb/source/Core/Opcode.cpp
Outdated
// from RISCVPrettyPrinter in llvm-objdump.cpp | ||
// if size % 4 == 0, print as 1 or 2 32 bit values (32 or 64 bit inst) | ||
// else if size % 2 == 0, print as 1 or 3 16 bit values (16 or 48 bit inst) | ||
// else fall back and print bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put these comments next to the ifs they refer to, I have to read the code anyway so it makes more sense to me to have the commentary interleaved.
LLDB uses the LLVM disassembler to determine the size of instructions and to do the actual disassembly. Currently, if the LLVM disassembler can't disassemble an instruction, LLDB will ignore the instruction size, assume the instruction size is the minimum size for that device, print no useful opcode, and print nothing for the instruction. This patch changes this behavior to separate the instruction size and "can't disassemble". If the LLVM disassembler knows the size, but can't dissasemble the instruction, LLDB will use that size. It will print out the opcode, and will print "<unknown>" for the instruction. This is much more useful to both a user and a script. The impetus behind this change is to clean up RISC-V disassembly when the LLVM disassembler doesn't understand all of the instructions. RISC-V supports proprietary extensions, where the TD files don't know about certain instructions, and the disassembler can't disassemble them. Internal users want to be able to disassemble these instructions. With llvm-objdump, the solution is to pipe the output of the disassembly through a filter program. This patch modifies LLDB's disassembly to look more like llvm-objdump's, and includes an example python script that adds a command "fdis" that will disassemble, then pipe the output through a specified filter program. This has been tested with crustfilt, a sample filter located at https://github.com/quic/crustfilt . Changes in this PR: - Decouple "can't disassemble" with "instruction size". DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for valid disassembly, and has the size as an out paramter. Use the size even if the disassembly is invalid. Disassemble if disassemby is valid. - Always print out the opcode when -b is specified. Previously it wouldn't print out the opcode if it couldn't disassemble. - Print out RISC-V opcodes the way llvm-objdump does. Add DumpRISCV method based on RISC-V pretty printer in llvm-objdump.cpp. - Print <unknown> for instructions that can't be disassembled, matching llvm-objdump, instead of printing nothing. - Update max riscv32 and riscv64 instruction size to 8. - Add example "fdis" command script. Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
For what it's worth I structured my PR #148105 to include all of the changes in this PR to these files,
If we can agree that adding a new Opcode::Type for a disassembly style with a mix of UInt16's and UInt32's instead of adding a separate Dump method is the right choice, I'm fine if you want to adopt my versions of these changes, or we can merge that PR and then take this PR on top of that, whatever is easier for you. My PR should have the same behavior that you need to accomplish this PR's goals, just a slightly different way of implementing it. |
At this point, the only thing in this PR that I don't like is the DumpRISCV method, the rest of looks good to me (that's the only reason I put up the other PR, to make it clear what I think the cleaner approach is, which we can merge, incorporate into this PR, or discuss more) |
I'm working on incorporating your changes. One concern I have - on a big endian host,
will print out the data incorrectly. The printed opcode 0940003f 00200020 (which matches llvm-objdump output) looks like this: So I'm going to keep the style
|
good point, I didn't think of the lldb host being BE. |
LLDB uses the LLVM disassembler to determine the size of instructions and to do the actual disassembly. Currently, if the LLVM disassembler can't disassemble an instruction, LLDB will ignore the instruction size, assume the instruction size is the minimum size for that device, print no useful opcode, and print nothing for the instruction. This patch changes this behavior to separate the instruction size and "can't disassemble". If the LLVM disassembler knows the size, but can't dissasemble the instruction, LLDB will use that size. It will print out the opcode, and will print "<unknown>" for the instruction. This is much more useful to both a user and a script. The impetus behind this change is to clean up RISC-V disassembly when the LLVM disassembler doesn't understand all of the instructions. RISC-V supports proprietary extensions, where the TD files don't know about certain instructions, and the disassembler can't disassemble them. Internal users want to be able to disassemble these instructions. With llvm-objdump, the solution is to pipe the output of the disassembly through a filter program. This patch modifies LLDB's disassembly to look more like llvm-objdump's, and includes an example python script that adds a command "fdis" that will disassemble, then pipe the output through a specified filter program. This has been tested with crustfilt, a sample filter located at https://github.com/quic/crustfilt . Changes in this PR: - Decouple "can't disassemble" with "instruction size". DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for valid disassembly, and has the size as an out paramter. Use the size even if the disassembly is invalid. Disassemble if disassemby is valid. - Always print out the opcode when -b is specified. Previously it wouldn't print out the opcode if it couldn't disassemble. - Print out RISC-V opcodes the way llvm-objdump does. Code for the new Opcode Type eType16_32Tuples by Jason Molenda. - Print <unknown> for instructions that can't be disassembled, matching llvm-objdump, instead of printing nothing. - Update max riscv32 and riscv64 instruction size to 8. - Add example "fdis" command script. - Added disassembly byte test for x86 with known and unknown instructions. Added disassembly byte test for riscv32 with known and unknown instructions, with and without filtering. Added test from Jason Molenda to RISC-V disassembly unit tests. Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
@jasonmolenda I've incorporated your changes, as requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for answering all the questions on this one.
LLDB uses the LLVM disassembler to determine the size of instructions and to do the actual disassembly. Currently, if the LLVM disassembler can't disassemble an instruction, LLDB will ignore the instruction size, assume the instruction size is the minimum size for that device, print no useful opcode, and print nothing for the instruction. This patch changes this behavior to separate the instruction size and "can't disassemble". If the LLVM disassembler knows the size, but can't dissasemble the instruction, LLDB will use that size. It will print out the opcode, and will print "<unknown>" for the instruction. This is much more useful to both a user and a script. The impetus behind this change is to clean up RISC-V disassembly when the LLVM disassembler doesn't understand all of the instructions. RISC-V supports proprietary extensions, where the TD files don't know about certain instructions, and the disassembler can't disassemble them. Internal users want to be able to disassemble these instructions. With llvm-objdump, the solution is to pipe the output of the disassembly through a filter program. This patch modifies LLDB's disassembly to look more like llvm-objdump's, and includes an example python script that adds a command "fdis" that will disassemble, then pipe the output through a specified filter program. This has been tested with crustfilt, a sample filter located at https://github.com/quic/crustfilt . Changes in this PR: - Decouple "can't disassemble" with "instruction size". DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for valid disassembly, and has the size as an out paramter. Use the size even if the disassembly is invalid. Disassemble if disassemby is valid. - Always print out the opcode when -b is specified. Previously it wouldn't print out the opcode if it couldn't disassemble. - Print out RISC-V opcodes the way llvm-objdump does. Code for the new Opcode Type eType16_32Tuples by Jason Molenda. - Print <unknown> for instructions that can't be disassembled, matching llvm-objdump, instead of printing nothing. - Update max riscv32 and riscv64 instruction size to 8. - Add example "fdis" command script. - Added disassembly byte test for x86 with known and unknown instructions. Added disassembly byte test for riscv32 with known and unknown instructions, with and without filtering. Added test from Jason Molenda to RISC-V disassembly unit tests. Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
lldb/unittests/Disassembler/RISCV/TestMCDisasmInstanceRISCV.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Please fix the formatting before landing - https://github.com/llvm/llvm-project/actions/runs/16232166455/job/45836933279?pr=145793. |
LLDB uses the LLVM disassembler to determine the size of instructions and to do the actual disassembly. Currently, if the LLVM disassembler can't disassemble an instruction, LLDB will ignore the instruction size, assume the instruction size is the minimum size for that device, print no useful opcode, and print nothing for the instruction. This patch changes this behavior to separate the instruction size and "can't disassemble". If the LLVM disassembler knows the size, but can't dissasemble the instruction, LLDB will use that size. It will print out the opcode, and will print "<unknown>" for the instruction. This is much more useful to both a user and a script. The impetus behind this change is to clean up RISC-V disassembly when the LLVM disassembler doesn't understand all of the instructions. RISC-V supports proprietary extensions, where the TD files don't know about certain instructions, and the disassembler can't disassemble them. Internal users want to be able to disassemble these instructions. With llvm-objdump, the solution is to pipe the output of the disassembly through a filter program. This patch modifies LLDB's disassembly to look more like llvm-objdump's, and includes an example python script that adds a command "fdis" that will disassemble, then pipe the output through a specified filter program. This has been tested with crustfilt, a sample filter located at https://github.com/quic/crustfilt . Changes in this PR: - Decouple "can't disassemble" with "instruction size". DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for valid disassembly, and has the size as an out paramter. Use the size even if the disassembly is invalid. Disassemble if disassemby is valid. - Always print out the opcode when -b is specified. Previously it wouldn't print out the opcode if it couldn't disassemble. - Print out RISC-V opcodes the way llvm-objdump does. Code for the new Opcode Type eType16_32Tuples by Jason Molenda. - Print <unknown> for instructions that can't be disassembled, matching llvm-objdump, instead of printing nothing. - Update max riscv32 and riscv64 instruction size to 8. - Add example "fdis" command script. - Added disassembly byte test for x86 with known and unknown instructions. Added disassembly byte test for riscv32 with known and unknown instructions, with and without filtering. Added test from Jason Molenda to RISC-V disassembly unit tests. Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
I disagree with the Python formatting here. It doesn't need to indent inside a Also is something like
really better than
? I did fix the formatting issue I missed in DIsassemblerLLVMC.cpp, though! |
LLDB uses the LLVM disassembler to determine the size of instructions and to do the actual disassembly. Currently, if the LLVM disassembler can't disassemble an instruction, LLDB will ignore the instruction size, assume the instruction size is the minimum size for that device, print no useful opcode, and print nothing for the instruction. This patch changes this behavior to separate the instruction size and "can't disassemble". If the LLVM disassembler knows the size, but can't dissasemble the instruction, LLDB will use that size. It will print out the opcode, and will print "<unknown>" for the instruction. This is much more useful to both a user and a script. The impetus behind this change is to clean up RISC-V disassembly when the LLVM disassembler doesn't understand all of the instructions. RISC-V supports proprietary extensions, where the TD files don't know about certain instructions, and the disassembler can't disassemble them. Internal users want to be able to disassemble these instructions. With llvm-objdump, the solution is to pipe the output of the disassembly through a filter program. This patch modifies LLDB's disassembly to look more like llvm-objdump's, and includes an example python script that adds a command "fdis" that will disassemble, then pipe the output through a specified filter program. This has been tested with crustfilt, a sample filter located at https://github.com/quic/crustfilt . Changes in this PR: - Decouple "can't disassemble" with "instruction size". DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for valid disassembly, and has the size as an out paramter. Use the size even if the disassembly is invalid. Disassemble if disassemby is valid. - Always print out the opcode when -b is specified. Previously it wouldn't print out the opcode if it couldn't disassemble. - Print out RISC-V opcodes the way llvm-objdump does. Code for the new Opcode Type eType16_32Tuples by Jason Molenda. - Print <unknown> for instructions that can't be disassembled, matching llvm-objdump, instead of printing nothing. - Update max riscv32 and riscv64 instruction size to 8. - Add example "fdis" command script. - Added disassembly byte test for x86 with known and unknown instructions. Added disassembly byte test for riscv32 with known and unknown instructions, with and without filtering. Added test from Jason Molenda to RISC-V disassembly unit tests. Change-Id: Ie5a359d9e87a12dde79a8b5c9c7a146440a550c5
@tedwoodward Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
After changes in #145793. /home/david.spickett/llvm-project/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp:1360:49: error: non-const lvalue reference to type 'uint64_t' (aka 'unsigned long long') cannot bind to a value of unrelated type 'size_t' (aka 'unsigned int') 1360 | status = m_disasm_up->getInstruction(mc_inst, size, data, pc, llvm::nulls()); | ^~~~ /home/david.spickett/llvm-project/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h:135:64: note: passing argument to parameter 'Size' here 135 | virtual DecodeStatus getInstruction(MCInst &Instr, uint64_t &Size, | ^ 1 error generated. The type used in the LLVM method we call is uin64_t so use that instead. It's overkill for what it is, but that's a separate issue if anyone cares. Also removed the unused form of GetMCInst.
After changes in llvm/llvm-project#145793. /home/david.spickett/llvm-project/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp:1360:49: error: non-const lvalue reference to type 'uint64_t' (aka 'unsigned long long') cannot bind to a value of unrelated type 'size_t' (aka 'unsigned int') 1360 | status = m_disasm_up->getInstruction(mc_inst, size, data, pc, llvm::nulls()); | ^~~~ /home/david.spickett/llvm-project/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h:135:64: note: passing argument to parameter 'Size' here 135 | virtual DecodeStatus getInstruction(MCInst &Instr, uint64_t &Size, | ^ 1 error generated. The type used in the LLVM method we call is uin64_t so use that instead. It's overkill for what it is, but that's a separate issue if anyone cares. Also removed the unused form of GetMCInst.
This changes the example command added in llvm#145793 so that the fdis program does not have to be a single program name. Doing so also means we can run the test on Windows where the program needs to be "python.exe script_name". I've changed "fdis set" to treat the rest of the command as the program. Then store that as a list to be passed to subprocess. If we just use a string, Python will think that "python.exe foo" is the name of an actual program. This will still break if the paths have spaces in, but I'm trying to do just enough to fix the test here without rewriting all the option handling.
…ndows Added by llvm/llvm-project#145793 Failing on our Windows on Arm bot: llvm/llvm-project#145793 Shebang lines don't work on Windows and we can't pass "python script_name" to the "fdis set" commadn because of the way arguments are parsed currently. I can fix that but that needs review, disable the test for now.
After changes in llvm#145793. /home/david.spickett/llvm-project/lldb/source/Plugins/Disassembler/LLVMC/DisassemblerLLVMC.cpp:1360:49: error: non-const lvalue reference to type 'uint64_t' (aka 'unsigned long long') cannot bind to a value of unrelated type 'size_t' (aka 'unsigned int') 1360 | status = m_disasm_up->getInstruction(mc_inst, size, data, pc, llvm::nulls()); | ^~~~ /home/david.spickett/llvm-project/llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h:135:64: note: passing argument to parameter 'Size' here 135 | virtual DecodeStatus getInstruction(MCInst &Instr, uint64_t &Size, | ^ 1 error generated. The type used in the LLVM method we call is uin64_t so use that instead. It's overkill for what it is, but that's a separate issue if anyone cares. Also removed the unused form of GetMCInst. (cherry picked from commit a64bfd8)
This changes the example command added in #145793 so that the fdis program does not have to be a single program name. Doing so also means we can run the test on Windows where the program needs to be "python.exe script_name". I've changed "fdis set" to treat the rest of the command as the program. Then store that as a list to be passed to subprocess. If we just use a string, Python will think that "python.exe foo" is the name of an actual program instead of a program and an argument to it. This will still break if the paths have spaces in, but I'm trying to do just enough to fix the test here without rewriting all the option handling.
…y (#148823) This changes the example command added in llvm/llvm-project#145793 so that the fdis program does not have to be a single program name. Doing so also means we can run the test on Windows where the program needs to be "python.exe script_name". I've changed "fdis set" to treat the rest of the command as the program. Then store that as a list to be passed to subprocess. If we just use a string, Python will think that "python.exe foo" is the name of an actual program instead of a program and an argument to it. This will still break if the paths have spaces in, but I'm trying to do just enough to fix the test here without rewriting all the option handling.
RISC-V supports proprietary extensions, where the TD files don't know about certain instructions, and the disassembler can't disassemble them. Internal users want to be able to disassemble these instructions in LLDB.
With llvm-objdump, the solution is to pipe the output of the disassembly through a filter program. This patch modifies LLDB's disassembly to look more like llvm-objdump's, and includes an example python script that adds a command "fdis" that will disassemble, then pipe the output through a specified filter program. This has been tested with crustfilt, a sample filter located at https://github.com/quic/crustfilt .
Changes in this PR:
Decouple "can't disassemble" with "instruction size". DisassemblerLLVMC::MCDisasmInstance::GetMCInst now returns a bool for valid disassembly, and has the size as an out paramter. Use the size even if the disassembly is invalid. Disassemble if disassemby is valid.
Always print out the opcode when -b is specified. Previously it wouldn't print out the opcode if it couldn't disassemble.
Print out RISC-V opcodes the way llvm-objdump does. Add DumpRISCV method based on RISC-V pretty printer in llvm-objdump.cpp.
Print for instructions that can't be disassembled, matching llvm-objdump, instead of printing nothing.
Update max riscv32 and riscv64 instruction size to 8.
Add example "fdis" command script.