Skip to content

Commit 34d4ef9

Browse files
committed
Introduce Basic Single Pass Parser
= Motivation The current parsing logic of sdb based on split() and the shlex library has resulted in multiple workarounds in the implementation of commands and the overall user-experience of the tool. In addition, its lack of proper error-handling and reporting frequently result in the user not knowing what is wrong with his input. Trying to fix the aforementioned shortcomings in the existing logic has proven difficult as fixing one problem brings up a new one. = Patch This patch replaces this code a simple hand-written parser that provides the bare-minimum that we need and improved error-reporting. Unit tests are also provided for the parser to test its behavior and also highlight its behavior in extreme cases of input. This patch also does a first pass in undoing most of the workarounds that we have in existing commands due to the old parsing logic. = Things To Note: Quoted Strings Proper support for single and double quote strings is added with this patch. Double-quote strings are allowed to escape a double-quote by inserting a backslash before it. Single-quote strings can escape a single quote the same way. E.g. the following examples are valid: ``` ... | filter "obj.spa_name == 'rpool'" | ... ... | filter "obj.spa_name == \"rpool\"" | ... ... | filter 'obj.spa_name == "rpool"' | ... ... | filter 'obj.spa_name == \'rpool\'' | ... ``` The purpose of strings is solely to allow the ability to pass multiple words separated by space as a single argument to commands. The `filter` examples show above get the whole predicate passed in string form as a single argument. The actual quotes of the string are not part of the arguments passed to the command. This behavior was modelled after bash. = Examples of new errors ``` // Before sdb> echo 1 2 3 | sdb: cannot recognize command: // After sdb: syntax error: freestanding pipe with no command echo 1 2 3 | ^ ``` ``` // Before sdb> echo 1 2 3 | filter obj != 1 sdb: filter: invalid input: comparison operator is missing // After sdb> echo 1 2 3 | filter obj != 1 sdb: syntax error: predicates that use != as an operator should be quoted echo 1 2 3 | filter obj != 1 ^ ``` ``` // Before sdb> echo 1 2 3 | filter "obj != 1 sdb encountered an internal error due to a bug. Here's the information you need to file the bug: ---------------------------------------------------------- Target Info: ProgramFlags.IS_LIVE|IS_LINUX_KERNEL Platform(<Architecture.X86_64: 1>, <PlatformFlags.IS_LITTLE_ENDIAN|IS_64_BIT: 3>) Traceback (most recent call last): File "/usr/lib/python3/dist-packages/sdb/internal/repl.py", line 107, in eval_cmd for obj in invoke(self.target, [], input_): File "/usr/lib/python3/dist-packages/sdb/pipeline.py", line 107, in invoke all_tokens = list(lexer) File "/usr/lib/python3.6/shlex.py", line 295, in __next__ token = self.get_token() File "/usr/lib/python3.6/shlex.py", line 105, in get_token raw = self.read_token() File "/usr/lib/python3.6/shlex.py", line 187, in read_token raise ValueError("No closing quotation") ValueError: No closing quotation ---------------------------------------------------------- Link: https://github.com/delphix/sdb/issues/new // After sdb> echo 1 2 3 | filter "obj != 1 sdb: syntax error: unfinished string expression echo 1 2 3 | filter "obj != 1 ^ ``` ``` // Before sdb> ! pwd sdb: cannot recognize command: sdb> ! echo hello! Multiple ! not supported // After sdb> ! pwd /export/home/delphix/sdb sdb> ! echo hello! hello! ```
1 parent 0e1988e commit 34d4ef9

File tree

74 files changed

+1097
-180
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

74 files changed

+1097
-180
lines changed

sdb/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@
3030
#
3131
from sdb.error import (Error, CommandNotFoundError, CommandError,
3232
CommandInvalidInputError, SymbolNotFoundError,
33-
CommandArgumentsError, CommandEvalSyntaxError)
33+
CommandArgumentsError, CommandEvalSyntaxError,
34+
ParserError)
3435
from sdb.target import (create_object, get_object, get_prog, get_typed_null,
3536
get_type, get_pointer_type, get_target_flags,
3637
get_symbol, type_canonical_name, type_canonicalize,
@@ -53,6 +54,7 @@
5354
'Error',
5455
'InputHandler',
5556
'Locator',
57+
'ParserError',
5658
'PrettyPrinter',
5759
'SingleInputCommand',
5860
'SymbolNotFoundError',

sdb/command.py

Lines changed: 39 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -215,13 +215,47 @@ def help(cls, name: str) -> None:
215215

216216
input_type: Optional[str] = None
217217

218-
def __init__(self, args: str = "", name: str = "_") -> None:
218+
def __init__(self,
219+
args: Optional[List[str]] = None,
220+
name: str = "_") -> None:
219221
self.name = name
220222
self.isfirst = False
221223
self.islast = False
222224

223225
self.parser = type(self)._init_parser(name)
224-
self.args = self.parser.parse_args(args.split())
226+
227+
#
228+
# The if-else clauses below may seem like it can be avoided by:
229+
#
230+
# [1] Passing the `args` function argument to parse_args() even if
231+
# it is None - the call won't blow up.
232+
#
233+
# or [2] Setting the default value of `args` to be [] instead of None.
234+
#
235+
# Solution [1] doesn't work because parse_args() actually distinguishes
236+
# between None and [] as parameters. If [] is passed it returns an
237+
# argparse.Namespace() with default values for all the fields that the
238+
# command specified in _init_parser(), which is what we want. If None
239+
# is passed then argparse's default logic is to attempt to parse
240+
# `_sys.argv[1:]` (reference code: cpython/Lib/argparse.py) which is
241+
# the arguments passed to the sdb from the shell. This is far from what
242+
# we want.
243+
#
244+
# Solution 2 is dangerous as default arguments in Python are mutable(!)
245+
# and thus invoking a Command with arguments that doesn't specify the
246+
# __init__() method can pass its arguments to a similar Command later
247+
# in the pipeline even if the latter Command didn't specify any args.
248+
# [docs.python-guide.org/writing/gotchas/#mutable-default-arguments]
249+
#
250+
# We still want to set self.args to an argparse.Namespace() with the
251+
# fields specific to our self.parser, thus we are forced to call
252+
# parse_args([]) for it, even if `args` is None. This way commands
253+
# using arguments can always do self.args.<expected field> without
254+
# having to check whether this field exist every time.
255+
#
256+
if args is None:
257+
args = []
258+
self.args = self.parser.parse_args(args)
225259

226260
def __init_subclass__(cls, **kwargs: Any) -> None:
227261
"""
@@ -365,7 +399,9 @@ def _init_parser(cls, name: str) -> argparse.ArgumentParser:
365399
parser.add_argument("type", nargs=argparse.REMAINDER)
366400
return parser
367401

368-
def __init__(self, args: str = "", name: str = "_") -> None:
402+
def __init__(self,
403+
args: Optional[List[str]] = None,
404+
name: str = "_") -> None:
369405
super().__init__(args, name)
370406
if not self.args.type:
371407
self.parser.error("the following arguments are required: <type>")

sdb/commands/container_of.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ class ContainerOf(sdb.Command):
4040
4141
sdb> addr init_task | cast void *
4242
(void *)0xffffffffa8217740
43-
sdb> addr init_task | member comm | addr | container_of struct task_struct comm | cast void *
43+
sdb> addr init_task | member comm | addr | container_of task_struct comm | cast void *
4444
(void *)0xffffffffa8217740
4545
4646
"""

sdb/commands/filter.py

Lines changed: 25 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
# pylint: disable=missing-docstring
1818

1919
import argparse
20-
from typing import Iterable
20+
from typing import Iterable, List, Optional
2121

2222
import drgn
2323
import sdb
@@ -55,16 +55,31 @@ def _init_parser(cls, name: str) -> argparse.ArgumentParser:
5555
parser.add_argument("expr", nargs=argparse.REMAINDER)
5656
return parser
5757

58-
def __init__(self, args: str = "", name: str = "_") -> None:
58+
@staticmethod
59+
def _parse_expression(input_expr: str) -> List[str]:
60+
pass
61+
62+
def __init__(self,
63+
args: Optional[List[str]] = None,
64+
name: str = "_") -> None:
5965
super().__init__(args, name)
6066
if not self.args.expr:
61-
self.parser.error("the following arguments are required: expr")
67+
self.parser.error("no expression specified")
68+
69+
#
70+
# This is a stop-gap solution until we figure out
71+
# exactly how we want the filter command to behave.
72+
#
73+
if len(self.args.expr) == 1:
74+
self.expr = self.args.expr[0].split()
75+
else:
76+
self.expr = self.args.expr
6277

6378
index = None
6479
operators = ["==", "!=", ">", "<", ">=", "<="]
6580
for operator in operators:
6681
try:
67-
index = self.args.expr.index(operator)
82+
index = self.expr.index(operator)
6883
# Use the first comparison operator we find.
6984
break
7085
except ValueError:
@@ -83,22 +98,22 @@ def __init__(self, args: str = "", name: str = "_") -> None:
8398
raise sdb.CommandInvalidInputError(
8499
self.name, "left hand side of expression is missing")
85100

86-
if index == len(self.args.expr) - 1:
101+
if index == len(self.expr) - 1:
87102
# If the index is found to be at the very end of the list,
88103
# this means there's no right hand side of the comparison to
89104
# compare the left hand side to. This is an error.
90105
raise sdb.CommandInvalidInputError(
91106
self.name, "right hand side of expression is missing")
92107

93108
try:
94-
self.lhs_code = compile(" ".join(self.args.expr[:index]),
95-
"<string>", "eval")
96-
self.rhs_code = compile(" ".join(self.args.expr[index + 1:]),
97-
"<string>", "eval")
109+
self.lhs_code = compile(" ".join(self.expr[:index]), "<string>",
110+
"eval")
111+
self.rhs_code = compile(" ".join(self.expr[index + 1:]), "<string>",
112+
"eval")
98113
except SyntaxError as err:
99114
raise sdb.CommandEvalSyntaxError(self.name, err)
100115

101-
self.compare = self.args.expr[index]
116+
self.compare = self.expr[index]
102117

103118
def _call_one(self, obj: drgn.Object) -> Iterable[drgn.Object]:
104119
try:

sdb/commands/internal/util.py

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,8 @@ def get_valid_type_by_name(cmd: sdb.Command, tname: str) -> drgn.Type:
2727
corresponding drgn.Type object.
2828
2929
This function is used primarily by commands that accept a type
30-
name as an argument and exists mainly for 2 reasons:
31-
[1] There is a limitation in the way the SDB lexer interacts with
32-
argparse making it hard for us to parse type names more than
33-
1 token wide (e.g. 'struct task_struct'). [bad reason]
34-
[2] We save some typing for the user. [good reason]
30+
name as an argument and exist only to save keystrokes for the
31+
user.
3532
"""
3633
if tname in ['struct', 'enum', 'union', 'class']:
3734
#
@@ -43,8 +40,9 @@ def get_valid_type_by_name(cmd: sdb.Command, tname: str) -> drgn.Type:
4340
# user-friendly and thus we just avoid that situation
4441
# by instructing the user to skip such keywords.
4542
#
46-
raise sdb.CommandError(cmd.name,
47-
f"skip keyword '{tname}' and try again")
43+
raise sdb.CommandError(
44+
cmd.name,
45+
f"skip keyword '{tname}' or quote your type \"{tname} <typename>\"")
4846

4947
try:
5048
type_ = sdb.get_type(tname)

sdb/commands/linux/per_cpu.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
# pylint: disable=missing-docstring
1818

1919
import argparse
20-
from typing import Iterable
20+
from typing import Iterable, List, Optional
2121

2222
import drgn
2323
import drgn.helpers.linux.cpumask as drgn_cpumask
@@ -51,7 +51,9 @@ def _init_parser(cls, name: str) -> argparse.ArgumentParser:
5151
parser.add_argument("cpus", nargs="*", type=int)
5252
return parser
5353

54-
def __init__(self, args: str = "", name: str = "_") -> None:
54+
def __init__(self,
55+
args: Optional[List[str]] = None,
56+
name: str = "_") -> None:
5557
super().__init__(args, name)
5658
self.ncpus = len(
5759
list(drgn_cpumask.for_each_possible_cpu(sdb.get_prog())))

sdb/commands/linux/slabs.py

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -148,14 +148,6 @@ def no_input(self) -> Iterable[drgn.Object]:
148148
def __pp_parse_args(self) -> Tuple[str, List[str], Dict[str, Any]]:
149149
fields = self.DEFAULT_FIELDS
150150
if self.args.o:
151-
#
152-
# HACK: Until we have a proper lexer for SDB we can
153-
# only pass the comma-separated list as a
154-
# string (e.g. quoted). Until this is fixed
155-
# we make sure to unquote such strings.
156-
#
157-
if self.args.o[0] == '"' and self.args.o[-1] == '"':
158-
self.args.o = self.args.o[1:-1]
159151
fields = self.args.o.split(",")
160152
elif self.args.v:
161153
fields = list(Slabs.FIELDS.keys())

sdb/commands/pyfilter.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
# pylint: disable=missing-docstring
1818

1919
import argparse
20-
from typing import Iterable
20+
from typing import Iterable, List, Optional
2121

2222
import drgn
2323
import sdb
@@ -33,7 +33,9 @@ def _init_parser(cls, name: str) -> argparse.ArgumentParser:
3333
parser.add_argument("expr", nargs=argparse.REMAINDER)
3434
return parser
3535

36-
def __init__(self, args: str = "", name: str = "_") -> None:
36+
def __init__(self,
37+
args: Optional[List[str]] = None,
38+
name: str = "_") -> None:
3739
super().__init__(args, name)
3840
if not self.args.expr:
3941
self.parser.error("the following arguments are required: expr")

sdb/commands/spl/spl_kmem_caches.py

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -148,14 +148,6 @@ def no_input(self) -> Iterable[drgn.Object]:
148148
def __pp_parse_args(self) -> Tuple[str, List[str], Dict[str, Any]]:
149149
fields = SplKmemCaches.DEFAULT_FIELDS
150150
if self.args.o:
151-
#
152-
# HACK: Until we have a proper lexer for SDB we can
153-
# only pass the comma-separated list as a
154-
# string (e.g. quoted). Until this is fixed
155-
# we make sure to unquote such strings.
156-
#
157-
if self.args.o[0] == '"' and self.args.o[-1] == '"':
158-
self.args.o = self.args.o[1:-1]
159151
fields = self.args.o.split(",")
160152
elif self.args.v:
161153
fields = list(SplKmemCaches.FIELDS.keys())

sdb/commands/stacks.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
# pylint: disable=missing-docstring
1818

1919
import argparse
20-
from typing import Dict, Iterable, List, Tuple
20+
from typing import Dict, Iterable, List, Optional, Tuple
2121
from collections import defaultdict
2222

2323
import drgn
@@ -143,7 +143,9 @@ class Stacks(sdb.Locator, sdb.PrettyPrinter):
143143
input_type = "struct task_struct *"
144144
output_type = "struct task_struct *"
145145

146-
def __init__(self, args: str = "", name: str = "_") -> None:
146+
def __init__(self,
147+
args: Optional[List[str]] = None,
148+
name: str = "_") -> None:
147149
super().__init__(args, name)
148150
self.mod_start, self.mod_end = 0, 0
149151
self.func_start, self.func_end = 0, 0

0 commit comments

Comments
 (0)