Skip to content

Conversation

@corona10
Copy link
Member

@corona10 corona10 commented Aug 29, 2021

@corona10
Copy link
Member Author


+---------------+--------+----------------------+
| Benchmark     | base   | opt                  |
+===============+========+======================+
| bench pattern | 482 ns | 417 ns: 1.15x faster |
+---------------+--------+----------------------+

goto fail;
}
values = PyList_New(0);
values = PyTuple_New(nkeys);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size of the tuple is predictable.

Python/ceval.c Outdated
}
PyObject *value = PyObject_CallFunctionObjArgs(get, key, dummy, NULL);
PyObject *args[] = { key, dummy };
PyObject *value = PyObject_Vectorcall(get, args, 2, NULL);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just replacing PyObject_CallFunctionObjArgs shows a 2% performance enhancement on the micro benchmark.

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Aug 29, 2021

The changes LGTM. Tested locally on Win64:

python -m test test_patma -R 3:3
0:00:00 Run tests sequentially
0:00:00 [1/1] test_patma
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

BTW, I was thinking if using _PyObject_GetMethod instead of _PyObject_GetAttrId will make your benchmark faster? The diff from your current is not too large:

@@ -846,7 +846,9 @@ match_keys(PyThreadState *tstate, PyObject *map, PyObject *keys)
     // - Don't cause key creation or resizing in dict subclasses like
     //   collections.defaultdict that define __missing__ (or similar).
     _Py_IDENTIFIER(get);
-    PyObject *get = _PyObject_GetAttrId(map, &PyId_get);
+    PyObject *get_name = _PyUnicode_FromId(&PyId_get); // borrowed
+    PyObject *get = NULL;
+    int meth_found = _PyObject_GetMethod(map, get_name, &get);
     if (get == NULL) {
         goto fail;
     }
@@ -873,8 +875,14 @@ match_keys(PyThreadState *tstate, PyObject *map, PyObject *keys)
             }
             goto fail;
         }
-        PyObject *args[] = { key, dummy };
-        PyObject *value = PyObject_Vectorcall(get, args, 2, NULL);
+        PyObject *args[] = { map, key, dummy };
+        PyObject *value = NULL;
+        if (meth_found) {
+            value = PyObject_Vectorcall(get, args, 3, NULL);
+        }
+        else {
+            value = PyObject_Vectorcall(get, &args[1], 2, NULL);
+        }
         if (value == NULL) {
             goto fail;
         }

@corona10
Copy link
Member Author

corona10 commented Aug 29, 2021

@Fidget-Spinner
Yeah it's better!


➜  cpython git:([bpo-45045](https://bugs.python.org/issue45045)) ✗ ./python.exe -m pyperf compare_to --table base.json suggestion.json
+---------------+--------+----------------------+
| Benchmark     | base   | suggestion           |
+===============+========+======================+
| bench pattern | 482 ns | 373 ns: 1.29x faster |
+---------------+--------+----------------------+
➜  cpython git:([bpo-45045](https://bugs.python.org/issue45045)) ✗ ./python.exe -m pyperf compare_to --table opt.json suggestion.json
+---------------+--------+----------------------+
| Benchmark     | opt    | suggestion           |
+===============+========+======================+
| bench pattern | 417 ns | 373 ns: 1.12x faster |
+---------------+--------+----------------------+

@corona10
Copy link
Member Author

With new commit

0:00:00 load avg: 5.05 Run tests sequentially
0:00:00 load avg: 5.05 [1/1] test_patma
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

Total duration: 1.1 sec
Tests result: SUCCESS

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@corona10
Copy link
Member Author

corona10 commented Aug 30, 2021

@Fidget-Spinner Thanks for the review.

Here is the final benchmark with optimization build with thin LTO :)

+---------------+---------------+----------------------+
| Benchmark     | thin_lto_base | thin_lto_opt         |
+===============+===============+======================+
| bench pattern | 357 ns        | 287 ns: 1.24x faster |
+---------------+---------------+----------------------+

@corona10 corona10 deleted the bpo-45045 branch August 30, 2021 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants