Skip to content

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented May 22, 2024

(updated description)

Writing JSON files (or encoding to a string) is not thread-safe in the sense that when encoding data to json while another thread is mutating the data, the result is not well-defined (this is true for both the normal and free-threading build). But the free-threading build can crash the interpreter while writing JSON because of the usage of methods like PySequence_Fast_GET_ITEM. In this PR we make the free-threading build safe by adding locks in three places in the JSON encoder.

Reading from a JSON file is safe: objects constructed are only known to the executing thread. Encoding data to JSON needs a bit more care: mutable Python objects such as a list or a dict could be modified by another thread during encoding.

  • When encoding a list use Py_BEGIN_CRITICAL_SECTION_SEQUENCE_FAST to project against mutation the list
  • When encoding a dict, we use a critical section for iteration over exact dicts (PyDict_Next is used there). The non-exact dicts use PyMapping_Items to create a list of tuples. PyMapping_Items itself is assumed to be thread safe, but the resulting list is not a copy and can be mutated.

Update 2025-02-10: refactored to avoid using Py_EXIT_CRITICAL_SECTION_SEQUENCE_FAST

  • The script below was used to test the free-threading implementation. Similar code was added to the tests.
Test script
import json
from threading import Thread
import time

class JsonThreadingTest:
    
    def __init__(self, number_of_threads=4, number_of_json_dumps=10):
    
        self.data = [ [], [], {}, {}, {}]
        self.json = {str(ii): d for ii, d in enumerate(self.data)}
        self.results =[]
        self.number_of_threads=number_of_threads
        self.number_of_json_dumps =number_of_json_dumps
            
    def modify(self, index):
        while self.continue_thread:
            for d in self.data:
                if isinstance(d, list ):
                    if len(d)>20:
                        d.clear()
                    else:
                        d.append(index)
                else:
                    if len(d)>20:
                        try:
                            d.pop(list(d)[0])
                        except KeyError:
                            pass
                    else:
                        if index%2:                            
                            d[index] = index
                        else:
                            d[bytes(index)] = bytes(index)
                    
    def test(self):
        self.continue_thread = True
        self.modifying_threads = []
        for ii in range(self.number_of_threads):
            t = Thread(target=self.modify, args=[ii])
            self.modifying_threads.append(t)

        self.results.clear()
        for t in self.modifying_threads:
            print(f'start {t}')
            t.start()
            
        for ii in range(self.number_of_json_dumps):
            print(f'dump {ii}')
            time.sleep(0.01)
            
            indent = ii if ii%3==0 else None
            if ii%5==0:
                try:
                    j = json.dumps(self.data, indent=indent, skipkeys=True)
                except TypeError:
                        pass
            else:
                j = json.dumps(self.data, indent=indent)
            self.results.append(j)
        self.continue_thread= False
        
        print([hash(r) for r in self.results])
            


t=JsonThreadingTest(number_of_json_dumps=102, number_of_threads=8)
t0=time.time()
t.test()
dt=time.time()-t0
print(t.results[-1])        
print(f'Done: {dt:.2f}')
  • The test script with t=JsonThreadingTest(number_of_json_dumps=102, number_of_threads=8) is a factor 25 faster using free-threading. Nice!

@nineteendo
Copy link
Contributor

You need to include the file that defines that macro.

Copy link
Contributor

@nineteendo nineteendo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert newlines

@eendebakpt eendebakpt changed the title Draft: gh-116738: Make _json module thread-safe #117530 gh-116738: Make _json module thread-safe #117530 May 31, 2024
@eendebakpt eendebakpt changed the title gh-116738: Make _json module thread-safe #117530 gh-116738: Make _json module thread-safe May 31, 2024
@eendebakpt eendebakpt changed the title gh-116738: Make _json module thread-safe gh-116738: Make _json module safe in the free-threading build Aug 14, 2024
Copy link
Contributor

@kumaraditya303 kumaraditya303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@kumaraditya303 kumaraditya303 merged commit 4357302 into python:main Aug 31, 2025
45 checks passed
if len(d) > 5:
try:
key = list(d)[0]
d.pop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be d.pop(key)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. See #138339

lkollar pushed a commit to lkollar/cpython that referenced this pull request Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants