Skip to content

Conversation

@WillAyd
Copy link
Member

@WillAyd WillAyd commented Apr 29, 2019

It looks like the extension module is unnecessarily incrementing the reference count for numeric objects and never releasing it, which causes a leak in to_json.

Still trying to grok exactly what the purpose of npyCtxtPassthru (comment in same file mentions it is required when encoding multi-dimensional arrays).

Removing the check for PyArray_ISDATETIME caused segfaults but that doesn't appear to leak memory anyway so most likely intentional to increment that ref count.

ASV results:

       before           after         ratio
     [9feb3ad9]       [9bfd45d3]
     <master>         <json-mem-fix~1>
-           69.1M            57.7M     0.84  io.json.ToJSONMem.peakmem_float
-           69.1M            57.7M     0.83  io.json.ToJSONMem.peakmem_int

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

@WillAyd WillAyd added the IO JSON read_json, to_json, json_normalize label Apr 29, 2019
@codecov
Copy link

codecov bot commented Apr 29, 2019

Codecov Report

Merging #26239 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26239      +/-   ##
==========================================
- Coverage   91.97%   91.96%   -0.01%     
==========================================
  Files         175      175              
  Lines       52368    52368              
==========================================
- Hits        48164    48160       -4     
- Misses       4204     4208       +4
Flag Coverage Δ
#multiple 90.52% <ø> (ø) ⬆️
#single 40.69% <ø> (-0.15%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.9% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9feb3ad...24e1a23. Read the comment docs.

1 similar comment
@codecov
Copy link

codecov bot commented Apr 29, 2019

Codecov Report

Merging #26239 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26239      +/-   ##
==========================================
- Coverage   91.97%   91.96%   -0.01%     
==========================================
  Files         175      175              
  Lines       52368    52368              
==========================================
- Hits        48164    48160       -4     
- Misses       4204     4208       +4
Flag Coverage Δ
#multiple 90.52% <ø> (ø) ⬆️
#single 40.69% <ø> (-0.15%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 96.9% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9feb3ad...24e1a23. Read the comment docs.

@jreback jreback added this to the 0.25.0 milestone Apr 30, 2019
@jreback jreback merged commit 5cb006f into pandas-dev:master Apr 30, 2019
@jreback
Copy link
Contributor

jreback commented Apr 30, 2019

thanks @WillAyd

@WillAyd WillAyd deleted the json-mem-fix branch January 16, 2020 00:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

IO JSON read_json, to_json, json_normalize

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory leak in df.to_json

2 participants