Issue Description:
Hello.
I have discovered a performance degradation in the .fillna function of pandas version 1.4.1 and below 1.4.2. And I notice the repository depends on pandas 1.4.1 in scripts/eval/science-world/requirements.txt. I am not sure whether this performance problem in pandas will affect this repository. I found some discussions on pandas GitHub related to this issue, including #46149 and #46204.
I also found that scripts/data/interaction/collection/convert_outputs.ipynb and scripts/eval/mint-bench/convert_outputs.py used the influenced api. There may be more files using the influenced api and pandas version below 1.4.2.
Suggestion
I would recommend considering an upgrade to a different version of pandas >= 1.4.2 or exploring other solutions to optimize the performance.
Any other workarounds or solutions would be greatly appreciated.
Thank you!