Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Conversation

@mstfbl
Copy link
Contributor

@mstfbl mstfbl commented Apr 5, 2021

This PR fixes an issue pointed out by Bandit w.r.t. using the xml library to parse untrusted data, by using defusedxml instead.

Bandit output:

>> Issue: [B314:blacklist] Using xml.etree.ElementTree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace xml.etree.ElementTree.parse with its defusedxml equivalent function or make sure defusedxml.defuse_stdlib() is called
   Severity: Medium   Confidence: High
   Location: ./torchtext/data/datasets_utils.py:24
   More Info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b313-b320-xml-bad-elementtree
23	    with codecs.open(f_txt, mode='w', encoding='utf-8') as fd_txt:
24	        root = ET.parse(f_xml).getroot()[0]
25	        for doc in root.findall('doc'):

>> Issue: [B314:blacklist] Using xml.etree.ElementTree.parse to parse untrusted XML data is known to be vulnerable to XML attacks. Replace xml.etree.ElementTree.parse with its defusedxml equivalent function or make sure defusedxml.defuse_stdlib() is called
   Severity: Medium   Confidence: High
   Location: ./torchtext/legacy/datasets/translation.py:169
   More Info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b313-b320-xml-bad-elementtree
168	            with codecs.open(f_txt, mode='w', encoding='utf-8') as fd_txt:
169	                root = ET.parse(f_xml).getroot()[0]
170	                for doc in root.findall('doc'):

@codecov
Copy link

codecov bot commented Apr 5, 2021

Codecov Report

Merging #1279 (3ea63f7) into master (c37f8a0) will increase coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1279      +/-   ##
==========================================
+ Coverage   78.80%   78.84%   +0.03%     
==========================================
  Files          67       67              
  Lines        3624     3630       +6     
==========================================
+ Hits         2856     2862       +6     
  Misses        768      768              
Impacted Files Coverage Δ
torchtext/data/datasets_utils.py 90.76% <100.00%> (+0.14%) ⬆️
torchtext/legacy/datasets/translation.py 37.23% <100.00%> (+2.06%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c37f8a0...3ea63f7. Read the comment docs.

@seemethere seemethere merged commit 5efd71c into pytorch:master Apr 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants