Skip to content

Conversation

jamesbraza
Copy link
Collaborator

Seen in logs today:

Traceback (most recent call last):
  ...
  File "/srv/.venv/lib/python3.12/site-packages/paperqa/clients/__init__.py", line 153, in query
    await gather_with_concurrency(
  File "/srv/.venv/lib/python3.12/site-packages/lmi/utils.py", line 100, in gather_with_concurrency
    return await asyncio.gather(*(sem_coro(c) for c in coros))
  File "/srv/.venv/lib/python3.12/site-packages/lmi/utils.py", line 87, in sem_coro
    return await coro
  File "/srv/.venv/lib/python3.12/site-packages/paperqa/clients/client_models.py", line 110, in query
    return await self._query(client_query)
  File "/srv/.venv/lib/python3.12/site-packages/paperqa/clients/semantic_scholar.py", line 358, in _query
    return await get_s2_doc_details_from_doi(
  File "/srv/.venv/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 189, in async_wrapped
    return await copy(fn, *args, **kwargs)
  File "/srv/.venv/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 111, in __call__
    do = await self.iter(retry_state=retry_state)
  File "/srv/.venv/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 153, in iter
    result = await action(retry_state)
  File "/srv/.venv/lib/python3.12/site-packages/tenacity/_utils.py", line 99, in inner
    return call(*args, **kwargs)
  File "/srv/.venv/lib/python3.12/site-packages/tenacity/__init__.py", line 398, in <lambda>
    self._add_action_func(lambda rs: rs.outcome.result())
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/srv/.venv/lib/python3.12/site-packages/tenacity/asyncio/__init__.py", line 114, in __call__
    result = await fn(*args, **kwargs)
  File "/srv/.venv/lib/python3.12/site-packages/paperqa/clients/semantic_scholar.py", line 307, in get_s2_doc_details_from_doi
    return await parse_s2_to_doc_details(
  File "/srv/.venv/lib/python3.12/site-packages/paperqa/clients/semantic_scholar.py", line 196, in parse_s2_to_doc_details
    doc_details = DocDetails(
  File "/srv/.venv/lib/python3.12/site-packages/pydantic/main.py", line 253, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
  File "/srv/.venv/lib/python3.12/site-packages/paperqa/types.py", line 949, in validate_all_fields
    data = cls.populate_bibtex_key_citation(data)
  File "/srv/.venv/lib/python3.12/site-packages/paperqa/types.py", line 860, in populate_bibtex_key_citation
    iter(Parser().parse_string(data["bibtex"]).entries.values())
  File "/srv/.venv/lib/python3.12/site-packages/pybtex/database/input/bibtex.py", line 407, in parse_string
    self.process_entry(entry_type, *entry[1])
  File "/srv/.venv/lib/python3.12/site-packages/pybtex/database/input/bibtex.py", line 370, in process_entry
    entry.add_person(Person(name), field_name)
  File "/srv/.venv/lib/python3.12/site-packages/pybtex/database/__init__.py", line 620, in __init__
    self._parse_string(string)
  File "/srv/.venv/lib/python3.12/site-packages/pybtex/database/__init__.py", line 749, in _parse_string
    report_error(InvalidNameString(name))
  File "/srv/.venv/lib/python3.12/site-packages/pybtex/errors.py", line 78, in report_error
    raise exception
pybtex.database.InvalidNameString: Too many commas in 'Kyriacos, Κυριάκος, Athanasiou, Αθανασίου'

It seems a Greek author coming from S2 could crash DocDetails creation. This PR just prevents the crash and enhances logging to include the DOI/title.

@jamesbraza jamesbraza self-assigned this Aug 6, 2025
@jamesbraza jamesbraza added the bug Something isn't working label Aug 6, 2025
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Aug 6, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a crash in DocDetails creation caused by Greek author names containing special characters that the pybtex library cannot parse. The fix enhances error handling and logging for better debugging.

  • Catches InvalidNameString exceptions from pybtex when parsing author names with Greek characters
  • Improves error logging to include DOI and title information for better debugging context

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Aug 6, 2025
@jamesbraza jamesbraza merged commit d1cde22 into main Aug 6, 2025
6 checks passed
@jamesbraza jamesbraza deleted the handling-invalid-names branch August 6, 2025 18:23
jamesbraza added a commit that referenced this pull request Aug 6, 2025
jamesbraza added a commit that referenced this pull request Aug 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants