Skip to content

Determine the best sorting order #72

@gaurav

Description

@gaurav

On a development version of NameRes, I tried searching for COVID with MONDO|HP filtering turned on, Biolink type filtering to Disease and sorting by shortest_name_length. I got back the following results in this order:

  1. MONDO:0100233 "long COVID-19"
  2. MONDO:0100163 "COVID-19–associated multisystem inflammatory syndrome in children"
  3. MONDO:0100319 "COVID-19–associated multisystem inflammatory syndrome in adults"
  4. MONDO:0100096 "COVID-19"

This is because MONDO:0100233 ("PASC") and MONDO:0100163 ("MISC", "PIMS", "PMIS") have shorter synonyms than MONDO:0100319 ("MIS-A") and MONDO:0100096 ("β-CoV"), and since both of the latter have the same length of synonym, we don't have any way of separating them.

This isn't too bad, since COVID-19 is still in the first five results, but it's not ideal.

There are other stats we could measure to help improve this situation:

  • preferred_name_length: The length of the preferred name
  • information_content: The information content of the clique (this will be missing for lots of identifiers: we could sort them after every concept we have an information concept value for)
  • ??? (@cbizon any ideas?)

Options for a search order:

  1. preferred_name_length > shortest_name_length
  2. information_content > preferred_name_length > shortest_name_length
  3. information_content > shortest_name_length > preferred_name_length

Probably the next step will be to include all three of these stats in the Solr database, and then we can make a Vue application to compare different sorting strategies until we find the one we like best.

We can also implement more complex metrics if we need to by adding them to Babel and summarizing them into a score field that we can sort on.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions