Skip to content

Conversation

@toph-allen
Copy link
Collaborator

The actual issue describing the "table view" goes into more detail than just a table view — I'm guessing that's actually referring to later issues in the series (https://github.com/posit-dev/connect/issues/30286).

Dogfood: https://rsc.radixu.com/who-is-visiting-this-content/
connect.posit.it: https://connect.posit.it/who-is-visiting-this-content/

Most basic possible approach: paste in a content GUID and get a table of individual visits and a table of total activity, sorted by descending date and number of visits respectively. Shows a summary sentence of total visits.

Screen Shot 2025-03-13 at 06 15 55 PM@2x

Notes:

  • Uses the firehose on dev builds and the existing APIs if it isn't available, so the intersection of the available data is displayed (i.e. just username and timestamp info).
  • Data are for the last week.
  • Shows a message asking you to paste in a GUID if you haven't entered anything.
  • The summary sentence is going to look bad for content that doesn't have a title (which might mainly be a problem on Dogfood)

@toph-allen toph-allen marked this pull request as draft March 13, 2025 22:23
@jonkeane
Copy link
Collaborator

The actual issue describing the "table view" goes into more detail than just a table view — I'm guessing that's actually referring to later issues in the series (https://github.com/posit-dev/connect/issues/30286).

💯 right about this, drafting error when we (I, I think in this case) was breaking things down.

Comment on lines +66 to +74
observe({
if (nchar(input$content_guid) == 0) {
show("guid_input_msg")
hide("content_visit_tables")
} else {
hide("guid_input_msg")
show("content_visit_tables")
}
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would go here, but what do you think about adding in a parser that lets someone paste in a full URL that includes the GUID? This is a bit of an extension / scope creep so if we want to make a separate issue for it and do that in a follow on that's totally fine. But the UX of needing to select just the GUID form the URL (or get to it in the settings panel) could be better. We had good success with the publisher where the field that accepts GUID also accepts the URL (and, in fact, in the publisher it's the GUID alone that is the hidden easter egg that it just works, we though that folks would be more confident / able to / comfortable with grabbing the URL)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to keep this as minimal as possible, figuring it wouldn't be the final UI for selecting a piece of content.

Perhaps it would be fine to add a bit of extra logic to extract the GUID if a URL is pasted in. OTOH, perhaps even having this hide / show logic and a "paste in a content guid" message is too much code to pour into something that won't be a final UI. I can really see both arguments. I'd rather not put more effort in here till a later edition, at the very least.

Comment on lines +28 to +30
# FIXME for connectapi: This is slow, it's where most of the slowness is with
# the new endpoint.
usage_parsed <- connectapi:::parse_connectapi_typed(usage_raw, usage_dtype)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we already have an issue in connectapi for this? Also if it's getting in our way (by being slow) here, do we even need to enforce typing like this here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is now an issue to track it: posit-dev/connectapi#383 :)

It's not horrendously slow, but it is definitely at least worth investigating as a connectapi issue.

And as for typing enforcement — I guess that depends on what you mean by "need". It's nice: parse_connectapi_typed() "automatically" handles deserializing the JSON from Connect into the correct R types and structure, abstracting away a lot of stuff that's fiddly in R, and there's value in that.

Another way I'm thinking about it is… if I just coded from scratch all the things parse_connectapi_typed does to shape the data into the shape I want, maybe be just as slow. And if that were the case, then I'd want to keep it, because it makes life a lot simpler in a few ways.

However, if, say, the base parse_connectapi() function, which doesn't do any type enforcement, was much faster, than it'd probably be better to use that and handle type conversion in a different way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't quite tell from this part of your comment;

However, if, say, the base parse_connectapi() function, which doesn't do any type enforcement, was much faster, than it'd probably be better to use that and handle type conversion in a different way.

Is parse_conenctapi() much faster in this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure when I wrote the comment (I wanted to avoid opening cans of worms while I was responding to your comments).

It isn't much faster:

> length(usage_raw)
[1] 485794
> system.time(connectapi:::parse_connectapi_typed(usage_raw, usage_dtype))
   user  system elapsed 
 28.245   1.488  29.798 
> system.time(connectapi:::parse_connectapi(usage_raw))
   user  system elapsed 
 17.602   0.330  18.128

I'm also going to post this info to the connectapi issue so that it doesn't get lost.

Comment on lines +79 to +87
# Default dates. "This week" is best "common sense" best represented by six
# days ago thru the end of today. Without these, content takes too long to
# display on some servers.
date_range <- reactive({
list(
from_date = today() - ddays(6),
to_date = today()
)
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great default + kick the can of "which is the best period? Should I do forever? etc" for when we get to the filtering edition later 💯

Comment on lines +119 to +128
aggregated_visits_data <- reactive({
usage_data() |>
filter(content_guid == input$content_guid) |>
group_by(user_guid) |>
summarize(n_visits = n()) |>
left_join(user_names(), by = "user_guid") |>
replace_na(list(full_name = "[Anonymous]")) |>
arrange(desc(n_visits)) |>
select(n_visits, full_name, username)
}) |> bindCache(date_range()$from_date, date_range()$to_date, input$content_guid)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might very well be another issue/PR later, since it wasn't in the original and is conceptually a bit of an addition, but:

If I'm remembering / understanding correctly items that have multiple URL routes and that folks navigate through will have a bunch of lines in the firehosue endpoint that are really one session of visits (to use the correct word to describe it, but we have overloaded that slightly with our old API endpoint! I'm meaning the abstract / general use not specifically what comes from our shiny sessions endpoint). What do you think about adding in a collapse here too? Something like "entries within N minutes of each other are considered one visit"? We definitely wouldn't want only one measure or the other — both are valuable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is definitely a priority!

Following our discussion What do you think about separating this investigation out as its own issue? Kind of a spike on computing the right metric? It'll involve some research and experimentation, and possibly discussion with @marcosnav.

Copy link
Collaborator Author

@toph-allen toph-allen Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created https://github.com/posit-dev/connect/issues/30515 about figuring out hits vs sessions

@toph-allen toph-allen marked this pull request as ready for review March 18, 2025 18:00
Copy link
Contributor

@marcosnav marcosnav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far looks good to me, some minor feedback from running it.

  • Table headers, timestamp -> Time, full_name -> Full Name, n_visits -> Total Visits
  • Values on the timestamp column to be human friendly

Comment on lines +10 to +12
shinyOptions(
cache = cachem::cache_disk("./app_cache/cache/", max_age = 60 * 60 * 8)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, requests for this dashboard respond pretty fast, do you think managing cache is still needed?

It'll be nice if this one can be simplified by removing cache operations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one will eventually be a sub-view of the other dashboard, and at that point it'll just probably get its data passed in from that dashboard's cache.

For now, since it needs to load the firehouse, and there is some time taken e.g. on Dogfood for processing the ~500000 records returned, caching is helpful for demo purposes.

That's from memory -- I can definitely try removing the cache to see if it performs sufficiently well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes 30 seconds to process the firehose endpoint in connectapi. There is an issue I opened to see if that performance cost can be reduced, but for now, caching after that processing step makes the app responsive faster after loading.

@toph-allen
Copy link
Collaborator Author

So far looks good to me, some minor feedback from running it.

  • Table headers, timestamp -> Time, full_name -> Full Name, n_visits -> Total Visits

  • Values on the timestamp column to be human friendly

Thanks for taking a look!

As part of the feedback from the review session, we're gonna investigate changing the table display framework we're using. That will change how we perform date formatting and header renaming, so in a discussion with @jonkeane I decided to skip adding those for this version.

@toph-allen
Copy link
Collaborator Author

As I got two comments about it, I added column names and date formatting to this version, rather than waiting till a later iteration.

Copy link
Contributor

@marcosnav marcosnav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing previous comments

@toph-allen toph-allen merged commit 50668cf into main Mar 18, 2025
2 checks passed
@cgraham-rs cgraham-rs deleted the toph/who-is-visiting-this-content-table-edition branch June 26, 2025 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants