who is visiting this content (table view) #38

toph-allen · 2025-03-13T22:17:04Z

The actual issue describing the "table view" goes into more detail than just a table view — I'm guessing that's actually referring to later issues in the series (https://github.com/posit-dev/connect/issues/30286).

Dogfood: https://rsc.radixu.com/who-is-visiting-this-content/
connect.posit.it: https://connect.posit.it/who-is-visiting-this-content/

Most basic possible approach: paste in a content GUID and get a table of individual visits and a table of total activity, sorted by descending date and number of visits respectively. Shows a summary sentence of total visits.

Notes:

Uses the firehose on dev builds and the existing APIs if it isn't available, so the intersection of the available data is displayed (i.e. just username and timestamp info).
Data are for the last week.
Shows a message asking you to paste in a GUID if you haven't entered anything.
The summary sentence is going to look bad for content that doesn't have a title (which might mainly be a problem on Dogfood)

jonkeane · 2025-03-14T12:55:01Z

The actual issue describing the "table view" goes into more detail than just a table view — I'm guessing that's actually referring to later issues in the series (https://github.com/posit-dev/connect/issues/30286).

💯 right about this, drafting error when we (I, I think in this case) was breaking things down.

jonkeane · 2025-03-14T13:01:13Z

extensions/who-is-visiting-this-content/app.R

+  observe({
+    if (nchar(input$content_guid) == 0) {
+      show("guid_input_msg")
+      hide("content_visit_tables")
+    } else {
+      hide("guid_input_msg")
+      show("content_visit_tables")
+    }
+  })


I think this would go here, but what do you think about adding in a parser that lets someone paste in a full URL that includes the GUID? This is a bit of an extension / scope creep so if we want to make a separate issue for it and do that in a follow on that's totally fine. But the UX of needing to select just the GUID form the URL (or get to it in the settings panel) could be better. We had good success with the publisher where the field that accepts GUID also accepts the URL (and, in fact, in the publisher it's the GUID alone that is the hidden easter egg that it just works, we though that folks would be more confident / able to / comfortable with grabbing the URL)

I was trying to keep this as minimal as possible, figuring it wouldn't be the final UI for selecting a piece of content.

Perhaps it would be fine to add a bit of extra logic to extract the GUID if a URL is pasted in. OTOH, perhaps even having this hide / show logic and a "paste in a content guid" message is too much code to pour into something that won't be a final UI. I can really see both arguments. I'd rather not put more effort in here till a later edition, at the very least.

jonkeane · 2025-03-14T13:03:09Z

extensions/who-is-visiting-this-content/get_usage.R

+  # FIXME for connectapi: This is slow, it's where most of the slowness is with
+  # the new endpoint.
+  usage_parsed <- connectapi:::parse_connectapi_typed(usage_raw, usage_dtype)


Do we already have an issue in connectapi for this? Also if it's getting in our way (by being slow) here, do we even need to enforce typing like this here?

There is now an issue to track it: posit-dev/connectapi#383 :)

It's not horrendously slow, but it is definitely at least worth investigating as a connectapi issue.

And as for typing enforcement — I guess that depends on what you mean by "need". It's nice: parse_connectapi_typed() "automatically" handles deserializing the JSON from Connect into the correct R types and structure, abstracting away a lot of stuff that's fiddly in R, and there's value in that.

Another way I'm thinking about it is… if I just coded from scratch all the things parse_connectapi_typed does to shape the data into the shape I want, maybe be just as slow. And if that were the case, then I'd want to keep it, because it makes life a lot simpler in a few ways.

However, if, say, the base parse_connectapi() function, which doesn't do any type enforcement, was much faster, than it'd probably be better to use that and handle type conversion in a different way.

I can't quite tell from this part of your comment;

However, if, say, the base parse_connectapi() function, which doesn't do any type enforcement, was much faster, than it'd probably be better to use that and handle type conversion in a different way.

Is parse_conenctapi() much faster in this case?

I wasn't sure when I wrote the comment (I wanted to avoid opening cans of worms while I was responding to your comments).

It isn't much faster:

> length(usage_raw) [1] 485794 > system.time(connectapi:::parse_connectapi_typed(usage_raw, usage_dtype)) user system elapsed 28.245 1.488 29.798 > system.time(connectapi:::parse_connectapi(usage_raw)) user system elapsed 17.602 0.330 18.128

I'm also going to post this info to the connectapi issue so that it doesn't get lost.

jonkeane · 2025-03-14T13:07:57Z

extensions/who-is-visiting-this-content/app.R

+  # Default dates. "This week" is best "common sense" best represented by six
+  # days ago thru the end of today. Without these, content takes too long to
+  # display on some servers.
+  date_range <- reactive({
+    list(
+      from_date = today() - ddays(6),
+      to_date = today()
+    )
+  })


This is a great default + kick the can of "which is the best period? Should I do forever? etc" for when we get to the filtering edition later 💯

extensions/who-is-visiting-this-content/app.R

jonkeane · 2025-03-14T13:12:45Z

extensions/who-is-visiting-this-content/app.R

+  aggregated_visits_data <- reactive({
+    usage_data() |>
+      filter(content_guid == input$content_guid) |>
+      group_by(user_guid) |>
+      summarize(n_visits = n()) |>
+      left_join(user_names(), by = "user_guid") |>
+      replace_na(list(full_name = "[Anonymous]")) |>
+      arrange(desc(n_visits)) |>
+      select(n_visits, full_name, username)
+  }) |> bindCache(date_range()$from_date, date_range()$to_date, input$content_guid)


This might very well be another issue/PR later, since it wasn't in the original and is conceptually a bit of an addition, but:

If I'm remembering / understanding correctly items that have multiple URL routes and that folks navigate through will have a bunch of lines in the firehosue endpoint that are really one session of visits (to use the correct word to describe it, but we have overloaded that slightly with our old API endpoint! I'm meaning the abstract / general use not specifically what comes from our shiny sessions endpoint). What do you think about adding in a collapse here too? Something like "entries within N minutes of each other are considered one visit"? We definitely wouldn't want only one measure or the other — both are valuable.

Yes, this is definitely a priority!

Following our discussion What do you think about separating this investigation out as its own issue? Kind of a spike on computing the right metric? It'll involve some research and experimentation, and possibly discussion with @marcosnav.

I created https://github.com/posit-dev/connect/issues/30515 about figuring out hits vs sessions

extensions/who-is-visiting-this-content/app.R

marcosnav

So far looks good to me, some minor feedback from running it.

Table headers, timestamp -> Time, full_name -> Full Name, n_visits -> Total Visits
Values on the timestamp column to be human friendly

marcosnav · 2025-03-18T18:14:44Z

extensions/who-is-visiting-this-content/app.R

+shinyOptions(
+  cache = cachem::cache_disk("./app_cache/cache/", max_age = 60 * 60 * 8)
+)


I think, requests for this dashboard respond pretty fast, do you think managing cache is still needed?

It'll be nice if this one can be simplified by removing cache operations.

This one will eventually be a sub-view of the other dashboard, and at that point it'll just probably get its data passed in from that dashboard's cache.

For now, since it needs to load the firehouse, and there is some time taken e.g. on Dogfood for processing the ~500000 records returned, caching is helpful for demo purposes.

That's from memory -- I can definitely try removing the cache to see if it performs sufficiently well.

It takes 30 seconds to process the firehose endpoint in connectapi. There is an issue I opened to see if that performance cost can be reduced, but for now, caching after that processing step makes the app responsive faster after loading.

toph-allen · 2025-03-18T18:24:06Z

So far looks good to me, some minor feedback from running it.

Table headers, timestamp -> Time, full_name -> Full Name, n_visits -> Total Visits

Values on the timestamp column to be human friendly

Thanks for taking a look!

As part of the feedback from the review session, we're gonna investigate changing the table display framework we're using. That will change how we perform date formatting and header renaming, so in a discussion with @jonkeane I decided to skip adding those for this version.

toph-allen · 2025-03-18T18:42:40Z

As I got two comments about it, I added column names and date formatting to this version, rather than waiting till a later iteration.

marcosnav

Thank you for addressing previous comments

toph-allen added 2 commits March 13, 2025 18:09

first commit

d89f596

fix summary sentence

4a11ae0

toph-allen requested review from Lytol, dethmasque, jonkeane and marcosnav March 13, 2025 22:17

toph-allen marked this pull request as draft March 13, 2025 22:23

jonkeane reviewed Mar 14, 2025

View reviewed changes

extensions/who-is-visiting-this-content/app.R Show resolved Hide resolved

jonkeane reviewed Mar 14, 2025

View reviewed changes

extensions/who-is-visiting-this-content/app.R Show resolved Hide resolved

toph-allen mentioned this pull request Mar 17, 2025

investigate parse_connectapi_typed performance posit-dev/connectapi#383

Open

clarify comment

b991f17

toph-allen marked this pull request as ready for review March 18, 2025 18:00

marcosnav reviewed Mar 18, 2025

View reviewed changes

add date formatting and column renaming

d317c06

toph-allen requested review from jonkeane and marcosnav March 18, 2025 18:42

marcosnav approved these changes Mar 18, 2025

View reviewed changes

toph-allen merged commit 50668cf into main Mar 18, 2025
2 checks passed

cgraham-rs deleted the toph/who-is-visiting-this-content-table-edition branch June 26, 2025 14:57

who is visiting this content (table view) #38

who is visiting this content (table view) #38

Uh oh!

Conversation

toph-allen commented Mar 13, 2025

Uh oh!

jonkeane commented Mar 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toph-allen Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

marcosnav left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toph-allen commented Mar 18, 2025

Uh oh!

toph-allen commented Mar 18, 2025

Uh oh!

marcosnav left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

toph-allen Mar 18, 2025 •

edited

Loading