Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions extensions/who-is-visiting-this-content/.Rprofile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
source("renv/activate.R")
2 changes: 2 additions & 0 deletions extensions/who-is-visiting-this-content/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.posit/
app_cache/
156 changes: 156 additions & 0 deletions extensions/who-is-visiting-this-content/app.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
library(shiny)
library(bslib)
library(shinyjs)
library(connectapi)
library(dplyr)
library(glue)
library(lubridate)
library(tidyr)

shinyOptions(
cache = cachem::cache_disk("./app_cache/cache/", max_age = 60 * 60 * 8)
)
Comment on lines +10 to +12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, requests for this dashboard respond pretty fast, do you think managing cache is still needed?

It'll be nice if this one can be simplified by removing cache operations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one will eventually be a sub-view of the other dashboard, and at that point it'll just probably get its data passed in from that dashboard's cache.

For now, since it needs to load the firehouse, and there is some time taken e.g. on Dogfood for processing the ~500000 records returned, caching is helpful for demo purposes.

That's from memory -- I can definitely try removing the cache to see if it performs sufficiently well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It takes 30 seconds to process the firehose endpoint in connectapi. There is an issue I opened to see if that performance cost can be reduced, but for now, caching after that processing step makes the app responsive faster after loading.


source("get_usage.R")

ui <- page_fillable(
useShinyjs(),
theme = bs_theme(version = 5),

card(
card_header("Who is Visiting This Content?"),
layout_sidebar(
sidebar = sidebar(
title = "No Filters Yet",
open = FALSE,

actionButton("clear_cache", "Clear Cache", icon = icon("refresh"))
),

textInput(
"content_guid",
"Content GUID"
),

h4(
id = "guid_input_msg",
"Please enter a content GUID"
),

textOutput("summary_message"),

tabsetPanel(
id = "content_visit_tables",
tabPanel(
"List of Visits",
tableOutput("all_visits")
),
tabPanel(
"Aggregated Visits",
tableOutput("aggregated_visits")
)
)
)
)
)

server <- function(input, output, session) {
# Cache invalidation button ----
cache <- cachem::cache_disk("./app_cache/cache/")
observeEvent(input$clear_cache, {
print("Cache cleared!")
cache$reset() # Clears all cached data
session$reload() # Reload the app to ensure fresh data
})

observe({
if (nchar(input$content_guid) == 0) {
show("guid_input_msg")
hide("content_visit_tables")
} else {
hide("guid_input_msg")
show("content_visit_tables")
}
})
Comment on lines +66 to +74
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would go here, but what do you think about adding in a parser that lets someone paste in a full URL that includes the GUID? This is a bit of an extension / scope creep so if we want to make a separate issue for it and do that in a follow on that's totally fine. But the UX of needing to select just the GUID form the URL (or get to it in the settings panel) could be better. We had good success with the publisher where the field that accepts GUID also accepts the URL (and, in fact, in the publisher it's the GUID alone that is the hidden easter egg that it just works, we though that folks would be more confident / able to / comfortable with grabbing the URL)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to keep this as minimal as possible, figuring it wouldn't be the final UI for selecting a piece of content.

Perhaps it would be fine to add a bit of extra logic to extract the GUID if a URL is pasted in. OTOH, perhaps even having this hide / show logic and a "paste in a content guid" message is too much code to pour into something that won't be a final UI. I can really see both arguments. I'd rather not put more effort in here till a later edition, at the very least.


# Loading and processing data ----
client <- connect()

# Default dates. "This week" is best "common sense" best represented by six
# days ago thru the end of today. Without these, content takes too long to
# display on some servers.
date_range <- reactive({
list(
from_date = today() - ddays(6),
to_date = today()
)
})
Comment on lines +79 to +87
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great default + kick the can of "which is the best period? Should I do forever? etc" for when we get to the filtering edition later 💯


content <- reactive({
# Grab the entire content data frame here and filter it using the pasted-in
# GUID to obtain content title and other metadata, rather than making a
# request to `v1/content/{GUID}`. If this were a prod, standalone dashboard,
# might be better to call that endpoint.
get_content(client)
}) |> bindCache("static_key")

user_names <- reactive({
get_users(client) |>
mutate(full_name = paste(first_name, last_name)) |>
select(user_guid = guid, full_name, username)
}) |> bindCache("static_key")

usage_data <- reactive({
get_usage(
client,
from = date_range()$from_date,
to = date_range()$to_date + hours(23) + minutes(59) + seconds(59)
)
}) |> bindCache(date_range()$from_date, date_range()$to_date)

# Compute data
all_visits_data <- reactive({
usage_data() |>
filter(content_guid == input$content_guid) |>
left_join(user_names(), by = "user_guid") |>
replace_na(list(full_name = "[Anonymous]")) |>
arrange(desc(timestamp)) |>
select(timestamp, full_name, username)
}) |> bindCache(date_range()$from_date, date_range()$to_date, input$content_guid)

aggregated_visits_data <- reactive({
usage_data() |>
filter(content_guid == input$content_guid) |>
group_by(user_guid) |>
summarize(n_visits = n()) |>
left_join(user_names(), by = "user_guid") |>
replace_na(list(full_name = "[Anonymous]")) |>
arrange(desc(n_visits)) |>
select(n_visits, full_name, username)
}) |> bindCache(date_range()$from_date, date_range()$to_date, input$content_guid)
Comment on lines +121 to +130
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might very well be another issue/PR later, since it wasn't in the original and is conceptually a bit of an addition, but:

If I'm remembering / understanding correctly items that have multiple URL routes and that folks navigate through will have a bunch of lines in the firehosue endpoint that are really one session of visits (to use the correct word to describe it, but we have overloaded that slightly with our old API endpoint! I'm meaning the abstract / general use not specifically what comes from our shiny sessions endpoint). What do you think about adding in a collapse here too? Something like "entries within N minutes of each other are considered one visit"? We definitely wouldn't want only one measure or the other — both are valuable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is definitely a priority!

Following our discussion What do you think about separating this investigation out as its own issue? Kind of a spike on computing the right metric? It'll involve some research and experimentation, and possibly discussion with @marcosnav.

Copy link
Collaborator Author

@toph-allen toph-allen Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created https://github.com/posit-dev/connect/issues/30515 about figuring out hits vs sessions


summary_message <- reactive({
content_title <- content() |>
filter(guid == input$content_guid) |>
pull(title)
hits <- all_visits_data()
glue(
"Content '{content_title}' had {nrow(hits)} between ",
"{min(hits$timestamp)} and {max(hits$timestamp)}."
)
})


output$summary_message <- renderText(summary_message())
output$all_visits <- renderTable(
all_visits_data() |>
transmute(timestamp = format(timestamp, "%Y-%m-%d %H:%M:%S"), full_name, username) |>
rename("Time" = timestamp, "Full Name" = full_name, "Username" = username)
)
output$aggregated_visits <- renderTable(
aggregated_visits_data() |>
rename("Total Visits" = n_visits, "Full Name" = full_name, "Username" = username)
)
}

shinyApp(ui, server)
60 changes: 60 additions & 0 deletions extensions/who-is-visiting-this-content/get_usage.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
library(connectapi)

# This file contains functions that ultimately will more likely be part of
# connectapi. As such, I'm not using dplyr or pipes here.

NA_datetime_ <- vctrs::new_datetime(NA_real_, tzone = "UTC")
NA_list_ <- list(list())

usage_dtype <- tibble::tibble(
"id" = NA_integer_,
"user_guid" = NA_character_,
"content_guid" = NA_character_,
"timestamp" = NA_datetime_,
"data" = NA_list_
)

# A rough implementation of how a new firehose usage function would work in
# `connectapi`.
get_usage_firehose <- function(client, from = NULL, to = NULL) {
usage_raw <- client$GET(
connectapi:::unversioned_url("instrumentation", "content", "hits"),
query = list(
from = from,
to = to
)
)

# FIXME for connectapi: This is slow, it's where most of the slowness is with
# the new endpoint.
usage_parsed <- connectapi:::parse_connectapi_typed(usage_raw, usage_dtype)
Comment on lines +28 to +30
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we already have an issue in connectapi for this? Also if it's getting in our way (by being slow) here, do we even need to enforce typing like this here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is now an issue to track it: posit-dev/connectapi#383 :)

It's not horrendously slow, but it is definitely at least worth investigating as a connectapi issue.

And as for typing enforcement — I guess that depends on what you mean by "need". It's nice: parse_connectapi_typed() "automatically" handles deserializing the JSON from Connect into the correct R types and structure, abstracting away a lot of stuff that's fiddly in R, and there's value in that.

Another way I'm thinking about it is… if I just coded from scratch all the things parse_connectapi_typed does to shape the data into the shape I want, maybe be just as slow. And if that were the case, then I'd want to keep it, because it makes life a lot simpler in a few ways.

However, if, say, the base parse_connectapi() function, which doesn't do any type enforcement, was much faster, than it'd probably be better to use that and handle type conversion in a different way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't quite tell from this part of your comment;

However, if, say, the base parse_connectapi() function, which doesn't do any type enforcement, was much faster, than it'd probably be better to use that and handle type conversion in a different way.

Is parse_conenctapi() much faster in this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure when I wrote the comment (I wanted to avoid opening cans of worms while I was responding to your comments).

It isn't much faster:

> length(usage_raw)
[1] 485794
> system.time(connectapi:::parse_connectapi_typed(usage_raw, usage_dtype))
   user  system elapsed 
 28.245   1.488  29.798 
> system.time(connectapi:::parse_connectapi(usage_raw))
   user  system elapsed 
 17.602   0.330  18.128

I'm also going to post this info to the connectapi issue so that it doesn't get lost.


usage_parsed[c("user_guid", "content_guid", "timestamp")]
}

get_usage_legacy <- function(client, from = NULL, to = NULL) {
shiny_usage <- get_usage_shiny(client, limit = Inf, from = from, to = to)
shiny_usage_cols <- shiny_usage[c("user_guid", "content_guid")]
shiny_usage_cols$timestamp <- shiny_usage$started

static_usage <- get_usage_static(client, limit = Inf, from = from, to = to)
static_usage_cols <- static_usage[c("user_guid", "content_guid")]
static_usage_cols$timestamp <- static_usage$time

bind_rows(shiny_usage_cols, static_usage_cols)
}

get_usage <- function(client, from = NULL, to = NULL) {
from <- format(from, "%Y-%m-%dT%H:%M:%SZ")
to <- format(to, "%Y-%m-%dT%H:%M:%SZ")
tryCatch(
{
print("Trying firehose usage endpoint.")
get_usage_firehose(client, from, to)
},
error = function(e) {
print("Could not use firehose endpoint; trying legacy usage endpoints.")
get_usage_legacy(client, from, to)
}
)
}
Loading