Skip to content

Conversation

@shweta487
Copy link
Contributor

  • Queried ACS data via the Census API and uploaded results to a GCS bucket for later use.
  • Processed Census Tract geometry.
  • Queried organization data from the data warehouse and stored it in GCS.
  • Queried bridge organization GTFS datasets and merged them with the dimension organizations table.
  • Loaded transit stop data and merged it with organization information.
  • Conducted spatial analysis: stop buffers and census tract intersections.
  • Adjusted population and demographic metrics for stop service areas.

@shweta487
Copy link
Contributor Author

#1696

@github-actions
Copy link

github-actions bot commented Nov 3, 2025

nbviewer URLs for impacted notebooks:

@shweta487 shweta487 requested a review from hhmckay November 4, 2025 16:05
Copy link
Contributor

@hhmckay hhmckay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Apologies for not commenting on specific lines, NB viewer sometimes lets me do that but didn't work here. Let me know if you have any questions and I can provide more clarity. A few overall comments:

  • Where do the definitions for the income categories [extremely low, very low, and low income] come from? If there is a specific statutory definition here, all good. If not, I'd consider the AB-1550 definition. It's somewhat tricky to operationalize but is county specific and more nuanced, especially in higher cost of living areas. I can share R code that I've used for this purpose.
  • Do Census tract geometries include water areas? It may be more accurate to use tract geometries that exclude water, but this is only a "nice to have" and shouldn't skew results all that much.
  • How did you arrive at the 500 meter buffer? I think that's probably fine but let's maybe use a more standardized transit stop catchment area. Maybe use half a mile (804.672 meters).
  • Since we are calculating these metrics at the agency-level, we will want to add an additional spatial operation to avoid double counting. After buffering the stops, but before intersecting with tracts, dissolve the buffered stops by agency so that each agency has one feature. It is okay if there is overlap between agencies, but there shouldn't be overlap between individual stops of the same agency (hence the need for a dissolve). I think dissolve() in geopandas should achieve this, but there may be other ways as well.
  • There's a simpler way to achieve the desired outcomes from cell 51. I'd modify cell 49 to just calculate the ratio between the adjusted area and original area. Then in cell 51, you can just apply it to all the population figures as you currently do, without having to recalculate the weights using the population data.
  • Once each adjusted population figure has been calculated, you need to aggregate (sum) by agency. You could also sum by route, or whatever the desired level of aggregation is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants