Skip to content

Conversation

@charliepark
Copy link
Contributor

@charliepark charliepark commented Oct 15, 2025

This PR implements silo-level networking restrictions that allow administrators to limit networking operations to Silo Admins only. When a silo's restrict_network_actions value is set to TRUE, only users with Silo Admin privileges can create, modify, or delete networking resources (VPCs, subnets, routers, routes, internet gateways, firewall rules, and IP pool address attachments). read and list operations remain available to all users with appropriate project-level permissions. If the restrict_network_actions value on the silo is false, then normal rules apply, and project collaborators would be able to configure networking resources.

So how does it work? A little background will help …

Authorization in Omicron uses the Polar policy engine. Apart from the create method for VPCs — which we'll get into — this does the same. See the end of omicron.polar for these new rules. This required a new Polar snippet for networking resources (look for PolarSnippet::InProjectNetworking). In Polar, permissions stack, and if any of the snippets grant permission to an actor, that actor can execute the requested action. That means we needed to not use the existing (and permissive) InProject snippet, and instead build up a snippet (InProjectNetworking) that assumes no permissions unless explicitly granted.

This pattern is applied to:

  • VPCs: modify, delete
  • VPC Subnets: create, modify, delete
  • VPC Routers: create, modify, delete
  • Router Routes: create, modify, delete
  • Internet Gateways: create, delete
  • IGW IP Pools: attach (create_child), detach (delete)
  • IGW IP Addresses: attach (create_child), detach (delete)
  • VPC Firewall Rules: modify

[Note as of 6pm Pacific on 2025-10-23:]
At the moment, Network Interfaces are not create-able, as the Polar rules restrict the create_child method on VPC Subnets. Angela and Charlie are talking about whether NICs and Subnets should be CRUD-able or not.

The curious case of VPC Create

There's a problem with the Polar implementation, though. Our existing Polar rules ask a question before granting create permissions to a user: "Does this actor have create_child permissions at the parent layer?" Because VPCs are children of projects, the Polar question there is simply saying "can this user (let's assume they're a project collaborator) create 'children' of projects?" "Children" here would be … disks, images, and … VPCs. There's no way to specify in the Polar check that we're asking about creating a specific kind of resource — namely a VPC. So! For the special case of VPCs, we have an app-level Rust check — self.check_networking_restrictions(opctx).await?; to determine if the user has the appropriate permissions, based on the silo's restrict_networking_actions value.

But wait! There's more!

What's the deal with sagas?

When a project is created in Omicron, we have a saga that automatically creates the networking resources that are children of that project. This is fine in a world where all project collaborators have create permissions on networking resources, but can cause problems in the world we're moving into — if a user isn't supposed to have the permission to create networking resources, it'd be a convenient end-run to allow them to show up after project creation.

So … the saga now accepts a create_default_vpc parameter that controls whether the VPC subsaga runs or not. Before creating the saga, the app layer (nexus/src/app/project.rs) determines this value, based on the silo's restrict_network_actions value:

  • restrict_network_actions is FALSE silos: Always create default VPC (the default / current situation)
  • restrict_network_actions is TRUE silos, Silo Admin: Create default VPC
  • restrict_network_actions is TRUE silos, non-admin: Skip default VPC creation entirely

The saga builder (nexus/src/app/sagas/project_create.rs) conditionally constructs the DAG, omitting the VPC subsaga and all related nodes for non-admins in restricted silos. Since the VPC subsaga creates all child resources (routers, routes, subnets, internet gateway, firewall rules), this single conditional prevents creation of the entire networking stack.

Database changes

  • Added restrict_network_actions boolean column to silo table (defaults to false)
  • Schema version bumped from 201 to 202
  • Migration: schema/crdb/restrict-network-actions/up.sql

API changes

  • Added optional restrict_network_actions field to SiloCreate params
  • Field appears in Silo views (always present, defaults to false)
  • OpenAPI spec updated accordingly

Testing
For the individual resources (test_vpc_networking_restrictions), the basic pattern here is "set up a silo with the restrict_network_actions value set to true. Then have a user with admin permissions create and update a networking resource, to verify that that's possible. Downgrade them to a project collaborator. Attempt to create the resource (should fail), attempt to modify the admin-created resource (should fail), attempt to delete the admin-created resource (should fail), upgrade to an admin again, and delete the resource (should succeed).

There are also some permission matrices run on both a restricted silo (test_vpc_networking_permissions_restricted) and an unrestricted silo (test_vpc_networking_permissions_unrestricted).

Pertains to https://github.com/oxidecomputer/customer-support/issues/416

@charliepark charliepark marked this pull request as ready for review October 23, 2025 22:08
Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I have some suggestions here, some of which are a bit deeper. Mentioning here for the record that we also discussed in chat an approach that used a new role rather than a custom silo property. Let me know if you want to talk through any of this.

Comment on lines +293 to +294
/// When true, restricts networking actions to Silo Admins only
restrict_network_actions: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest making this an enum with two explicit values, like:

enum ProjectRolesConfigureNetworking {
    Allowed,
    Disallowed,
}

Comment on lines +457 to +459
# NOTE: No permission rules defined here!
# All permissions controlled by custom networking restriction
# rules in omicron.polar (can_modify_networking_resource)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why you can't put the stuff that's copy/pasted for each networking resource in omicron.polar in here instead.

admin_group_name: None,
tls_certificates: vec![],
mapped_fleet_roles: Default::default(),
restrict_network_actions: None, // Default: no restrictions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worthwhile to modify this test to run all the silo resources (or at least all the networking ones) twice: once with this set and once without. You could probably skip some of the hand-written tests, it should automatically catch everything, and then we'll have the current behavior documented in the test output file.

ALTER TABLE omicron.public.silo
ADD COLUMN IF NOT EXISTS restrict_network_actions BOOL
NOT NULL
DEFAULT FALSE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +63 to +64
// Only create default VPC if allowed (i.e., networking is not restricted
// or the actor is a Silo Admin)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Only create default VPC if allowed (i.e., networking is not restricted
// or the actor is a Silo Admin)
// Only create default VPC if requested.

I'd suggest that the question of whether this is set is a policy choice by the caller and this code doesn't need to care about when or why it's set.

.internal_context("creating a Project")?;
opctx.authorize(authz::Action::CreateChild, &authz_silo).await?;

// Determine if we should create a default VPC.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is potentially fine but I find it pretty confusing when code behaves differently depending on what permissions you have (aside from failing a permission check, I mean) so I'm just thinking out loud if there's a cleaner way to do this. Some ideas:

  • Do this all the time if networking is not restricted to silo admins. Never do this if networking is restricted to silo admins. In this world, the "restrict networking config to silo admins" is more like "silos have two modes: one where networking is configured separately from projects and only silo admins can do it, and one where networking is fully configurable within a project and project admins can do it too".
  • Make this an option in the API and either:
    • put this logic in the console and/or CLI so it behaves the same way. This sounds the same but I think it's a big difference if the API's behavior is deterministic given its input vs. using other system state to determine what to do.
    • make the user choose this (i.e., have a checkbox)

I admit I might be overthinking this. With the behavior as coded, I guess the story is: "this creates a project and, if you have permission to do so, creates the VPC too." I've certainly seen tools that do that, but I usually assume it's a bug that they didn't fail or warn me if they couldn't do some of the stuff they normally do.

Comment on lines +59 to +74
if let Some(policy) = opctx.authn.silo_authn_policy() {
if policy.restrict_network_actions() {
// Networking is restricted - only create VPC if user is Silo Admin
// (i.e., has Modify permission on the Silo)
opctx
.authorize(authz::Action::Modify, &authz_silo)
.await
.is_ok()
} else {
// No networking restrictions, create VPC
true
}
} else {
// No policy, create VPC
true
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this is all duplicating policy logic that should exist in the Polar file, though it's admittedly tricky to figure out which resource and action we'd want to authorize here. In an ideal world I feel like this would be something like:

let create_default_vpc = opctx.authorize(authz::Action::Create, authz_project.vpc_list()).await.is_ok();

That's not quite right -- you don't want to ignore all errors here, only authz errors.

And vpc_list() doesn't exist -- I was imagining we created a per-project synthetic resource which was "this project's list of VPCs" (or "this project's networking config") and that this function returned that.

Speaking of which, I wonder if that approach would simplify a lot of this? If there were a synthetic resource within the project that referred to the networking config, that's where you could put the special Polar rules, and all the other networking resources would have a simple snippet that's like "you have this permission if you have it on the parent project's networking config".

///
/// Returns Err if the silo restricts networking and the actor is not
/// a Silo Admin.
pub(crate) async fn check_networking_restrictions(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also feels like it should be expressible as a call to authorize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants