Skip to content

[Bug]: Incorrect GPU name displayed in Hotaisle VM Fleet Resources #2900

@Bihan

Description

@Bihan

Steps to reproduce

  1. Create Hotaisle Fleet with following config
type: fleet
name: hotaisle-fleet-vm

placement: any

ssh_config:
  user: hotaisle
  identity_file: ~/.ssh/id_rsa
  

  hosts:
    - 23.183.40.86
  1. Once the fleet is created check the fleet with below command:
    dstack fleet

Actual behaviour

FLEET              INSTANCE  BACKEND       RESOURCES                                                  PRICE  STATUS  CREATED     
 hotaisle-fleet-vm  0         ssh (remote)  cpu=13 mem=220GB disk=11451GB AMD Radeon Graphics:192GB:1  $0     idle    28 mins ago 

Expected behaviour

FLEET              INSTANCE  BACKEND       RESOURCES                                                  PRICE  STATUS  CREATED     
 hotaisle-fleet-vm  0         ssh (remote)  cpu=13 mem=220GB disk=11451GB MI300X:192GB:1  $0     idle    28 mins ago 

dstack version

0.19.18

Server logs

...
...
[18:09:01] DEBUG    dstack._internal.server.background.tasks.process_instances:450 Received a host_info {'gpu_vendor': 'amd',          
                    'gpu_name': 'AMD Radeon Graphics', 'gpu_memory': 196288, 'gpu_count': 1, 'addresses': ['23.183.40.86/26',          
                    'fe80::5054:ff:fe54:bfcd/64', '172.17.0.1/16', 'fe80::1c6c:22ff:fee5:2656/64'], 'disk_size': 12295348760576,       
                    'cpus': 13, 'memory': 236435066880}                                                                                
[18:09:02] INFO     dstack._internal.server.background.tasks.process_instances:286 The instance hotaisle-fleet-vm-0 (23.183.40.86) was 
                    successfully added

Additional information

When I run rocm-smi --showproductname in the VM it shows Mi300x

============================ ROCm System Management Interface ============================
====================================== Product Info ======================================
GPU[0]		: Card Series: 		AMD Instinct MI300X VF
GPU[0]		: Card Model: 		0x74b5
GPU[0]		: Card Vendor: 		Advanced Micro Devices, Inc. [AMD/ATI]
GPU[0]		: Card SKU: 		M3000100
GPU[0]		: Subsystem ID: 	0x74a1
GPU[0]		: Device Rev: 		0x00
GPU[0]		: Node ID: 		1
GPU[0]		: GUID: 		22361
GPU[0]		: GFX Version: 		gfx942
==========================================================================================
================================== End of ROCm SMI Log ===================================

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions