This guide will help you create and manage your first HyperPod cluster using the CLI.
Before you begin, ensure you have:
- An AWS account with appropriate permissions for SageMaker HyperPod
- AWS CLI configured with your credentials
- HyperPod CLI installed (
pip install sagemaker-hyperpod)
Note
Region Configuration: For commands that accept the --region option, if no region is explicitly provided, the command will use the default region from your AWS credentials configuration.
Cluster stack names must be unique within each AWS region. If you attempt to create a cluster stack with a name that already exists in the same region, the deployment will fail.
It's recommended to start with a new and clean directory for each cluster configuration:
mkdir my-hyperpod-cluster
cd my-hyperpod-cluster.. tab-set::
.. tab-item:: CLI
.. code-block:: bash
hyp init cluster-stack
This creates three files:
config.yaml: The main configuration file you'll use to customize your clustercfn_params.jinja: A reference template for CloudFormation parametersREADME.md: Usage guide with instructions and examples
Important
The resource_name_prefix parameter in the generated config.yaml file serves as the primary identifier for all AWS resources created during deployment. Each deployment must use a unique resource name prefix to avoid conflicts. This prefix is automatically appended with a unique identifier during cluster creation to ensure resource uniqueness.
You can configure your cluster in two ways:
Option 1: Edit config.yaml directly
The config.yaml file contains key parameters like:
template: cluster-stack
namespace: kube-system
stage: gamma
resource_name_prefix: sagemaker-hyperpod-eksOption 2: Use CLI/SDK commands (Pre-Deployment)
.. tab-set::
.. tab-item:: CLI
.. code-block:: bash
hyp configure --resource-name-prefix your-resource-prefix
Note
The hyp configure command only modifies local configuration files. It does not affect existing deployed clusters.
Warning
Cluster Stack Name Uniqueness: Cluster stack names must be unique within each AWS region. Ensure your resource_name_prefix in config.yaml generates a unique stack name for the target region to avoid deployment conflicts.
.. tab-set::
.. tab-item:: CLI
.. code-block:: bash
hyp create --region your-region
This will:
- Validate your configuration
- Create a timestamped folder in the
rundirectory - Initialize the cluster creation process
Check the status of your cluster:
.. tab-set::
.. tab-item:: CLI
.. code-block:: bash
hyp describe cluster-stack your-cluster-name --region your-region
.. tab-item:: SDK
.. code-block:: python
from sagemaker.hyperpod.cluster_management.hp_cluster_stack import HpClusterStack
# Describe a specific cluster stack
response = HpClusterStack.describe("your-cluster-name", region="your-region")
print(f"Stack Status: {response['Stacks'][0]['StackStatus']}")
print(f"Stack Name: {response['Stacks'][0]['StackName']}")
Note
Region-Specific Stack Names: Cluster stack names are unique within each AWS region. When describing a stack, ensure you specify the correct region where the stack was created, or the command will fail to find the stack.
List all clusters:
.. tab-set::
.. tab-item:: CLI
.. code-block:: bash
hyp list cluster-stack --region your-region
.. tab-item:: SDK
.. code-block:: python
from sagemaker.hyperpod.cluster_management.hp_cluster_stack import HpClusterStack
# List all CloudFormation stacks (including cluster stacks)
stacks = HpClusterStack.list(region="your-region")
for stack in stacks['StackSummaries']:
print(f"Stack: {stack['StackName']}, Status: {stack['StackStatus']}")
Important
Runtime vs Configuration Commands:
hyp update clustermodifies existing, deployed clusters (runtime settings like instance groups, node recovery)hyp configuremodifies localconfig.yamlfiles before cluster creation
Use the appropriate command based on whether your cluster is already deployed or not.
.. tab-set::
.. tab-item:: CLI
.. code-block:: bash
hyp update cluster \
--cluster-name your-cluster-name \
--instance-groups "[]" \
--region your-region
.. tab-set::
.. tab-item:: CLI
.. code-block:: bash
hyp reset
Always validate your configuration before submission:
.. tab-set:: .. tab-item:: CLI .. code-block:: bash hyp validateNote
This command performs syntactic validation only of the
config.yamlfile against the appropriate schema. It checks:- YAML syntax: Ensures file is valid YAML
- Required fields: Verifies all mandatory fields are present
- Data types: Confirms field values match expected types (string, number, boolean, array)
- Schema structure: Validates against the template's defined structure
This command performs syntactic validation only and does not verify the actual validity of values (e.g., whether AWS regions exist, instance types are available, or resources can be created).
Use meaningful resource prefixes to easily identify your clusters
Monitor cluster status regularly after creation
Keep your configuration files in version control for reproducibility
After creating your cluster, you can:
Connect to your cluster:
.. tab-set:: .. tab-item:: CLI .. code-block:: bash hyp set-cluster-context --cluster-name your-cluster-nameStart training jobs with PyTorch
Deploy inference endpoints
Monitor cluster resources and performance
For more detailed information on specific commands, use the --help flag:
hyp <command> --help