oci_datascience_model_deployment
This resource provides the Model Deployment resource in Oracle Cloud Infrastructure Data Science service. Api doc link for the resource: https://docs.oracle.com/iaas/api/#/en/data-science/latest/ModelDeployment
Example terraform configs related to the resource : https://github.com/oracle/terraform-provider-oci/tree/master/examples/datascience
Creates a new model deployment.
Example Usage
resource "oci_datascience_model_deployment" "test_model_deployment" {
#Required
compartment_id = var.compartment_id
model_deployment_configuration_details {
#Required
deployment_type = var.model_deployment_model_deployment_configuration_details_deployment_type
#Optional
environment_configuration_details {
#Required
environment_configuration_type = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_environment_configuration_type
#Optional
cmd = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_cmd
custom_http_endpoints {
#Optional
endpoint_uri_suffix = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_custom_http_endpoints_endpoint_uri_suffix
http_methods = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_custom_http_endpoints_http_methods
}
default_environment_variables = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_default_environment_variables
entrypoint = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_entrypoint
environment_variables = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_environment_variables
health_check_port = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_health_check_port
image = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_image
image_digest = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_image_digest
predict_api_specification = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_predict_api_specification
image_signature_id = oci_datascience_image_signature.test_image_signature.id
server_port = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_server_port
}
infrastructure_configuration_details {
model_configuration_details {
#Required
instance_configuration {
#Required
instance_shape_name = oci_core_shape.test_shape.name
#Optional
model_deployment_instance_shape_config_details {
#Optional
cpu_baseline = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_instance_configuration_model_deployment_instance_shape_config_details_cpu_baseline
memory_in_gbs = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_instance_configuration_model_deployment_instance_shape_config_details_memory_in_gbs
ocpus = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_instance_configuration_model_deployment_instance_shape_config_details_ocpus
}
network_access_type = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_instance_configuration_network_access_type
private_endpoint_id = oci_dataflow_private_endpoint.test_private_endpoint.id
subnet_id = oci_core_subnet.test_subnet.id
}
#Optional
bandwidth_mbps = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_bandwidth_mbps
maximum_bandwidth_mbps = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_maximum_bandwidth_mbps
scaling_policy {
#Required
policy_type = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_policy_type
#Optional
auto_scaling_policies {
#Required
auto_scaling_policy_type = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_auto_scaling_policy_type
initial_instance_count = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_initial_instance_count
maximum_instance_count = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_maximum_instance_count
minimum_instance_count = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_minimum_instance_count
rules {
#Required
metric_expression_rule_type = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_rules_metric_expression_rule_type
scale_in_configuration {
#Optional
instance_count_adjustment = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_in_configuration_instance_count_adjustment
pending_duration = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_in_configuration_pending_duration
query = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_in_configuration_query
scaling_configuration_type = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_in_configuration_scaling_configuration_type
threshold = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_in_configuration_threshold
}
scale_out_configuration {
#Optional
instance_count_adjustment = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_out_configuration_instance_count_adjustment
pending_duration = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_out_configuration_pending_duration
query = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_out_configuration_query
scaling_configuration_type = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_out_configuration_scaling_configuration_type
threshold = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_out_configuration_threshold
}
#Optional
metric_type = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_rules_metric_type
}
}
cool_down_in_seconds = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_cool_down_in_seconds
instance_count = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_instance_count
is_enabled = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_is_enabled
}
}
model_configuration_details {
#Optional
bandwidth_mbps = var.model_deployment_model_deployment_configuration_details_model_configuration_details_bandwidth_mbps
instance_configuration {
#Optional
instance_shape_name = oci_core_shape.test_shape.name
model_deployment_instance_shape_config_details {
#Optional
cpu_baseline = var.model_deployment_model_deployment_configuration_details_model_configuration_details_instance_configuration_model_deployment_instance_shape_config_details_cpu_baseline
memory_in_gbs = var.model_deployment_model_deployment_configuration_details_model_configuration_details_instance_configuration_model_deployment_instance_shape_config_details_memory_in_gbs
ocpus = var.model_deployment_model_deployment_configuration_details_model_configuration_details_instance_configuration_model_deployment_instance_shape_config_details_ocpus
}
network_access_type = var.model_deployment_model_deployment_configuration_details_model_configuration_details_instance_configuration_network_access_type
private_endpoint_id = oci_dataflow_private_endpoint.test_private_endpoint.id
subnet_id = oci_core_subnet.test_subnet.id
}
model_id = oci_datascience_model.test_model.id
#Optional
bandwidth_mbps = var.model_deployment_model_deployment_configuration_details_model_configuration_details_bandwidth_mbps
maximum_bandwidth_mbps = var.model_deployment_model_deployment_configuration_details_model_configuration_details_maximum_bandwidth_mbps
scaling_policy {
#Required
policy_type = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_policy_type
#Optional
auto_scaling_policies {
#Required
auto_scaling_policy_type = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_auto_scaling_policies_auto_scaling_policy_type
initial_instance_count = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_auto_scaling_policies_initial_instance_count
maximum_instance_count = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_auto_scaling_policies_maximum_instance_count
minimum_instance_count = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_auto_scaling_policies_minimum_instance_count
rules {
#Required
metric_expression_rule_type = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_auto_scaling_policies_rules_metric_expression_rule_type
scale_in_configuration {
#Optional
instance_count_adjustment = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_in_configuration_instance_count_adjustment
pending_duration = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_in_configuration_pending_duration
query = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_in_configuration_query
scaling_configuration_type = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_in_configuration_scaling_configuration_type
threshold = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_in_configuration_threshold
}
scale_out_configuration {
#Optional
instance_count_adjustment = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_out_configuration_instance_count_adjustment
pending_duration = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_out_configuration_pending_duration
query = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_out_configuration_query
scaling_configuration_type = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_out_configuration_scaling_configuration_type
threshold = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_out_configuration_threshold
}
#Optional
metric_type = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_auto_scaling_policies_rules_metric_type
}
}
cool_down_in_seconds = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_cool_down_in_seconds
instance_count = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_instance_count
is_enabled = var.model_deployment_model_deployment_configuration_details_model_configuration_details_scaling_policy_is_enabled
}
}
#Optional
environment_configuration_details {
#Required
environment_configuration_type = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_environment_configuration_type
#Optional
cmd = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_cmd
entrypoint = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_entrypoint
environment_variables = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_environment_variables
health_check_port = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_health_check_port
image = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_image
image_digest = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_image_digest
server_port = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_server_port
}
}
project_id = oci_datascience_project.test_project.id
#Optional
category_log_details {
#Optional
access {
#Required
log_group_id = oci_logging_log_group.test_log_group.id
log_id = oci_logging_log.test_log.id
}
predict {
#Required
log_group_id = oci_logging_log_group.test_log_group.id
log_id = oci_logging_log.test_log.id
}
}
defined_tags = {"Operations.CostCenter"= "42"}
description = var.model_deployment_description
display_name = var.model_deployment_display_name
freeform_tags = {"Department"= "Finance"}
opc_parent_rpt_url = var.model_deployment_opc_parent_rpt_url
}
Example terraform config for deployment_type = "SINGLE_MODEL_FLEX" with infrastructure_type = "MANAGED_COMPUTE_CLUSTER" and autoscaling:
resource "oci_datascience_model_deployment" "test_single_model_flex_model_deployment" {
#Required
compartment_id = var.compartment_id
model_deployment_configuration_details {
#Required
deployment_type = "SINGLE_MODEL_FLEX"
#Optional
environment_configuration_details {
#Required
environment_configuration_type = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_environment_configuration_type
#Optional
cmd = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_cmd
default_environment_variables = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_default_environment_variables
entrypoint = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_entrypoint
environment_variables = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_environment_variables
health_check_port = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_health_check_port
image = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_image
image_digest = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_image_digest
image_signature_id = oci_datascience_image_signature.test_image_signature.id
server_port = var.model_deployment_model_deployment_configuration_details_environment_configuration_details_server_port
}
infrastructure_configuration_details {
#Required
compute_target_id = oci_datascience_compute_target.test_compute_target.id
infrastructure_type = "MANAGED_COMPUTE_CLUSTER"
#Optional
model_deployment_resource_configuration {
#Required
resource_request_configuration {
#Required
memory_in_gbs = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_model_deployment_resource_configuration_resource_request_configuration_memory_in_gbs
ocpus = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_model_deployment_resource_configuration_resource_request_configuration_ocpus
#Optional
gpus = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_model_deployment_resource_configuration_resource_request_configuration_gpus
}
#Optional
resource_limit_configuration {
#Required
memory_in_gbs = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_model_deployment_resource_configuration_resource_limit_configuration_memory_in_gbs
ocpus = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_model_deployment_resource_configuration_resource_limit_configuration_ocpus
}
}
scaling_policy {
#Required
policy_type = "AUTOSCALING"
#Optional
auto_scaling_policies {
#Required
auto_scaling_policy_type = "THRESHOLD"
initial_instance_count = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_initial_instance_count
maximum_instance_count = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_maximum_instance_count
minimum_instance_count = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_minimum_instance_count
rules {
#Required
metric_expression_rule_type = "TARGET_PREDEFINED_EXPRESSION"
metric_type = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_rules_metric_type
scale_configuration {
#Required
target_scaling_configuration_type = "THRESHOLD"
threshold = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_configuration_threshold
}
}
rules {
#Required
metric_expression_rule_type = "TARGET_CUSTOM_EXPRESSION"
scale_configuration {
#Required
query = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_configuration_query
target_scaling_configuration_type = "QUERY"
threshold = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_configuration_threshold
#Optional
metric_namespace = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_rules_scale_configuration_metric_namespace
}
}
scale_in_policy {
#Optional
cool_down_in_seconds = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_scale_in_policy_cool_down_in_seconds
instance_count_adjustment = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_scale_in_policy_instance_count_adjustment
pending_duration = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_scale_in_policy_pending_duration
}
scale_out_policy {
#Optional
cool_down_in_seconds = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_scale_out_policy_cool_down_in_seconds
instance_count_adjustment = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_scale_out_policy_instance_count_adjustment
pending_duration = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_auto_scaling_policies_scale_out_policy_pending_duration
}
}
is_enabled = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_is_enabled
}
}
model_configuration_details {
#Required
model_id = oci_datascience_model.test_model.id
}
}
#Required
project_id = oci_datascience_project.test_project.id
#Optional
category_log_details {
#Optional
access {
#Required
log_group_id = oci_logging_log_group.test_log_group.id
log_id = oci_logging_log.test_log.id
}
predict {
#Required
log_group_id = oci_logging_log_group.test_log_group.id
log_id = oci_logging_log.test_log.id
}
}
defined_tags = {"Operations.CostCenter"= "42"}
description = var.model_deployment_description
display_name = var.model_deployment_display_name
freeform_tags = {"Department"= "Finance"}
opc_parent_rpt_url = var.model_deployment_opc_parent_rpt_url
}
Example terraform config for deployment_type = "SINGLE_MODEL_FLEX" with infrastructure_type = "MANAGED_COMPUTE_CLUSTER" and fixed size scaling:
resource "oci_datascience_model_deployment" "test_single_model_flex_fixed_size_model_deployment" {
#Required
compartment_id = var.compartment_id
model_deployment_configuration_details {
#Required
deployment_type = "SINGLE_MODEL_FLEX"
infrastructure_configuration_details {
#Required
compute_target_id = oci_datascience_compute_target.test_compute_target.id
infrastructure_type = "MANAGED_COMPUTE_CLUSTER"
scaling_policy {
#Required
instance_count = var.model_deployment_model_deployment_configuration_details_infrastructure_configuration_details_scaling_policy_instance_count
policy_type = "FIXED_SIZE"
}
}
model_configuration_details {
#Required
model_id = oci_datascience_model.test_model.id
}
}
#Required
project_id = oci_datascience_project.test_project.id
}
Argument Reference
The following arguments are supported:
category_log_details- (Optional) (Updatable) The log details for each category.access- (Optional) (Updatable) The log details.predict- (Optional) (Updatable) The log details.
compartment_id- (Required) (Updatable) The OCID of the compartment where you want to create the model deployment.defined_tags- (Optional) (Updatable) Defined tags for this resource. Each key is predefined and scoped to a namespace. See Resource Tags. Example:{"Operations.CostCenter": "42"}description- (Optional) (Updatable) A short description of the model deployment.display_name- (Optional) (Updatable) A user-friendly display name for the resource. Does not have to be unique, and can be modified. Avoid entering confidential information. Example:My ModelDeploymentfreeform_tags- (Optional) (Updatable) Free-form tags for this resource. Each tag is a simple key-value pair with no predefined name, type, or namespace. See Resource Tags. Example:{"Department": "Finance"}model_deployment_configuration_details- (Required) (Updatable) The model deployment configuration details.deployment_type- (Required) (Updatable) The type of the model deployment.environment_configuration_details- (Optional) (Updatable) The configuration to carry the environment details thats used in Model Deployment creationcmd- (Applicable when environment_configuration_type=OCIR_CONTAINER) (Updatable) The container image run CMD as a list of strings. UseCMDas arguments to theENTRYPOINTor the only command to run in the absence of anENTRYPOINT. The combined size ofCMDandENTRYPOINTmust be less than 2048 bytes.custom_http_endpoints- (Applicable when environment_configuration_type=OCIR_CONTAINER) (Updatable) List of custom inference HTTP endpoints configured on the model deployment instance for inferencing.endpoint_uri_suffix- (Required when environment_configuration_type=OCIR_CONTAINER) (Updatable) The suffix part of the endpoint that will be allowed for invocation.http_methods- (Required when environment_configuration_type=OCIR_CONTAINER) (Updatable) List of HTTP methods acceptable by the URI.
default_environment_variables- (Applicable when environment_configuration_type=OCIR_CONTAINER) Service injected Environment variables set for the web server container and can not be set or modified by user.entrypoint- (Applicable when environment_configuration_type=OCIR_CONTAINER) (Updatable) The container image run ENTRYPOINT as a list of strings. Accept theCMDas extra arguments. The combined size ofCMDandENTRYPOINTmust be less than 2048 bytes. More information on howCMDandENTRYPOINTinteract are here.environment_configuration_type- (Required) (Updatable) The environment configuration typeenvironment_variables- (Optional) (Updatable) Environment variables to set for the web server container. The size of envVars must be less than 2048 bytes. Key should be under 32 characters. Key should contain only letters, digits and underscore (_) Key should start with a letter. Key should have at least 2 characters. Key should not end with underscore eg.TEST_Key if added cannot be empty. Value can be empty. No specific size limits on individual Values. But overall environment variables is limited to 2048 bytes. Key can’t be reserved Model Deployment environment variables.health_check_port- (Applicable when environment_configuration_type=OCIR_CONTAINER) (Updatable) The port on which the container HEALTHCHECK would listen. The port can be anything between1024and65535. The following ports cannot be used24224,8446,8447.image- (Applicable when environment_configuration_type=OCIR_CONTAINER) (Updatable) The full path to the Oracle Container Repository (OCIR) registry, image, and tag in a canonical format. The container image is optional while using service managed open source foundation model. Acceptable format:<region>.ocir.io/<registry>/<image>:<tag><region>.ocir.io/<registry>/<image>:<tag>@digestimage_digest- (Applicable when environment_configuration_type=OCIR_CONTAINER) (Updatable) The digest of the container image. For example,sha256:881303a6b2738834d795a32b4a98eb0e5e3d1cad590a712d1e04f9b2fa90a030predict_api_specification- (Applicable when environment_configuration_type=OCIR_CONTAINER) (Updatable) The chosen specification from predefined set of endpoints a user can access. For example, if the value is ‘openai’, the user can access OpenAI-compliant endpoints like /v1/completions, /v1/chat/completions, /v1/models, etc., for inference.image_signature_id- (Applicable when environment_configuration_type=OCIR_CONTAINER) (Updatable) OCID of the container image signatureserver_port- (Applicable when environment_configuration_type=OCIR_CONTAINER) (Updatable) The port on which the web server serving the inference is running. The port can be anything between1024and65535. The following ports cannot be used24224,8446,8447.
model_configuration_details- (Required) (Updatable) The model configuration details.bandwidth_mbps- (Optional) (Updatable) The minimum network bandwidth for the model deployment.instance_configuration- (Required) (Updatable) The model deployment instance configuration.instance_shape_name- (Required) (Updatable) The shape used to launch the model deployment instances. When using service managed open source foundation model, the supported shapes can be retrieved using get model api /models/{modelId}/definedMetadata/deploymentConfiguration/artifact/content.model_deployment_instance_shape_config_details- (Optional) (Updatable) Details for the model-deployment instance shape configuration.cpu_baseline- (Optional) (Updatable) The baseline OCPU utilization for a subcore burstable VM instance. If this attribute is left blank, it will default toBASELINE_1_1. The following values are supported: BASELINE_1_8 - baseline usage is 1⁄8 of an OCPU. BASELINE_1_2 - baseline usage is 1⁄2 of an OCPU. BASELINE_1_1 - baseline usage is an entire OCPU. This represents a non-burstable instance.memory_in_gbs- (Optional) (Updatable) A model-deployment instance of type VM.Standard.E3.Flex or VM.Standard.E4.Flex allows the memory to be specified with in the range of 6 to 1024 GB. VM.Standard3.Flex memory range is between 6 to 512 GB and VM.Optimized3.Flex memory range is between 6 to 256 GB.ocpus- (Optional) (Updatable) A model-deployment instance of type VM.Standard.E3.Flex or VM.Standard.E4.Flex allows the ocpu count to be specified with in the range of 1 to 64 ocpu. VM.Standard3.Flex OCPU range is between 1 to 32 ocpu and for VM.Optimized3.Flex OCPU range is 1 to 18 ocpu.
network_access_type- (Optional) (Updatable) Network Access type of model deployment.private_endpoint_id- (Optional) (Updatable) The OCID of a Data Science private endpoint.subnet_id- (Optional) (Updatable) A model deployment instance is provided with a VNIC for network access. This specifies the OCID of the subnet to create a VNIC in. The subnet should be in a VCN with a NAT/SGW gateway for egress.
maximum_bandwidth_mbps- (Optional) (Updatable) The maximum network bandwidth for the model deployment.model_id- (Required) (Updatable) The OCID of the model you want to deploy.scaling_policy- (Optional) (Updatable) The scaling policy to apply to each model of the deployment.auto_scaling_policies- (Required when policy_type=AUTOSCALING) (Updatable) The list of autoscaling policy details.auto_scaling_policy_type- (Required) (Updatable) The type of autoscaling policy.initial_instance_count- (Required) (Updatable) For a threshold-based autoscaling policy, this value is the initial number of instances to launch in the model deployment immediately after autoscaling is enabled. Note that anytime this value is updated, the number of instances will be reset to this value. After autoscaling retrieves performance metrics, the number of instances is automatically adjusted from this initial number to a number that is based on the limits that you set.maximum_instance_count- (Required) (Updatable) For a threshold-based autoscaling policy, this value is the maximum number of instances the model deployment is allowed to increase to (scale out).minimum_instance_count- (Required) (Updatable) For a threshold-based autoscaling policy, this value is the minimum number of instances the model deployment is allowed to decrease to (scale in).rules- (Required) (Updatable) The list of autoscaling policy rules.metric_expression_rule_type- (Required) (Updatable) The metric expression for creating the alarm used to trigger autoscaling actions on the model deployment.The following values are supported:
PREDEFINED_EXPRESSION: An expression built using CPU or Memory metrics emitted by the Model Deployment Monitoring.CUSTOM_EXPRESSION: A custom Monitoring Query Language (MQL) expression.
metric_type- (Required when metric_expression_rule_type=PREDEFINED_EXPRESSION) (Updatable) Metric typescale_in_configuration- (Required) (Updatable) The scaling configuration for the predefined metric expression rule.instance_count_adjustment- (Applicable when metric_expression_rule_type=CUSTOM_EXPRESSION | PREDEFINED_EXPRESSION) (Updatable) The value is used for adjusting the count of instances by.pending_duration- (Applicable when metric_expression_rule_type=CUSTOM_EXPRESSION | PREDEFINED_EXPRESSION) (Updatable) The period of time that the condition defined in the alarm must persist before the alarm state changes from “OK” to “FIRING” or vice versa. For example, a value of 5 minutes means that the alarm must persist in breaching the condition for five minutes before the alarm updates its state to “FIRING”; likewise, the alarm must persist in not breaching the condition for five minutes before the alarm updates its state to “OK.”The duration is specified as a string in ISO 8601 format (
PT10Mfor ten minutes orPT1Hfor one hour). Minimum: PT3M. Maximum: PT1H. Default: PT3M.query- (Required when metric_expression_rule_type=CUSTOM_EXPRESSION) (Updatable) The Monitoring Query Language (MQL) expression to evaluate for the alarm. The Alarms feature of the Monitoring service interprets results for each returned time series as Boolean values, where zero represents false and a non-zero value represents true. A true value means that the trigger rule condition has been met. The query must specify a metric, statistic, interval, and trigger rule (threshold or absence). Supported values for interval:1m-60m(also1h). You can optionally specify dimensions and grouping functions. Supported grouping functions:grouping(),groupBy().Example of threshold alarm:
CPUUtilization[1m]{resourceId = “MODEL_DEPLOYMENT_OCID”}.grouping().mean() < 25 CPUUtilization[1m]{resourceId = “MODEL_DEPLOYMENT_OCID”}.grouping().mean() > 75
scaling_configuration_type- (Required when metric_expression_rule_type=CUSTOM_EXPRESSION | PREDEFINED_EXPRESSION) (Updatable) The type of scaling configuration.threshold- (Required when metric_expression_rule_type=PREDEFINED_EXPRESSION) (Updatable) A metric value at which the scaling operation will be triggered.
scale_out_configuration- (Required) (Updatable) The scaling configuration for the predefined metric expression rule.instance_count_adjustment- (Applicable when metric_expression_rule_type=CUSTOM_EXPRESSION | PREDEFINED_EXPRESSION) (Updatable) The value is used for adjusting the count of instances by.pending_duration- (Applicable when metric_expression_rule_type=CUSTOM_EXPRESSION | PREDEFINED_EXPRESSION) (Updatable) The period of time that the condition defined in the alarm must persist before the alarm state changes from “OK” to “FIRING” or vice versa. For example, a value of 5 minutes means that the alarm must persist in breaching the condition for five minutes before the alarm updates its state to “FIRING”; likewise, the alarm must persist in not breaching the condition for five minutes before the alarm updates its state to “OK.”The duration is specified as a string in ISO 8601 format (
PT10Mfor ten minutes orPT1Hfor one hour). Minimum: PT3M. Maximum: PT1H. Default: PT3M.query- (Required when metric_expression_rule_type=CUSTOM_EXPRESSION) (Updatable) The Monitoring Query Language (MQL) expression to evaluate for the alarm. The Alarms feature of the Monitoring service interprets results for each returned time series as Boolean values, where zero represents false and a non-zero value represents true. A true value means that the trigger rule condition has been met. The query must specify a metric, statistic, interval, and trigger rule (threshold or absence). Supported values for interval:1m-60m(also1h). You can optionally specify dimensions and grouping functions. Supported grouping functions:grouping(),groupBy().Example of threshold alarm:
CPUUtilization[1m]{resourceId = “MODEL_DEPLOYMENT_OCID”}.grouping().mean() < 25 CPUUtilization[1m]{resourceId = “MODEL_DEPLOYMENT_OCID”}.grouping().mean() > 75
scaling_configuration_type- (Required when metric_expression_rule_type=CUSTOM_EXPRESSION | PREDEFINED_EXPRESSION) (Updatable) The type of scaling configuration.threshold- (Required when metric_expression_rule_type=PREDEFINED_EXPRESSION) (Updatable) A metric value at which the scaling operation will be triggered.
cool_down_in_seconds- (Applicable when policy_type=AUTOSCALING) (Updatable) For threshold-based autoscaling policies, this value is the minimum period of time to wait between scaling actions. The cooldown period gives the system time to stabilize before rescaling. The minimum value is 300 seconds, which is also the default. The cooldown period starts when the model deployment becomes ACTIVE after the scaling operation.instance_count- (Required when policy_type=FIXED_SIZE) (Updatable) The number of instances for the model deployment.is_enabled- (Applicable when policy_type=AUTOSCALING) (Updatable) Whether the autoscaling policy is enabled.policy_type- (Required) (Updatable) The type of scaling policy.
model_configuration_details- (Required when deployment_type=SINGLE_MODEL) (Updatable) The model configuration details.bandwidth_mbps- (Applicable when deployment_type=SINGLE_MODEL) (Updatable) The minimum network bandwidth for the model deployment.instance_configuration- (Required when deployment_type=SINGLE_MODEL) (Updatable) The model deployment instance configuration.instance_shape_name- (Required when deployment_type=SINGLE_MODEL) (Updatable) The shape used to launch the model deployment instances. When using service managed open source foundation model, the supported shapes can be retrieved using get model api /models/{modelId}/definedMetadata/deploymentConfiguration/artifact/content.model_deployment_instance_shape_config_details- (Applicable when deployment_type=SINGLE_MODEL) (Updatable) Details for the model-deployment instance shape configuration.cpu_baseline- (Applicable when deployment_type=SINGLE_MODEL) (Updatable) The baseline OCPU utilization for a subcore burstable VM instance. If this attribute is left blank, it will default toBASELINE_1_1. The following values are supported: BASELINE_1_8 - baseline usage is 1⁄8 of an OCPU. BASELINE_1_2 - baseline usage is 1⁄2 of an OCPU. BASELINE_1_1 - baseline usage is an entire OCPU. This represents a non-burstable instance.memory_in_gbs- (Applicable when deployment_type=SINGLE_MODEL) (Updatable) A model-deployment instance of type VM.Standard.E3.Flex or VM.Standard.E4.Flex allows the memory to be specified with in the range of 6 to 1024 GB. VM.Standard3.Flex memory range is between 6 to 512 GB and VM.Optimized3.Flex memory range is between 6 to 256 GB.ocpus- (Applicable when deployment_type=SINGLE_MODEL) (Updatable) A model-deployment instance of type VM.Standard.E3.Flex or VM.Standard.E4.Flex allows the ocpu count to be specified with in the range of 1 to 64 ocpu. VM.Standard3.Flex OCPU range is between 1 to 32 ocpu and for VM.Optimized3.Flex OCPU range is 1 to 18 ocpu.
network_access_type- (Applicable when deployment_type=SINGLE_MODEL) (Updatable) Network Access type of model deployment.private_endpoint_id- (Applicable when deployment_type=SINGLE_MODEL) (Updatable) The OCID of a Data Science private endpoint.subnet_id- (Applicable when deployment_type=SINGLE_MODEL) (Updatable) A model deployment instance is provided with a VNIC for network access. This specifies the OCID of the subnet to create a VNIC in. The subnet should be in a VCN with a NAT/SGW gateway for egress.
maximum_bandwidth_mbps- (Applicable when deployment_type=SINGLE_MODEL) (Updatable) The maximum network bandwidth for the model deployment.model_id- (Required when deployment_type=SINGLE_MODEL) (Updatable) The OCID of the model you want to deploy.scaling_policy- (Applicable when deployment_type=SINGLE_MODEL) (Updatable) The scaling policy to apply to each model of the deployment.auto_scaling_policies- (Required when policy_type=AUTOSCALING) (Updatable) The list of autoscaling policy details.auto_scaling_policy_type- (Required) (Updatable) The type of autoscaling policy.initial_instance_count- (Required) (Updatable) For a threshold-based autoscaling policy, this value is the initial number of instances to launch in the model deployment immediately after autoscaling is enabled. Note that anytime this value is updated, the number of instances will be reset to this value. After autoscaling retrieves performance metrics, the number of instances is automatically adjusted from this initial number to a number that is based on the limits that you set.maximum_instance_count- (Required) (Updatable) For a threshold-based autoscaling policy, this value is the maximum number of instances the model deployment is allowed to increase to (scale out).minimum_instance_count- (Required) (Updatable) For a threshold-based autoscaling policy, this value is the minimum number of instances the model deployment is allowed to decrease to (scale in).rules- (Required) (Updatable) The list of autoscaling policy rules.metric_expression_rule_type- (Required) (Updatable) The metric expression for creating the alarm used to trigger autoscaling actions on the model deployment.The following values are supported:
PREDEFINED_EXPRESSION: An expression built using CPU or Memory metrics emitted by the Model Deployment Monitoring.CUSTOM_EXPRESSION: A custom Monitoring Query Language (MQL) expression.
metric_type- (Required when metric_expression_rule_type=PREDEFINED_EXPRESSION) (Updatable) Metric typescale_in_configuration- (Required) (Updatable) The scaling configuration for the predefined metric expression rule.instance_count_adjustment- (Applicable when metric_expression_rule_type=CUSTOM_EXPRESSION | PREDEFINED_EXPRESSION) (Updatable) The value is used for adjusting the count of instances by.pending_duration- (Applicable when metric_expression_rule_type=CUSTOM_EXPRESSION | PREDEFINED_EXPRESSION) (Updatable) The period of time that the condition defined in the alarm must persist before the alarm state changes from “OK” to “FIRING” or vice versa. For example, a value of 5 minutes means that the alarm must persist in breaching the condition for five minutes before the alarm updates its state to “FIRING”; likewise, the alarm must persist in not breaching the condition for five minutes before the alarm updates its state to “OK.”The duration is specified as a string in ISO 8601 format (
PT10Mfor ten minutes orPT1Hfor one hour). Minimum: PT3M. Maximum: PT1H. Default: PT3M.query- (Required when metric_expression_rule_type=CUSTOM_EXPRESSION) (Updatable) The Monitoring Query Language (MQL) expression to evaluate for the alarm. The Alarms feature of the Monitoring service interprets results for each returned time series as Boolean values, where zero represents false and a non-zero value represents true. A true value means that the trigger rule condition has been met. The query must specify a metric, statistic, interval, and trigger rule (threshold or absence). Supported values for interval:1m-60m(also1h). You can optionally specify dimensions and grouping functions. Supported grouping functions:grouping(),groupBy().Example of threshold alarm:
CPUUtilization[1m]{resourceId = “MODEL_DEPLOYMENT_OCID”}.grouping().mean() < 25 CPUUtilization[1m]{resourceId = “MODEL_DEPLOYMENT_OCID”}.grouping().mean() > 75
scaling_configuration_type- (Required when metric_expression_rule_type=CUSTOM_EXPRESSION | PREDEFINED_EXPRESSION) (Updatable) The type of scaling configuration.threshold- (Required when metric_expression_rule_type=PREDEFINED_EXPRESSION) (Updatable) A metric value at which the scaling operation will be triggered.
scale_out_configuration- (Required) (Updatable) The scaling configuration for the predefined metric expression rule.instance_count_adjustment- (Applicable when metric_expression_rule_type=CUSTOM_EXPRESSION | PREDEFINED_EXPRESSION) (Updatable) The value is used for adjusting the count of instances by.pending_duration- (Applicable when metric_expression_rule_type=CUSTOM_EXPRESSION | PREDEFINED_EXPRESSION) (Updatable) The period of time that the condition defined in the alarm must persist before the alarm state changes from “OK” to “FIRING” or vice versa. For example, a value of 5 minutes means that the alarm must persist in breaching the condition for five minutes before the alarm updates its state to “FIRING”; likewise, the alarm must persist in not breaching the condition for five minutes before the alarm updates its state to “OK.”The duration is specified as a string in ISO 8601 format (
PT10Mfor ten minutes orPT1Hfor one hour). Minimum: PT3M. Maximum: PT1H. Default: PT3M.query- (Required when metric_expression_rule_type=CUSTOM_EXPRESSION) (Updatable) The Monitoring Query Language (MQL) expression to evaluate for the alarm. The Alarms feature of the Monitoring service interprets results for each returned time series as Boolean values, where zero represents false and a non-zero value represents true. A true value means that the trigger rule condition has been met. The query must specify a metric, statistic, interval, and trigger rule (threshold or absence). Supported values for interval:1m-60m(also1h). You can optionally specify dimensions and grouping functions. Supported grouping functions:grouping(),groupBy().Example of threshold alarm:
CPUUtilization[1m]{resourceId = “MODEL_DEPLOYMENT_OCID”}.grouping().mean() < 25 CPUUtilization[1m]{resourceId = “MODEL_DEPLOYMENT_OCID”}.grouping().mean() > 75
scaling_configuration_type- (Required when metric_expression_rule_type=CUSTOM_EXPRESSION | PREDEFINED_EXPRESSION) (Updatable) The type of scaling configuration.threshold- (Required when metric_expression_rule_type=PREDEFINED_EXPRESSION) (Updatable) A metric value at which the scaling operation will be triggered.
cool_down_in_seconds- (Applicable when policy_type=AUTOSCALING) (Updatable) For threshold-based autoscaling policies, this value is the minimum period of time to wait between scaling actions. The cooldown period gives the system time to stabilize before rescaling. The minimum value is 300 seconds, which is also the default. The cooldown period starts when the model deployment becomes ACTIVE after the scaling operation.instance_count- (Required when policy_type=FIXED_SIZE) (Updatable) The number of instances for the model deployment.is_enabled- (Applicable when policy_type=AUTOSCALING) (Updatable) Whether the autoscaling policy is enabled.policy_type- (Required) (Updatable) The type of scaling policy.
model_group_configuration_details- (Required when deployment_type=MODEL_GROUP) (Updatable) The model group configuration details.model_group_id- (Required when deployment_type=MODEL_GROUP) (Updatable) The OCID of the model group you want to deploy.
opc_parent_rpt_url- (Optional) URL to fetch the Resource Principal Token from the parent resource.project_id- (Required) The OCID of the project to associate with the model deployment.state- (Optional) (Updatable) The target state for the Model Deployment. Could be set toACTIVEorINACTIVE.
** IMPORTANT ** Any change to a property that does not support update will force the destruction and recreation of the resource with the new property values
Attributes Reference
The following attributes are exported:
category_log_details- The log details for each category.compartment_id- The OCID of the model deployment’s compartment.created_by- The OCID of the user who created the model deployment.defined_tags- Defined tags for this resource. Each key is predefined and scoped to a namespace. See Resource Tags. Example:{"Operations.CostCenter": "42"}description- A short description of the model deployment.display_name- A user-friendly display name for the resource. Does not have to be unique, and can be modified. Avoid entering confidential information. Example:My ModelDeploymentfreeform_tags- Free-form tags for this resource. Each tag is a simple key-value pair with no predefined name, type, or namespace. See Resource Tags. Example:{"Department": "Finance"}id- The OCID of the model deployment.lifecycle_details- Details about the state of the model deployment.model_deployment_configuration_details- The model deployment configuration details.deployment_type- The type of the model deployment.environment_configuration_details- The configuration to carry the environment details thats used in Model Deployment creationcmd- The container image run CMD as a list of strings. UseCMDas arguments to theENTRYPOINTor the only command to run in the absence of anENTRYPOINT. The combined size ofCMDandENTRYPOINTmust be less than 2048 bytes.custom_http_endpoints- List of custom inference HTTP endpoints configured on the model deployment instance for inferencing.endpoint_uri_suffix- The suffix part of the endpoint that will be allowed for invocation.http_methods- List of HTTP methods acceptable by the URI.
default_environment_variables- Service injected Environment variables set for the web server container and can not be set or modified by user.entrypoint- The container image run ENTRYPOINT as a list of strings. Accept theCMDas extra arguments. The combined size ofCMDandENTRYPOINTmust be less than 2048 bytes. More information on howCMDandENTRYPOINTinteract are here.environment_configuration_type- The environment configuration typeenvironment_variables- Environment variables to set for the web server container. The size of envVars must be less than 2048 bytes. Key should be under 32 characters. Key should contain only letters, digits and underscore (_) Key should start with a letter. Key should have at least 2 characters. Key should not end with underscore eg.TEST_Key if added cannot be empty. Value can be empty. No specific size limits on individual Values. But overall environment variables is limited to 2048 bytes. Key can’t be reserved Model Deployment environment variables.health_check_port- The port on which the container HEALTHCHECK would listen. The port can be anything between1024and65535. The following ports cannot be used24224,8446,8447.image- The full path to the Oracle Container Repository (OCIR) registry, image, and tag in a canonical format. The container image is optional while using service managed open source foundation model. Acceptable format:<region>.ocir.io/<registry>/<image>:<tag><region>.ocir.io/<registry>/<image>:<tag>@digestimage_digest- The digest of the container image. For example,sha256:881303a6b2738834d795a32b4a98eb0e5e3d1cad590a712d1e04f9b2fa90a030predict_api_specification- The chosen specification from predefined set of endpoints a user can access. For example, if the value is ‘openai’, the user can access OpenAI-compliant endpoints like /v1/completions, /v1/chat/completions, /v1/models, etc., for inference.server_port- The port on which the web server serving the inference is running. The port can be anything between1024and65535. The following ports cannot be used24224,8446,8447.image_signature_id- OCID of the container image signature
infrastructure_configuration_details- The infrastructure configuration details.bandwidth_mbps- The minimum network bandwidth for the model deployment.infrastructure_type- The type of the model deployment infrastructure.instance_configuration- The model deployment instance configuration.instance_shape_name- The shape used to launch the model deployment instances. When using service managed open source foundation model, the supported shapes can be retrieved using get model api /models/{modelId}/definedMetadata/deploymentConfiguration/artifact/content.model_deployment_instance_shape_config_details- Details for the model-deployment instance shape configuration.cpu_baseline- The baseline OCPU utilization for a subcore burstable VM instance. If this attribute is left blank, it will default toBASELINE_1_1. The following values are supported: BASELINE_1_8 - baseline usage is 1⁄8 of an OCPU. BASELINE_1_2 - baseline usage is 1⁄2 of an OCPU. BASELINE_1_1 - baseline usage is an entire OCPU. This represents a non-burstable instance.memory_in_gbs- A model-deployment instance of type VM.Standard.E3.Flex or VM.Standard.E4.Flex allows the memory to be specified with in the range of 6 to 1024 GB. VM.Standard3.Flex memory range is between 6 to 512 GB and VM.Optimized3.Flex memory range is between 6 to 256 GB.ocpus- A model-deployment instance of type VM.Standard.E3.Flex or VM.Standard.E4.Flex allows the ocpu count to be specified with in the range of 1 to 64 ocpu. VM.Standard3.Flex OCPU range is between 1 to 32 ocpu and for VM.Optimized3.Flex OCPU range is 1 to 18 ocpu.
network_access_type- Network Access type of model deployment.private_endpoint_id- The OCID of a Data Science private endpoint.subnet_id- A model deployment instance is provided with a VNIC for network access. This specifies the OCID of the subnet to create a VNIC in. The subnet should be in a VCN with a NAT/SGW gateway for egress.
maximum_bandwidth_mbps- The maximum network bandwidth for the model deployment.scaling_policy- The scaling policy to apply to each model of the deployment.auto_scaling_policies- The list of autoscaling policy details.auto_scaling_policy_type- The type of autoscaling policy.initial_instance_count- For a threshold-based autoscaling policy, this value is the initial number of instances to launch in the model deployment immediately after autoscaling is enabled. Note that anytime this value is updated, the number of instances will be reset to this value. After autoscaling retrieves performance metrics, the number of instances is automatically adjusted from this initial number to a number that is based on the limits that you set.maximum_instance_count- For a threshold-based autoscaling policy, this value is the maximum number of instances the model deployment is allowed to increase to (scale out).minimum_instance_count- For a threshold-based autoscaling policy, this value is the minimum number of instances the model deployment is allowed to decrease to (scale in).rules- The list of autoscaling policy rules.metric_expression_rule_type- The metric expression for creating the alarm used to trigger autoscaling actions on the model deployment.The following values are supported:
PREDEFINED_EXPRESSION: An expression built using CPU or Memory metrics emitted by the Model Deployment Monitoring.CUSTOM_EXPRESSION: A custom Monitoring Query Language (MQL) expression.
metric_type- Metric typescale_in_configuration- The scaling configuration for the predefined metric expression rule.instance_count_adjustment- The value is used for adjusting the count of instances by.pending_duration- The period of time that the condition defined in the alarm must persist before the alarm state changes from “OK” to “FIRING” or vice versa. For example, a value of 5 minutes means that the alarm must persist in breaching the condition for five minutes before the alarm updates its state to “FIRING”; likewise, the alarm must persist in not breaching the condition for five minutes before the alarm updates its state to “OK.”The duration is specified as a string in ISO 8601 format (
PT10Mfor ten minutes orPT1Hfor one hour). Minimum: PT3M. Maximum: PT1H. Default: PT3M.query- The Monitoring Query Language (MQL) expression to evaluate for the alarm. The Alarms feature of the Monitoring service interprets results for each returned time series as Boolean values, where zero represents false and a non-zero value represents true. A true value means that the trigger rule condition has been met. The query must specify a metric, statistic, interval, and trigger rule (threshold or absence). Supported values for interval:1m-60m(also1h). You can optionally specify dimensions and grouping functions. Supported grouping functions:grouping(),groupBy().Example of threshold alarm:
CPUUtilization[1m]{resourceId = “MODEL_DEPLOYMENT_OCID”}.grouping().mean() < 25 CPUUtilization[1m]{resourceId = “MODEL_DEPLOYMENT_OCID”}.grouping().mean() > 75
scaling_configuration_type- The type of scaling configuration.threshold- A metric value at which the scaling operation will be triggered.
scale_out_configuration- The scaling configuration for the predefined metric expression rule.instance_count_adjustment- The value is used for adjusting the count of instances by.pending_duration- The period of time that the condition defined in the alarm must persist before the alarm state changes from “OK” to “FIRING” or vice versa. For example, a value of 5 minutes means that the alarm must persist in breaching the condition for five minutes before the alarm updates its state to “FIRING”; likewise, the alarm must persist in not breaching the condition for five minutes before the alarm updates its state to “OK.”The duration is specified as a string in ISO 8601 format (
PT10Mfor ten minutes orPT1Hfor one hour). Minimum: PT3M. Maximum: PT1H. Default: PT3M.query- The Monitoring Query Language (MQL) expression to evaluate for the alarm. The Alarms feature of the Monitoring service interprets results for each returned time series as Boolean values, where zero represents false and a non-zero value represents true. A true value means that the trigger rule condition has been met. The query must specify a metric, statistic, interval, and trigger rule (threshold or absence). Supported values for interval:1m-60m(also1h). You can optionally specify dimensions and grouping functions. Supported grouping functions:grouping(),groupBy().Example of threshold alarm:
CPUUtilization[1m]{resourceId = “MODEL_DEPLOYMENT_OCID”}.grouping().mean() < 25 CPUUtilization[1m]{resourceId = “MODEL_DEPLOYMENT_OCID”}.grouping().mean() > 75
scaling_configuration_type- The type of scaling configuration.threshold- A metric value at which the scaling operation will be triggered.
cool_down_in_seconds- For threshold-based autoscaling policies, this value is the minimum period of time to wait between scaling actions. The cooldown period gives the system time to stabilize before rescaling. The minimum value is 300 seconds, which is also the default. The cooldown period starts when the model deployment becomes ACTIVE after the scaling operation.instance_count- The number of instances for the model deployment.is_enabled- Whether the autoscaling policy is enabled.policy_type- The type of scaling policy.
server_port- The port on which the web server serving the inference is running. The port can be anything between1024and65535. The following ports cannot be used24224,8446,8447.
model_configuration_details- The model configuration details.bandwidth_mbps- The minimum network bandwidth for the model deployment.instance_configuration- The model deployment instance configuration.instance_shape_name- The shape used to launch the model deployment instances. When using service managed open source foundation model, the supported shapes can be retrieved using get model api /models/{modelId}/definedMetadata/deploymentConfiguration/artifact/content.model_deployment_instance_shape_config_details- Details for the model-deployment instance shape configuration.cpu_baseline- The baseline OCPU utilization for a subcore burstable VM instance. If this attribute is left blank, it will default toBASELINE_1_1. The following values are supported: BASELINE_1_8 - baseline usage is 1⁄8 of an OCPU. BASELINE_1_2 - baseline usage is 1⁄2 of an OCPU. BASELINE_1_1 - baseline usage is an entire OCPU. This represents a non-burstable instance.memory_in_gbs- A model-deployment instance of type VM.Standard.E3.Flex or VM.Standard.E4.Flex allows the memory to be specified with in the range of 6 to 1024 GB. VM.Standard3.Flex memory range is between 6 to 512 GB and VM.Optimized3.Flex memory range is between 6 to 256 GB.ocpus- A model-deployment instance of type VM.Standard.E3.Flex or VM.Standard.E4.Flex allows the ocpu count to be specified with in the range of 1 to 64 ocpu. VM.Standard3.Flex OCPU range is between 1 to 32 ocpu and for VM.Optimized3.Flex OCPU range is 1 to 18 ocpu.
network_access_type- Network Access type of model deployment.private_endpoint_id- The OCID of a Data Science private endpoint.subnet_id- A model deployment instance is provided with a VNIC for network access. This specifies the OCID of the subnet to create a VNIC in. The subnet should be in a VCN with a NAT/SGW gateway for egress.
maximum_bandwidth_mbps- The maximum network bandwidth for the model deployment.model_id- The OCID of the model you want to deploy.scaling_policy- The scaling policy to apply to each model of the deployment.auto_scaling_policies- The list of autoscaling policy details.auto_scaling_policy_type- The type of autoscaling policy.initial_instance_count- For a threshold-based autoscaling policy, this value is the initial number of instances to launch in the model deployment immediately after autoscaling is enabled. Note that anytime this value is updated, the number of instances will be reset to this value. After autoscaling retrieves performance metrics, the number of instances is automatically adjusted from this initial number to a number that is based on the limits that you set.maximum_instance_count- For a threshold-based autoscaling policy, this value is the maximum number of instances the model deployment is allowed to increase to (scale out).minimum_instance_count- For a threshold-based autoscaling policy, this value is the minimum number of instances the model deployment is allowed to decrease to (scale in).rules- The list of autoscaling policy rules.metric_expression_rule_type- The metric expression for creating the alarm used to trigger autoscaling actions on the model deployment.The following values are supported:
PREDEFINED_EXPRESSION: An expression built using CPU or Memory metrics emitted by the Model Deployment Monitoring.CUSTOM_EXPRESSION: A custom Monitoring Query Language (MQL) expression.
metric_type- Metric typescale_in_configuration- The scaling configuration for the predefined metric expression rule.instance_count_adjustment- The value is used for adjusting the count of instances by.pending_duration- The period of time that the condition defined in the alarm must persist before the alarm state changes from “OK” to “FIRING” or vice versa. For example, a value of 5 minutes means that the alarm must persist in breaching the condition for five minutes before the alarm updates its state to “FIRING”; likewise, the alarm must persist in not breaching the condition for five minutes before the alarm updates its state to “OK.”The duration is specified as a string in ISO 8601 format (
PT10Mfor ten minutes orPT1Hfor one hour). Minimum: PT3M. Maximum: PT1H. Default: PT3M.query- The Monitoring Query Language (MQL) expression to evaluate for the alarm. The Alarms feature of the Monitoring service interprets results for each returned time series as Boolean values, where zero represents false and a non-zero value represents true. A true value means that the trigger rule condition has been met. The query must specify a metric, statistic, interval, and trigger rule (threshold or absence). Supported values for interval:1m-60m(also1h). You can optionally specify dimensions and grouping functions. Supported grouping functions:grouping(),groupBy().Example of threshold alarm:
CPUUtilization[1m]{resourceId = “MODEL_DEPLOYMENT_OCID”}.grouping().mean() < 25 CPUUtilization[1m]{resourceId = “MODEL_DEPLOYMENT_OCID”}.grouping().mean() > 75
scaling_configuration_type- The type of scaling configuration.threshold- A metric value at which the scaling operation will be triggered.
scale_out_configuration- The scaling configuration for the predefined metric expression rule.instance_count_adjustment- The value is used for adjusting the count of instances by.pending_duration- The period of time that the condition defined in the alarm must persist before the alarm state changes from “OK” to “FIRING” or vice versa. For example, a value of 5 minutes means that the alarm must persist in breaching the condition for five minutes before the alarm updates its state to “FIRING”; likewise, the alarm must persist in not breaching the condition for five minutes before the alarm updates its state to “OK.”The duration is specified as a string in ISO 8601 format (
PT10Mfor ten minutes orPT1Hfor one hour). Minimum: PT3M. Maximum: PT1H. Default: PT3M.query- The Monitoring Query Language (MQL) expression to evaluate for the alarm. The Alarms feature of the Monitoring service interprets results for each returned time series as Boolean values, where zero represents false and a non-zero value represents true. A true value means that the trigger rule condition has been met. The query must specify a metric, statistic, interval, and trigger rule (threshold or absence). Supported values for interval:1m-60m(also1h). You can optionally specify dimensions and grouping functions. Supported grouping functions:grouping(),groupBy().Example of threshold alarm:
CPUUtilization[1m]{resourceId = “MODEL_DEPLOYMENT_OCID”}.grouping().mean() < 25 CPUUtilization[1m]{resourceId = “MODEL_DEPLOYMENT_OCID”}.grouping().mean() > 75
scaling_configuration_type- The type of scaling configuration.threshold- A metric value at which the scaling operation will be triggered.
cool_down_in_seconds- For threshold-based autoscaling policies, this value is the minimum period of time to wait between scaling actions. The cooldown period gives the system time to stabilize before rescaling. The minimum value is 300 seconds, which is also the default. The cooldown period starts when the model deployment becomes ACTIVE after the scaling operation.instance_count- The number of instances for the model deployment.is_enabled- Whether the autoscaling policy is enabled.policy_type- The type of scaling policy.
model_deployment_system_data- Model deployment system data.current_instance_count- This value is the current count of the model deployment instances.model_type- The type of the deployed model.system_infra_type- The infrastructure type of the model deployment.
model_deployment_url- The URL to interact with the model deployment.project_id- The OCID of the project associated with the model deployment.state- The state of the model deployment.time_created- The date and time the resource was created, in the timestamp format defined by RFC3339. Example: 2019-08-25T21:10:29.41Z
SINGLE_MODEL_FLEX Exported Attributes
The following attributes are exported for deployment_type = SINGLE_MODEL_FLEX.
model_deployment_configuration_detailsdeployment_type-SINGLE_MODEL_FLEXmodel_configuration_details- The single model configuration details.model_id- The OCID of a model.
infrastructure_configuration_details- The infrastructure configuration details for managed compute cluster type compute target.infrastructure_type-MANAGED_COMPUTE_CLUSTERcompute_target_id- The OCID of a Compute Target.model_deployment_resource_configuration- Resource configuration details for model deploy on managed compute cluster type compute target.resource_request_configuration- Resource request configuration to run workload on managed compute cluster type compute target compute target.gpus- The total number of gpus required to be allocated to the workload.memory_in_gbs- The memory in Gbs required to be allocated to run the workload.ocpus- The ocpus required to be allocated to run the workload.
resource_limit_configuration- Resource limit configuration details for workload on managed compute cluster type compute target.memory_in_gbs- Burstable limit for memory.ocpus- Burstable limit for cpu.
scaling_policy- The scaling policy to apply to workloads on managed compute cluster type compute target.policy_type- The type of scaling policy. Supported values areFIXED_SIZEandAUTOSCALING.instance_count- Returned forpolicy_type=FIXED_SIZE. The number of instances for the workload.is_enabled- Returned forpolicy_type=AUTOSCALING. Whether the autoscaling policy is enabled.auto_scaling_policies- Returned forpolicy_type=AUTOSCALING. The list of autoscaling policy details.auto_scaling_policy_type- The type of autoscaling policy. The supported value isTHRESHOLD.initial_instance_count- For a threshold-based autoscaling policy, this value is the initial number of workload instances to launch immediately after autoscaling is enabled.maximum_instance_count- For a threshold-based autoscaling policy, this value is the maximum number of workload instances allowed to increase to (scale out).minimum_instance_count- For a threshold-based autoscaling policy, this value is the minimum number of workload instances allowed to decrease to (scale in). This should be zero for scale-to-zero.rules- The list of autoscaling policy rules.metric_expression_rule_type- The metric expression for creating the alarm used to trigger autoscaling actions for workload. Supported values areTARGET_PREDEFINED_EXPRESSIONandTARGET_CUSTOM_EXPRESSION.metric_type- Returned forTARGET_PREDEFINED_EXPRESSION. Metric type.scale_configuration- The scaling configuration for the target-based workload rule.target_scaling_configuration_type- The type of target scaling configuration. Supported values areTHRESHOLDandQUERY.threshold- A metric value at which the scaling operation will be triggered.query- Returned forTARGET_CUSTOM_EXPRESSION. The Monitoring Query Language (MQL) expression to evaluate for the alarm.metric_namespace- Returned forTARGET_CUSTOM_EXPRESSION. Namespace to read the metrics from.
scale_in_policy- Workload scaling policy configuration for workloads on managed compute cluster type compute target.cool_down_in_seconds- The duration of time window used to restrict flapping of instance count when the metrics used for scaling keep fluctuating.instance_count_adjustment- The value used for adjusting the count of instances.pending_duration- The period of time that the alarm condition must persist before the alarm state changes.
scale_out_policy- Workload scaling policy configuration for workloads on managed compute cluster type compute target.cool_down_in_seconds- The duration of time window used to restrict flapping of instance count when the metrics used for scaling keep fluctuating.instance_count_adjustment- The value used for adjusting the count of instances.pending_duration- The period of time that the alarm condition must persist before the alarm state changes.
Timeouts
The timeouts block allows you to specify timeouts for certain operations:
* create - (Defaults to 20 minutes), when creating the Model Deployment
* update - (Defaults to 20 minutes), when updating the Model Deployment
* delete - (Defaults to 20 minutes), when destroying the Model Deployment
Import
ModelDeployments can be imported using the id, e.g.
$ terraform import oci_datascience_model_deployment.test_model_deployment "id"