使用 Terraform 配置告警策略

最近更新时间:2024-05-29 17:15:22

我的收藏

前提条件

安装 Terraform

腾讯云 Cloud Shell 是一款帮助您运维的免费产品,预装了 Terraform 相关组件,并配置好腾讯云临时凭证(credentials)。
如果您不使用 Cloud Shell,关于安装 Terraform 的具体操作,请参见在 本地安装和配置 Terraform
说明:
Terraform 安装版本不得低于 v1.6.3,您可通过 terraform --version 命令查看安装的 Terraform 版本。
若您是通过云端管理,可前往 云端安装,云端安装获取访问密钥与本地相同。

配置腾讯云账号信息

在首次使用 Terraform 之前,请前往 云 API 密钥 申请安全凭证 SecretId 和 SecretKey。如果已有可使用的安全凭证,则跳过该步骤。
1. 登录 访问管理控制台,在左侧导航栏,选择访问密钥 > API 密钥管理
2. 在 API 密钥管理页面,单击新建密钥,即可以创建一对 SecretId / SecretKey。

配置腾讯云账号信息,有以下两种方式:

静态凭证鉴权
在用户目录下创建 provider.tf 文件,输入如下内容。其中my-secret-idmy-secret-key 需替换为密钥 SecretId 和 SecretKey。
provider "tencentcloud" {
secret_id = "my-secret-id"
secret_key = "my-secret-key"
}
环境变量鉴权
配置电脑环境变量或云端环境变量,请执行以下命令。其中YOUR_SECRET_IDYOUR_SECRET_KEY 需替换为密钥 SecretId 和 SecretKey。
export TENCENTCLOUD_SECRET_ID=YOUR_SECRET_ID
export TENCENTCLOUD_SECRET_KEY=YOUR_SECRET_KEY

增加 Prometheus 实例的 Terraform 配置告警策略

1. 创建一个新的 Terraform 配置文件。创建一个新的目录,并在该目录下创建一个 main.tf 文件,配置如下信息:
# 指定 provider 配置信息

terraform { required_providers { tencentcloud = { source = "tencentcloudstack/tencentcloud" } } }

# promethues 配置告警 (云监控侧配置)

resource "tencentcloud_monitor_tmp_alert_rule" "foo" {
duration = "2m"
expr = "avg by (instance) (mysql_global_status_threads_connected) / avg by (instance) (mysql_global_variables_max_connections) > 0.8"
instance_id = tencentcloud_monitor_tmp_instance.foo.id
receivers = ["notice-zmjsavnp"] # 此处可通过云监控的tf创建通知模板
rule_name = "MySQL 连接数过多--tf-云监控test"
rule_state = 2
type = "MySQL/MySQL 连接数过多"

annotations {
key = "description"
value = "MySQL 连接数过多, 实例: {{$labels.instance}},当前值: {{ $value | humanizePercentage }}。"
}
annotations {
key = "summary"
value = "MySQL 连接数过多(>80%)"
}

labels {
key = "severity"
value = "warning"
}
}
说明:
配置的字段如下:
duration:规则持续时间。
expr:报警表达式。
instance_id:实例 ID。
receivers:报警接收人列表。
rule_name:报警规则名称。
rule_state:报警规则状态。
type:报警规则类型。
您若需要获取更多详细参数请参考 腾讯云集成接入 github,也可以通过 Terraform 腾讯云提供商 了解更多。
2. 初始化 Terraform 运行环境,执行命令如下:
terraform init
预期输出信息:

Initializing the backend...

Initializing provider plugins...
- Reusing previous version of tencentcloudstack/tencentcloud from the dependency lock file
- Using previously-installed tencentcloudstack/tencentcloud v1.81.32

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
3. 生成资源规划,执行命令如下:
terraform plan
预期输出信息:
tencentcloud_monitor_tmp_instance.foo: Refreshing state... [id=prom-jh0zntj2]
tencentcloud_monitor_tmp_exporter_integration.tmpExporterIntegration: Refreshing state... [id=balck-box-tf-test#prom-jh0zntj2#1##blackbox-exporter]
tencentcloud_monitor_tmp_tke_cluster_agent.foo: Refreshing state... [id=prom-jh0zntj2#cls-1uary7z2#eks]
tencentcloud_monitor_tmp_exporter_integration.tmpExporterMointor: Refreshing state... [id=tf-test-cjtest#prom-jh0zntj2#1##qcloud-exporter]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
+ create

Terraform will perform the following actions:

# tencentcloud_monitor_tmp_alert_rule.foo will be created
+ resource "tencentcloud_monitor_tmp_alert_rule" "foo" {
+ duration = "2m"
+ expr = "avg by (instance) (mysql_global_status_threads_connected) / avg by (instance) (mysql_global_variables_max_connections) > 0.8"
+ id = (known after apply)
+ instance_id = "prom-jh0zntj2"
+ receivers = [
+ "notice-zmjsavnp",
]
+ rule_name = "MySQL 连接数过多--tf-云监控test"
+ rule_state = 2
+ type = "MySQL/MySQL 连接数过多"

+ annotations {
+ key = "description"
+ value = "MySQL 连接数过多, 实例: {{$labels.instance}},当前值: {{ $value | humanizePercentage }}。"
}
+ annotations {
+ key = "summary"
+ value = "MySQL 连接数过多(>80%)"
}

+ labels {
+ key = "severity"
+ value = "warning"
}
}

Plan: 1 to add, 0 to change, 0 to destroy.

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly these actions if
you run "terraform apply" now.
4. 创建集成中心组件集成,执行命令如下:
terraform apply
预期输出信息:
tencentcloud_monitor_tmp_instance.foo: Refreshing state... [id=prom-jh0zntj2]
tencentcloud_monitor_tmp_exporter_integration.tmpExporterIntegration: Refreshing state... [id=balck-box-tf-test#prom-jh0zntj2#1##blackbox-exporter]
tencentcloud_monitor_tmp_exporter_integration.tmpExporterMointor: Refreshing state... [id=tf-test-cjtest#prom-jh0zntj2#1##qcloud-exporter]
tencentcloud_monitor_tmp_tke_cluster_agent.foo: Refreshing state... [id=prom-jh0zntj2#cls-1uary7z2#eks]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
+ create

Terraform will perform the following actions:
实例内容。。。
Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
Terraform will perform the actions described above.
Only 'yes' will be accepted to approve.

Enter a value: yes

tencentcloud_monitor_tmp_alert_rule.foo: Creating...
tencentcloud_monitor_tmp_alert_rule.foo: Creation complete after 2s [id=prom-jh0zntj2#arule-n76kqshg]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

查看 Prometheus 实例状态

登录 腾讯云可观测平台,在左侧导航栏,选择 prometheus 监控 ,可在 prometheus 实例列表中看到存在的实例。

删除 Prometheus 实例集成中心组件集成

销毁资源,执行命令如下:
terraform destroy
预期输出信息:
tencentcloud_monitor_tmp_instance.foo: Refreshing state... [id=prom-8dyb6iny]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
- destroy

Terraform will perform the following actions:
实例内容。。。

Do you really want to destroy all resources?
Terraform will destroy all your managed infrastructure, as shown above.
There is no undo. Only 'yes' will be accepted to confirm.

Enter a value: yes

tencentcloud_monitor_tmp_instance.foo: Destroying... [id=prom-8dyb6iny]
tencentcloud_monitor_tmp_instance.foo: Destruction complete after 6s

Destroy complete! Resources: 1 destroyed.
注意:
当出现 Destroy complete! Resources: (存在的实例个数) destroyed. 表示您已删除该实例。