ECS task metadata API `stats` endpoint undercounts bytes transmitted/received on EC2 with `awsvpc` network mode

I used [this CDK stack](https://github.com/isker/ecs_exporter/blob/6ac05f5c991c6015ab9754b48f8bb9afeb8e9441/tools/cdk/main.go) to launch two ECS tasks, one running on Fargate and one on EC2 (with `awsvpc` network mode, to best mirror the Fargate task).  Each task is running an alpine container that I could use to ECS Exec into the task and get a shell.

In each task, I then installed [iperf3](https://github.com/esnet/iperf) and [jq](https://jqlang.org/) using `apk add iperf3 jq`. I then used iperf3 to send one gigabyte of TCP packets from one task to the other, and vice versa: the receiving task runs `iperf3 -s` to start a server, and the sending task runs `iperf3 -c $OTHER_TASK_IPV4_ADDR -n 1G` to send 1GB to the receiving task.

In each task, I then printed the alpine container's network stats using the [ECS task metadata API](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-metadata-endpoint-v4.html): `wget -q -O- ${ECS_CONTAINER_METADATA_URI_V4}/stats | jq .networks`.

The Fargate task reports reasonable numbers, namely a bit over 1GB transmitted and received (we of course have done other things in these tasks beyond just invoke `iperf3`):

```
# wget -q -O- ${ECS_CONTAINER_METADATA_URI_V4}/stats | jq .networks
{
  "eth1": {
    "rx_bytes": 1096752652,
    "rx_packets": 159456,
    "rx_errors": 0,
    "rx_dropped": 0,
    "tx_bytes": 1083080110,
    "tx_packets": 139570,
    "tx_errors": 0,
    "tx_dropped": 0
  }
}
```

The EC2 task does not report expected numbers:
```
# wget -q -O- ${ECS_CONTAINER_METADATA_URI_V4}/stats | jq .networks
{
  "eth0": {
    "rx_bytes": 361389142,
    "rx_packets": 46892,
    "rx_errors": 0,
    "rx_dropped": 0,
    "tx_bytes": 361363480,
    "tx_packets": 51774,
    "tx_errors": 0,
    "tx_dropped": 0
  }
}
```

That is ~361MB transmitted and received, a substantial undercount.

The discrepancy reported by this synthetic test aligns with real observed conditions we have seen reported by the same API endpoints for services we are running in ECS. We have been comparing Fargate and EC2 to determine which is better to run our workloads. We rely on these container stats as consumed by the [Prometheus ECS exporter](https://github.com/prometheus-community/ecs_exporter) to monitor our services, so the fact that the EC2 data is meaningfully incorrect makes this a challenge. Please fix the EC2 network stats.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ECS task metadata API `stats` endpoint undercounts bytes transmitted/received on EC2 with `awsvpc` network mode #4618

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ECS task metadata API stats endpoint undercounts bytes transmitted/received on EC2 with awsvpc network mode #4618

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

ECS task metadata API `stats` endpoint undercounts bytes transmitted/received on EC2 with `awsvpc` network mode #4618