11. Metrics Backend: Atlas

Atlas was developed by Netflix to manage dimensional time-series data for near real-time operational insight. Atlas features in-memory data storage, letting it gather and report large numbers of metrics quickly.

Atlas captures operational intelligence. Whereas business intelligence is data gathered for analyzing trends over time, operational intelligence provides a picture of what is currently happening within a system.

Spring Cloud provides a spring-cloud-starter-netflix-atlas that has all the dependencies you need. Then you can annotate your Spring Boot application with @EnableAtlas and provide a location for your running Atlas server by setting the netflix.atlas.uri property.

11.1 Global Tags

Spring Cloud lets you add tags to every metric sent to the Atlas backend. Global tags can be used to separate metrics by application name, environment, region, and so on.

Each bean implementing AtlasTagProvider contributes to the global tag list, as shown in the following example:

@Bean
AtlasTagProvider atlasCommonTags(
    @Value("${spring.application.name}") String appName) {
  return () -> Collections.singletonMap("app", appName);
}

11.1.1 Using Atlas

To bootstrap an in-memory standalone Atlas instance, use the following commands:

$ curl -LO https://github.com/Netflix/atlas/releases/download/v1.4.2/atlas-1.4.2-standalone.jar
$ java -jar atlas-1.4.2-standalone.jar
[Tip]Tip

An Atlas standalone node running on an r3.2xlarge (61GB RAM) can handle roughly 2 million metrics per minute for a given six-hour window.

Once the application is running and you have collected a handful of metrics, you can verify that your setup is correct by listing tags on the Atlas server, as shown in the following example:

$ curl http://ATLAS/api/v1/tags
[Tip]Tip

After running several requests against your service, you can gather some basic information on the request latency of every request by pasting the following URL in your browser: http://ATLAS/api/v1/graph?q=name,rest,:eq,:avg

The Atlas wiki contains a compilation of sample queries for various scenarios.

See the alerting philosophy and docs on using double exponential smoothing to generate dynamic alert thresholds.