To support aggregations and other operations that require looking up field values on a per-document basis, Elasticsearch uses a data structure called doc values. Term-based field types such as keyword store their doc values using an ordinal mapping for a more compact representation. This mapping works by assigning each term an incremental integer or ordinal based on its lexicographic order. The field’s doc values store only the ordinals for each document instead of the original terms, with a separate lookup structure to convert between ordinals and terms.When used during aggregations, ordinals can greatly improve performance. As an example, the terms aggregation relies only on ordinals to collect documents into buckets at the shard-level, then converts the ordinals back to their original term values when combining results across shards.Each index segment defines its own ordinal mapping, but aggregations collect data across an entire shard. So to be able to use ordinals for shard-level operations like aggregations, Elasticsearch creates a unified mapping called global ordinals. The global ordinal mapping is built on top of segment ordinals, and works by maintaining a map from global ordinal to the local ordinal for each segment.Global ordinals are used if a search contains any of the following components:Certain bucket aggregations on keyword, ip, and flattened fields. This includes terms aggregations as mentioned above, as well as composite, diversified_sampler, and significant_terms.Bucket aggregations on text fields that require fielddata to be enabled.Operations on parent and child documents from a join field, including has_child queries and parent aggregations.The global ordinal mapping uses heap memory as part of the field data cache. Aggregations on high cardinality fields can use a lot of memory and trigger the field data circuit breaker.
indices.fielddata.cache.size (Static) The max size of the field data cache, eg 38% of node heap space, or an absolute value, eg 12GB. Defaults to unbounded. If you choose to set it, it should be smaller than Field data circuit breaker limit.
indices.breaker.fielddata.limit (Dynamic) Limit for fielddata breaker. Defaults to 40% of JVM heap.
DocValues 存储格式及压缩实现https://mp.weixin.qq.com/s/kP5Pza2xRtBlcJs5WYvgjADisk-Based Field Data a.k.a. Doc Valueshttps://www.elastic.co/cn/blog/disk-based-field-data-a-k-a-doc-valuesDoc Values and Fielddatahttps://www.elastic.co/guide/en/elasticsearch/guide/current/docvalues-and-fielddata.html