Oracle Coherence memory usage, indexes
In previous posts we discussed a memory consumption in Oracle Coherence. We loaded 1M of DomainObj into cached and looked into memory dumps aquired with jmap utility. Now, we will talk about memory overheads caused by another powerful feature of Coherence - indexes.
Coherence support indexes. Indexes are another major memory consumer, and if you going to use indexes you should plan memory usage for them. We will use a distributed scheme with a local scheme on the back end in this case. An index is created for each of the DomainObjAttrib fields, and the number of unique values of this field should be about 1M/16 (each value of indexed field should match 16 objects).
First let's look at Coherence 3.5 :

Internally the index is constructed from 2 maps:
In our case we have 62,500 unique values of indexed fields.
Again, we can see that Coherence 3.5 uses additional data structures to make partition operations more efficient, but, in the case of index, removing duplicated attribute values from the heap makes 3.5 more memory efficient.
Let's find a formula for index size (N number of entries, M number of unique indexed field values)
Coherence 3.4:
In our case the estimated index size for 3.4 should be 100 * 6250 + 56 * 1M + 24000000 = 81.4Mb. The formula assumes a 100% fill ratio of hash tables; it produces the lower bound for index size. The actual size will always be little bigger (unless the hash table fill ration is greater than 100%, but it should never happen).
Coherence 3.5:
Summary
Let's put the results of our experiments in single table. N is a number of entries in the cache, M is a number of distinct values of indexed field.
Coherence 3.4: >100 * M + 56 * N + N * A
Coherence 3.5: >100 * M + 56 * N + M * A
Coherence support indexes. Indexes are another major memory consumer, and if you going to use indexes you should plan memory usage for them. We will use a distributed scheme with a local scheme on the back end in this case. An index is created for each of the DomainObjAttrib fields, and the number of unique values of this field should be about 1M/16 (each value of indexed field should match 16 objects).
<distributed-scheme>
<scheme-name>simple-distributed-scheme</scheme-name>
<backing-map-scheme>
<local-scheme/>
</backing-map-scheme>
<backup-count>0</backup-count>
</distributed-scheme>
<scheme-name>simple-distributed-scheme</scheme-name>
<backing-map-scheme>
<local-scheme/>
</backing-map-scheme>
<backup-count>0</backup-count>
</distributed-scheme>
First let's look at Coherence 3.5 :

- row 6 (com.tangosol.util.SegmentedHashMap$Entry) - 1M of instances related to main storage and 1062.5k to the index;
- row 7 ([Lcom.tangosol.util.SegmentedHashMap$Entry;) – 19Mb is related to main storage, the rest to the index;
- row 12 – 7.2 Mb is related to main storage, the rest to the index.
Internally the index is constructed from 2 maps:
- Forward map: maps indexed attribute value to a set of cache entries (I suspect that reference to the key is actually stored, but I am not sure);
- Reverse map: maps cache entry (or its key) to the value of the indexed attribute for this entry.
In our case we have 62,500 unique values of indexed fields.
Now let's compare the memory picture with 3.4:

Rows related to the index are highlighted by yellow, also:
- row 6 ([Lcom.tangosol.util.SafeHashMap$Entry;) – 7.2Mb related to main storage and the rest to the index.
Summary for Coherence 3.4:
| ![]() |
Summary for Coherence 3.5:
| ![]() |
Again, we can see that Coherence 3.5 uses additional data structures to make partition operations more efficient, but, in the case of index, removing duplicated attribute values from the heap makes 3.5 more memory efficient.
Let's find a formula for index size (N number of entries, M number of unique indexed field values)
Coherence 3.4:
- Forward map
- 4 * M – reference in hash table
- 24 * M – SafeHashMap$Entry
- 16 * M – SafeHashSet
- 56 * M – SafeHashMap (backend for SafeHashSet)
- 4 * N – reference in hash table of hash set
- 24 * N – SafeHashMap$Entry
- Reverse map
- 4 * N – references in hash table
- 24 * N – SafeHashMap$Entry
- <Size of attribute value> * N – stored attribute value
- Total index size - 100 * M + 56 * N + N * <Size of attribute value>
In our case the estimated index size for 3.4 should be 100 * 6250 + 56 * 1M + 24000000 = 81.4Mb. The formula assumes a 100% fill ratio of hash tables; it produces the lower bound for index size. The actual size will always be little bigger (unless the hash table fill ration is greater than 100%, but it should never happen).
Coherence 3.5:
- Forward map – 100 * M (bytes) + size of value objects.
- 4 * M – reference in hash table
- 24 * M – SegmentedHashMap$Entry
- 16 * M – SafeHashSet
- 56 * M – SafeHashMap (backend for SafeHashSet)
- 4 * N – reference in hash table of hash set
- 24 * N –SafeHashMap$Entry
- Reverse map – 100 * M (bytes) + size of value objects.
- 4 * N – references in hash table
- 24 * N – SegmentedHashMap$Entry
- <Size of attribute value> * M – stored attribute value
- Total index size - 100 * M + 56 * N + M * <Size of attribute value>
Summary
Let's put the results of our experiments in single table. N is a number of entries in the cache, M is a number of distinct values of indexed field.
Coherence 3.4: >100 * M + 56 * N + N * A
Coherence 3.5: >100 * M + 56 * N + M * A
N is a number of entrees, M - is a number of distinct entries, A is a size of indexed attribute value.
Because the fill ratio of the hash table will always be below 100%, in practice, the cache will always consume slightly more memory than the value calculated by the formulas above.
Conclusion
Working with Coherence caches, it is always important to remember that both main data and indexes are stored in main memory. Here I tried to give a simple tool to get a good estimation of memory consumption.
Hope you will find it useful.
Labels: coherence, data grid, memory, ~Alexey Ragozin



1 Comments:
how do you calculate the size of attribute? (to be 240)
Post a Comment
Subscribe to Post Comments [Atom]
Links to this post:
Create a Link
<< Home