October 23, 2009

Oracle Coherence, memory structure of cache

Imagine that you are working on project involving an in-memory data grid. You have analyzed your requirements and you can see that you need to store 10 million objects in your grid. The next question is how much physical memory do you need to provide such capacity? I recently have been facing the same question, and want to share some findings about how Oracle Coherence (popular in-memory data grid middleware) uses memory.

My approach is very simple. I am creating a cache, then puttung 1 million objects in it, and then analyze heap memory usage.

Local scheme

Lest start with local scheme. While it may not be so often used by itself, a local scheme may serve as a backing map for other schemes.


<local-scheme>
   <
scheme-name>local-schemescheme-name>
<local-scheme>

The memory picture for one million objects is:


(Sun JDK contains a 'lmap' tool that can display memory usage by objects for a live Java process. It is an extremely handy tool for memory profiling.)


On the diagram we can see our domain objects (DomainObjAttrib, DomainObject, DomainObjKey). Also we can see 1M of  com.tangosol.net.cache.LocalCache$Entry objects. They are part of hash table implemented in Coherence. If you put the same objects in java.util.HashMap, the picture will be almost the same, but you will see 1M of java.util.HashMap$Entry instead.

Lets summarize memory consumption:


  • Domain object: 140.6Mb
  • Hash table: 77.3Mb
  • Other: 4.8Mb

In short, we have overhead of about 77 bytes per cache entry. How this space is used by Coherence?

  • 72 bytes – size of LocalCache$Entry (BTW, the size of HashMap$Entry is just 24 bytes because it does not need to store additional data needed to support eviction and statistics collection);
  • 4 bytes – reference from hash table;
  • Because the hash table fill ration is always less than 100%, some references are unused. This produces an additional 1.3 bytes per entry in our case.

Distributed scheme

Now let's switch to a distributed scheme (distributed hash table implementation of Coherence). As in the first case, we will use a simple configuration with a local scheme on the back end.


<distributed-scheme>
<
scheme-name>simple-distributed-schemescheme-name>
  <
backing-map-scheme><local-scheme/>backing-map-scheme>
  <
backup-count>0</backup-count>
<
distributed-scheme>

The memory picture for one million objects is:
Now things look much more complicated. First, there are no domain objects on the heap – that's because in distributed scheme objects are stored in serialized form. Coherence is using com.tangosol.util.Binary
objects to wrap the actual byte array (this way they can be store in the backing map), so we have 2M of wrapper objects (for each key, and each value). The next strange thing is SegmentedHashMap, which is not a part of local scheme, so it should be related to the distributed scheme. In Coherence prior to version 3.5 the data for all partitions was stored in a single backing map instance and to relocate a partition data from backing map, a full scan was required. Version 3.5 introduced a new mechanism - an additional index that remembers a key set for each partition. It improves partition relocation performance drastically but increases memory usage. Finally there is a pack of int[] objects, but this is still a mystery for me.

I think it is worthwhile to compare the 3.5 results with the 3.4 results at this point. With exactly the same configuration, the memory picture for Coherence 3.4 is:
Two things to note:

  1. Size of LocalCache$Entry in 3.4 is 64 bytes instead of 72 in 3.5
  2. There is no SegmentedHashMap related staff in 3.4
Let's summarize memory usage.

Coherence 3.4:

  • Domain objects: 227.3Mb
  • Overhead: 116.4Mb
  • Other 11Mb

Coherence 3.5

  • Domain objects: 227.3Mb
  • Overhead: 167Mb
  • Other 8Mb

Conclusion:

  • In the distributed scheme we have 2 additional Binary objects per entry (+48 bytes)
  • Also 3.5 adds additional data structures, which consume about 50 bytes per object
Data structures added in 3.5 are a trade-off to provide better performance in partition-related operations. They have their merits for sure, but is still would be preferable to have an option for turning them off.

External scheme

A local scheme is not the only option to use as a backing map. Coherence also has a so-called external scheme, which can store data off the heap using the BinaryStore plugin (plugins for NIO memory storage and several disk backend storage models are supported out of the box). Let's analyze the distributed scheme with an external scheme (nio memory) as the back end.


<distributed-scheme>
<
scheme-name>external-distributed-schemescheme-name>
<
backing-map-scheme>
<
external-scheme>
<
nio-memory-manager/>
external-scheme>
backing-map-scheme>
<
backup-count>0backup-count>
distributed-scheme>

The memory picture for one million objects is:


Please keep in mind that we are analyzing only the Java heap and in this case some amount of data is stored out of the heap using direct memory buffers. You may expect that all your business data will be stored out of the heap, but still we can see plenty of binary objects in memory. These binaries are keys. Yes, while values are stored entirely in external storage, keys are stored in the heap (actually they are stored in both the heap and external storage). Coherence 3.4 shows a similar picture, so I will just omit it.


In this case the diagram does not show the full memory picture (only the heap) so you should not compare it directly to the previous cases.

Coherence 3.5:

  • Key duplicates: 32Mb
  • Overhead: 66Mb
  • Other: 18Mb

Conclusion:

  • While the external scheme is using non-heap memory for data storage, it istill consumes enough of the heap for keys and other structures.

Replicated cache

Cache configuration:

<replicated-scheme>
<
scheme-name>simple-replicated-schemescheme-name>
<
backing-map-scheme>
<
local-scheme/>
<backing-map-scheme>
<replicated-scheme/>

The memory picture for one million objects is:
I rarely use a replicated scheme, so the number of additional data structures is a little surprising. As you can see, the replicated scheme is storing plain Java objects (unlike the distributed scheme, which is always operating with serialized blobs).

Memory summary is (for Coherence 3.5)

  • Domain objects: 140.6Mb
  • Overhead: 263.3Mb
  • Other: 7.6Mb



Conclusion

Lets put all results in a simple table with capacity formulae:


Here N is a number of entries, K is a size of keyset.

Stay tuned
Hope, this will help you to more presizely estimate memory consumptions of your Coherence cluster. In next blog posting I will describe the structure and overheads of secondary indexes in Oracle Coherence. Stay tuned!



Labels: , , , ,

2 Comments:

Blogger Cameron said...

Nice article. :-)

If you use the "partitioned" option, then the separate index of keys per partition is delegated to the PartitionAwareBackingMap. That means with off-heap, it shouldn't use any heap memory for the key index.

Peace,

Cameron Purdy | Oracle Coherence

October 29, 2009 12:55 PM  
Blogger Alexey Ragozin said...

Hi Cameron,

I have missed <partitioned> option among 3.5 features. It is very interesting. Storing keys in heap is a serious limitation for large scale off-heap storage.
I tried this option, but results were unexpected.
I think it is better to discuss them on Coherence forum, please see my post.

Anyway I'm going to continue measuring different storage schemes and post report. Looks like things like eviction also have great impact on heap consumption and I cannot ignore them.

Regards,
Alexey

October 31, 2009 1:53 AM  

Post a Comment

Subscribe to Post Comments [Atom]

Links to this post:

Create a Link

<< Home