Binary Calculator Project - What is the footprint of my GigaSpaces Entries?
In the initial stages of every Data Grid project it is always essential to get good estimates of memory requirements. How much memory will my domain objects converted to Entries consume and what will be the indexing overhead? The answer to this question defines what JVM heap size to choose and how many of JVMs will be needed to store the intended dataset - or determine how much data can be stored within a given hardware footprint.
For GigaSpaces, it is hard to provide accurate theoretical estimates of your Entry size. Here are the reasons:
First, collect the basic statistics on the efficiency of storage of the individual entities:
Also, we are planning to build a lightweight plugin system to supply a custom EntrySource,
for example, your own random entry generator or JDBC or Hibernate data source. Consequently, performing full fledged capacity experiments loading real data from the database will be much easier.
We hope that this tool will be quite useful for GigaSpaces implementors in the field.
For GigaSpaces, it is hard to provide accurate theoretical estimates of your Entry size. Here are the reasons:
- Space doesn't store entries as heap objects - they are stored decomposed to fields
- String uid is generated and stored along with each entry
- Index overhead is dependent on type of index (ordered or unordered ) and on a field dataset cardinality.
First, collect the basic statistics on the efficiency of storage of the individual entities:
- Connect to remote space
- Get the batch of tested entries from some entry source
- Write a batch to remote space
- Perform remote garbage collecting
- Measure memory usage
- Repeat step 2
Implementing this idea, we have built an initial version of the Binary Calculator, which can be used as a toolkit for measuring arbitrary entry footprint. It has a very simple GUI that shows progress of the memory experiment.


We are planning to turn this simple toolkit into a much more powerful tool, which will generate entries on the fly, based on user-supplied meta data. This way, the user can specify an Entry Description as a simple table in a GUI:
BinaryCalculator will generate Entries at runtime based on this description, populate it with random data, perform memory experiments and show estimated entry size.| Type | Indexed | Number of fields | Avg Length |
|---|---|---|---|
| Long | Yes | 1 | N/A |
| String | No | 3 | 1000 |
| String | Yes | 2 | 5000 |
| Integer | No | 3 | N/A |
Also, we are planning to build a lightweight plugin system to supply a custom EntrySource,
for example, your own random entry generator or JDBC or Hibernate data source. Consequently, performing full fledged capacity experiments loading real data from the database will be much easier.
We hope that this tool will be quite useful for GigaSpaces implementors in the field.
Labels: binary calculator, gigaspaces, grid consulting, grid dynamics, testing, ~Eugene Steinberg

2 Comments:
Eugene,
Very nice initiative.
Few comments:
1) Don't carry GigaSpaces' jars with you, rather ask the user to point out to GigaSpaces install dir. It will allow you to be loosely coupled, and to run against different GigaSpaces versions.
2) What would be good is to have the size implementation at the space process itself. I.e. add your processing bean / filter statisics to collect this data (the user configures which classes to monitor), and then the UI is used to only display the results.
3) Add best practices and suggestions. I.e. help the user implement different size reduction mechanisms. e.g. Externaliable support. See http://www.gigaspaces.com/wiki/display/OLH/Externalizable+Support for some details.
-Guy
Guy,
Thank you for interest in this project and for valuable comments. Let me answer them one by one
1) The first version was build under ant as early prototype, next version is already moved under maven 2, so this issue is addressed
2) We thought about this approach and decided not to go that way, as it adds additional complexity and was unclear to us what additional value does it bring in compared to simple approach to write a batch, gc and get memory stats through JMX. If you have ideas on that, please share.
More, current architecture is decoupled from IMDG implementation and can be easily adopted to measure capacity of other IMDG products
3) Absolutely. Besides, Managing entry sizes is the scope of another OpenSpaces project we run, PackRat.
We have a plan to bring in the PackRat demo to BinaryCalculator to visually present PackRat value.
Post a Comment
Subscribe to Post Comments [Atom]
Links to this post:
Create a Link
<< Home