August 19, 2010

Coherence Production Checklist describes TTL expiry attack

Recently I've found a curious contradiction between advice given by software experts and a real hardware. Let's have a look to Coherence Production Checklist and TTL tuning for production. Does it make sense? At the first glance, yes it does - you secure production cluster from multicast IP clashes by setting ttl to 1 or even 0.
But there is some objections here. Have a look to Catalyst 6500/6000 Switch High CPU Utilization and TTL Expiry . In two words setting ttl<=1 lead to high load of switch/router device, because router prefers to use hardware matrix to route packets. Instead of this, expired packets should be processed by CPU to issue ICMP Type 11, Code 0 - "Time Exceeded", even such replies is disabled. As result router goes crazy, and you have a really odd network behavior.
That's what you should do instead: implement multicast assignment policy in your datacenter; disable or properly control multicast routing - just kill mcast packets at the network segments boundaries by the routing table; and of course do not follow software experts advice when you deal with the real hardware.

Labels:

July 16, 2010

Coherence, ReflectionPofSerializer now supports POF extractor

A year ago I have implemented and open sourced ReflectionPofSerializer. This class has saved me from implementing thousands of lines of boring serialization code for various Filters, EntryProcessors, Aggregators , Invocables and other mobile objects in Coherence.

Recently I was asked about POF extrator support in ReflectionPofSerializer. My answer was no, it doesn't support extraction from POF directly. But it turns out what ReflectionPofSerializer can be easily extended to support it. A bit of coding and woala, let me introduce ReflectionPofExtrator.

ReflectionPofExtractor is an extractor implementation which can be used in Coherence for filtering and indexing. It’s usage is very similar to standard ReflectionExtractor (except it accessing attributes by field names not by calling methods). It can extract a field of stored object directly from POF binary, also it can parse simple paths to extracted attributes (e.g. path "instrument.exchange" will make extractor first extract filed "instrument" then extract field "exchange" from instrument and return it). If ReflectionPofSerializer have been used to serialize object, ReflectionPofExtractor can read value directly from binary POF format (what is it, without deserialization cost). If another type of serializer was used, object will be deserialized and field will be extracted via reflection.

It did some micro benchmarking, results are below:

For cache filled with 500k objects, a full scan filter is executed, below is filter execution time for different size of objects:

Binary object size: 180 bytes

  • ReflectionExtractor execution time: 20s
  • ReflectionPofExtractor execution time: 14s


Binary object size: 604 bytes

  • ReflectionExtractor execution time: 23s
  • ReflectionPofExtractor execution time: 24s


Binary object size: 1804 bytes

  • ReflectionExtractor execution time: 150s
  • ReflectionPofExtractor execution time: 117s


This result proves 2 things:

  1. Never relay on micro benchmark – do proper performance testing of your application.
  2. PofExtractor is not always a performance win, use advice above.

But anyway now you can estimate effect of using PofExtractors before writing boiler plate of custom PofSerializer/PofNavigator.


Labels: , ,

July 2, 2010

War story: optimizing one Hadoop job

Recently, we faced a problem of categorization of shopping items : applying a complex set of regular expressions to product description, figure out category of product and extract category-dependent fields. Since original data is a result of web crawling, majority of records do not change since previous crawl while 1% of records do actually change. This led to idea of incremental processing when we initially select altered records and then process only those that had changed.


Algorithm


Original algorithm was to run a set of map-reduce jobs:
  1. Diff”. Compare older data and new data to fetch updated records. Create a 'diff' that contains input for updated / new records and list of deleted keys.
  2. Extract”. Convert diff to hadoop MapFile format, along with fetching fields (desired processing) for updated & new records. Deleted records are mapped to deletion marker.
  3. New”. Rip out new records to a separate output which along with output of step 4 forms final result.
  4. Merge”. Go through previous processing results, using diff as a dictionary, replacing updated records and removing deleted ones.



Optimization


Initial implementation performed really bad, it worked even longer than just processing of all fields on a single machine. So, we started optimization which included following steps:

1. Use custom WritableComparable.

Initially, we used CSV text file as intermediate format. Now, we decided to implement custom Writable class (fig. 1) for values and WritableComparable (fig. 2) class for keys of intermediate data.

figure 1: RecordKeyWritable.java



figure 2: RecordWritable.java

Surprisingly, this optimization produced a little impact, about a 1% of total runtime, mostly by eliminating time for string splitting.


2. Adding combiner

Next, we decided to implement efficient combiner that will combine all records of the same product for both Diff and Merge stages. See figure 3 for example of combiner for Diff stage.


figure 3: RecordCombiner.java

After enabling combiner for Diff we won 15% of time, and Merge time decreased by 20%. Data transfer volume decreased significantly (up to 2x).

3. Upgrade Hadoop 0.18.3 to 0.20.2.

We did not expected that, but just this cut total processing time by 30%, mainly because shuffle performance increased dramatically. At the time of our tests, Hadoop 0.20 could not be deployed on Amazon EC2 very easy, so we made a couple of fixes in scripts. Anyway this should be fixed in Hadoop repository by now.

4. Enable GZIP compression for input, output and intermediate data.

We've tried to compress input data, providing input format compression, output compression and intermediate (sequence file) format compression. However, we could not notice impact of any of these optimizations

Output:

SequenceFileOutputFormat.setCompressOutput(conf, true);
SequenceFileOutputFormat.setOutputCompressorClass(conf, GzipCodec.class);

or

FileOutputFormat.setOutputCompressorClass(conf, GzipCodec.class);
FileOutputFormat.setCompressOutput(conf, true);

figure 4: Snippets enabling GZIP compression for M/R output

Gzipped input is understood by fileInputFormat & SequenceFileInputFormat transparently


5. Enable LZO compression for intermediate data.

LZO compression achieves medium compression rate at a pretty good speed of compression and outstanding speed of decompresion. This makes LZO codec ideal for compression of intermediate data (Mapper's output) so it allows to highly reduce traffic between mappers and reducers at a low computational price. Additionally, it supports independent block decompression. We tried LZOcodec from hadoop-0.18 pack and it allowed us to save 5% of total processing time. Unfortunately, LZO support was removed from hadoop-0.20 due to license restriction. There're several alternative projects in progress, but at the time of our tests there was no out-of-the-box solution.


conf.setCompressMapOutput(true);
conf.setMapOutputCompressorClass(LzoCodec.class);

figure 5: snippets enabling LZO compression for traffic between mappers & reducers


6. Tune number of mappers and reducers

After a set of experiments we found out that manual setting number of mappers and reducers can speed up M/R substantially. First, check that map tasks are small enough so all nodes are equally loaded and they are not too small to suffer from initialization overhead (map task execution time lower than 20 seconds is known to be bad).

Further optimization is achieved by tuning ratio of number of mappers to number of reducers. Just by setting appropriate values via conf.setNumMapTasks() and conf.setNumReduceTasks() we were able to cut total processing time by 30%! Unfortunately, optimal parameter values strongly depends on data size and distribution. It appears that taking 3 * input size / block_size mappers and 1 reducer per slave node is a good point to start optimizing.


7. Map side join

We have also tried map-side join for Merge. In order to do this we have to get data for “Merge” step sorted and partitioned the same way. This allows us to perform merge join – our mapper reads two input simultaneously matching records of the same key. This also required extra-map reduce to merge old results (as we did not merged results of update&delete with new before). Also we could not avoid sort & shuffle since we needed IdentityReducer to keep processing results in single file. So finally we could not achieve significant performance increase (5% of Merge time savings) at a price of much higher complexity.

8. Put diff map to local disk instead of HDFS

The next step was to put diff-file to each merging node in local file system, instead of sharing it by hdfs. This saved us 10% more of merge time. Anyway map-file lookup was still too slow taking 95% of merging time so we looked for keeping it as in-memory distributed hash map (DHT).

9. Replace MapFile with in-memory cache

Instead of hadoop map file (which is intended to keep index in memory) we switched to use of in-memory hash map keeping both index and values in memory. Result was outstanding: Merge step became 70% faster and New step got 60% faster. Since our diff was small enough to fit in memory of slave node, we used our own simple HT:
  • - it serves as a http-service
  • - on start it reads the diff file and populates hash table with it
  • - the service is terminted when Merge join is over

If diff size gets bigger so it could not fit in memory of a single slave node, we will have to look up for more complicated solution. Probably one of available in-memory hash table implementation would fit, so this needs further investigation..

Conclusion

If you would face a problem of optimizing of map-reduce task here is our suggestions:

  1. Start with tuning map-reduce tasks number.
  2. Try latest hadoop version
  3. Avoid map-file as a dictionary data : try to put it in memory, reconsider algorithm, look for third-party solution.
  4. Add combiners and switch to custom record format.
  5. Try replace reduce-side join with map-side one

P.S.

This table uncovers some details of optimizations tried and level of effort for each of those:


Improvement

Effect

Effort

Custom record format

Total time reduced by 1%

2 hours of coding.

Adding combiners

This, along with Custom record
formal>
, reduced total time by 17%. Diff time reduced by 15% and Merge time reduced by 20%.

2 hours of coding.

Update hadoop

Total time reduced by 30% by similar speed up of all steps

Had to fix deployment scripts for
EC2, this should be already fixed in codebase.

GZip compression of input/output

No visible effect

Two lines in JobDriver
code.

LZO compression of intermediate data

Total processing time reduces by 5%

Two lines of code in JobDriver in Hadoop 0.18.
Significant effort (2-3 days) of adding LZO-like compression to Hadoop 0.20 (until the compression in incorporated in Hadoop distribution)

Map-reduce task tuning

Total processing reduced by 30%

A week of experiments. Note that this
should be revisited when data size or distribution changes (and we know it
always grows).

Map side join

Negative: total processing time increased.

A week of coding & experiments.

Put dictionary data in local file system

10% saved on Merge,
total time reduces by 7%

Couple of lines in running scripts.

Put dictionary data in memory

70% Merge speed up, 60%
New speed up, total time reduced by 50%.

2 days of coding for simple in-memory hash-table. In case when dictionary data does not fit in memory, reconsideration is
required. Custom in-memory DHT may take a bliss of time, commercial promises
to be pricey, open solutions should be investigated..

Labels: ,

June 30, 2010

Coherence. Read through and operation bundling.

If you are using read-write-backing-map, you should know what CacheLoader interface has methods load(...) and loadAll(…) for bulk loading. And if your cache loader is loading data from DB, your implementation of loadAll(…) is probably designed to load all objects via single DB query (cause it is usually order of magnitude more efficient than issue DB query per requested key). Lets assume your read through cache is working just fine, and at some point you decided to implement cache preloading – a popular pattern. To implement preloading you have found some way to collect keys (e.g. via query to DB) and wanted to use getAll(…) call to load data into cache. Everything looks ok, unit tests have passed, but on real data it takes forever to preload cache! Let’s investigate this …

Problem quickly reveals itself, Coherence is not calling loadAll(…), and instead it calls load(…) for each key in single thread. Culprit here is partitioned (distributed) cache. Your call to loadAll(…) produces single GetAllRequest (one request per storage member owning part of key set to be precise). While processing this request partitioned service is looping over keys from request at call get(…) for each key from backing map. Partitioned service never calls getAll(…) from backing map and it gives no chance for read-write-backing-map implementation to call loadAll(…). Bummer.

My favorite solution for this issue is to load objects from DB directly and then put them to cache (looks a bit ugly, but as you will see below it is the most clean workaround). Sometimes, it may not be an option. Do we have other alternatives?

cachestore-scheme element has a configuration option named operation-bundling. Let’s try to use it.



No effect. What is a problem?

Let me explain how operation bundling works. Operation bundler intercepts method calls ( load(…) for this case) and delay them a little (1ms in config above). After delay it check if there are pending requests in other threads, and if they are, it gathers all requests for all threads and executes them in single loadAll(…), then dispatch results back to callers in other threads (and finally returns its own result of cause). This means max number of request in bundle is limited by number of threads which are accessing read-write-backing-map. By default we have just one thread per partitioned service, that's why nothing interesting happens.

Let’s add thread pool config to distributed scheme.



Test it. Nothing has changed. What next?

Yes we have thread pool now, but partitioned service have received just one GetAllRequest and each request is processed by single thread, so we still have one thread that sequentially calls to get(…) method. Blast!

Go back to our getAll(…) call site, create thread pool, separate keys between threads, call multiple getAll(…) in parallel.



Finally, loadAll(…) is being called now, but number of keys in single call to getAll(…) is still limited to size of partitioned service thread pool, and we also have to have same number of parallel getAll(…) requests.

So, for me this solution is a way more hacky than just load object for DB without Coherence help. I have notified Coherence team on this problem, hope they will fix it in some future release.

Labels: ,

June 22, 2010

Deploying to Hyper-V: Windows over Windows

Recently I was able to spend some time to getting acquainted with Microsoft Powershell and Microsoft Hyper-V. The task was to imitate elastic cloud service on existing hardware running Windows Server 2008 R2 with Hyper-V. Main goal was to learn how to create and shutdown running windows-based virtual machines on-demand.

Hyper-V provides WMI management interface which is available for use in Powershell, thanks to PowerShell Management Library for Hyper-V.

While solving my task I faced several issues I would like to share

Backup & Restore


Unfortunatelly, current version of NTFS does not support neither data de-duplication nor copy-on-write. Each of these technologies could dramatically speed-up creation of running instance from snapshot. For now, making a copy of virtual machine snapshot (including disk image snapshot) takes about 95% of instance creation time.

Hence, when you need fast instance creation, it might worth to use some NAS that supports de-duplication.

The other option is to use Hyper-V Differential Disk feature, but it requires to mangle or recreate virtual machine config file to avoid original disk image corruption. The other caveat is that if original image becomes corrupted, all differential disks become corrupted too.

Avoiding duplicate VM names


An obvious way to maintain reusable VM image is to prepare virtual machine with all needed software pre-installed, and then shutdown and 'export' it. When importing VM, Hyper-V uses disk image in-place, and rewrites config files making image unusable. Hence, for getting working copy of VM, we need to make a copy of exported image in some other place and then 'import' it.

But when we restore two such virtual machines, we get two identical instances with the same names. We need to give these instances different names to distinguish them. The easiest way is to generate GUID and use it as a VM name.

The VM name is specified in two places in the exported virtual machine XML configuration file. It would be very easy to use XPath to specify nodes for update, but unfortunately, Hyper-V config file parser is very formatting-sensitive. Hyper-V does not accept MSXML-parsed and saved VM config file, because it is saved slightly re-formatted. Hence, we have to replace original VM name with replacement pattern and consider xml file as plaintext.

Update: It is possible to use XML object OuterXml() method to get config file Hyper-V understands. Hence, config file update script might look like:


$hostname = [System.Guid]::NewGuid().ToString()
$cfgfile = Resolve-Path("$VMsPath\$hostname\Virtual Machines\*.exp")
$xml = [xml](Get-Content $cfgfile)
$xml.SelectSingleNode('DECLARATIONS/DECLGROUP/VALUE.OBJECT/INSTANCE[@CLASSNAME="Msvm_VirtualSystemGlobalSettingData"]/PROPERTY[@NAME="ElementName"]/VALUE').'#text' = $hostname
$xml.SelectSingleNode('DECLARATIONS/DECLGROUP/VALUE.OBJECT/INSTANCE[@CLASSNAME="Msvm_VirtualSystemSettingData"]/PROPERTY[@NAME="ElementName"]/VALUE').'#text' = $hostname
$xml.OuterXml | Set-Content $cfgfile


Avoiding duplicate hostnames and SIDs


After automatic virtual machine creation was implemented, the next issue was machine names and IP addresses. Obviously, when a complete copy of the VM is created, the copy will have the same hostname as original machine and some IP address obtained via DHCP.

Avoiding machine SID and hostname duplication is pretty simple. Microsoft provides freeware system preparation tool in Windows Resource Kit (sysprep.exe). It has different configuration file formats for different windows versions, but setupctl.exe tool from Windows Resource Kit generates valid config files for any windows version.

Sysprep.exe moves Windows back to sealed state (the state in which pre-installed windows exists on new computers), removes itself from disk and shuts down the Windows.

In sealed state all host-specific information is misssing: hostname, SID, Windows Product Key, activation date and so on. On first boot Windows starts post-install process, queries user for hostname, product key, generates system SID and makes several other steps. Setupctl.exe allows to create unattended post-install file. One of important unattended setup features is optional random hostname generation.

Hence, duplicated SIDs and hostnames can be avoided using sysprep and setupctl at the last step of preparing reusable VM image.

Acquiring IP addresses of the VM


IP address of running virtual machine is available in VM key-value pairs. Hyper-V acquires it through Integration Services. Note, that on some early versions of Hyper-V it is impossible to acquire IP address of the running virtual machine. Hence, if you installed Integration Services on VM, type in powershell:

Get-VMKVP "vm-name"

if you don't see IP address among other values, you should upgrade Hyper-V.

Conclusion


Microsoft Hyper-V with some workarounds may be used as local cloud-like infrastructure. Combining such infrastructure with public cloud services like Amazon EC2 and GoGrid may provide almost unlimited infrastructure on demand with reduced cost as a result of local infrastructure usage when demand is moderate.

Have a nice day!

Labels: , , ,

May 10, 2010

Facebook gives a powerful impulse in the spread of Semantic Web

At its third f8 conference Facebook introduces the next version of their platform, which opens very interesting perspective for all people who develop Semantic Web applications.

In a nutshell, the idea of this Facebook's innovation is to integrate user's web pages into the social graph. All such web pages become a part of ontology, and this ontology will be available from each web page integrated to this graph. For example, when user browses sports web site, the information about his friends favorite sportsmen could be available there.

In order to help web developers to integrate their sites to social graph, Facebook provides Open Graph protocol. According to this protocol all semantic information is stored in additional <meta> tags in <header> of html-page.

There are four required properties for every page: og:title, og:type, og:image and og:url. There are optional properties also: og:description and og:site_name. The Open Graph protocol supports location and contact information properties as well.

This is a how it could look like for Grid Dynamics site:
<html xmlns:og="http://opengraphprotocol.org/schema/">
<head>
<title>Grid Dynamics : Experts in scaling Mission-Critical Systems</title>
<meta property="og:title" content="Grid Dynamics : Experts in scaling Mission-Critical Systems"/>
<meta property="og:type" content="company"/>
<meta property="og:url" content="http://www.griddynamics.com/"/>
<meta property="og:image" content="http://www.griddynamics.com/templates/rhuk_milkyway/favicon.ico"/>
...
</head>
...
</html>

There is nothing technically new in Open Graph protocol. Most important thing that Facebook, with all power of its users behind it, promotes Semantic Web now. By Facebook statistics, over 50,000 websites already adopted Facebook’s new social plugins. That means that many people will study RDF, SPARQL and all this Semantic Web staff. We will have structured information and unbounded possibility to create new services to analyze it very soon. Enjoy it!

Labels: ,

May 5, 2010

Optional arguments as a new language feature of C# 4.0. Improvement or a pitfall?

Visual Studio 2010 has been launched, and thus C# 4.0 has become the latest standard-de-facto in the world of .Net development.
Previous major modification of the language has brought us many attractive features, which no doubt could significantly increase the development efficiency and simplify our life. So, it is quite predictable that all C#-developers were anticipating finding out even more improvements in the new release.

The release is here, and we’ve finally got three brand-new features: dynamic binding, optional and named arguments and the variance annotations. While the first and the last are specifically designed for a custom set of use-cases and scenarios, the optional and named arguments look like the simple time saving improvement, intended to be widely used in almost any C# code. They do not add any new functionality or technology – they just simplify things that we have been always doing. This is what is usually called “syntax sugar”, and such sugar features usually tend to become extremely popular, so most probably very soon we will see a lot of code written with heavy usage of optional arguments.

Before I continue, let me briefly describe the feature itself. It is quite obvious and easy, and has been used in many other programming languages for ages; however it is new in C#-world, so here it is. If you already know what are the optional arguments, then simply skip the next section.

“New” feature


Imagine a situation when you need to pass default values to the method or an object constructor. There may be different sets of defaults values for different scenarios. What would you do? Right, you would create a set of method overloads, do all the actual logic in the body of the method having most complete and specific set of the arguments, and call this method from other methods, substituting the default argument values in that calls. It would look something like this:



If the number of possible overrides is large, then the code becomes ugly and difficult to read and maintain. C++ and VB developers would use default arguments syntax here – and finally we’ve got a possibility to do it as well:



In this case arg1 is a required parameter, while arg2 and arg3 are optional parameters. So, the usage is obvious: calling DoSomeMethod(11) is equivalent to calling DoSomeMethod(11,0,12). In the same fashion you can explicitly specify the value of arg2 and skip only arg3.
You cannot omit first optional parameter while explicitly specifying the last one – this would significantly degrease the readability of such code (imagine the need to count commas in some multi-parametric method). Because of that, you cannot call anything like DoSomeMethod(11,,12) – this will lead to a compile-time error.
Another part of the new feature – so called “named arguments” comes to help in this case. You may explicitly specify the name of the parameter when entering argument values, which will allow you to enter them in any given order (probably, different from the order in which they are declared) and omitting any optional parameters. Like this:



That’s easy and simple, right? Most of the people who see this feature immediately like it and most probably are going to use it in their code. But some details remain hidden, and even reading Microsoft’s C# specification may not clear them all. And the details – as always – are the residence of the Devil.

Fun begins


Let’s start with the simple case.
Imagine we have the following code:



What do you expect to see as an output? Which method will be called? Or will we get a compilation error?
As for me, I would really prefer to see the compilation error here, because this really looks as an ambiguity. However, it is not: no error appears, the code compiles and executes successfully, and we see that a parameterless method is called. This happens because the member which matches the exact list of arguments is considered better then the one which matches the list of required arguments only. So, there is a reasonable explanation, yet this behavior still may very confusing.

Ok, let’s move on, to a more complicated case: inheritance and virtual methods.



Here we have a virtual method with an optional argument, its override in the derived class and a new method with no parameters defined in the derived class only. So, what will happen if we execute the following code:



This is actually the same situation as we had in previous example: parametersless method is a closer match then the one with the optional arguments, so we’ll get “Derived called” output.
But notice, that if you declare o as Base, you’ll get completely different story: call to Foo() will become a call to the virtual method of Base, and so the method with the optional argument will be executed: you’ll get “Overriden called” as an output. This is expected behavior, at least if you understand how virtual members are executed.

But now let’s turn the thing inside out: let’s move a parameterless method to Base and make it virtual, while putting the Foo(string bar = null) inside the derived class:



Can you guess what will happen now if we call the same code?



According to what we’ve seen before, we may expect that the overridden method will be called, because its signature is a closer match. However, it is not! We see a “Derived called” output, which means that in this particular case an overload was resolved in favor of a method with optional argument, which is non-virtual and is defined only in Derived.
Sounds crazy, doesn’t it?

The only explanation why this happens which I may suggest is that the search for the best match has a preference for methods explicitly defined in this particular class for which the call is made. So, if you put a new keyword instead of override (or simply omit this keyword, which is a bad practice, but means the same), you’ll get back on track: a parameterless version of the method will be called. But as soon as the method becomes virtual – bang! – it gets less priority when compared with others.

Conclusion


There is no clear clue for this behavior in MSDN, and even C# specification does not tell this explicitly. So, this is really, really dangerous pitfall. Imagine a vast code refactoring which introduces new method overloads and changing the behavior of existing methods... This may lead to a real pain.
So, be careful when using optional arguments and try to obey the following simple rules to minimize ambiguity:

  • When introducing a new method with optional parameters, remove old method overloads which were used for the same purpose before.
  • When overriding some method with optional parameters make sure that there are no new methods of the same signature at this class
  • When creating a new method in a class, make sure that it has different name with the optional-argument methods defined in the base class even if their argument signatures are different.

Personal note as a bottom-line


When C# first appeared, it was a clear and easy-to-learn & easy-to-use language, with minimum number of assumptions and ambiguous rules, implicit casts and overloads. It didn’t require you to know hundreds of hidden dangers and pitfalls; there were not features which were available only theoretically, but practically lead to bad coding practices and so on.

Now – when more and more “syntax sugar” extensions come out – the language is moving fast towards being complex, universal, but very generally defined tool, which has many unwanted surprises for unprepared developer.
Probably, this is a logical way of the maturity of any development tool or product. However, I personally feel little bit sad about that. The time of clarity and formality in the C#-world seems to be passing away.

Labels: , , ,