June 19, 2008

Grid technologies in middle-size applications

Grid technologies were born to solve extreme problems and currently they are used primarily by large-scale applications. However, like computers, which were initially used only for solving complex scientific tasks and later came to almost every house, grid technologies are coming into middle-size enterprise market. In this article I will try to answer a question about how medium application can take advantage of grid technologies.

When writing large application that solve extreme problems, scalability is always an issue and you do not have a choice: you must invest resources in scalability and developers often have enough time and knowledge to solve this problem. When writing small applications, you usually do not care about this kind of problems, because all will work fine on a single server. But when writing middle-size application, you are in trouble, because you are already big enough to start thinking about scalability, but you are not big enough to invest a lot of resources in solving this problem. When middle-size applications grow from little ones the trouble becomes really serious. However, scalability problems in middle-size applications are usually caused by a very limited set of architecture decisions. Knowing these causes and their resolutions will help to build a more scalable application.

Whatever architecture you choose for a middle-size application, it will usually have a web server and a database. In the best case, it will have a single physical machine with both. In the worst case, it will have web servers, application servers and database server on separate machines. In all cases, the request processing chain will include several processes and will take a lot of time. Usually, the database server is also a bottleneck for the entire system, because it doesn’t scale well. While resolving these problems, mankind invented caches. Caches are useful to store some data that is costly to compute or retrieve. They can reduce both database load and request processing time a lot. However, caches can also be scaling killers.

Assume you have an architecture with a dedicated web server that hosts your application process. It is relatively easy to maintain a cache on a single server. But suppose you want to scale and the single server becomes a load balancing cluster, where each machine should maintain its own cache, and this cache should be synchronized with caches on other servers. For example, if some item is removed from the cache in one server, it should be immediately removed in every cache on every other server in the cluster. This is extremely hard to accomplish. But since developers often implement simple local caches by themselves, they try to enhance their caches to support distributed behavior. The problem is that developers often do not have enough knowledge and experience to do this. Fortunately, the problem of distributed caches is well known and the solution already exists.

The solution is to use third-party distributed caches or In Memory Data Grids (IMDG). In Memory Data Grids were created to solve problems of scaling data in extreme applications, where the cluster contains a hundreds of servers. Data stored on these servers can be partitioned between them or replicated. If data is partitioned, each server contains only one chunk of data and each chunk is stored on multiple servers to provide failover. This allows huge amounts of data to be stored in memory. If data is replicated, it is stored in full on each server. Of course, the data cached on each server is synchronized with data on other servers. This allows very fast access to data, because it is always available locally. In Memory Data Grids provide lots of other interesting features, which deserve an entire book to describe. Distributed caches are essentially a simplified form of an In Memory Data Grid. Currently there are many implementations of both IMDGs and distributed caches and if you choose to use these technologies you have a number of options.

The concrete choice will depend on what technology or framework you use in your application:
  • If you are using .NET, you may use ScaleOut, NCache or Microsoft’s new distributed cache Velocity, which is currently available as a Community Technology Preview. They all provide an ASP.NET session state provider – the easiest way to gain a benefit from grid technologies. With this session state provider you will not need to maintain a special SQL Server for storing ASP.NET session data, because all data will be distributed between web servers in the cluster in a reliable and robust way.
  • If you are using Java, Oracle Coherence and GigaSpaces are the most famous In Memory Data Grids and, hence, can be used as distributed caches. They both provide a second level cache for Hibernate, so, if you use it, you can scale easily with no additional development efforts.
  • If you are using C++, PHP, Ruby or Python, you should consider memcached. This is a very famous distributed cache initially developed for LiveJournal, which has already helped to scale many extreme applications, like Wikipedia, YouTube, Facebook and others.
All these implementations will help you to solve distributed cache problems and scale well. In simple cases, to start using them, you will need to replace your old local caches with new distributed ones. If you need to scale a Hibernate second-level cache, or an ASP.NET session state provider, you will not be required to write any code. However, in the case of complex and serious scalability problems you can consult with us at Grid Dynamics any time and we will help you to solve them in a most effective way.

Labels: , , , , , , ,

1 Comments:

Blogger Jimekus said...

I stumbled onto how to get my Ingridx to spit out a histogram of seven words which uniquely identify a document, such that Google always makes it their number one.

I want to pass these for selection from a series of searches to each visitor in my blog-grid and run a script which displays the 1st google article for them.

My visitor/browsers are to be invited to then click icons. These icon/comments are saved by my script, for Ingridx to slowly adjust over time, the seven words.

Given my unique reading list, presented as a series of entries, each of seven words, I need help to make such a browser extension for my webpage.

Please don't decry this as a mere tautology.

Can anyone help?

February 21, 2009 4:38 AM  

Post a Comment

Subscribe to Post Comments [Atom]

Links to this post:

Create a Link

<< Home