Match Making in Grid Computing
Data partitioning is widely used to store and manage huge amounts of data across several servers. Often there is a need to introduce global distributed transactions to keep the data consistent. Those global transactions represent a natural limit to the scalability for the entire system.
Project Convergence was born as an example of managing interaction between in-memory data grid and computational grid to address specifically problem of data consistency in systems with global distributed transactions. Based on the idea of data aware routing, network utilization and overall performance will be increased. In addition, data aware routing can also improve scalability.
How does it work? Let me illustrate the solution by using a simple example: You are the owner of a dating agency. A candidate comes to your agency and looks for a person with specific parameters such as the love for operas, an interest in cooking and a liking for dogs . Let's assume that the set of these parameters is unique for each person and all parameters need to be matching. So, your agency will find him a perfect match or he will be put into your database as a single new customer waiting for a match. Considering your candidate’s specific parameters, you expect that it will take some time to find him an exact match.
Your company has an outstanding reputation and you are flooded with candidates and your company cannot handle all the new candidates. You realize that you need to get a business partner and you partner with another dating agency in town. You also hire as secretary who greets your candidates and sends them randomly to one of the offices.
One day, two candidates with the exact same parameters (a perfect match) enter your office and your secretary sends one of them to your office and the other to your business partner’s office. Both of you start looking independently for an exact match simultaneously, but both of you cannot find one in your databases. As a result, you and your companion mark both candidates as single, waiting for a match while you miss the opportunity to generate a fee from the match. How can you solve this unfortunate incident?
First, you can have your secretary manage the assignment better by having her search your and your business partner's combined database: But while she is using the database, nobody else can access the database until she is done with her search. This may produce a match of the candidates, but create a bottleneck and hence impact the performance of your match making capabilities.
Secondly, you divide all candidates in two groups (for example, those that like dogs and those that do not, assuming there are about the same number of candidates that like or dislike dogs). This way, you and your partner are responsible for an equal amount of candidates. You will instruct your secretary to send incoming candidates to the offices depending on the likes/dislikes of dogs. Because you are processing only half of the candidates with one parameter already matched in your database, you will achieve a match at a faster rate than before.
This is exactly how in-memory data grids and computational grids interact and how data aware job scheduling can address the ‘matching of data grid and compute grid requests.’ As in our example, a database or an in-memory data grid (Oracle Coherence or GigaSpaces XAP), represents the databases with all the parameters of the candidates. You and your business partner’s represent the computational grid (GridGain, DataSynapse GridServer or Sun Grid Engine), and your secretary fulfills the function of a data aware job scheduler.
Project Convergence was born as an example of managing interaction between in-memory data grid and computational grid to address specifically problem of data consistency in systems with global distributed transactions. Based on the idea of data aware routing, network utilization and overall performance will be increased. In addition, data aware routing can also improve scalability.
How does it work? Let me illustrate the solution by using a simple example: You are the owner of a dating agency. A candidate comes to your agency and looks for a person with specific parameters such as the love for operas, an interest in cooking and a liking for dogs . Let's assume that the set of these parameters is unique for each person and all parameters need to be matching. So, your agency will find him a perfect match or he will be put into your database as a single new customer waiting for a match. Considering your candidate’s specific parameters, you expect that it will take some time to find him an exact match.
Your company has an outstanding reputation and you are flooded with candidates and your company cannot handle all the new candidates. You realize that you need to get a business partner and you partner with another dating agency in town. You also hire as secretary who greets your candidates and sends them randomly to one of the offices.
One day, two candidates with the exact same parameters (a perfect match) enter your office and your secretary sends one of them to your office and the other to your business partner’s office. Both of you start looking independently for an exact match simultaneously, but both of you cannot find one in your databases. As a result, you and your companion mark both candidates as single, waiting for a match while you miss the opportunity to generate a fee from the match. How can you solve this unfortunate incident?
First, you can have your secretary manage the assignment better by having her search your and your business partner's combined database: But while she is using the database, nobody else can access the database until she is done with her search. This may produce a match of the candidates, but create a bottleneck and hence impact the performance of your match making capabilities.
Secondly, you divide all candidates in two groups (for example, those that like dogs and those that do not, assuming there are about the same number of candidates that like or dislike dogs). This way, you and your partner are responsible for an equal amount of candidates. You will instruct your secretary to send incoming candidates to the offices depending on the likes/dislikes of dogs. Because you are processing only half of the candidates with one parameter already matched in your database, you will achieve a match at a faster rate than before.
This is exactly how in-memory data grids and computational grids interact and how data aware job scheduling can address the ‘matching of data grid and compute grid requests.’ As in our example, a database or an in-memory data grid (Oracle Coherence or GigaSpaces XAP), represents the databases with all the parameters of the candidates. You and your business partner’s represent the computational grid (GridGain, DataSynapse GridServer or Sun Grid Engine), and your secretary fulfills the function of a data aware job scheduler.
Labels: convergence, grid computing, ~Alexander Kusnetsov

0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
Links to this post:
Create a Link
<< Home