Tour Web Services Atomic Transaction operationsBeginner‘s guide to classic transactions, data recovery, and mapping to WS-AtomicTransactions ![]() | ![]() |
![]() |
Level: Introductory Thomas Freund, Senior Technical Staff Member , IBM 02 Sep 2004 Explore how transactions work in one common and classic form to preserve data integrity, and apply that classical transaction description to the operations of the new Web Services Atomic Transactions (WS-AT) and related Web Services Coordination (WS-C) specifications. Mapping classical to Web services transactions helps you discover that Web Services Atomic Transactions embodies age-old common industry best practices for one kind of transaction. Whereas classical transaction processing most often occurred using non-universal means and interoperated in an ad hoc manner, if at all, the new WS-AT embodiment is based on widely accepted interoperability standards such as XML and WSDL. Using Web services mechanisms results in better flexibility and interoperability. Most people know that transactions are vital to any business. Fewer people know how transaction processing works behind the scenes. In this paper, we illustrate how transactions work using a simple example that addresses, and provides a solution for, a common transactional problem: losing a customer‘s money. First we illustrate the problem using a classic style of transaction processing, then we map the old onto a new Web services-based mechanism with new advantages in flexibility and interoperability.
Not losing money is quite important. Just ask Waldo. Waldo‘s situation typifies the need for a transaction. Waldo uses an ATM (or a browser) to move some money from one account to another account. These accounts may be in different branches of the same financial institution, or may be in different institutions. It is never acceptable to Waldo for his money to disappear. Should Waldo ever doubt the safety of his money, he would probably switch financial institutions. Waldo‘s money is represented by data in two databases that cooperate to insure that the data they contain are always in a known and consistent state. That is, these two databases allow actions or tasks between them to be within a common activity or work scope (see Figure 1). Put yet another way, a single transaction can manipulate data in both databases and something will guarantee that only one of two possible outcomes occur: all the changes are successfully made or none of the changes are made at all. Figure 1. Common activity encompasses various recoverable actions ![]() The something that guarantees the common outcome of all the actions is a protocol supported by both databases, and some supporting middleware. The protocol the databases use to keep data (such as Waldo‘s balances) coordinated is called two phase commit, or simply 2PC. Our example uses a common variation of 2PC called presumed abort, where the default behavior in the absence of a successful outcome is to rollback or undo all actions in the activity. From a programming perspective, there are different ways to specify that multiple actions should be within the scope of a single transaction. One particularly clear way to specify transactional behavior is shown in Listing 1. The code is the small piece of logic running somewhere behind the ATM Waldo is using -- perhaps within the datacenter of one of the financial institutions involved. Clearly a lot is left out of Listing 1. We are only going to show enough to use this example to illustrate the 2PC protocol being used to coordinate two actions: taking money out of one account and putting it in another account. Listing 1. Pseudo-code for Waldo‘s transaction
Another way to specify that a transaction is needed is to use J2EE‘s Container Managed Transactions (CMT), but we use the code in Listing 1 for now because it is a clear and easy match for Waldo‘s simple transaction. Using J2EE CMT, the BeginTransaction and CommitTransaction are automatically implied without using lines of code.
So what is a classic transaction? Abstractly, a classic transaction is just a grouping of recoverable actions, the guaranteed outcome of which is that either all the actions are taken, or none of them are taken (see Figure 1). For our simple purposes, a recoverable action is anything that modifies protected data. For example, taking money out of one of Waldo‘s accounts (fromAcct -= amount) is a recoverable action that can be reversed up to the end of the transaction. In Waldo‘s case, his transaction comprises two actions: taking money out of one account and putting money into another account. It‘s okay for both of these actions to occur, and it‘s even okay if neither of these actions occur. It‘s never okay for one action to occur without the other also occurring, which would result in corrupt data and either Waldo‘s net worth or the bank‘s assets disappearing or appearing from nowhere. Hence, both actions need to be within a single transaction with a single outcome: either both actions occur (a commit outcome), or neither action occurs (a rollback outcome). Assuming no errors happen, the code in Listing 1 shows that a commit outcome is desired. The code could just as easily have specified rollback instead of commit (for when Waldo hits the <Cancel> key on the ATM), which means reverse all actions in the transactional work scope (between beginning and end). The transaction monitor, which is the underlying middleware helping the code in Listing 1 support transaction processing, would automatically specify rollback if the program suffered an unhandled exception. Such an automatic rollback on the part of the transaction monitor is a protection mechanism to make sure that data is not corrupted -- for example even if the ATM application fails unexpectedly, the middleware will "clean up" and guarantee the outcome. For this introductory paper, we ignore truly catastrophic events, such as one of Waldo‘s banks being entirely swallowed into a sink hole, thereby precluding final outcome processing of his transaction. Really bad events, which are orders of magnitude less common than regular outcome processing, are the subject of something called heuristics. Heuristics are beyond our scope and are likely to involve human intervention at one of the banks (the one that didn‘t fall into a sink hole). Classical transaction processing by the numbers Now let‘s see how one common variant of 2PC (presumed abort) can be used to effect Waldo‘s transaction and move money from one account to another in a recoverable way. A key part of this illustration is to see that no matter what kind of failure occurs (ignoring sink holes), data integrity is preserved and Waldo remains a loyal customer. Figure 2 shows Waldo‘s transaction on a timeline with all of the interacting components needed to execute the logic shown in Listing 1. The ATM application itself is the top line. The next two lines represent the account databases that the application manipulates. The databases will be transactional participants. The next line is a transactional coordinator, or middleware that will orchestrate the 2PC protocol. The state of the transaction dictates recovery processing in the event of a failure. The colored line at the very bottom indicates the state of Waldo‘s transaction at different points in time. The lines for Database-1, Database-2, and Coordinator represent both time (flowing left to right) and also some key records recorded onto a recovery log. The recovery log is used insure data integrity during recovery processing, which we will illustrate later. Safety of the recovery log is vital. The log is specially protected using hardware if the data is really important (and Waldo‘s money is very important). It is critical that the recovery log is not lost, and so it is protected with redundancy and security commensurate with the value of the data (such as secured access, RAID storage, physically separated redundant storage, and so on). Most optimizations, checkpoint processing (aggregating intermediate results so that recovery actions never need to read the entire log), and "fringe cases" are beyond our scope. For example, we are a little liberal about what it means to write to the log. Some things have to be written to the log before processing can continue, but many things can be written "lazily." In the two cases where it is really important (numbered steps 12 and 13 below), we specify that log is forced, meaning the writes occur before any further steps are taken. We are using just one of many possible variations on 2PC for our example. Our databases use their logs for before and after images of data (Undo and Do records) and state information. We could have simplified even further and skipped the Do records, but sizeable and high performance databases would probably use them, so we did too (usage becomes clear in the following steps).
Waldo‘s transaction on a timeline Now let‘s walk through Waldo‘s transaction. Below, when we talk about the ATM application, take it to mean either the application itself, or some middleware supporting the application. For example, when we say the application begins a transactional scope, it could be that middleware begins the transactional scope on behalf of the application. Figure 2: Waldo‘s ATM transaction behind the scenes ![]() Here is narration to help explain the numbered steps shown in Figure 2 (the application pseudo code is shown in Listing 1):
Figure 2 illustrated that a lot of things go on behind the scenes. Next we illustrate how the recoverable resource managers (participants) in this example can recover from failures and insure data integrity.
Recovery processing isn‘t really part of two phase commit. Rather, 2PC enables recovery. That is, because all the resource managers used 2PC, they are able to perform actions such as described here and guarantee data integrity. Integrity is maintained even across databases by making sure that all the recoverable actions either go forward or go back. Figure 3 is a copy of Figure 2 with three red vertical lines inserted. These red vertical lines represent failures. We assume the worst possible failure that is not permanent: everything fails due to a massive, region-wide power failure. Smaller failures are mostly just a subset of this massive failure (for example, the application fails, a database fails, and the coordinator fails). After the failure, the databases and the coordinator can eventually restart and use their recovery logs and recovery rules to insure data integrity as illustrated in Figure 3. We won‘t really care much about the ATM application itself. If the application alone fails, middleware underneath it will drive rollback (assuming it dies prior to saying Commit). If the application does go down and come back up, it can be blissfully ignorant of what it was doing before the failure, since the middleware and databases guarantee data integrity. This makes the application itself vastly easier to write and maintain. Data recovery uses rules based on the state of the transaction as recorded by the coordinator and participants. Here are our recovery rules (for this example, using 2PC with presumed abort and our specific optimizations). After a failure:
Figure 3: Maintaining data integrity over failures ![]()
When Waldo returns to the ATM, all his accounts are in the same state they were in before he started the transaction. He thought the transaction was in process when a massive power failure took out the ATM and the two banks he was transferring money between. He is relieved when he finds that his balances have not changed at all.
As with Failure 1, when Waldo returns to the ATM, his accounts are again in the same state as before he started the transaction that was interrupted by the massive power failure.
Once again, when Waldo returns to the ATM after the failure, his money is again in a known state, but this time the transaction completed successfully (money was taken out of one account and put into the other). Waldo is very impressed with the quality of service provided by his financial institutions, who successfully completed his transaction even during the massive power failure. It‘s important to remember that this example used just one kind of 2PC with some specific optimizations -- and in fact some things were left out (for example, when can one of the databases forget about a transaction?). More optimizations and lots of heuristic processing goes on in any real system. A useful transactional protocol requires that all the recovery cases for failure be covered in some way, either in the protocol or in the participants, and be enabled by the protocol. This example only illustrated a few cases of failures.
Mapping from classical transactions to Web Services Atomic Transactions In Figures 2 and 3 we didn‘t mention how Database-1 contacted the Coordinator, nor did we specify how the Application called the databases. In fact, we didn‘t specify the mechanisms for anything to contact anything else. In the past, these were mostly non-universal mechanisms that sometimes only worked between certain combinations of entities (applications, resource managers, and coordinators or transaction monitors). The combination of Web Services, Web Services Coordination (WS-C) and Web Services Atomic Transactions (WS-AT) map all of the flows shown in Figure 2 and specify precise communications mechanisms for achieving the same results. However, instead of only working between certain combinations, the Web Services based flows can work with just about anything. Figure 4: Waldo‘s transaction revised. ![]() In Figure 4, the classic flows (Figures 2 and 3) are converted to Web services as follows. Significantly changed steps are numbered and described below. As before, when we say application, take it to mean the application or some helper middleware. Likewise, when we say database, it might mean the actual database, or some helper middleware.
From Figure 4, it is clear that atomic transactions using Web services (WS-C, and WS-AT) are substantially the same as without Web services (Figures 2 and 3, for example). The primary differences are almost cosmetic from the outside and involve how entities communicate with each other, not the substance of what they communicate. However, these differences in how the entities communicate have a big impact on flexibility and interoperability. You can acheive universal interoperability with Web services, because instead of changing resource manager X to interoperate with transaction monitor Y, you can change both X and Y to use Web services and then interoperate with many other resource managers and transaction monitors. So instead of two-at-a-time interoperability, or interoperability only within a specific kind of domain, n-way universal interoperability is possible. Recovery processing using Web services between the interested parties is again the same as before Web services. Resource managers are the only ones who know their resources and how to commit them or roll them back. As an example, suppose Failure 1 (from Figure 3) happens, but after the conversion to Web services. Database-1 comes back up and, just like before Web services, it reads its log and realizes that it needs to contact the Coordinator. Information on how to contact the coordinator is in the state saved on its recovery log; with Web services it will be an end point reference (EPR -- defined in WS-Addressing). Database-1 contacts the Coordinator at that EPR with a message defined in WS-AT called Replay. Replay causes the Coordinator to resend the last protocol message to Database-1, which lets Database-1 deduce the transaction state -- and then apply the appropriate recovery rule.
Isolation and other transaction options Data integrity depends on a set of actions atomically moving data from one well-defined state to another. Our example showed Waldo‘s transaction with two actions inside a common all-or-nothing activity: taking money out of one account and putting it into another account. What if Mrs. Waldo entered into this picture by using an ATM across town at the same time and attempted to take money from one of the same accounts? If Mrs. Waldo tries to take money out of the account being held by Database-1 between numbered steps 4 and 16 of Figure 2, her attempted transaction might fail or might have to wait for Waldo‘s transaction to finish. The reason is that Waldo is manipulating that same account data in Database-1. The database record(s) for that account is locked (depending on the database management system, more than just a single balance might be locked). Having the database records locked in this case is a good thing and a bad thing. It‘s a good thing because data integrity must be preserved and so the actions representing Waldo‘s transaction need to complete (in other words, be isolated) before another transaction is allowed to manipulate the same data; otherwise the recovery steps to guarantee data integrity might be extremely complex. However, having the database records locked is also a bad thing, because, for example, Mrs. Waldo can‘t access the money. Holding locks for long periods can reduce the amount of concurrency that a database can support. Lack of concurrency inconveniences Mrs. Waldo, but it can be an even bigger concern in other usage examples. How long is Mrs. Waldo blocked from the joint account? Suppose a failure occurs between numbered steps 4 and 16 from Figure 3. Database-1 records remain locked somewhere in that span, perhaps for an indeterminate length of time. In this example, observe that access to Database-1 records can depend on the availability of Database-2 (because between steps 4 and 16 is also a lot of Database-2 processing, while Database-1 locks are held). Mrs. Waldo is accidentally demonstrating a drawback of 2PC: database locks are needed to guarantee isolation and this can have negative consequences for data access (and concurrency). This illustrates one reason that 2PC is generally considered appropriate for more controlled environments, where the participants (such as Database-1 and Database-2) can be expected to behave according to relatively strict policies. Relaxing isolation while still preserving transactional semantics (and eventual data integrity) would allow better flexibility, but 2PC (classic or Web services) is not well-suited for this. Relaxed isolation, among other things, is the subject of a different Web services transaction specification aimed at more loosely controlled environments: Web Services Business Activity (WS-BA). WS-BA is outside the scope of this article, but will be the subject of a future article. Most people will probably not directly use the various features defined in WS-C and WS-AT. Rather, middleware will make much of this processing transparent (and easier). However, it‘s nice to know what goes on behind the scenes, because it can be helpful in using facilities most efficiently, and because you never know -- you might need to write a resource manager someday.
We looked at classical transaction processing and how it enables data integrity across a number of actions by forcing an all-or-none semantic to the set of actions. Transaction processing using Web services is logically very similar to classical transaction processing. There is a relatively straightforward map from classic to Web services transactions and we mapped the Durable 2PC variety of Web Services Atomic Transaction as an illustration. |
聯(lián)系客服