I've moved...

This blog now has a new home - please update your shortcuts and readers to: www.jeffkemponoracle.com. Sorry for any inconvenience!

Friday, October 21, 2005

Avoiding Lost Updates: Protecting Data in a Multi-user environment

There is a particular problem with Oracle and other databases where access to data is not serialized (by default), and there are a number of ways this problem is generally dealt with. The problem crops up when you have many users, some of whom may wish to update the same data at around the same time.

[Full Article]

9 comments:

  1. 10g offers ORA_ROWSCN (either row or block dependencies) automatically. This would work the same as the timestamp but without the need for an explicit timstamp column or reliance that the update does actually set it.

    Row dependencies would take a little more storage. Block dependency would give a few false positives as it is only able to say whether one or more records on the block has been updated and can't say which one.

    PS. I think, when you mentioned the poorly designed app, you meant to say "readers block readers" instead of "writers block readers" as you go on to say that the locks are taken when the rows were read

    ReplyDelete
  2. Hi Gary,

    Thanks for the note about ORA_ROWSCN - I've never used this pseudocolumn before and so hadn't considered it for this purpose. I would probably use the ROWDEPENDENCIES option - and the extra 6 bytes is more than compensated by the removal of the timestamp or version columns.

    You're also right about the poorly designed app - readers indeed blocked readers.

    Cheers
    Jeff

    P.S. I'm relatively new to this blogging thing - can you tell me how you found my blog?

    ReplyDelete
  3. Noticed some other references to this usage of ORA_ROWSCN, including:

    "Optimistic Locking in 10G" (Ask Tom)
    http://asktom.oracle.com/pls/ask/f?p=4950:9:8443264549509927838::NO:9:F4950_P9_DISPLAYID:22948373947565

    ReplyDelete
  4. Hi Jeff,

    This topic has moved to:

    http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:22948373947565


    Great post, btw :)

    ReplyDelete
  5. Hi Jeff,

    Thanks for this post, and although it was many moons ago, I thought I'd add a thought.

    Column Compare is a superior method to the others because it allows dev teams over years and staff changes to still executed OCC correctly (provided they've been taught how to do Column Compare):

    All method except #1 involve a vector-valued function from f(columns to be updated) -> vector-field, and differ only in the vector-field codomain.

    - column compare: f(cols to be updated) -> (cols to be updated)
    - hash: f(cols to be updated) -> long
    - timestamp: f(cols to be updated) -> datetime
    - version: f(cols to be updated) -> long

    In a dev team, suppose all devs are taught to use column compare. Then N different code areas that overlap in the data being written, each with N different functions but s.t. the (cols to be updated) vectors have a non-empty intersection, will obey OCC.

    Here's a simple ex of how Version Compare fails easily:

    - TxA checks (colA, colB) with (version1) and TxB checks (colB,colC) with (version2), then a lost update occurs when:
    - TxB reads (colB,colC)
    - TxA read (colA,colB)
    - TxA commits (colA,colB) after checking (version1)
    - TxB commits (colB,colC) after checking (version2).
    The colB update from TxA has been lost.

    If both Tx's were using Column Compare, then TxB would not have committed since colB had changed since its read.

    Dev teams are always changing staff and knowledge. It's very unlikely that Dev2 understands the version locking used by Dev1 at any given time. But if all use Column Compare, checking for all fields they're about to write, they remain OCC-correct.

    ReplyDelete
  6. Hi jd,

    Thanks for your thought!

    It certainly is a good idea for all developers to understand how their system operates in this regard. I don't quite agree with the rest of your comment though.

    "Column Compare is a superior method to the others because it allows dev teams over years and staff changes to still executed OCC correctly (provided they've been taught how to do Column Compare)"

    If all your dev teams understand Column Compare, that may certainly be a reasonable argument for choosing that method, but it isn't the only criterion. It might be ok for you, but what about my dev team who are more accustomed to Version compare? Are you arguing for an industry-wide standard?

    I suggest that the business requirements need to be considered as well.

    "If both Tx's were using Column Compare, then TxB would not have committed since colB had changed since its read."

    No, as long as the Version Compare method was implemented correctly, TxB would still not have committed because the version number on the row would have been changed by TxA. All except method #1 (Last Update Wins) will protect against lost updates, as long as they are implemented correctly.

    ReplyDelete
  7. Thanks Jeff, I agree with you that all except #1 will prevent lost updates if implemented correctly, and I was not suggesting we standardize on Column Compare. I suggested that the latter is easiest to communicate to a team "read what your going to write, and use UPDATE..new ..WHERE..old=new". But you're right, version or hash are really no more complicated to teach and maintain over staff changes.

    One question for you and your readership - I've seen a Version table used, with cols for the versions, apart from the tables being updated - ie the 'version' col is not co-located on the table being updated. This was likely done as a strictly additive delta to the schema, to avoid having to add version columns to data tables.

    What is your opinion on this design? It seems lost updates are still possible (since row-level locks in the UPDATE..WHERE cannot be used to lock the row while validating on the version and writing). It seems that unless a critical section in the app code synch's all threads trying to write, there is a chance for a lost update. The 4 schemes we are speaking about in this article use UPDATE..WHERE on the row being written, and so validation and writing happen under the same row lock. Do we need to underline that the version must be on the row, so as to use the same row lock?

    Here's a design for your evaluation (assume READ_COMMITTED isolation):
    1. within txA, read the version from the version table's correct column
    2. in the UPDATE, use a subquery in the WHERE clause:
    UPDATE..SET... WHERE versionReadin#1 = (SELECT versionCol FROM VERSIONTABLE);

    Question - does that subquery within an update cause a row-lock to be placed on the VERSIONTABLE as well (even though reading, since within an UPDATE)? If so, I can see how a version table can be used. Else, I suspect a lost udpate can still occur. What SQL is optimal in using a disjoint version table for versioning?

    thanks in advance...

    ReplyDelete
  8. I also would be interested in what other ideas people have on the concept of a separate "versions" table.

    If the version number is on a separate table for some reason, it would mean the UPDATE must update both tables atomically, otherwise, as you say, updates will be lost.

    For this to work the row in the "versions" table must be updated at the same time as the row in the data table. This could be done in one statement by updating via a view. This update might require the use of a instead-of update trigger. If not in the same statement, the row in the versions table would require an explicit exclusive lock prior to doing the update on the data table. And, since in Oracle readers do not block writers, just doing a subquery on the versions table will not lock the row.

    In addition, the "versions" table would need a foreign key to the data table, and each data table to be protected would need its own "versions" table (unless it uses some sort of "generic" ID for the reference, which means it can no longer constrain referential integrity).

    Seems to me that this architecture requires a great deal of additional complexity with little gain. For the sake of not adding a single column, we'd have to introduce a whole lot of coding to ensure it works correctly. How hard is it to add a column to a table? It's probably more a political question - in some places it is harder to get one additional column approved by all the layers of architects and DBAs, than to get approval to change the code.

    If the designer doesn't want an additional "version" column on each table, I'd suggest they use one of the other methods (e.g. ROW_SCN, described in the next post after this one) that give the same benefit, are relatively simple to implement, but don't require any additional columns.

    ReplyDelete
  9. Sorry, I mean to mention the ORA_ROWSCN method as described in Gary's comment, above.

    ReplyDelete

Note: only a member of this blog may post a comment.