MemberzPlus
Ross Group Inc   Client Portal  |  (800) 652-4985  |  Contact   



Data Conversion and Oracle’s Bulk Processing

Several areas may complicate a timely system cutover. Data conversion is one of the most time consuming. As we bring our latest client onto MemberzPlus, the cutover involves staging, mapping, and converting the client’s data. Through the data conversion process, code to query, loop, insert, update, and delete was inspected to allow for top performance. Utilizing Oracle’s bulk processing greatly increased conversion performance allowing quick retrieval and changing of lots of data. The key here is to reduce context switching between Oracle’s PL/SQL and SQL engines.

MemberzPlus Bulk Processing Banner

A context switch is basically a processing shuffle. For example, a PL/SQL unit with no SQL statements is processed entirely by the PL/SQL engine. Otherwise, the PL/SQL engine passes the SQL statements to the SQL engine for execution. Once the SQL is processed, the SQL engine returns the result back to the PL/SQL engine for further processing: a costly context switch. In our situation, we identified fetch/loop code processing millions of results: a context switch for every row fetched.

PL/SQL – SQL Details

PL/SQL unit

Function, procedure, package spec/body, a trigger, type/type body, PL/SQL block

SQL statement

DML and queries

PL/SQL engine

Located in the database or application development tool: TOAD, Oracle Forms

SQL engine

Located in the database

PL/SQL - SQL Engines

MemberzPlus Bulk Processing Diagram

In some cases, it may not be possible to remedy all row-by-row processing. But handling as much within SQL as possible and refactoring as much classic cursor-for-loop code as possible with the BULK COLLECT/FORALL clauses prevents this row-by-row processing by storing the result set in a PL/SQL collection and processing in batches with the SQL engine. Be aware that a collection can grow large and if not controlled, can negatively impact database performance by exhausting database session memory. To prevent a memory hit, use the LIMIT clause to manage the number of records within each batch. So, if the overall collection is 1,000,000 records and LIMIT is set to 100, there will be 10,000, 100 record batches sent for processing: 10,000 trips to the database versus 1,000,000. Based on local testing, a LIMIT value of 100 seems to be a sweet spot for performance and memory management.

Bulk Processing Details

BULK COLLECT

Retrieves collections (multiple records of data) in a single fetch.

LIMIT

Limits the number of records “per batch”, when processing a collection.

FORALL

Insert, Update, Delete collections quickly.

Bulk Processing Example

create or replace procedure client_customer_bulk_test_dg is
 cursor customer_cursor is
  select customer_id, membership_id
  from client_customers c;

  --Declare the following:
  --a type of table based on the customer_cursor,
  --a collection of the table type used to store results,
  --a collection limiter: c_col_limit, to set the result-batch per collection.
  type customer_tt is table of customer_cursor%rowtype index by binary_integer;
  customer_col customer_tt;
  c_col_limit number := 100;
Begin
  --Open/loop/fetch c_col_limit records at a time and collect into the
  --customer_col collection. Note: customer_cursor_loop is not a row-by-row loop in the
  --classic sense as it works with the BULK COLLECT and LIMIT clauses, building the
  --result set collection 100 records at a time, in this case (c_col_limit = 100) until
  --all of the customer_cursor records are collected.
  open customer_cursor;
  loop <>
    fetch customer_cursor bulk collect
    into customer_col limit c_col_limit;

      --Stop fetching when all records have been collected
      exit when customer_col.count = 0;

      --Insert collection data
      --FORALL, not a loop, but a statement to bundle all the DML (that would normally
      --process row-by-row) and send it to the SQL engine in batches of 100 records at
      --a time (c_col_linit = 100).
      begin
        FORALL indx IN 1 .. customer_col.COUNT --SAVE EXCEPTIONS
        insert into client_customers_dg(customer_id, membership_id)
        values(customer_col(indx).customer_id,customer_col(indx).membership_id)

          --track insert errors to clean and reprocess
          --Here, I'm using the DBMS_ERRLOG pkg for logging vs. SAVE EXCEPTIONS.
          --This DBMS_ERRLOG package allows DML operations to continue working properly
          --despite errors that might occur during the insert. Errors are inserted into
          --this table for future analysis and repair.
          log errors into err$_client_customers_dg('create_customer - dg_customer')
          reject limit unlimited;
      commit; --each batch of 100 records
    end;
  end loop customer_cursor_loop;
  commit; --just in case
  close customer_cursor;
End client_customer_bulk_test_dg;

Conclusion

Simple, efficient, and fast. The cursor-for-loop code ran for about 13 hours and inserted ~80,000 records. After refactoring with bulk processing, we now see 9.5+ million records inserted/updated in about two hours.

Go-Live approaches with a little less stress.

Credits:


Don Garrett
Senior Systems Engineer

Mr. Garrett joined the Ross Group Veterinary Solutions Division in 2002 as an experienced developer in Oracle and PL/SQL. He has been a technical lead handling systems and application life-cycle management at numerous large veterinary teaching hospitals, diagnostic labs and now systems administration for MemberzPlus. Mr. Garrett has over 25 years IT experience in areas of accounting, banking, telecom, health, and memberships. Mr. Garrett holds a Bachelor of Science degree in Management Information Systems from the University of West Florida.