Data Conversion and Oracle’s Bulk Processing
Several areas may complicate a timely system cutover. Data conversion is one of the most time consuming. As we bring our latest client onto MemberzPlus, the cutover involves staging, mapping, and converting the client’s data. Through the data conversion process, code to query, loop, insert, update, and delete was inspected to allow for top performance. Utilizing Oracle’s bulk processing greatly increased conversion performance allowing quick retrieval and changing of lots of data. The key here is to reduce context switching between Oracle’s PL/SQL and SQL engines.
A context switch is basically a processing shuffle. For example, a PL/SQL unit with no SQL statements is processed entirely by the PL/SQL engine. Otherwise, the PL/SQL engine passes the SQL statements to the SQL engine for execution. Once the SQL is processed, the SQL engine returns the result back to the PL/SQL engine for further processing: a costly context switch. In our situation, we identified fetch/loop code processing millions of results: a context switch for every row fetched.
PL/SQL – SQL Details
PL/SQL unit
Function, procedure, package spec/body, a trigger, type/type body, PL/SQL block
SQL statement
DML and queries
PL/SQL engine
Located in the database or application development tool: TOAD, Oracle Forms
SQL engine
Located in the database
PL/SQL - SQL Engines
In some cases, it may not be possible to remedy all row-by-row processing. But handling as much within SQL as possible and refactoring as much classic cursor-for-loop code as possible with the BULK COLLECT/FORALL clauses prevents this row-by-row processing by storing the result set in a PL/SQL collection and processing in batches with the SQL engine. Be aware that a collection can grow large and if not controlled, can negatively impact database performance by exhausting database session memory. To prevent a memory hit, use the LIMIT clause to manage the number of records within each batch. So, if the overall collection is 1,000,000 records and LIMIT is set to 100, there will be 10,000, 100 record batches sent for processing: 10,000 trips to the database versus 1,000,000. Based on local testing, a LIMIT value of 100 seems to be a sweet spot for performance and memory management.
Bulk Processing Details
BULK COLLECT
Retrieves collections (multiple records of data) in a single fetch.
LIMIT
Limits the number of records “per batch”, when processing a collection.
FORALL
Insert, Update, Delete collections quickly.
Bulk Processing Example
create or replace procedure client_customer_bulk_test_dg is
cursor customer_cursor is
select customer_id, membership_id
from client_customers c;
--Declare the following:
--a type of table based on the customer_cursor,
--a collection of the table type used to store results,
--a collection limiter: c_col_limit, to set the result-batch per collection.
type customer_tt is table of customer_cursor%rowtype index by binary_integer;
customer_col customer_tt;
c_col_limit number := 100;
Begin
--Open/loop/fetch c_col_limit records at a time and collect into the
--customer_col collection. Note: customer_cursor_loop is not a row-by-row loop in the
--classic sense as it works with the BULK COLLECT and LIMIT clauses, building the
--result set collection 100 records at a time, in this case (c_col_limit = 100) until
--all of the customer_cursor records are collected.
open customer_cursor;
loop <
fetch customer_cursor bulk collect
into customer_col limit c_col_limit;
--Stop fetching when all records have been collected
exit when customer_col.count = 0;
--Insert collection data
--FORALL, not a loop, but a statement to bundle all the DML (that would normally
--process row-by-row) and send it to the SQL engine in batches of 100 records at
--a time (c_col_linit = 100).
begin
FORALL indx IN 1 .. customer_col.COUNT --SAVE EXCEPTIONS
insert into client_customers_dg(customer_id, membership_id)
values(customer_col(indx).customer_id,customer_col(indx).membership_id)
--track insert errors to clean and reprocess
--Here, I'm using the DBMS_ERRLOG pkg for logging vs. SAVE EXCEPTIONS.
--This DBMS_ERRLOG package allows DML operations to continue working properly
--despite errors that might occur during the insert. Errors are inserted into
--this table for future analysis and repair.
log errors into err$_client_customers_dg('create_customer - dg_customer')
reject limit unlimited;
commit; --each batch of 100 records
end;
end loop customer_cursor_loop;
commit; --just in case
close customer_cursor;
End client_customer_bulk_test_dg;
Conclusion
Simple, efficient, and fast. The cursor-for-loop code ran for about 13 hours and inserted ~80,000 records. After refactoring with bulk processing, we now see 9.5+ million records inserted/updated in about two hours.
Go-Live approaches with a little less stress.
Credits:
Don Garrett
Senior Systems Engineer
Mr. Garrett joined the Ross Group Veterinary Solutions Division in 2002 as an experienced developer in Oracle and PL/SQL. He has been a technical lead handling systems and application life-cycle management at numerous large veterinary teaching hospitals, diagnostic labs and now systems administration for MemberzPlus. Mr. Garrett has over 25 years IT experience in areas of accounting, banking, telecom, health, and memberships. Mr. Garrett holds a Bachelor of Science degree in Management Information Systems from the University of West Florida.