It's an alternative and recommended approach to overcome the performance issues of simple JDBC batch approach in SQLServer (mssql-jdbc.jar).
How regular Batching works in MS SQL Server implementation?
Firstly, we must understand that network exchange is a costly activity. If you are making a call from application to Database to perform any database operation there will be communication happens between application and database server. In Bulk, if you are performing any DDL or DML activities it takes humongous time just because of these network exchanges and also time taken to commit on database for every record (sql server commits by default).
This can be avoided by using Batching. Batch Processing allows you to group related SQL statements into a batch and submit them with one call to the database. When you send several SQL statements to the database at once, you reduce the amount of communication overhead, thereby improving performance.
Steps to follow for regular JDBC Batching :
- First set autocommit as false on connection
- Instead of executeUpdate, use addBatch()
- Based on your batch size call executeBatch()
- After batch execution use clearBatch()
- commit
It supposed to send all batch operations on one shot, however we identified at database side it is still executing each and every record one by one.
It leads to degrade performance.
Even it is ended up with same behaviour after applying "set nocount on".
Here is the SQL Server Profiler Trace which will show the queries executed during the JDBC batch run. [click here for the traces]
To Overcome this Microsoft came up with Table Valued Parameter.
In table valued parameter instead of adding each record to batch, we are adding to sqlserverdatatable (equivalent to database user defined table type object) and passing
sqlserverdatatable to the query as a single query param.
Steps to follow
1. First setautocommit as false on connection
2. in Query, instead of keeping individual columns as query params we will keep only one query param to which we need to pass sqlserverdatatable.
3. create a user defined table type object for desired table.
4. we have to create sqlserverdatatable using the same schema of table type object we have created in database using addColumnMetadata() method.
5. every record we should add to the sqlserverdatatable using addRow() method
6. using setStructed() method on SQLServerPreparedStatement you need to map the query, sqlserverdatatable and table type object name. It will use sqlserverdatatable and user defined table type object name and pass the bulk data from application to database server in one shot and inserts into the table.
7. execute the statement and clear the sqlserverdatatable object.
It reduces the network exchanges unlike JDBC batching.
Here is the SQL Server Profiler Trace which will show the queries executed during the TVP batch run. [click here for the traces]
Runtimes to execute 100000 records with batch size of 500
JDBC Batching :
insert statement: INSERT INTO TVPEXAMPLE (name,age,sex,address,pin) values ( ?, ?, ?, ?, ?);set nocount on
Time Taken for Batch execution : 1557 (ms)
TVP Batching:
TVP insert statement: INSERT INTO TVPEXAMPLE SELECT * FROM ?; set nocount on
Time Taken for TVP execution : 1056 (ms)
You can see there is around 35% improvement in the performance if you use TVP object while doing batch.
However still the Oracle Batching is much faster than TVP Batching approach in SQL Server because of the driver implementation of Oracle.
You can download the Project from here.