Workload Simulation With SQL Data Generator

Patients of Dr.Psycho are glad he performs simulated operations

Patients of Dr.Psycho are glad he performs simulated operations

Introduction

In college one of my mentors was a jedi  master of simulations and testing.  The guy could write a simulation for just about anything you can think of, and with anything you can think of.  I’m pretty sure that he once made a neural network for simulating dinosaur procreation rates…using sticks and mud (true story).  Simulating workloads can help you thoroughly test ideas and present data to management that will allow them to make informed decisions.  IRL, your career will be much happier if you can test your ideas BEFORE deploying them to production.  No one is asking you to have good ideas all the time, but the business is depending on you to come up with ideas that will work as intended.  The only way to accomplish this responsibly, is to test.

For the sake of this post, let’s start with the wiki definition of Simulation Software, “essentially, a program that allows the user to observe an operation through simulation without actually performing that operation.”   In our case, let’s go ahead and revise that a bit so that the simulation software will allow us to observe how the system reacts without actually performing the operation…in production.

DBAs are known to be some of the most creative and awesome people in the world, and because of this we are usually just thinking up cool shit all the time!  As I’m sitting here typing, I’m thinking about how telekinesis would be so rad.  I can easily hypothesize about the good (telekinetic levitation from the couch to the fridge), and I can just as easily hypothesize about the bad (actually, it’s not that easy, I can’t think of anything that could go wrong)  Clearly, if I had the ability to simulate telekinetic situations, I could make a better decision before borrowing some powers from my friend.

The Red Gate SQL Data Generator [SDG] is a great tool for  generating realistic test data.  I use this tool all the time for populating tables in order to check out growth rates, or just to have some data to test with.  It is fast, and extremely easy to use.  One day while I was thinking up some cool shit, I thought…Wouldn’t it be like whoa if I could use SDG to simulate workloads on various designs so that I could observe the outcomes?!?

TO THE BATCAVE!!!

Time to make SCIENCE!

Time to make SCIENCE!

The Lab

Using Hyper-V, SQL Server 2012, SDG, andPoSh I was able to construct a system for performing inserts, selects, updates and deletes on a system.  I chose PoSh because I’m a big fan of the scripting language.  I pretty much believe you can automate anything you can imagine on Windows using PoSh.  The lab setup can be duplicated with the following walk through.

First create as many sql users as you want for the workload.  Use some T-SQL or PoSh, or click if you prefer.  I named my users rg_vu and set all the passwords to testing (so please don’t hax my vms).  I created 5 total.

Next you will want to create a SDG project.  For this experiment I created 3 unique projects each with their own number of rows.  I used 100k, 500k, and a milli.

RGDG_Screen1

I then edited the project options and changed the amount of rows per batch, I felt this would add a little character to each project so  I went with 10, 100, and 1000.  I do this while cooking too, sometimes I throw a little of this in, sometimes a little of that.  I follow my nose and eventually it’s a masterpiece.

RGDG_Screen_4

At this point you kind of see where this is going.  Each project is now generating stuff uniquely, but that is just insert traffic 🙁  We need to spice this bad boy up and not only generate stuff, but also DO stuff!  This is where you can use SDG pre and post scripts.  Since I wanted to get as close to an active system as I could, I whipped up 3 quick loops (you can cut n paste these)

This statement runs in the pre (helping to keep the size of the database down)

DELETE TOP (50) PERCENT
FROM
LoadLab.dbo.BeatMe

This statement runs in the post (updating the data, like trifling applications do)

UPDATE dbo.BeatMe SET beatMeDate = (GETDATE() - ABS(CHECKSUM(NEWID())) % 25)
WHERE beatMeId IN (SELECT TOP (25) PERCENT beatMeId FROM dbo.BeatMe);

This loop runs in the post as well (selecting the data, like trifling users do)

DECLARE @n INT;
SET @n = ABS(CHECKSUM(NEWID())) % 25;
DECLARE @sqlstmt NVARCHAR(MAX);
SET @sqlstmt = 'SELECT TOP ' + CAST(@n AS NVARCHAR(MAX)) + '* FROM LoadLab.dbo.BeatMe';
DECLARE @counter INT;
SET @counter = 0;
WHILE @counter < 25
BEGIN
EXEC
sys.sp_executesql @stmt = @sqlstmt;
SET @counter = @counter + 1;
END

RGDG_Screen2

Ok, now that your SDG projects are all setup and ready to serve, you need a way to execute them.  Thanks to PoSh this is not a problem, although it is rather cumbersome.  First I created a single ps1 script that executes the command line for SDG.  The command I used specifies the server, database and user (this way I can run the same projects with multiple users in hopes of laying a true smack down on the db).   I also named my scripts with the postfix of _vu, just so I can keep things organized.

&’C:\Program Files (x86)\Red Gate\SQL Data Generator 3\SQLDataGenerator.exe’ /project:”C:\Lab\BeatMeBabyBear.sqlgen” /server:sandbox-sql1 /database:LoadLab /username:rg_vu1 /password:testing

After creating <n> scripts, I created a master script that calls each script through Start-Job and then sleeps in-between so that they each start-up a different times.

Start-Job -filepath C:\Lab\BeatMeBabyBear_vu1.ps1
Start-Sleep -Seconds 3
Start-Job -filepath C:\Lab\BeatMeMommaBear_vu2.ps1
Start-Sleep -Seconds 2
Start-Job -filepath C:\Lab\BeatMePapaBear_vu3.ps1
Start-Sleep -Seconds 1
Start-Job -filepath C:\Lab\BeatMeBabyBear_vu4.ps1
Start-Sleep -Seconds 5
Start-Job -filepath C:\Lab\BeatMePapaBear_vu5.ps1

The Results

Once debugging was completed, I was able to test the process and see the load being generated via Task Manager, Resource Monitor, Perfmon, and SQL Profiler.  This was the intended result of the experiment, so I viewed it as a success. The CPU spiked, and profiler scrolled like a madman.  Indexes on the table showed updated stats, and triggers fired as intended.  I now have a way to generate workloads for various ideas, and test how they will perform.  One of the things I noted was that the SDG uses ‘insert bulk’ so at this time I’m unsure how this affects the simulation.

The experiment also helped me come up with more features I would like to see in SDG, or that I will need to hack myself.   First, I really wish SDG had a random rows setting.  I would have loved to put in a min/max value for the amount of rows and let it run.  Second, if the software offered a way to run scripts after rows had been inserted, then we could run the selects, updates and deletes in a more erratic fashion, better yet a run scripts after rows are inserted.  Third, if Red Gate would open up the API for the SDG , someone could probably make this happen.

At this time it is possible to edit the .gen files in notepad.  They use xml so some possible future work could be to automate the scripting of the above steps using the xml. After I had completed this experiment, @Kendra_Little informed me that HammerDB can be scripted to hit user databases using TCL.  Since HammerDB is designed to simulate workloads I think my time is better spent learning how to integrate it in with my tests.

Cheers

5 comments

    1. Just FYI, it’s been pointed out to me that one way to generate a massive INSERT script in the meantime would be to point SQL Data Compare at the database that SDG has filled – and the Comparison tools *do* have an API via the Comparison SDK 🙂

      1. Thanks Chris. I am familiar with the Compare API (which is why I thought it would be cool to have a SDG API).

        So you are saying;
        1. Create 2 tables table1 and table1copy
        2. Generate data into table1copy (a milli rows)
        3. Compare the original and the copy to generate the insert script?

        It’s definitely an idea, but I’m not sure how it would fit in with the experiment I ran.

        1. Yup, that’s what I’ve heard suggested at this stage. You might get some more suggestions via the feature request forum. I’ll let you know if I hear anything more at this end!

Leave a Reply

Your email address will not be published. Required fields are marked *