Welcome to Freedom Technologies, In this post, we’ll discuss inserting large amounts of random data into SQL tables for performance testing in the next post we will use these large tables to test the performance of queries that use Joins and subqueries.
How populate sql table with random data?
-- If Table [TblStudent] and [TblDonation] exists then drop the tables If (Exists (select * from information_schema.tables where table_name = 'TblStudent')) Begin Drop Table TblStudent End If (Exists (select * from information_schema.tables where table_name = 'TblDonation')) Begin Drop Table TblDonation End -- Recreate [TblStudent] and [TblDonation] tables CREATE TABLE [dbo].[TblStudent]( [Id] [int] IDENTITY(1,1) NOT NULL, [StudentName] [nvarchar](max) NOT NULL, [FatherName] [nvarchar](max) NOT NULL, CONSTRAINT [PK_TblStudent] PRIMARY KEY CLUSTERED ( [Id] ASC ) ) CREATE TABLE [dbo].[TblDonation]( [Id] [int] IDENTITY(1,1) NOT NULL, [StudentId] [int] NULL, [DonationAmount] [float] NULL, CONSTRAINT [PK_TblDonation] PRIMARY KEY CLUSTERED ( [Id] ASC ) ) GO --We are Inserting Sample dummy data into the TblStudent table Declare @Id int Set @Id = 1 While(@Id <= 200000) Begin Insert into TblStudent values('Student Name - ' + CAST(@Id as nvarchar(20)), 'Father Name - ' + CAST(@Id as nvarchar(20))) Print @Id Set @Id = @Id + 1 End -- Declare variables to hold a random StudentId from the TblStudent table, -- DonationAmount declare @RandomStudentId int declare @RandomDonationAmount float -- Declare and set variables to generate a -- random StudentId between 1 and 200000 declare @UpperLimitForStudentId int declare @LowerLimitForStudentId int set @UpperLimitForStudentId = 1 set @LowerLimitForStudentId = 200000 -- Declare and set variables to generate a -- random UnitPrice between 10 and 500 declare @UpperLimitForDonationAmount int declare @LowerLimitForDonationAmount int set @LowerLimitForDonationAmount = 10 set @UpperLimitForDonationAmount = 500 --Insert Sample data into TblDonation table Declare @Counter int Set @Counter = 1 While(@Counter <= 500000) Begin select @RandomStudentId = Round(((@UpperLimitForStudentId - @LowerLimitForStudentId) * Rand() + @LowerLimitForStudentId), 0) select @RandomDonationAmount = Round(((@UpperLimitForDonationAmount - @LowerLimitForDonationAmount) * Rand() + @LowerLimitForDonationAmount), 0) Insert into [TblDonation] values(@RandomStudentId,@RandomDonationAmount) Print @Counter Set @Counter = @Counter + 1 End
There are three parts to the SQL script.
The first part will actually check if the tables exist, TblStudent and TblDonation
If the tables exist, they will be dropped. And then the second part, we recreate those tables and the final part will insert random sample data into those tables. Let’s see how we are checking for the existence of those tables and dropping them.
So to check the existence of tables, I’m using the system table called information_schema.tables. Let’s see what this table contains. So let’s execute query, select * from information_schema.tables
And look at this. This query has returned me with a list of table names. Where are these tables present? These tables are present in the TestDatabase because we executed a query within the context of the TestDatabase database.
I got all the tables that are present within that TestDatabase database and we are checking here if the table name is TblStudent. So do we have TblStudent within information_schema.tables? Yes, and we are passing that to Exists() function.
Exist() function will return true if the table exists, otherwise, it’s going to return false. So if this function returns true, that means the table is present. We want to drop it. So drop table TblStudent.
then the second part will actually recreate those tables, create table TblStudent and TblDonation table. and the important thing to notice here is that StudentId is a foreign key. So this is referring to the ID column in the TblStudent table.
So the third and final step is to actually insert sample data, inserting sample data into tables. It’s very simple and straightforward.
So if you look at the TblStudent table, it has got three columns within that id, StudentName , FatherName Id column is the identity column.Meaning when we insert a row into table TblStudent, we don’t have to supply a value for the identity column.
We only need to supply values for name and description columns which are of type and [nvarchar]. so the name is going to be dynamically computed. So the name of the student, since this random data I’m going to call my student says student1,student2,student3 …Studentn, etc.. Similarly, the father’s name description is going to be Father Name 1, Father Name 2, etc.
So while we declare a variable of type integer @Id and the @Id is initialized to 1, so while that @Id is less than or equal to 200000, which means we are going to loop through 200000 times, I’m going to insert 200000 students into the TblStudent table. So what we are doing insert into TblStudent values. So first we need to supply the value for the name column and then for the first name.
And to that what we are doing, we are concatenating the @Id value first-time @Id value will be 1. We are converting that to nvarchar because @Id is an integer variable. But we are since we are concatenating that to another string, we need to typecast that to be of type nvarchar, which is what we are doing here.
So that will give value for the name column, Student Name-1, Student Name-2, etc.
And ‘Father Name – ‘ + CAST(@Id as nvarchar(20)) will give me Father Name -1, which is going to be the value for the Father Name column.
Now you execute the above query it will take approx 2-3 minutes.