Often I come across the following problem when designing a database:
I need to test my queries, but to do that, I need some data. Sometimes large amounts in order to properly test them.
Now I could enter all of my data manually, but if I have many fields, or need a lot of diversity in my data, that will take forever.
The solution: Excel!
Let’s instead generate our data by using excel and excel formulas. For this demonstration, i will be using the following data dictionary:
ID | first_name | last_name | called_by_first | postcode | gender | DOB | |
---|---|---|---|---|---|---|---|
AutoValue INT | Text [VARCHAR[ | Text [VARCHAR] | Yes/No [Boolean] | Text [VARCHAR] | Int | Char | Date |
The ID is easy, we can just use an incrementinfg number.
First name is where it starts to get harder… To do this, I’ve actually made a custom webpage that generates any amount of space seporated names as you want! (Although >1000 names tends to lag your computer).
Then I use the “text to columns” tool in excel (Under Data>Text to columns>Delimited>Tick the “Space” option) to seporate into the first_name and last_name fields.
Next, to generate a random “Yes/No” answer, I used excels =CHOOSE
formula, which, given a number, chooses between options. For example, =CHOOSE(1, YES, NO)
would give us “Yes” and a 2 would give us “No”.
This gives us our next function:=CHOOSE(RANDBETWEEN(1,2), YES, NO)
, which randomly gives us either yes or no.
Next, our email. To do this, I made a nieve approach. To do this, i simply used the concat function=CONCAT(B2, C2, RANDBETWEEN(10, 99), "@example.gov.au")
Where B2 is our first name, and C2 our last.
The postcode is very simple, as we can just use =RANDBETWEEN(3000, 3999)
(This is the range for victorian postcodes).
Gender, once again, is fairly simple, and similar to the called_by_first
solution:
=CHOOSE(RANDBETWEEN(1,3), "M", "F", "O")
Finally, our most challenging field (in my opinion), the date. This is difficult because base demands we must format it with padded zero’s (e.g. not 3/4/2022 but 03/04/2022).
For our day, we can use the following:
=TEXT(RANDBETWEEN(1, 28), REPT("0", 2))
Which says: Join a random number with enough zero’s to make it 2 digits wide.
We can repeat this for the month: =TEXT(RANDBETWEEN(1, 12), REPT("0", 2))
And the year is fairly simple: =RANDBETWEEN(1900, 2012)
Then, we join all of these, making this monster of a formula:
=CONCAT(TEXT(RANDBETWEEN(1, 28), REPT("0", 2)), "/", TEXT(RANDBETWEEN(1, 12), REPT("0", 2)), "/", RANDBETWEEN(1900, 2012))
Now obviously this has a few holes, such as not including > 28th day of any months, but this is perfectly fine for a bit of “filler” data.
So, to recap, this is our final formula set:
first_name | last_name | called by first | postcode | Gender | DOB | |
---|---|---|---|---|---|---|
generator into text to columns | Split first name | =CHOOSE(RANDBETWEEN(1,2), YES, NO) | =CONCAT(B2, C2, RANDBETWEEN(10, 99), “@example.gov.au”)` | =RANDBETWEEN(3000, 3999) | =CHOOSE(RANDBETWEEN(1,3), “M”, “F”, “O”) | =CONCAT(TEXT(RANDBETWEEN(1, 28), REPT(“0”, 2)), “/“, TEXT(RANDBETWEEN(1, 12), REPT(“0”, 2)), “/“, RANDBETWEEN(1900, 2012)) |