Table of Contents
List of Examples
Table of Contents
This book is the official user guide for ElGenerador.
ElGenerador is a tabular data generator written in java. It can generate tables with random or user supplied data. The process is configured in a XML file supplied from the command line. It can read meta-data (data dictionary) from a JDBC connection, also can write data to CSV files or another JDBC connection, all these parameters are specified in a XML file. You can attach different generators to each table column, so you can specify integer columns with a given numeric range, or string columns, or email columns, or word-list loaded columns.
It has, among others, the following features:
Reads meta-data from a JDBC data source. So you can design your database and then attach generators for each column of each table.
Writes data to a JDBC data source or a group of CSV files.
Has different kinds of generators, like the following:
Boolean Generator.
Integer Generator.
Real Generator.
String Generator.
Printf Generator.
Word-List Generator.
PhoneNumber Generator.
Script Language Generator.
You can see other generators in the generator reference, also you can create and use your own generators in a pluggable way.
Supports CSV Files, Apache Derby, MySQL, Microsoft SQL Server and PostgreSQL systems.
This software needs at least a Java SE 6 or compatible runtime (like OpenJDK). Can run in any operating system that supports the previous platform.
Just unpack the release and execute the
elgenerador.jar
with your
preferred Java runtime.
You can find further information in the project web site.
Table of Contents
In this tutorial we will explain how to use this tool with some examples.
Lets say that you want to generate just one table with the following diagram:
You have to create the following XML file:
Example 2.1. Hello World XML file.
<?xml version="1.0" encoding="UTF-8"?> <project name="HelloProject"> <metadata create="true"> <table name="HelloWorld"> <generator row-count="1000" /> <column name="id" data-type="integer"> <generator> <integer min="0" /> </generator> </column> <column name="message" data-type="string" size="15"> <generator> <script language="javascript"> value = "Hello World!"; </script> </generator> </column> <column name="some_integer" ds-type="INT"> <generator> <integer min="-24" max="45" /> </generator> </column> <column name="some_real" data-type="real"> <generator> <real min="-1" max="8" precision="3" /> </generator> </column> <column name="some_date" data-type="DATE"> <generator> <date min="01/01/01" max="01/01/10" /> </generator> </column> <column name="some_time" data-type="TiMe"> <generator> <time min="1:20 AM" max="6:30 PM" /> </generator> </column> <column name="some_phone" ds-type="varchar" size="20"> <generator> <phone template="+56-32-XXXXXXX" /> </generator> </column> <column name="some_string" data-type="STRING" size="10"> <generator> <printf format="%d-%s"> <integer min="1" max="4" /> <string length="4" /> </printf> </generator> </column> <primary-key> <column-ref column="id" /> </primary-key> </table> </metadata> <output> <csv header="true" prefix="table" suffix=".csv" /> </output> </project>
Save the file as hello.xml
. To generate
the data run the following command:
bash$
java -jar elgenerador.jarhello.xml
You will see how the program generates the data and creates in the working directory a file named tableHelloWorld.csv that contains the generated comma separated values.
The XML file always must start with a
project
tag. Optionally you can specify a name for this project.
You can specify a random-seed
attribute,
that will be user as a numeric seed for the generators,
so if you supply the same seed, the generated data
will be identical.
Then you can specify the
metadata
that defines the generated tables with its columns
and data types. If the create
attribute
is true
the tables will be re-created
in the output data source.
Define one or several
table
tags. They must have at least a name
attribute defined, and is preferrable to define the
generator
tag and set as attribute the row-count
that will define the number of rows to generate. If
this tag is not included, the program will not
generate the table.
Define one or several
column
tags.
They must have at least a name
and a data-type
or ds-type
tags defined. The tag data-type
stands
for the application generic data type, there are
several generic data types available. The tag
ds-type
stands for data-source data type,
this type can be any type that the underliying
RDBMS supports. Data types are case insensitive.
If you select any string or byte string data type
you must supply the size, if it is not supplied,
the application will take 32 characters or bytes
of length.
Define a
generator
tag. That will contain the selected column
generator. If a generator is not supplied, the
program will pick one compatible with the data
type with a default configuration.
Define the concrete generator to use, the program will not check for compatibility, for now you can put a string generator for a integer column, but it will raise an error at the data insertion moment.
Define, if it is necessary, one or more contraints
like, primary keys
or foreign keys.
In this example, id
is the primary key
column, but the program supports multiple columns
as primary keys.
Define an
output
tag that will specify the data source used for
data output.
In this case, a
csv
data source was chosen, with a prefix for its tables
and a suffix to be used. If the table is called
HelloWorld
, with this configuration, the
CSV file will be tableHelloWorld.csv
.
If you don't want to type all those xml tags to generate a simple table, you can ask the program to scan an already present meta-data from a JDBC connection to write those tags for you. Then you can edit only the column definitions to specify the generator that you want to use.
Example 2.2. JDBC Meta Data Input
<project name="InputProject"> <input> <jdbc> <driver name="postgresql"> <jar>/usr/share/java/postgresql.jar</jar> <native></native> <class>org.postgresql.Driver</class> </driver> <connection> <catalog>mbassale</catalog> <schema>public</schema> <url>jdbc:postgresql:mbassale</url> <username>mbassale</username> <password>12345</password> </connection> </jdbc> </input> </project>
Save the file as inputTest.xml
and type
the following command:
bash$
java -jar elgenerador.jar -ehello.xml
The option -e stands for extract. The
program will make the connection and read the metadata for you.
Then it will write a file called inputTest.xml.out
with the table and column definitions present in the data base.
This file can be edited and used to generate the data for those
tables.
You must supply a jdbc data source as input
specification. If the -e
switch is supplied, it will ignore the
metadata
and output
tags.
The JDBC data source is defined in two parts.
The first, driver
defines the location of the
.jar
files, native library directories
and the class name, necessary to load the
driver at runtime and make the connection.
The second, connection
defines the catalog (database name),
schema, url, username and password
to use to connect to the RDBMS.
If the password tag it is not supplied,
it will ask using the standard input.
The program will generate a file with the following contents:
Example 2.3. Input Result
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <project name="inputTest"> <metadata create="false" name="Model@7b2be1bd"> <table name="table0_pkey"> <generator row-count="0" /> <column data-type="string" ds-type="varchar" name="col0" size="30"> <generator> <string length="30" /> </generator> </column> </table> <table name="table1_pkey"> <generator row-count="0" /> <column data-type="integer" ds-type="int4" name="col0" size="10"> <generator> <integer max="2147483647" min="-2147483648" /> </generator> </column> </table> <table name="table0"> <generator priority="1" row-count="0" /> <column data-type="string" ds-type="varchar" name="col0" size="30"> <generator> <string length="30" /> </generator> </column> <column data-type="integer" ds-type="int4" name="col1" size="10"> <generator> <integer max="2147483647" min="-2147483648" /> </generator> </column> <primary-key name="table0_pkey"> <column-ref column="col0" /> </primary-key> <foreign-key foreign-table="" name="table0_col1_fkey"> <column-ref foreign="col0" local="col1" /> </foreign-key> </table> <table name="table1"> <generator row-count="0" /> <column data-type="integer" ds-type="int4" name="col0" size="10"> <generator> <integer max="2147483647" min="-2147483648" /> </generator> </column> <column data-type="real" ds-type="float4" name="col1" size="8"> <generator> <real max="1.7976931348623157E308" min="4.9E-324" precision="-1.0" /> </generator> </column> <column data-type="string" ds-type="varchar" name="col2" size="25"> <generator> <string length="25" /> </generator> </column> <primary-key name="table1_pkey"> <column-ref column="col0" /> </primary-key> </table> </metadata> </project>
As you can see, this output must be edited to suit your needs. Also it lacks an output definition, necessary to tell the application were to store the data.
Table of Contents
project — Document root element.
<project [name="some string"] [random-seed="some number"]> ... </project>
metadata — Used for meta-data definition.
<metadata [create="booleanValue
"]>
...
</metadata>
table — Define a table to be generated.
<table name="name
" [random-seed="longValue
"]> ... </table>
This element define a table to be generated. The name
attribute is obligatory and must be compatible with the
underlying data source naming conventions. A table must
have at least one column definition.
generator — Define a table generator.
<generator [row-count="count
"] [random-seed="longValue
"]> ... </generator>
This element define a table generator. The parameter row-count
define how many rows has the application to generate.
The number of rows to generate.
The random seed used when creating the random data generators.
Example 9. Table Generator example
<project name="testProject"> ... <metadata create="true"> <table name="table1"> <generator row-count="1000" random-seed="5" /> ... <column> ... </column> </table> <table name="table2"> <generator row-count="10000" /> ... <column> ... </column> </table> </metadata> ... </project>
column — Define a column to be generated.
<column name="name
" [data-type="data type
"|ds-type="data type
"] [size="number
"] [random-seed="longValue
"]> ... </column>
This element define a column to be generated. At least one
of data-type
or ds-type
must be
specified.
The column name (obligatory).
The column data type. This data types are speficic to the program.
The column data source specific type. It must be the name of a data type define in the underlying data source DBMS.
The column size or length. For character string or byte string types, this attribute should be defined or the application will pick up 32 bytes or characters as value.
The random seed used when creating the random data generators.
Example 10. Column example
<project name="testProject"> ... <metadata> <table name="table1"> <generator ... /> ... <column name="column1" data-type="string" size="60"> ... </column> </table> <table name="table2"> <generator ... /> ... <column name="column2" ds-type="INT" random-seed="0"> ... </column> </table> </metadata> ... </project>
generator — Define a column generator.
<generator> ... </generator>
This element define a a column generator. The you must explicitly define the generator to use.
Example 11. Column Generator example
<project name="testProject"> ... <metadata> <table name="table1"> <generator ... /> ... <column name="column1" data-type="string" size="60"> <generator> ... </generator> </column> </table> <table name="table2"> <generator ... /> ... <column name="column2" ds-type="INT"> <generator> ... </generator> </column> </table> </metadata> ... </project>
primary-key — Define a primary-key.
<primary-key> <column-ref ... /> ... </primary-key>
This element define a primary key used for the table. At least a column reference should be made to a column previously defined in the table.
A reference to another column, the
column
attribute must match a
column name in the same table. Multiple
column-ref
elements can be
inserted inside a primary-key
element.
Example 12. Primary Key example
<project name="testProject"> ... <metadata> <table name="table1"> <generator ... /> ... <generator row-count="1000" /> <column name="id" data-type="integer"> <generator> <integer min="0" /> </generator> </column> <column name="some_date" data-type="date"> <generator> <date /> </generator> </column> ... <primary-key> <column-ref column="id" /> <column-ref column="some_date" /> </primary-key> </table> </metadata> ... </project>
foreign-key — Define a foreign-key.
<foreign-key foreignTable="foreignTableName"> <column-ref local="localColumnName" foreign="foreignColumnName" /> ... </foreign-key>
This element define a foreign key used for the table.
At least a column reference should be made to a
column previously defined in the table and in another
table. The column reference attribute local
stands for the local column that is foreign key, and the
attribute foreign
stands for the foreign
column that is primary key in the foreign table.
The foreign-table
attribute define the name
of the referenced foreign table.
A reference to another column, the
local
attribute must match a
column name in the same table. The foreign
attribute must match a foreign table column name.
Multiple column-ref
elements can be
inserted inside a foreign-key
element.
Example 13. Foreign Key example
<project name="testProject"> ... <metadata> <table name="Suppliers"> <column name="supplier_id" data-type="integer"> ... </column> ... <primary-key> <column-ref column="supplier_id" /> </primary-key> </table> <table name="Supplier_Addresses"> <column name="supplier_address_id" data-type="integer"> </column> ... <column name="supplier_id" data-type="integer"> </column> ... <primary-key> <column-ref column="supplier_address_id" /> </primary-key> ... <foreign-key foreign-table="Suppliers"> <column-ref local="supplier_id" foreign="supplier_id" /> </foreign-key> </table> </metadata> ... </project>
csv — Defines a CSV data source.
<csv [path="/path/to/dir
"] [header="booleanValue
"] [prefix="prefix
"] [suffix="suffix
"]/>
This element defines a CSV data source, for now this data source
its user for output only. This data source will create a file for
each table in the metadata definition in the selected directory.
The name of the generated files will be:
prefix + tableName + suffix
.
The filename prefix used for the CSV files.
The filename suffix used for the CSV files.
Indicates if the generated files should have as
first column the name of the columns. true
will have the columns in the file, false
otherwise.
The path in witch generated files should go.
jdbc — Defines a JDBC data source.
<jdbc> ... </jdbc>
This element is the beggining of a JDBC data source definition.
It must be followed by driver
element and a
connection
element. A jdbc
data
source can appear at an input
or
output
definition.
driver — Defines the JDBC driver to load.
<driver> <jar>JAR-path
</jar> <native>native-path
</native> <class>jdbc-class
</class> </driver>
This attribute is for logging purposes only, the JDBC driver provides the real driver name.
The file path of the driver jar file.
Directory of native libraries files used by the driver (if necessary).
Fully qualified JDBC Driver class name.
Example 17. JDBC data source driver definition.
<project name="testProject"> ... <metadata> ... </metadata> <output> <jdbc> <driver name="derby"> <jar>/home/mbassale/devel/libs/db-derby-10.5.1.1-bin/lib/derby.jar</jar> <native></native> <class>org.apache.derby.jdbc.EmbeddedDriver</class> </driver> <connection> ... </connection> </jdbc> </output> </project>
connection — Defines the JDBC connection to make
<connection> <catalog>catalog
</catalog> <schema>schema
</schema> <url>url
</url> <username>username
</username> <password>password
</password> </connection>
The database catalog name to connect to (if applicable), most of the time a catalog name is the same as database name.
If applicable, the shema to use.
JDBC connection url (obligatory).
The username to use.
The password to use. If it is empty
(<password></password>
,
it means an empty password, if this element
is not present, it will ask the password via
command line.
Example 18. JDBC data source driver definition.
<project name="testProject"> ... <metadata> ... </metadata> <output> <jdbc> <driver name="postgresql"> <jar>/usr/share/java/postgresql.jar</jar> <native></native> <class>org.postgresql.Driver</class> </driver> <connection> <catalog>testdb</catalog> <schema>public</schema> <url>jdbc:postgresql:testdb</url> <username>mbassale</username> <password>12345</password> </connection> </jdbc> </output> </project>
Table of Contents
integer — Defines an integer generator.
<integer [min="minValue
"] [max="maxValue
"] [autoincrement="booleanValue
"] />
This element define an integer generator. This generator
generates 32 bits integer values between min
and max
. If autoincrement
is not
present or, false
, the number will be random
generated, if autoincrement
is true
,
the number will be the successor of the last value, respecting
maximum and minimum values.
The minimum value (inclusive). If it is not
present, the minimum will be the minimum value
for a java programming language int
primitive type.
The maximum value (inclusive). If it is not present,
the maximum will be the maximum value for a java
programming language int
primitive type.
Set true
to make the next value the
succesor of the previous one. Set to false
or non present to generate a random value.
Example 20. Integer generator.
<project name="postgresqlTest"> <metadata> <table name="table0"> <generator row-count="10000" /> ... <column name="col0" data-type="integer"> <generator> <integer min="-273" max="5" autoincrement="true" /> </generator> </column> ... </table> ... </metadata> ... </project>
real — Defines a real number generator.
<real [min="minValue
"] [max="maxValue
"] [precision="integerValue
"] />
This element define a real generator. This generator
generates real numbers between min
and
max
. If precision
is defined,
the number will be round to the specified number of
decimal places.
The minimum value (inclusive). If it is
not present, the minimum will be the
minimum value for a java programming
language double
primitive type.
The maximum value (inclusive). If it is not
present, the maximum will be the maximum value
for a java programming language double
primitive type.
The precision, in decimal places, of the generated value. For example, a precision of 2, will generate numbers like 3.45, 2.67, -4.5, -5.02, etc.
string — Defines a string generator.
<string [length="integerValue
"]/>
date — Defines a date generator.
<date [min="minValue
"] [max="maxValue
"]/>
This element defines a date generator. Generates random
dates from min
to max
(inclusive).
The date must be specified in US locale (mm/dd/yy).
The minimum date. If not specified, it will take the Unix Epoch (1/1/1970) as minimum date.
The maximum date. If not specified, it will take the current date.
Example 23. Date generator.
<project name="postgresqlTest"> <metadata> <table name="table0"> <generator row-count="10000" /> ... <column name="some_date" ds-type="DATE"> <generator> <date /> </generator> </column> ... <column name="some_another_date" data-type="DATE"> <generator> <date min="01/01/01" max="01/01/10" /> </generator> </column> ... </table> ... </metadata> ... </project>
time — Defines a time generator.
<time [min="minValue
"] [max="maxValue
"]/>
This element defines a time generator. Generates random
times from min
to max
(inclusive).
The time must be specified in US locale (hh:mm PM or AM).
timestamp — Defines a timestamp generator.
<timestamp [min="minValue
"] [max="maxValue
"]/>
This element defines a timestamp generator. Generates random
timestamps from min
to max
(inclusive).
The timestamp must be specified in US locale
(mm/dd/yy hh:mm PM or AM).
The minimum timestamp. If not specified, it will take "1/1/1970 00:00 AM" as minimum.
The maximum timestamp. If not specified, it will take the current date and "11:59 PM" as maximum.
Example 25. Timestamp generator.
<project name="postgresqlTest"> <metadata> <table name="table0"> <generator row-count="10000" /> ... <column name="some_timestamp" data-type="timestamp"> <generator> <timestamp min="8/15/1984 1:20 AM" max="1/1/2012 6:30 PM" /> </generator> </column> ... </table> ... </metadata> ... </project>
printf — Defines a printf style generator.
<printf [format="format
"]>
... other generators here ...
</printf>
This element defines a printf style generator. Generates
random character strings according to the printf-style
format specifier. The format syntax is similar to the
C-language function printf
(see man 2 printf).
Specify any needeed generators that will supply the data to the format specifiers.
Example 26. Printf generator.
<project name="postgresqlTest"> <metadata> <table name="table0"> <generator row-count="10000" /> ... <column name="some_string" data-type="STRING" size="10"> <generator> <printf format="%d-%s"> <integer min="1" max="4" /> <string length="4" /> </printf> </generator> </column> ... </table> ... </metadata> ... </project>
word-list — Defines a word-list generator.
<word-list file="filePath
"/>
This element defines a word-list generator. Loads the selected word-list file into memory and pick a random word to generate a string. The target column must have sufficient space to save the string.
Example 27. Word-list generator.
<project name="postgresqlTest"> <metadata> <table name="table0"> <generator row-count="10000" /> ... <column name="some_string" data-type="STRING" size="10"> <generator> <word-list file="/home/mbassale/devel/workspace/generador/censusNames.txt /> </generator> </column> ... </table> ... </metadata> ... </project>
phone — Defines a phone number generator.
<phone template="template
"/>
This element defines a phone number generator. Using the template string, generates a random phone number string.
The phone number template. For example, the template "+56-32-2XXXXXX" will generate the phone numbers "+56-32-2012316", "+56-32-2938452", etc. The 'X' character will be replaced by a random number digit.
Example 28. Word-list generator.
<project name="postgresqlTest"> <metadata> <table name="table0"> <generator row-count="10000" /> ... <column name="some_phone" ds-type="varchar" size="20"> <generator> <phone template="+56-32-XXXXXXX" /> </generator> </column> ... </table> ... </metadata> ... </project>
script — Defines a script generator.
<script language="sourceLanguage
">
... script language source code ...
</script>
This element defines a generator that executes a script to
create the next value. Every time the application needs
a new value, it will execute the script source code that it is
inside this element. For now, only a javascript engine is
available. The code execution must set the value
variable to the result of the calculation.
The javascript engine used is the JDK 1.6 scripting engine, see JSR 223 specification for details.
Inside the script you will find the following variables already set by the application:
The randomSeed used to generate the data.
An object with several methods to generate
random data initialized with the randomSeed
(**TODO** insert the function descriptions).
Example 29. Script generator.
<project name="postgresqlTest"> <metadata> <table name="table0"> <generator row-count="10000" /> ... <column name="some_phone" ds-type="varchar" size="20"> <generator> <script language="javascript"> var val = randomData.nextInt(0, 40) * randomSeed; value = "Hello World!"; </script> </generator> </column> ... </table> ... </metadata> ... </project>
ElGenerador uses software from the following organizations:
Apache Commons Lang.
Apache Commons Lang Copyright 2001-2008 The Apache Software Foundation This product includes software developed by The Apache Software Foundation (http://www.apache.org/).
Apache Commons CLI.
Apache Commons CLI Copyright 2001-2009 The Apache Software Foundation This product includes software developed by The Apache Software Foundation (http://www.apache.org/).
Apache Commons Math.
Apache Commons Math Copyright 2001-2008 The Apache Software Foundation This product includes software developed by The Apache Software Foundation (http://www.apache.org/).
SLF4J
Copyright (c) 2004-2008 QOS.ch All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
This software is released under the following conditions:
Copyright (c) 2009, Marco Bassaletti Olivos. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.