Difference between revisions of "Transforming Data with Kettle"

From Opentaps Wiki
Jump to navigationJump to search
(Testing your Configuration)
 
(8 intermediate revisions by 2 users not shown)
Line 5: Line 5:
 
== Setting up Kettle ==
 
== Setting up Kettle ==
  
If you want to design transformation with Kettle, you have first to setup your databases connections. To do this you have to edit the file simple-jndi/jdbc.properties in the Kettle installation directory.
+
=== Setting up jdbc.properties ===
  
Here is an example on how to setup this file
+
If you want to design transformation with Kettle, you have first to setup your databases connections.  To do this you have to edit the file <tt>simple-jndi/jdbc.properties</tt> in the Kettle installation directory.  (On OS X it was <tt>Kettle.app/Contents/Resources/simple-jndi/jdbc.properties</tt>)  It should define connections to your operational ERP (opentaps) database and to your opentaps analytics database.  Usually, you can copy over the file config/jdbc.properties from opentaps analytics.  Here is an example:
 +
<pre>
 +
operational/type=javax.sql.DataSource
 +
operational/driver=com.mysql.jdbc.Driver
 +
operational/url=jdbc:mysql://127.0.0.1/opentaps_testing
 +
operational/user=opentaps
 +
operational/password=opentaps
 +
analytics/type=javax.sql.DataSource
 +
analytics/driver=com.mysql.jdbc.Driver
 +
analytics/url=jdbc:mysql://localhost/analytics_1_7
 +
analytics/user=opentaps
 +
analytics/password=opentaps
 +
analytics_testing/type=javax.sql.DataSource
 +
analytics_testing/driver=com.mysql.jdbc.Driver
 +
analytics_testing/url=jdbc:mysql://localhost/opentaps_analytics_testing
 +
analytics_testing/user=operational
 +
analytics_testing/password=operational
 +
</pre>
 +
 
 +
=== Configuring Databases ===
 +
 
 +
When you first start Spoon, you will need to set up your databases.  Click on the [New] next to the "Repository" input, and then click on "New" to set up a database connection.  You would need to set up to database connections, one for analytics and one for the operational database.  Select the type of the database, set JNDI as the method of access, and then entered the database name as it is described in your jdbc.properties file above:
 +
[[Image:Kettle_database_configuration.png]]
  
<pre>
+
Then, to start Spoon, click on the [No Repository]  button. You do not need a repository database because your transformations are stored in .ktr files in the opentaps analytics application, not in a kettle transformations database.
opentaps/type=javax.sql.DataSource
 
opentaps/driver=org.postgresql.Driver
 
opentaps/url=jdbc:postgresql://127.0.0.1/ofbiz
 
opentaps/user=ofbiz
 
opentaps/password=ofbiz
 
  
opentaps_analytics/type=javax.sql.DataSource
 
opentaps_analytics/driver=com.mysql.jdbc.Driver
 
opentaps_analytics/url=jdbc:mysql://localhost/opentaps_analytics
 
opentaps_analytics/user=opentaps
 
opentaps_analytics/password=opentaps
 
  
opentaps_analytics_testing/type=javax.sql.DataSource
 
opentaps_analytics_testing/driver=com.mysql.jdbc.Driver
 
opentaps_analytics_testing/url=jdbc:mysql://localhost/opentaps_analytics_testing
 
opentaps_analytics_testing/user=opentaps
 
opentaps_analytics_testing/password=opentaps
 
</pre>
 
  
 
== Designing Transformation and Job Templates ==
 
== Designing Transformation and Job Templates ==
Line 39: Line 45:
 
### Test the connection and ensure it works
 
### Test the connection and ensure it works
  
On the right is a screenshot of the way to configure a database.
+
You can use templates/opentaps/DateDimensionSeedTemplate.ktr as a starting pointImport it into Spoon, change the database configuration if necessary, and edit the template.
 
 
[[Image:Kettle_database_configuration.png|thumb]]
 
 
 
I have created a kettle template for generating the seed dataYou can import this in Spoon and change the database configuration if necessary.  That is only if you need to edit the template.
 
  
 
== Editing a Transformation ==
 
== Editing a Transformation ==
Line 51: Line 53:
 
== Debuging a Transformation ==
 
== Debuging a Transformation ==
  
After clicking on the run button in the menubar, there is an option to activate logs with more or less verbosity.
+
If you have your databases configured correctly, you can run your transformation in Spoon to debug the results.  You can click on the [Run] button at the top of the menu, watch the results, which will show you how many rows of data were read and written.  This will help you identify any steps that may have failed.  If a particular step has encountered an error, it will be highlighted in red.  You can click on this step and then click on show logs to see the log file of the errors.
  
You can use the preview button in the menubar, which let watch the whole result of one step
+
You can also use the [Preview] button, which will let you select steps in the transformation and see the rows of data after a particular step has been executed.

Latest revision as of 17:23, 14 August 2009

These are preliminary notes.

Setting up Kettle

Setting up jdbc.properties

If you want to design transformation with Kettle, you have first to setup your databases connections. To do this you have to edit the file simple-jndi/jdbc.properties in the Kettle installation directory. (On OS X it was Kettle.app/Contents/Resources/simple-jndi/jdbc.properties) It should define connections to your operational ERP (opentaps) database and to your opentaps analytics database. Usually, you can copy over the file config/jdbc.properties from opentaps analytics. Here is an example:

operational/type=javax.sql.DataSource
operational/driver=com.mysql.jdbc.Driver
operational/url=jdbc:mysql://127.0.0.1/opentaps_testing
operational/user=opentaps
operational/password=opentaps
analytics/type=javax.sql.DataSource
analytics/driver=com.mysql.jdbc.Driver
analytics/url=jdbc:mysql://localhost/analytics_1_7
analytics/user=opentaps
analytics/password=opentaps
analytics_testing/type=javax.sql.DataSource
analytics_testing/driver=com.mysql.jdbc.Driver
analytics_testing/url=jdbc:mysql://localhost/opentaps_analytics_testing
analytics_testing/user=operational
analytics_testing/password=operational

Configuring Databases

When you first start Spoon, you will need to set up your databases. Click on the [New] next to the "Repository" input, and then click on "New" to set up a database connection. You would need to set up to database connections, one for analytics and one for the operational database. Select the type of the database, set JNDI as the method of access, and then entered the database name as it is described in your jdbc.properties file above: Kettle database configuration.png

Then, to start Spoon, click on the [No Repository] button. You do not need a repository database because your transformations are stored in .ktr files in the opentaps analytics application, not in a kettle transformations database.


Designing Transformation and Job Templates

To create our date dimension seed data using Spoon, the steps are,

  1. Create a New Transformation using New -> New Transformation
  2. On the left, you'll see three collapsed sections, Main Tree, Core Objects and Favorite Steps. Expand the Main Tree section.
    1. Under Transformations right click Database Connections and select New
      1. Specify the connection information to opentaps_analytics, our data warehouse. Also ensure characterEncoding of UTF8 in options as above.
      2. Test the connection and ensure it works

You can use templates/opentaps/DateDimensionSeedTemplate.ktr as a starting point. Import it into Spoon, change the database configuration if necessary, and edit the template.

Editing a Transformation

To edit an existing transformation in Spoon, use File > Import from an XML file, and import the file (for example, StoreDimensionTemplate.ktr from analytics/templates/ directory.) When you are done, use File > Export to an XML file to save the transformation as an XML file again, instead of storing it in the database.

Debuging a Transformation

If you have your databases configured correctly, you can run your transformation in Spoon to debug the results. You can click on the [Run] button at the top of the menu, watch the results, which will show you how many rows of data were read and written. This will help you identify any steps that may have failed. If a particular step has encountered an error, it will be highlighted in red. You can click on this step and then click on show logs to see the log file of the errors.

You can also use the [Preview] button, which will let you select steps in the transformation and see the rows of data after a particular step has been executed.