Transforming Data with Kettle
These are preliminary notes.
Setting up Kettle
If you want to design transformations between the opentaps enterprise database and the warehouse, then it is recommended to also set up a database for the Pentaho Data Integration project (also known as Kettle).
create database opentaps_analytics_kettle default character set utf8 collate utf8_general_ci; grant all privileges on opentaps_analytics_kettle.* to 'opentaps'@'localhost' identified by 'opentaps';
Next we need to start Spoon and have it populate the kettle database with information. When you start Spoon, create a new repository named Opentaps Analytics Kettle. We will have to add a Database Connection, call it Opentaps Analytics Kettle. Specify the connection information to localhost and add the following option under the Options tab,
MySQL characterEncoding UTF8
You can use the Test button to make sure you can connect. Once done, use the Create or Update button to populate this database with kettle seed data. Now you can log in with user admin password admin.
Designing Transformation and Job Templates
To create our date dimension seed data using Spoon, the steps are,
- Create a New Transformation using New -> New Transformation
- On the left, you'll see three collapsed sections, Main Tree, Core Objects and Favorite Steps. Expand the Main Tree section.
- Under Transformations right click Database Connections and select New
- Specify the connection information to opentaps_analytics, our data warehouse. Also ensure characterEncoding of UTF8 in options as above.
- Test the connection and ensure it works
- Under Transformations right click Database Connections and select New
I have created a kettle template for generating the seed data. You can import this in Spoon and change the database configuration if necessary. That is only if you need to edit the template.