Business Intelligence - Pentaho BI - Pentaho Data Integration
What is Data Integration?
Many organizations have valuable information scattered across disparate applications and databases. Often it’s hard, if not impossible, for business users to get a clear view into customer service, sales performance, internal efficiency, or global performance without an integrated view of information across these data silos. Pentaho Data Integration unlocks, cleanses and integrates this valuable information and puts it in the hands of your business users.
Access and integrate disparate data
ERP and CRM applications
Legacy systems
Databases
Scale up to meet large data requirements
Sophisticated caching
Clustering
Optimized SQL
Maximize administrator productivity
100% metadata-driven
Rich graphical user interface
Wizards to streamline common tasks
How does Pentaho facilitate Data Integration?
Pentaho Data Integration delivers powerful Extraction, Transformation and Loading (ETL) capabilities using an innovative, metadata-driven approach. The ease of use in our graphical, drag-and-drop design increases productivity and our extensible, standards based architecture ensures that you will never be forced to adopt proprietary methodologies into your ETL solution.
What is ETL?
Extract, Transform, and Load (ETL) is a process in data warehousing that involves
extracting data from outside sources,
transforming it to fit business needs (which can include quality levels), and ultimately
loading it into the end target, i.e. the data warehouse.
ETL is important, as it is the way data actually gets loaded into the warehouse. This article assumes that data is always loaded into a data warehouse, whereas the term ETL can in fact refer to a process that loads any database. ETL can also be used for the integration with legacy systems. Usually ETL implementations store an audit trail on positive and negative process runs. In almost all designs, this audit trail is not at the level of granularity which would allow to reproduce the ETL's result if the raw data were not available.