ETL - principles, applications, tools XML interfaces for PHP PB138 - Markup Languages Tomas Pitner May 11, 2013 To i ETL - principles, applications, tools XML interfaces for PHP Q ETL - principles, applications, tools $ XML interfaces for PHP ETL - principles, applications, tools XML interfaces for PHP ETL: Extract-Transform-Lo Extract-Transform-Load (ETL) are data integration practices and tools: Extract data-mining from different sources, different data formats, ... Transform transformation of data to a desired form Load loading/storing of data to/from a target database/data warehouse To i ETL - principles, applications, tools XML interfaces for PHP ETL Applications ETL tools has many application areas today: O Different sources and formats data integration (text documents, CSV, XLS spreadsheets, databases, XML data, ...) O Data consolidation (transformations of data and data " cleaning") O Storing of data into huge databases - data warehouses for management applications O Data migration (data transfers to different platforms, databases, etc. ETL systems are called as "a critical building block to a successful business intelligence deployment". ETL - principles, applications, tools XML interfaces for PHP Implementation ■ There is a lot of (not only Java-based) implementations, many have a GUI, allowing to graphically design transformation flows. Clover ETL http://www.cloveretl.org - open source ETL tool including GUI (http://www.cloveretl.org/ _img/clovergui/Graf.png) Microsoft SQL Server Integration Services http://www.microsoft.com/sql/technologies/ integration/default.mspx Octopus Java/XML ETL Tool http://octopus.enhydra.org/ java-etl http://code.google.com/p/java-etl/ Kettle http://kettle.pentaho.org/ To i ETL - principles, applications, tools XML interfaces for PHP Examples - Clover ETL Commercially developed (Javlin (http://www.javlin.cz) company, Fl industry partner, http://www.cloveretl.org) open-source tool containing: Clover Engine The kernel processing transformations. Contains connectors to external data sources and targets. Clover Server deployment platform for transformation execution (incl. planing and monitoring) in a real life. Clover Designer tool for graphical design of transformation graphs (based on Eclipse platform). To i ETL - principles, applications, tools XML interfaces for PHP Clover ETL - Designer I Ckjver.ETL-Ccitipan;- . . .-........... liWfa iC^ = il e Edit Clover.ETL Navigate ?ear-:h Prcje-:t Run Fielt! assist in (Id, Help ö - y a I »f I °3 & e° I °li D?5 □?? I B DDI I iura ardersForCustomers-.grf £3 J- Navigator S3N B data-tmp E3 Orders ForCustomers.grf . lookup & meta customertfrnt odetails.DBF.fmt orders BF.fmt orderslnfc.fmt B seq & trans [>] ,cla«p-ath 1 .project [H worltspace.prrni "fJE Outline " n' B Compon.rrt. Compares all orders with specified customers Palette i Edge QNot, ^Readers 0 J— IM. ORDERS • + ; EKT.SORT.ORD-EFiS 0 : S» ■ = JOIN.ORDERS j IN_D ETAILS • # ; EXT_S QRT_DETAILS IrJ.CUSTOMERS £■ UNQUOTE ■ a ■ = JOIM.CUSTOMER s^UniversalDataRe.. [^Delimited Data Re. ij Ccn:cle Prcblems Clever - P.egei Tester Clover - Graph tracking Clover - Log Q □ To i ETL - principles, applications, tools XML interfaces for PHP Questions ETL implementation and deployment on a huge data involves some problems, that are not usual in a different areas: • transformations should be optimized to a speed as well as to allow huge data processing. • effective memory models used to (temporal) store XML data -unable to use common "in memory" tree models. • definability, maintainability and verifiability of wide transformation networks - visual tools + formal methods To i Principially the same as in Java, there are: tree-oriented interfaces DOM (http: //php.net/manual/en/book.dom.php) full repertoire of operations (read, validate, write incl. prettyprinting, programmatic creation of docs, elements, etc.) stream-based (pull) SimpleXML (http: //php.net/manual/en/book.simplexml.php)-since PHP 5.0 part of the core PHP, very simple an frequently used interface, enables direct iteration (traversal) through XML elements, direct evaluation of XPath expressions etc. Also see SimpleXML (PHP) at W3Schools (http://www.w3schools.com/php/php_xml_ simplexml.asp) event-driven SAX ETL - principles, applications, tools XML interfaces for PHP Example (1) - DOM The following code reads (analyses, "parses") XML document and writes it back to file (serializes it). $dom = new DDMDocument(); // configuration for read $dom->preserveWhiteSpace = FALSE; $dom->load(;input.xml;); // configuration for write $dom->formatOutput = TRUE; $dom->encoding = ;utf-8;; $dom$\to$save(;output.xml;); To i ETL - principles, applications, tools XML interfaces for PHP Example (2) - SAX The following code reads an XML file with book records and prints info about htem them (from Reading and writing the XML DOM with PHP Using the DOM library, SAX parser and regular expressions, Jack Herrington, IBM 2005) XPath Example xpath(V/title;) as $title) { echo "

" . $title . "

"; } ?> To i DOM Very good (English written) intra to XML in PHP at IBM Developerworks: Reading and writing the XML DOM with PHP (http://www.ibm.com/ developerworks/library/os-xmldomphp/) SimpleXML Elliotte Rusty Harold: SimpleXML processing with PHP A markup-specific library for XML processing in PHP (http://www.ibm.com/developerworks/ library/x-simplexml.html) XML in PHP Jiří Kosek /in Czech/: Very good Czech-written intra series on XML in PHP Processing by Jirka Kosek (www.zdrojak.cz) (http://www.zdrojak. cz/serialy/prehled-podpory-xml-v-php5/) ETL - principles, applications, tools XML interfaces for PHP Jin Kosek: PHP a XML (in Czech) Grada Publishing, 2010 - excellent, well readable, educative, includes not only info on PHP processing but in general on XML: Schema, Relax NG, XSLT, web services To i