The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleanin
|
| List Price: | $45.00 |
| Price: | $32.18 & eligible for FREE Super Saver Shipping on orders over $25. Details |
Availability: Usually ships in 24 hours
Ships from and sold by Amazon.com
56 new or used available from $26.55
Average customer review:Product Description
* Cowritten by Ralph Kimball, the world's leading data warehousing authority, whose previous books have sold more than 150,000 copies
* Delivers real-world solutions for the most time- and labor-intensive portion of data warehousing-data staging, or the extract, transform, load (ETL) process
* Delineates best practices for extracting data from scattered sources, removing redundant and inaccurate data, transforming the remaining data into correctly formatted data structures, and then loading the end product into the data warehouse
* Offers proven time-saving ETL techniques, comprehensive guidance on building dimensional structures, and crucial advice on ensuring data quality
Product Details
- Amazon Sales Rank: #102527 in Books
- Published on: 2004-09-13
- Original language: English
- Number of items: 1
- Binding: Paperback
- 528 pages
Features
- ISBN13: 9780764567575
- Condition: NEW
- Notes: Brand New from Publisher. No Remainder Mark.
- Click here to view our Condition Guide and Shipping Prices
Editorial Reviews
From the Back Cover
The single most authoritative guide on the most difficult phase of building a data warehouse
The extract, transform, and load (ETL) phase of the data warehouse development life cycle is far and away the most difficult, time-consuming, and labor-intensive phase of building a data warehouse. Done right, companies can maximize their use of data storage; if not, they can end up wasting millions of dollars storing obsolete and rarely used data. Bestselling author Ralph Kimball, along with Joe Caserta, shows you how a properly designed ETL system extracts the data from the source systems, enforces data quality and consistency standards, conforms the data so that separate sources can be used together, and finally delivers the data in a presentation-ready format.
Serving as a road map for planning, designing, building, and running the back-room of a data warehouse, this book provides complete coverage of proven, timesaving ETL techniques. Beginning with a quick overview of ETL fundamentals, it then looks at ETL data structures, both relational and dimensional. The authors show how to build useful dimensional structures, providing practical examples of techniques.
Along the way you’ll learn how to:
- Plan and design your ETL system
- Choose the appropriate architecture from the many possible options
- Build the development/test/production suite of ETL processes
- Build a comprehensive data cleaning subsystem
- Tune the overall ETL process for optimum performance
About the Author
RALPH KIMBALL, PhD, founder of the Kimball Group, has been a leading visionary in the data warehousing industry since 1982 and is one of today’s best-known speakers and educators. He is the author of several bestselling titles published on data warehousing, including The Data Warehouse Toolkit (Wiley).
JOE CASERTA is the founder of Caserta Concepts, LLC, a data warehousing consulting firm. He writes frequently for print and online magazines, and is an active contributor to DWList, the major online community for data warehousing professionals.
Customer Reviews
A survival Guide and a Must
A survival guide and a must have for every data warehouse architect. This book is written for architects - not for ETL developers. Written from the 10,000 foot level, many of the architectures and designs are `nice to haves' and would require tremendous commitments in resources to be implemented and thus may be too lofty for many organizations. HOWEVER, it is best to have a theoretical bulls-eye, a target to shoot for, and try to make small baby steps towards implementing the optimal solution, then not have a hypothetical utopia at all to strive for.
Looking for a comparison of ETL tools and which ones do what best? You will not find this here.
A great resource for DW Architects who may have many years of experience working on data warehouse projects but may have not had the opportunity of implementing some more elaborative meta data driven cleaning and conforming schemas - a truly interesting approach yet I'm not sure Ralph Kimball's design with the `survivorship support metadata' schema, could perform fast enough for some of the large data warehouse loading needs of larger organizations.
Separating critical issues from insignificant ones is difficult from the reading, however, the framework and methodical approach to the steps of Extract => Clean => Conform => Deliver and the role and responsibilities of the actors, i.e, DW Architect, dimensional manager, fact table provider, ect., give the reader/architect some clear division of duties more then likely not clearly defined within the corporation.
Plenty of ERDs, actual SQL statements, templates and diagrams to use in your existing projects.
(...)
Another strong Data Warehousing book from Ralph Kimball
In this book Ralph lays down a framework for constructing the DW ETL. This is useful not just in constructing quality ETL processes, but also because Ralph's works tend to 'set' standards in data warehousing. The format of this book is similar to the Lifecycle Toolkit. Ralph takes a very staged, logical approach to the material. Some sections are just great e.g. the chapters on Extraction and Development. A small amount of the material is repeated from the Lifecycle Toolkit and Dimensional Modeling books, but no more than is needed to make this book stand on its own.
Also like the other books, this one takes a vendor agnostic approach. While this may increase the shelf-life of the book, I would have appreciated some comparisons between the major vendors out there today.
Overall: I recommend this one as a buy, even if you have Ralph's other books.
An almost complete dwh design with ETL orientation
This book takes almost all issues in a data warehouse design and represents them oriented to ETL features. Actually, ETLing matches the whole of the data warehouse (more or less), so the need to describe them makes this book an autonomous work you can read without referring to previous books by Kimball. Besides, I think that some technical descriptions have been better performed here: in my experience it is impossible to undertake dwh activities without (at least) a sound knowledge about general features (indexes, use of a bulk loader vs. INSERT, etc.) of RDBMS, and this paper addresses them conveniently. On the other hand, the flat style used lacks to give evidence to the very significant issues, which happen so to be mixed up with less important statements; that demands to pay high attention while reading, but a blurring boundary between subtleties and trivialities seems to be a common shortcoming in dwh literature. Even with that flaw, the ETL Toolkit turn out as an outstanding reference to state of the art of dwh technology.




