Programming Spiders, Bots, and Aggregators in Java
|
| Price: |
21 new or used available from $12.64
Average customer review:Product Description
Spiders, bots, and aggregators are all so-called intelligent agents, which execute tasks on the Web without the intervention of a human being. Spiders go out on the Web and identify multiple sites with information on a chosen topic and retrieve the information. Bots find information within one site by cataloging and retrieving it. Aggregrators gather data from multiple sites and consolidate it on one page, such as credit card, bank account, and investment account data. This book offer offers a complete toolkit for the Java programmer who wants to build bots, spiders, and aggregrators. It teaches the basic low-level HTTP/network programming Java programmers need to get going and then dives into how to create useful intelligent agent applications. It is aimed not just at Java programmers but JSP programmers as well. The CD-ROM includes all the source code for the author's intelligent agent platform, which readers can use to build their own spiders, bots, and aggregators.
Product Details
- Amazon Sales Rank: #1070570 in Books
- Published on: 2002-02
- Original language: English
- Number of items: 1
- Binding: Paperback
- 512 pages
Editorial Reviews
From the Back Cover
The content and services available on the web continue to be accessed mostly through direct human control. But this is changing. Increasingly, users rely on automated agents that save them time and effort by programmatically retrieving content, performing complex interactions, and aggregating data from diverse sources. Programming Spiders, Bots, and Aggregators in Java teaches you how to build and deploy a wide variety of these agents-from single-purpose bots to exploratory spiders to aggregators that present a unified view of information from multiple user accounts.
You will quickly build on your basic knowledge of Java to quickly master the techniques that are essential to this specialized world of programming, including parsing HTML, interpreting data, working with cookies, reading and writing XML, and managing high-volume workloads. You'll also learn about the ethical issues associated with bot use--and the limitations imposed by some websites.
This book offers two levels of instruction, both of which are focused on the library of routines provided on the companion CD. If your main concern is adding ready-made functionality to an application, you'll achieve your goals quickly thanks to step-by-step instructions and sample programs that illustrate effective implementations. If you're interested in the technologies underlying these routines, you'll find in-depth explanations of how they work and the techniques required for customization.
About the Author
Jeff Heaton is a computer programmer, college instructor, and author. Currently a Java software designer for the Reinsurance Group of America, Inc. (RGA), he has previously applied his web, database, and artificial intelligence programming skills on behalf of a number of companies, including MasterCard, Anheuser-Busch, and Boeing. Jeff teaches Java and C++ at St. Louis Community College at Meramec and is a graduate student at Washington University in St. Louis.
Customer Reviews
not for serious programmers
The code presented in this book is painful to look at. For one thing, the author is not familiar with basic Java coding conventions and continues to use C conventions instead.
In addition to not knowing proper coding conventions, this guy has no clue about writing Java UIs - the code listed in this book actually has Visual Cafe tags all over the place!
As far as info regarding spiders/bots/aggregators - there is decent high level overview info in this book, but nothing for a real programmer. You will not learn how to build these things on your own, and the book relies on the helper libraries included on the cd-rom to accomplish anything. If you are hoping to build anything useful after purchasing this book, understand that you will only succeed if you include the com.heaton.* libraries included on the cd.
Lots of working code but not much of a tutorial
Bots are the simplest form of Internet-aware programs in that they simply carry out a repetitive task once unleashed on the web. A spider travels the web in a complex fashion, moving from one part of the World Wide Web to another collecting information from one site and then jumping to another based on that information. An aggregator is a bot that is designed to log into several user accounts and retrieve similar information.
If you need a complete bot, spider, or aggregator written in Java, complete with source code and a detailed manual about that source code so that you can customize it to suit your needs, this is a five star book. However, if you are looking for a book about information storage and retrieval and network programming that focuses on the theory of operation of such software with application code written in Java, you will be sorely disappointed.
The author did such a fine job of documenting his work with excellent diagrams, comments, and the book that reads like a user's manual, that I easily took his Web spider code and modified it to perform many additional tasks that his basic package does not do. All of the hooks are available in his code for you to modify it to collect or examine just about any kind of data accessible via the web.
I highly recommend this book if you are taking an information storage and retrieval class and you would like to read and study something applied on spiders, bots, and aggregators versus the theory you get in most textbooks. Just understand you are getting code plus a user's manual, not a tutorial. You are definitely going to need other resources on Java network programming if you want to study, understand, or modify the included source code. I suggest the latest edition of "Java Network Programming" by Elliotte Rusty Harold for help with the network programming part of bots, spiders, and aggregates. I also suggest you look at "Spidering Hacks", which has many good ideas of features you can add to your web spider.
Create a Object Oriented Bot Package Step by Step
I use this book as a supplement to a class that I teach, as it gives the students the necessary stills to programmatically spider, and generally access, information on the Net.
As some of the other reviewers point out, this book does center around the creation of a "bot package". However, I see this as one of the book's greatest strengths. The author explains step by step how to take basic concepts, continually build upon them, progressing onward to more complex spiders and bots. Specifically:
1. Create an advanced HTTP object that overcomes many of the shortcomings of the one which is built into Java. (namely cookie support, referrer support, HTTP authentication, and more)
2. Add forms/page processing on top of the HTTP object. You are shown step by step how to process the data you collect from step 1.
3. Create a bot that wields the page/form processing created in step 2.
4. Create a spider, that, using steps 1-3, can access pages across an entire site.
5. Expand the spider to support thread pooling and a JDBC database.
Rather than providing a bunch of disjoint code samples, like many books do. The author guides you step by step through the above path, revealing the techniques at every step. For the reader who does not care about the intricate nature of bot programming, sadly, some of my students. You can skip to the API documentation and get right onto creating your own bots. You can also download updated versions of the "bot package" from the author's site. I actually did this before buying the book.
The downsides to the book are the example programs use of GUI's. I would rather every example had been straight console, the GUI only gets in the way, for a book targeting bot programming. Also the author very annoyingly putting an underscore in front of every class-instance variable, which gives some of the code something of a C++ look I suppose.
If you are already programming bots and spiders of your own, I don't think you will get much more from this book than you are likely already doing.
But for someone who wants to get started in this exciting area, there is nothing else like it, and I highly recommend it.




