Java/Groovy Web Crawler Framework

Java core framework for building scriptable Web crawlers with Groovy.

Activity Feed (RSS), Repository Explorer (SVN)

svn checkout http://svn.hyperkit-software.com/webcrawler/

Introduction

The aim of this project is to provide a stable Web crawler core programmed in Java. What the core is actually used for is implemented in the scripting language Groovy. This way you get the best from both worlds: (1) A strongly typed, compiled and architected core for reliability and (2) a dynamically interpreted data extraction and processing logic for agility. One use case is already delivered with the source code: A crawling example for ImmobilienScout24, a German real estate website.

Tags (5)

Screenshots (3)

Tutorials (0)

None yet.

Activities (Subversion)

Actions

Activity
added  
modified  
deleted  
replaced  
23
14
1
3
Month 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
Year 2009 2011

Developers

Activity
georg  
23
18
Month 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
Year 2009 2011

Files (Subversion)

Filename Size Author Time Revision
pom.xml 3 KB georg 2011/05/09 3
src 0 Byte georg 2011/05/09 3
main 0 Byte georg 2011/05/09 3
groovy 0 Byte georg 2011/05/09 3
immobilienscout24 0 Byte georg 2011/05/09 3
Expose.groovy 2 KB georg 2011/05/09 3
List.groovy 1 KB georg 2011/05/09 3
java 0 Byte georg 2011/05/09 3
com 0 Byte georg 2011/05/09 3
hyperkit 0 Byte georg 2011/05/09 3
crawler 0 Byte georg 2011/05/09 3
Manager.java 994 Byte georg 2011/05/09 3
Program.java 1 KB georg 2011/05/09 3
Task.java 2 KB georg 2011/05/09 3