A Swiss-army knife library for scraping and processing data from the web
Find a file
Dustin Thomas de2df08e6b
Some checks failed
/ build (push) Failing after 1m4s
fix compile error
2025-05-05 17:42:19 -05:00
.bsp Switch to docker-in-docker sbt image 2025-05-04 18:03:32 -05:00
.forgejo/workflows Switch to docker-in-docker sbt image 2025-05-04 18:03:32 -05:00
.idea fix build errors in Java 17 2025-05-04 14:10:24 -05:00
core/src/main/scala fix compile error 2025-05-05 17:42:19 -05:00
docs update docs some 2025-05-02 13:39:01 -05:00
ext fix build errors in Java 17 2025-05-04 14:10:24 -05:00
project add kafka datastore file and rename module 2025-01-21 17:33:01 -06:00
src/test/scala cleanup comments 2025-05-04 13:51:11 -05:00
.gitignore initial commit 2025-01-21 17:20:31 -06:00
build.sbt first working test 2025-05-02 23:26:50 -05:00
LICENSE_GPL.md relicense under LGPLv3 2025-05-02 16:30:21 -05:00
LICENSE_LGPL.md relicense under LGPLv3 2025-05-02 16:30:21 -05:00
README.md relicense under LGPLv3 2025-05-02 16:30:21 -05:00

aggregation-framework

A Swiss-army knife library for scraping and processing data from the web. Provides a unified interface for multiple different HTTP clients, and convenience functionality for parsing and preprocessing data for your applications to use.

  • Quickly build HTTP requests for a variety of data formats and APIs.
  • Parse common data formats such as XML, HTML, and JSON.
  • Push your aggregated data automatically to your preferred database (such as Kafka, MySQL, or Postgres).
  • Write your own collectors for non-standard data formats.
graph LR
    EXT1[(External HTTP API)]
    EXT2[(External HTTP API)]
    EXT3[(External HTTP API)]

    COL1[/Collector/]
    COL2[/Collector/]
    COL3[/Collector/]
    
    DB[(Application Database)]
    BE1[Backend Application]
    BE2[Backend Application]
    BE3[Backend Application]
  
    subgraph AP[Aggregation Framework]
        COL1
        COL2
        COL3
    end
    
    EXT1 --> COL1
    EXT2 --> COL2
    EXT3 --> COL3
    
    COL1 & COL2 & COL3 --> DB --> BE1 & BE2 & BE3

Get Started

Add Aggregation Framework and your preferred extensions to your project. For sbt:

// add Forge as a resolver
resolvers += "Gitea Package API" at "https://forge.cptlobster.dev/api/packages/cptlobster/maven"

libraryDependencies += "dev.cptlobster" %% "aggregation-framework-core" % "0.1.0-SNAPSHOT"
// for JSON parsing
libraryDependencies += "dev.cptlobster" %% "aggregation-framework-json" % "0.1.0-SNAPSHOT"

Note: Snapshot versions are available here at forge.cptlobster.dev. Release versions will be made available on Maven Central at a future date.

To create a consumer, follow the tutorial.

Target Artifacts

The project is split into a collection of packages. These are split so that you don't have to install a ton of external packages that you aren't going to use.

The core package is located under /core in this repository, and the extension packages are located under their own subdirectories in /ext. Each extension package has its own README that describes it in more detail.

graph BT
    CORE[aggregation-framework-core]
    JSON[aggregation-framework-json]
    KAFKA[aggregation-framework-kafka]
    SEL[aggregation-framework-selenium]
    RUNNER[aggregation-framework-runner]
    CORE --> JSON & KAFKA & SEL & RUNNER

Development

This project uses sbt for project and dependency management. Install sbt via your preferred package manager; if you use IntelliJ, it can manage sbt for you.

To build the entire project:

sbt compile

License

This program is licensed under the GNU Lesser General Public License, version 3.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU Lesser General Public License (and the GNU General Public License) along with this program. If not, see https://www.gnu.org/licenses/.