What is it used for?

What is it used for?

Aspire as a content acquisition/processing/indexing platform offers different features for each different task, the main ones are listed here.

Content Acquisition with Connectors

  • Built-in connectors to dozens of different data sources (see list of available connectors here)

    • Scalable:  Automatically distributes ingestion jobs across a cluster of nodes

    • Elastic:  Add and remove nodes at any time

    • Resilient:  Crawl state is carefully tracked at all points

      • Jobs on failed nodes are automatically picked up by other nodes

      • After a full system crash, crawling restarts from where it left off

    • High Performance:  Crawls are typically limited only by limitations on the source system

    • Incremental:  Automatically identifies incremental changes and processes only those changes

      • The method for detecting incremental changes is based on what is provided by the underlying content storage technology.

Content Publication with Publishers

  • Built-in publishers to most commonly available search engines

    • Including but not limited to:

      • Elasticsearch

      • Solr

      • SharePoint

      • Google Cloud Search

      • Amazon Kendra

  • Content migration publishers for Cloud-based storage solutions

  • Real-Time Streaming systems such as

    • Amazon Kinesis

    • Apache Kafka

Metadata/Content extraction & manipulation

  • Built-in components for many common content processing tasks

    • Such as text extraction, OCR, field mapping, domain mapping, archive file extraction, etc.

  • Scripting for easy manipulation of metadata

  • Document rendering as images (for thumbnail previews)

Document Level Security

  • Fully understands document-level security

    • Ingests ACLs for each content source

    • Provides cached, high-performance group-expansion* for each content source

      • *group-expansion is a process where the user-group memberships are flattened in such a way that given any user, a flat list of its groups are listed, even the parent groups of the ones directly assigned to them.

        • For example

          • user Ann has been assigned to the Developers group only

          • group Developers is part of another group called IT_Operations

          • After the group-expansion process, Ann is listed to be part of both Developers and IT_Operations

    • Multi-domain identity extraction and mapping

    • Identity publication

Customizable components

Aspire is designed to be able to host and use independent components (connectors, publishers and content-manipulation are components), if there are no built-in components for what you need, Aspire provides a set of SDK frameworks to develop new components.

So you can:

  • Create custom connectors and publishers

  • Create custom pipelines and workflow controls

  • Create custom components

Ease of deployment

  • Components and configurations are deployed through Maven

  • Properties allow for anything to be parameterized (e.g. server destinations, credentials, file directory locations, etc.)

  • Content source configurations can be exported from any cluster and imported on another

  • Container images for ease of deployment in Container orchestration tools such as Kubernetes