What is it used for?
Aspire as a content acquisition/processing/indexing platform offers different features for each different task, the main ones are listed here.
Content Acquisition with Connectors
Built-in connectors to dozens of different data sources (see list of available connectors here)
Scalable: Automatically distributes ingestion jobs across a cluster of nodes
Elastic: Add and remove nodes at any time
Resilient: Crawl state is carefully tracked at all points
Jobs on failed nodes are automatically picked up by other nodes
After a full system crash, crawling restarts from where it left off
High Performance: Crawls are typically limited only by limitations on the source system
Incremental: Automatically identifies incremental changes and processes only those changes
The method for detecting incremental changes is based on what is provided by the underlying content storage technology.
Content Publication with Publishers
Built-in publishers to most commonly available search engines
Including but not limited to:
Elasticsearch
Solr
SharePoint
Google Cloud Search
Amazon Kendra
Content migration publishers for Cloud-based storage solutions
Real-Time Streaming systems such as
Amazon Kinesis
Apache Kafka
Metadata/Content extraction & manipulation
Built-in components for many common content processing tasks
Such as text extraction, OCR, field mapping, domain mapping, archive file extraction, etc.
Scripting for easy manipulation of metadata
Document rendering as images (for thumbnail previews)
Document Level Security
Fully understands document-level security
Ingests ACLs for each content source
Provides cached, high-performance group-expansion* for each content source
*group-expansion is a process where the user-group memberships are flattened in such a way that given any user, a flat list of its groups are listed, even the parent groups of the ones directly assigned to them.
For example
user Ann has been assigned to the Developers group only
group Developers is part of another group called IT_Operations
After the group-expansion process, Ann is listed to be part of both Developers and IT_Operations
Multi-domain identity extraction and mapping
Identity publication
Customizable components
Aspire is designed to be able to host and use independent components (connectors, publishers and content-manipulation are components), if there are no built-in components for what you need, Aspire provides a set of SDK frameworks to develop new components.
So you can:
Create custom connectors and publishers
Create custom pipelines and workflow controls
Create custom components
Ease of deployment
Components and configurations are deployed through Maven
Properties allow for anything to be parameterized (e.g. server destinations, credentials, file directory locations, etc.)
Content source configurations can be exported from any cluster and imported on another
Container images for ease of deployment in Container orchestration tools such as Kubernetes