Package lumis.search

Provides classes for implementing search services integrated to Lumis Portal.

See: Description

Package lumis.search Description

Provides classes for implementing search services integrated to Lumis Portal.

1. Introduction

This package contains classes that can be used to implement ways to portal services index their data, in order to search for this data later, based on user-entered criteria. In other words, these classes compose an extensible search engine architecture, to be used by portal projects based on Lumis Portal.

Lumis Portal comes with an implementation of search engine based on Apache Lucene. It's the default search engine used. Portals based on Lumis Portal have a configuration that allows the use of one or more compatible search engines to index and search for services data.

When you have data that needs to be searchable, it has to be indexed first. The indexing process is done by an Indexer implementation. Then, later, when, for example, an user requests a data search, a search is executed by a Searcher implementation.

2. Main Concepts and Classes

2.1. Searchable Information

A searchable information is any textual information that can be stored by a search engine and later be found by some textual criteria such as keywords and text fragments.

Portal services can deal with several kinds of information, such as news, articles, images and files. Any kind of information can be considered "searchable" once it has some textual portion. For example, a news is a typical textual content and obviously searchable. In other hand, an image can considered "searchable" only if it has some aggregated textual information, such as image authoring or description. A file will be searchable only if its format allows text extraction.

Once the textual searchable information was given or extracted from original information, it can be indexed and searched using the infra-structure present in this package.

A searchable information is encapsulated in a SearchContent object.

2.1.1. Fields

A searchable information is composed by one or more fields, each of them having a name and one or more string values.

A field is represented by a SearchContentField object.

There are five predefined field types:

2.2. Search Engine

A search engine is a subsystem that provides indexing and searching functionality to the portal infra-structure.

Basically, a search engine is a set of implementations for Indexer and Searcher implementations that has a configuration section in the configuration file.

The default search engine that comes with Lumis Portal is an implementation based on Apache Lucene, a popular Information Retrieval framework that stores searchable information on indexes in the file system.

Custom search engines can be implemented to integrate a portal solution to other search engines and IR frameworks.

2.3. Indexing

Indexing is the action of storing a searchable information, in a way that allows to find this information using text search criterias, such as keywords or information fragments.

A typical data indexing cycle is composed by the following steps:

Some search engines can pre-process the information before it is stored, to perform some common actions, like case normalization, common words cleaning etc.

A search engine provides indexing functionality through an implementation of the Indexer class, that receives the searchable information to be indexed in a SearchContent object, which has a collection of SearchContentField objects.

2.4. Searching

Searching is the action of find an information based on a certain criteria, usually a portion of the original information. This portion can be a keyword given by the user or any fragment of information that can be used to match one or more previosly indexed informations.

A typical search cycle consists in the following steps:

Some search engines, like the default search engine, provide advanced features that can be used in search criterias, such as wildcards and boolean operators.

A search engine provides searching functionality through an implementation of the Searcher class, that receives the search criteria in a SearchQuery object. The search results returns from the search engine in a SearchResults object, which has one SearchHit object for each searchable information that matches the given search criteria.

3. The Search Configuration File

The Lumis Portal's search infra-structure can be configured using the configuration file searchconfig.xml, that can be found in the lumisdata/config directory.

Since:
4.0.4
Version:
$Revision: 10348 $ $Date: 2009-04-13 17:31:28 -0300 (Mon, 13 Apr 2009) $

Lumisportal  7.1.1.140331 - Copyright © 2006–2014 Lumis EIP Tecnologia da Informação LTDA. All Rights Reserved.