Back to start page
Back to start page Tockit (section) Docco (section) Tupleware (section) ToscanaJ (section) Score (active section) CGXML (section) Banner
Overview (active)
-
-

Score

The idea of Score is to be a system for document retrieval using Formal Concept Analysis (FCA) in a web context. Its basic parts are:

Indexer
An indexer scans a number of documents and finds attributes to describe them, typically keywords.
Database
A relational database stores the references to the documents, the attributes and their relation.
FCA Module
An FCA module creates a formal concept lattice from the database and offers a query API.
Frontend
A Java Servlet will offer the query features to the user.

The different subsystems shall use ODBC and CORBA to communicate, the database structure will be simple but extensible, so the code we create should be easily reusable. The usage of ODBC and CORBA ensures interoperability between platforms, programming languages and networks.

Targeted applications are:

Ontology guided document retrieval
A set of documents with a specific topic can be indexed using an ontology system and specific documents can be retrieved.
Mailing list archives
Mails in a mailing list archive can be queried for keywords that are generated from the header information and using idf (inverse document frequency).

There are a number of advantages of this approach compared to classic retrieval systems:

  • We can avoid empty result sets
  • We can offer refinements that are always true refinements (including a ranking for their efficiency)
  • We have implicit document clusters (the formal concepts) with a natural distance relation, thus we can give result sets with documents that match the query not exactly but nearly, including a ranking on them.

Status: At the moment this project is on hold. One day we might get back to it, but at the moment there are other things we consider more important for our research interests. Check out our personal document management tool Docco which picks up similar interests in a smaller context.

< ---------------------------------- >