CARROT2 MANUAL PDF
quickly try Carrot2 with your own data; tune Carrot2 clustering settings in real time Carrot2 User and Developer Manual Download User and Developer. Carrot² is an open source search results clustering engine. It can automatically cluster small . with Carrot² clustering, radically simplified Java API, search results clustering web application re-implemented, user manual available. This manual provides detailed information about the Carrot Search Lingo3G document The dependency on Carrot2 framework has been updated to , .
|Published (Last):||21 December 2004|
|PDF File Size:||5.84 Mb|
|ePub File Size:||17.54 Mb|
|Price:||Free* [*Free Regsitration Required]|
For clustering controller API and other miscellaneous examples, refer to the Carrot 2 project documentation. You can carroh2 the components available in Carrot 2 to fetch documents from various sources public search engines, Lucene, Solrserialize the results to JSON or XML and many more, while the the clusters are generated by Lingo3G.
Below is some example code for the most common use cases. You can also browse Carrot 2 code repository for further examples. The easiest way to get started with Lingo3G is to cluster a collection of Document s.
Overview (Lingo3G v API Documentation (JavaDoc))
Each document can consist of:. To make the example short, the code shown below clusters only 5 documents. Use at least 20 to get reasonable clusters. If you have access to the query that generated the documents being clustered, you should also provide it to Lingo3G to get better clusters. If your production code needs to fetch documents from popular search engines, it is very important that you generate and use your own API key.
You can pass the API key along with the query and the requested number of results in an attribute map. Lingo3G manual lists all supported attributes along with their keys, types and allowed values.
The code shown below, fetches and clusters 50 results from org. You can change the default behaviour of Lingo3G by changing its attributes. For a complete list of available attributes, their identifiers, types and allowed values, please see Lingo3G manual.
To pass attributes to Lingo3G, put them into a Mapalong with the documents being clustered. The code shown below searches the web using org. As an alternative to the raw attribute map used in the previous example, you can use attribute map builders.
Lingo3G v1.16.0 API Documentation
Attribute carroh2 builders have a number of advantages:. A possible disadvantage of attribute builders is that one algorithm’s attributes can be divided into a number of builders and hence not readily available in your IDE’s auto complete window.
Please consult attribute documentation in Lingo3G manual for pointers to the appropriate builder classes and methods.
The code shown below clusters clusters an example collection of Document s using Lingo3G tuned to return slightly fewer clusters than by default, but with increased hierarchy depth.
The examples janual above used a simple controller to manage the clustering process.
While the simple controller is enough for one-shot requests, for long-running applications, such as web applications, it’s better to use a controller which supports pooling of processing component instances and caching of processing results.
This example shows how to cluster non-English content. By default Lingo3G assumes that the documents provided for clustering are written in English. When clustering content written in some different language, it is important to indicate the language to Lingo3G, so that it can mqnual the lexical resources stop words, kanual, stemmer appropriate for that language.
If the language of the documents in unknown it can be detected automatically by setting the language-recognition attribute to true. Clustering text documents The easiest way to get started with Lingo3G mwnual to cluster a collection of Document s.
Each document can consist of: ByUrlClusteringAlgorithmignored by other algorithms. The simplest yet least flexible way to do it is to use the Controller. The code shown below retrieves search results for query data mining from org.
Attribute map builders have a number of advantages: There are two ways to indicate the desired clustering language to Lingo3G: By setting the language of each document in their Document. The language does not necessarily have to be the same for all documents on the input, the algorithm can handle multiple languages in one document set as well.
Please see the MultilingualClustering. By setting the fallback language. For documents with undefined Document. You can change the fallback language by setting the MultilingualClustering.
Lingo3G clustering algorithm component, the algorithm uses the infrastructure defined by the Carrot 2 framework. Definitions of Carrot 2 core interfaces and their implementations.
Attribute annotations for Carrot 2 core interfaces.