Skip to main content

Lucene Indexing

Lucene is rich and powerful text search library written in java. Lucene provides full-text indexing in databases and various other objects. lucene will add content to a full text index, then we can perform queries on this index .

lucene is able to generate fast results because, it will search an index instead of searching a text directly. this is similar to searching a book and getting a page number, using index number at the end of the book unlike searching each page.

In Lucene document is the unit of index. index contains one or more documents. document contains one or more fields. for example set of users can be considered as an index. a single user record will be a document and user name address can be fields.

Indexing

is adding documents to Index via an IndexWriter.

Searching

is retrieving documents from an Index via an IndexSearcher. A searching query will be passed to IndexSearcher. Queries can be performed via Lucene Query Language.

Comments

Popular posts from this blog

Oracle Database 12c installation on Ubuntu 16.04

This article describes how to install Oracle 12c 64bit database on Ubuntu 16.04 64bit. Download software  Download the Oracle software from OTN or MOS or get a downloaded zip file. OTN: Oracle Database 12c Release 1 (12.1.0.2) Software (64-bit). edelivery: Oracle Database 12c Release 1 (12.1.0.2) Software (64-bit)   Unpacking  You should have following two files downloaded now. linuxamd64_12102_database_1of2.zip linuxamd64_12102_database_2of2.zip Unzip and copy them to \tmp\databases NOTE: you might have to merge two unzipped folders to create a single folder. Create new groups and users Open a terminal and execute following commands. you might need root permission. groupadd -g 502 oinstall groupadd -g 503 dba groupadd -g 504 oper groupadd -g 505 asmadmin Now create the oracle user useradd -u 502 -g oinstall -G dba,asmadmin,oper -s /bin/bash -m oracle You will prompt to set to password. set a momorable password and write it down. ...

DBCA : No Protocol specified

when trying to execute dbca from linux terminal got this error message. now execute the command xhost, you probably receiving No protocol specified xhost:  unable to open display ":0" issue is your user is not allowed to access the x server. You can use xhost to limit access for X server for security reasons. probably you are logged in as oracle user. switch back to default user and execute xhost again. you should see something like SI:localuser:nuwan solution is adding the oracle to access control list xhost +SI:localuser:oracle now go back to oracle user and try dbca it should be working

How to Summarize Real Time Event Data Using Siddhi CEP

These days I am working on a task to improve the performance of a wso2telco analytics which is based wso2 DAS. Our product does data summarizing using apache spark. When the data increases it hangs even though we have applied wso2 incremental processing. Hence we decided to do one level data summarizing on real time as soon as data arrives at wso2 siddhi event processor. spark will do summarize per hour while siddhi will summarize per minute. To summarize data in real time we can use siddhi window feature. it can hold the events arrived up to a specified time and then release all the accumulated events at once. Since the events received are released at once we can use summarizing methods to summarize them. Following describes few attempts how I  tried to do this. Attempt 01 This was the attempt suggested by wso2 support team https://github.com/wso2/analytics-apim/blob/master/features/org.wso2.analytics.apim.feature/src/main/resources/template-manager/executionplans/AP...