How to perform full text retrieval and search in Java
How to perform full-text retrieval and search in Java
Full-text retrieval and search is a technique for finding specific keywords or phrases in large-scale text data. In applications that process large amounts of text data, such as search engines, email systems, and document management systems, full-text retrieval and search functions are very important.
As a widely used programming language, Java provides a wealth of libraries and tools that can help us implement full-text retrieval and search functions. This article will introduce how to use the Lucene library to implement full-text retrieval and search, and provide some specific code examples.
1. Introduce the Lucene library
First, we need to introduce the Lucene library into the project. The Lucene library can be introduced into the Maven project in the following ways:
<dependencies> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>8.10.1</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-analyzers-common</artifactId> <version>8.10.1</version> </dependency> </dependencies>
2. Create an index
Before performing full-text search, we need to create an index first. This index contains relevant information about the text data to be searched, so that we can perform subsequent search operations. The following is a simple example code for creating an index:
import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import java.io.IOException; import java.nio.file.Paths; public class Indexer { private IndexWriter indexWriter; public Indexer(String indexDir) throws IOException { Directory dir = FSDirectory.open(Paths.get(indexDir)); Analyzer analyzer = new StandardAnalyzer(); IndexWriterConfig config = new IndexWriterConfig(analyzer); indexWriter = new IndexWriter(dir, config); } public void close() throws IOException { indexWriter.close(); } public void addDocument(String content) throws IOException { Document doc = new Document(); doc.add(new TextField("content", content, Field.Store.YES)); indexWriter.addDocument(doc); } }
In the above example code, we use IndexWriter
to create the index and TextField
to define the Indexed fields. When adding content to be indexed to the index, we need to first create a Document
object, then add fields to the object, and finally call the addDocument
method to add Document
Object is added to the index.
3. Perform search
After creating the index, we can perform search operations. The following is a simple search sample code:
import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.DirectoryReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.queryparser.classic.QueryParser; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.ScoreDoc; import org.apache.lucene.search.TopDocs; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import java.io.IOException; import java.nio.file.Paths; public class Searcher { private IndexSearcher indexSearcher; private QueryParser queryParser; public Searcher(String indexDir) throws IOException { Directory dir = FSDirectory.open(Paths.get(indexDir)); Analyzer analyzer = new StandardAnalyzer(); IndexReader indexReader = DirectoryReader.open(dir); indexSearcher = new IndexSearcher(indexReader); queryParser = new QueryParser("content", analyzer); } public ScoreDoc[] search(String queryString, int numResults) throws Exception { Query query = queryParser.parse(queryString); TopDocs topDocs = indexSearcher.search(query, numResults); return topDocs.scoreDocs; } public Document getDocument(int docID) throws IOException { return indexSearcher.doc(docID); } }
In the above sample code, we use IndexSearcher
to perform the search operation. Before performing a search, we need to create a Query
object to represent the query to be searched, and use QueryParser
to parse the query string into a Query
object. We then use the search
method of IndexSearcher
to perform the search and return the ranking of the search results.
4. Usage example
The following is a sample code that uses the full-text retrieval and search function:
public class Main { public static void main(String[] args) { String indexDir = "/path/to/index/dir"; try { Indexer indexer = new Indexer(indexDir); indexer.addDocument("Hello, world!"); indexer.addDocument("Java is a programming language."); indexer.addDocument("Lucene is a full-text search engine."); indexer.close(); Searcher searcher = new Searcher(indexDir); ScoreDoc[] results = searcher.search("Java", 10); for (ScoreDoc result : results) { Document doc = searcher.getDocument(result.doc); System.out.println(doc.getField("content").stringValue()); } } catch (IOException e) { e.printStackTrace(); } catch (Exception e) { e.printStackTrace(); } } }
In the above sample code, we first create a Indexer
to create an index and add some text data. Then, we create a Searcher
to perform the search and print out the text content of the search results.
Through the above sample code, we can use the Lucene library to easily implement full-text retrieval and search functions in Java. Using Lucene, we can efficiently find specific keywords or phrases in large-scale text data, thereby improving the efficiency and performance of text processing applications.
The above is the detailed content of How to perform full text retrieval and search in Java. For more information, please follow other related articles on the PHP Chinese website!

Start Spring using IntelliJIDEAUltimate version...

When using MyBatis-Plus or other ORM frameworks for database operations, it is often necessary to construct query conditions based on the attribute name of the entity class. If you manually every time...

Java...

How does the Redis caching solution realize the requirements of product ranking list? During the development process, we often need to deal with the requirements of rankings, such as displaying a...

Conversion of Java Objects and Arrays: In-depth discussion of the risks and correct methods of cast type conversion Many Java beginners will encounter the conversion of an object into an array...

Solutions to convert names to numbers to implement sorting In many application scenarios, users may need to sort in groups, especially in one...

Detailed explanation of the design of SKU and SPU tables on e-commerce platforms This article will discuss the database design issues of SKU and SPU in e-commerce platforms, especially how to deal with user-defined sales...

How to set the SpringBoot project default run configuration list in Idea using IntelliJ...


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

Atom editor mac version download
The most popular open source editor