In text retrieval, full text search (also called free search text) refers to a technique for searching a computer-stored document or database; in a full text search, the search engine examines all of the words in every stored document as it tries to match search words supplied by the user. Full-text searching techniques became common in online bibliographic databases in the 1970s. Most Web sites and application programs (such as word processing software) provide full text search capabilities. Some Web search engines, such as AltaVista employ full text search techniques, while others index only a portion of the Web pages examined by its indexing system.
The most common approach to full text search is to generate a complete index or concordance for all of the searchable documents. For each word (excepting stop words which are too common to be useful) an entry is made which lists the exact position of every occurrence of it within the database of documents. From such a list it is relatively simple to retrieve all the documents that match a query, without having to scan each document. Although for very small document collections full-text searching can be done by serial scanning, indexing is the preferred method for almost all full-text searching.
As anyone who has performed a free text search will readily recognize, free text searching is likely to retrieve many documents that are not relevant to the search question. Such documents are called false positives. The retrieval of irrelevant documents is often caused by the inherent ambiguity of natural language; for example, in the United States, football refers to what is called American football outside the U.S.; throughout the rest of the world, football refers to what Americans call soccer. A search for football may retrieve documents that are about two completely different sports.
Due to the ambiguities of natural language, a full text search typically produces a retrieval list that has low precision: most of the items retrieved are irrelevant. Controlled-vocabulary searching solves this problem by tagging the documents in such a way that the ambiguities are eliminated. However, a controlled vocabulary search may have low recall: it may fail to retrieve some documents that are actually relevant to the search question. Despite the presence of many irrelevant documents in a free text search's retrieval list, a free text search may be able to locate a document that a controlled vocabulary search failed to retrieve.
The deficiencies of free text searching have been addressed in two ways: By providing users with tools that enable them to express their search questions more precisely, and by developing new search algorithms that improve retrieval precision.
Technological advances have greatly improved the performance of free text searching. For example, Google's PageRank algorithm gives more prominence to documents to which other Web pages have linked. This algorithm dramatically improves users' perception of search precision, a fact that explains its popularity among Internet users. See search engine for additional examples.
Volltextrecherche | Carian Teks Penuh | Full text-zoeken | 全文檢索
This article is licensed under the GNU Free Documentation License.
It uses material from the
"Full text search".
Home Page • arts • business • computers • games • health • hospitals • home • kids & teens • news • physicians • recreation• reference • regional • science • shopping • society • sports • world