What are directory index search engines?
Baidu Google Yahoo Yahoo Sohu Search Sina Search NetEase Search Zhongsou Tianwang Search China Search Extreme Search Netfrog Search 3721 Search TOM Search Directory Index Search Engine What are there
Generally divided into two types: ordinary website content and music
Baidu Google Yahoo Yahoo Sohu Search Sina Search NetEase Search Zhongsou Tianwang Search Zhonghua Search Extreme Search Netfrog Search 3721 Search QQ Search TOM Search Directory Index How does the search engine operate?
Main technologies
A search engine consists of four parts: search engine, indexer, crawler and user interface.
Searcher
The function of a searcher is to roam around the Internet, discover and collect information. It is often a computer program that runs around the clock. It must collect various types of new information as much and as quickly as possible. At the same time, because the information on the Internet updates very quickly, it must regularly update the old information that has been collected to avoid dead connections and invalid connections. Wire. There are currently two strategies for collecting information:
● Starting from a set of starting URLs, following the hyperlinks (Hyperlinks) in these URLs, looping around in a breadth-first, depth-first or heuristic manner Discover information on the Internet. These starting URLs can be any URL, but are often very popular sites that contain many links (such as Yahoo!).
● Divide the Web space according to domain names, IP addresses or country domain names, and each searcher is responsible for the exhaustive search of a subspace.
There are various types of information collected by search engines, including HTML, XML, Newsgroup articles, FTP files, word processing documents, and multimedia information.
The implementation of search engines often uses distributed and parallel computing technology to increase the speed of information discovery and update. Information discovery by commercial search engines can reach millions of web pages per day.
Indexer
The function of the indexer is to understand the information searched by the crawler, extract index entries from it, and use them to represent files and generate index tables for the file library.
There are two types of index items: objective index items and content index items: objective items have nothing to do with the semantic content of the document, such as author name, URL, update time, encoding, length, link
Popularity (Link
Popularity), etc.; content index items are used to reflect the content of the file, such as keywords and their weights, phrases, words, etc. Content index items can be divided into two types: single index items and multiple index items (or phrase index items). For English, single index items are English words, which are easier to extract because there are natural separators (spaces) between words; for continuously written languages ??such as Chinese, words must be segmented.
In search engines, a single index item is generally assigned a weight to indicate the degree of discrimination of the file by the index item, and is also used to calculate the relevance of the query results. The methods used generally include statistical methods, information theory methods and probability methods. The methods for extracting phrase index items include statistical methods, probability methods and linguistic methods.
The index table generally uses some form of inversion list (Inversion List), that is, the corresponding file is queried by the index item. The index table may also record the position where the index items appear in the file so that the crawler can calculate the adjacency or proximity relationship (proximity) between the index items.
Indexers can use either a centralized indexing algorithm or a decentralized indexing algorithm. When the amount of data is large, instant indexing must be implemented, otherwise it will not be able to keep up with the rapid increase in the amount of information. Indexing algorithms have a great impact on indexer performance (such as response speed during large-scale peak queries). The effectiveness of a search engine depends largely on the quality of its index.
Searcher
The function of the searcher is to quickly check out files in the index database according to the user's query, evaluate the relevance of the file and the query, and evaluate the results to be output. Sorting, and implementing some kind of user relevance feedback mechanism.
There are four commonly used information retrieval models for searchers: set theory model, algebraic model, probability model and hybrid model.
4. User interface
The function of the user interface is to enter user queries, display query results, and provide user relevance feedback mechanisms. The main purpose is to facilitate users to use search engines and obtain effective and timely information from search engines efficiently and in multiple ways. The design and implementation of user interfaces use the theories and methods of human-computer interaction to fully adapt to human thinking habits. User input interface can be divided into two types: simple interface and complex interface.
The simple interface only provides a text box for users to enter query strings; the complex interface allows users to restrict queries, such as logical operations (AND, OR, NOT; +, -), proximity relationships (similar Neighbor, NEAR), domain name range (such as .edu, .), appearance position (such as title, content), information time, length, etc. Some companies and institutions are considering developing standards for query options.
Features
First of all, search engines are automatic website retrievals, while directory indexing relies entirely on manual operations. After a user submits a website, directory editors will personally browse your site and decide whether to accept it based on a set of custom criteria or even the editor's subjective impression.
Secondly, when a search engine includes a website, as long as the website itself does not violate the relevant rules, the login can usually be successful. Directory indexing has much higher requirements for websites, and sometimes it may not be successful even if you log in multiple times. Especially for super indexes like Yahoo, logging in is even more difficult.
In addition, when logging into a search engine, we generally do not need to consider the classification of the website. When logging into the directory index, the website must be placed in the most appropriate directory (Directory).
Finally, in the search engine
the relevant information of each website is automatically extracted from the user's webpage, so from the user's perspective, we have more autonomy; Directory indexing requires that website information must be filled in manually, and there are various
restrictions. What's more, if the staff thinks that the directory and website information you submit are inappropriate, they can adjust them at any time, without consulting you in advance.
Directory index, as the name suggests, is to store websites in corresponding directories by category. Therefore, when querying information, users can select keyword searches or search layer by layer according to the category directory. If you search using keywords, the results returned are the same as those of search engines, which also rank websites according to the degree of information relevance, but there are more human factors involved. If you query by hierarchical directory, the ranking of websites in a certain directory is determined by the order of the title letters (there are exceptions). Is the meta tag effective for directory index search engines?
There is an important code "" (commonly known as the META tag) in the HTML source code of a web page. The META tag is used to describe the attributes of an HTML web page file, such as author, date and time, web page description, keywords, page refresh, etc. What are the directory index, meta, and full-text search engines?
Search engines can be divided into three categories according to their working methods: 1. Directory search engines: Directory search engines mainly include Yahoo! , LookSmart, About, DMOZ, Galaxy, etc. 2. Full-text search engines: Full-text search engines mainly include Google, Baidu, AltaVista, Inktomi, Alltheweb, etc. 3. Meta-search engines: Meta-search engines mainly include InfoSpace, Dogpile, Vivisimo, Peking University Skynet, Sohu, Lycos, Meta crawler, etc.
Who knows the advantages and disadvantages of directory index search engines, full-text search engines, and meta-search engines? Please be as detailed as possible.
Give me money! Urgent use
■ Full-text search engine
Full-text search engine is a veritable search engine. Representative ones abroad include Google, Fast/AllTheWeb, AltaVista, Inktomi, Teoma, WiseNut, etc., and domestic ones include The famous one is Baidu. They are all databases established by extracting information from various websites on the Internet (mainly webpage text), retrieving relevant records that match the user's query conditions, and then returning the results to the user in a certain order. or, therefore they are true search engines.
From the perspective of the source of search results, full-text search engines can be subdivided into two types. One has its own search program (Indexer), commonly known as a "Spider" program or a "robot" ( Robot) program, and builds its own web database, and the search results are directly called from its own database, such as the 7 engines mentioned above; the other is to rent the database of other engines and arrange it in a custom format Search results, such as Lycos engines.
■ Directory Index
Although the directory index has a search function, it is not a real search engine in the strict sense. It is just a list of website links classified by directory. Users do not need to perform keyword searches at all and can find the information they need by relying only on the category directory. The most representative directory index is the famous Yahoo! Other famous ones include Open Directory Project (DMOZ), LookSmart, About, etc. Domestic searches on Sohu, Sina, and NetEase also fall into this category.
■ Meta Search Engine (META Search Engine)
When accepting a user query request, the meta search engine searches on multiple other engines at the same time and returns the results to the user. who. Famous meta-search engines include InfoSpace, Dogpile, Vivisimo, etc. (list of meta-search engines). The most representative Chinese meta-search engine is Souxing search engine. In terms of search result arrangement, some directly arrange the search results according to the source engine, such as Dogpile, while others rearrange and combine the results according to custom rules, such as Vivisimo.
In addition to the above three categories of engines, there are also the following non-mainstream forms:
1. Aggregated search engines: such as the engine launched by HotBot at the end of 2002. This engine is similar to the META search engine, but the difference is that instead of calling multiple engines for search at the same time, the user selects from the four provided engines, so it is more accurate to call it a "collective" search engine.
2. Portal search engines: Although they provide search services, such as AOL Search and MSN Search, they have neither classified directories nor web databases, and their search results come entirely from other engines.
3. Free For All Links (FFA): This type of website generally simply scrolls and arranges the link entries. A few of them have simple categories, but their scale is smaller than that of Yahoo and other directory indexes. Come much smaller. Which websites use directory index search engines?
All