Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Metasearch engine
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Operation == A metasearch engine accepts a single search request from the [[User (computing)|user]]. This search request is then passed on to another search engine's [[database]]. A metasearch engine does not create a database of [[web page]]s but generates a [[Federated database system]] of [[data integration]] from multiple sources.<ref name="Meng2008">{{cite web | last=Meng | first=Weiyi | date=May 5, 2008 | title=Metasearch Engines | url=http://www.cs.binghamton.edu/~meng/pub.d/EDBS_Metasearch.pdf}}</ref><ref>{{cite web | url=https://homes.cs.washington.edu/~etzioni/papers/ieee-metacrawler.html | title=The MetaCrawler architecture for resource aggregation on the Web | publisher=IEEE expert | year=1997 | last1=Selberg | first1=Erik | last2=Etzioni | first2=Oren | pages=11β14}}</ref><ref>{{cite web | url=https://research.ijcaonline.org/volume74/number5/pxc3889739.pdf | title=Design and Development of a Programmable Meta Search Engine | publisher=Foundation of Computer Science | date=July 2013 | last1=Manoj | first1=M | last2=Jacob | first2=Elizabeth | pages=6β11}}</ref> Since every search engine is unique and has different [[algorithms]] for generating ranked data, duplicates will therefore also be generated. To remove duplicates, a metasearch engine processes this data and applies its own algorithm. A revised list is produced as an output for the user.{{citation needed|date=November 2019}} When a metasearch engine contacts other search engines, these search engines will respond in three ways: * They will both cooperate and provide complete access to the interface for the metasearch engine, including private access to the index database, and will inform the metasearch engine of any changes made upon the index database; * Search engines can behave in a non-cooperative manner whereby they will not deny or provide any access to interfaces; * The search engine can be completely hostile and refuse the metasearch engine total access to their database and in serious circumstances, by seeking [[legal]] methods.<ref name=retrieval>{{cite web | last1=Manoj | first1=M. | last2=Jacob | first2=Elizabeth | date=October 2008 | title=Information retrieval on Internet using meta-search engines: A review. | url=http://nopr.niscair.res.in/bitstream/123456789/2243/1/JSIR%2067(10)%20739-746.pdf | publisher=[[Council of Scientific and Industrial Research]]}}</ref> === Architecture of ranking === Web pages that are highly ranked on many search engines are likely to be more [[Relevance (information retrieval)|relevant]] in providing useful information.<ref name=retrieval/> However, all search engines have different ranking scores for each website and most of the time these scores are not the same. This is because search engines prioritise different criteria and methods for scoring, hence a website might appear highly ranked on one search engine and lowly ranked on another. This is a problem because Metasearch engines rely heavily on the consistency of this data to generate reliable accounts.<ref name=retrieval/> === Fusion === [[File:DFIG Model.jpg|thumb|Data Fusion Model|286x286px]] A metasearch engine uses the process of Fusion to filter data for more efficient results. The two main fusion methods used are: Collection Fusion and Data Fusion. * Collection Fusion: also known as distributed retrieval, deals specifically with search engines that index unrelated data. To determine how valuable these sources are, Collection Fusion looks at the content and then ranks the data on how likely it is to provide relevant information in relation to the query. From what is generated, Collection Fusion is able to pick out the best resources from the rank. These chosen resources are then merged into a list.<ref name=retrieval/> * Data Fusion: deals with information retrieved from search engines that indexes common data sets. The process is very similar. The initial rank scores of data are merged into a single list, after which the original ranks of each of these documents are analysed. Data with high scores indicate a high level of relevancy to a particular query and are therefore selected. To produce a list, the scores must be normalized using algorithms such as CombSum. This is because search engines adopt different policies of algorithms resulting in the score produced being incomparable.<ref>{{cite book | last1=Wu | first1=Shengli | last2=Crestani | first2=Fabio | last3=Bi | first3=Yaxin | title=Information Retrieval Technology | chapter=Evaluating Score Normalization Methods in Data Fusion | year=2006 | volume=4182 | pages=642β648 | doi=10.1007/11880592_57 | series=Lecture Notes in Computer Science | isbn=978-3-540-45780-0 | citeseerx=10.1.1.103.295}}</ref><ref>{{cite web | last1=Manmatha | first1=R. | last2=Sever | first2=H. | year=2014 | title=A Formal Approach to Score Normalization for Meta-search. | url=http://maroo.cs.umass.edu/pdf/IR-242.pdf | access-date=2014-10-27 | archive-url=https://web.archive.org/web/20190930051034/http://maroo.cs.umass.edu/pdf/IR-242.pdf | archive-date=2019-09-30 | url-status=dead }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)