WIKIBOOKS
DISPONIBILI
?????????

ART
- Great Painters
BUSINESS&LAW
- Accounting
- Fundamentals of Law
- Marketing
- Shorthand
CARS
- Concept Cars
GAMES&SPORT
- Videogames
- The World of Sports

COMPUTER TECHNOLOGY
- Blogs
- Free Software
- Google
- My Computer

- PHP Language and Applications
- Wikipedia
- Windows Vista

EDUCATION
- Education
LITERATURE
- Masterpieces of English Literature
LINGUISTICS
- American English

- English Dictionaries
- The English Language

MEDICINE
- Medical Emergencies
- The Theory of Memory
MUSIC&DANCE
- The Beatles
- Dances
- Microphones
- Musical Notation
- Music Instruments
SCIENCE
- Batteries
- Nanotechnology
LIFESTYLE
- Cosmetics
- Diets
- Vegetarianism and Veganism
TRADITIONS
- Christmas Traditions
NATURE
- Animals

- Fruits And Vegetables



ARTICLES IN THE BOOK

  1. Adobe Reader
  2. Adware
  3. Altavista
  4. AOL
  5. Apple Macintosh
  6. Application software
  7. Arrow key
  8. Artificial Intelligence
  9. ASCII
  10. Assembly language
  11. Automatic translation
  12. Avatar
  13. Babylon
  14. Bandwidth
  15. Bit
  16. BitTorrent
  17. Black hat
  18. Blog
  19. Bluetooth
  20. Bulletin board system
  21. Byte
  22. Cache memory
  23. Celeron
  24. Central processing unit
  25. Chat room
  26. Client
  27. Command line interface
  28. Compiler
  29. Computer
  30. Computer bus
  31. Computer card
  32. Computer display
  33. Computer file
  34. Computer games
  35. Computer graphics
  36. Computer hardware
  37. Computer keyboard
  38. Computer networking
  39. Computer printer
  40. Computer program
  41. Computer programmer
  42. Computer science
  43. Computer security
  44. Computer software
  45. Computer storage
  46. Computer system
  47. Computer terminal
  48. Computer virus
  49. Computing
  50. Conference call
  51. Context menu
  52. Creative commons
  53. Creative Commons License
  54. Creative Technology
  55. Cursor
  56. Data
  57. Database
  58. Data storage device
  59. Debuggers
  60. Demo
  61. Desktop computer
  62. Digital divide
  63. Discussion groups
  64. DNS server
  65. Domain name
  66. DOS
  67. Download
  68. Download manager
  69. DVD-ROM
  70. DVD-RW
  71. E-mail
  72. E-mail spam
  73. File Transfer Protocol
  74. Firewall
  75. Firmware
  76. Flash memory
  77. Floppy disk drive
  78. GNU
  79. GNU General Public License
  80. GNU Project
  81. Google
  82. Google AdWords
  83. Google bomb
  84. Graphics
  85. Graphics card
  86. Hacker
  87. Hacker culture
  88. Hard disk
  89. High-level programming language
  90. Home computer
  91. HTML
  92. Hyperlink
  93. IBM
  94. Image processing
  95. Image scanner
  96. Instant messaging
  97. Instruction
  98. Intel
  99. Intel Core 2
  100. Interface
  101. Internet
  102. Internet bot
  103. Internet Explorer
  104. Internet protocols
  105. Internet service provider
  106. Interoperability
  107. IP addresses
  108. IPod
  109. Joystick
  110. JPEG
  111. Keyword
  112. Laptop computer
  113. Linux
  114. Linux kernel
  115. Liquid crystal display
  116. List of file formats
  117. List of Google products
  118. Local area network
  119. Logitech
  120. Machine language
  121. Mac OS X
  122. Macromedia Flash
  123. Mainframe computer
  124. Malware
  125. Media center
  126. Media player
  127. Megabyte
  128. Microsoft
  129. Microsoft Windows
  130. Microsoft Word
  131. Mirror site
  132. Modem
  133. Motherboard
  134. Mouse
  135. Mouse pad
  136. Mozilla Firefox
  137. Mp3
  138. MPEG
  139. MPEG-4
  140. Multimedia
  141. Musical Instrument Digital Interface
  142. Netscape
  143. Network card
  144. News ticker
  145. Office suite
  146. Online auction
  147. Online chat
  148. Open Directory Project
  149. Open source
  150. Open source software
  151. Opera
  152. Operating system
  153. Optical character recognition
  154. Optical disc
  155. output
  156. PageRank
  157. Password
  158. Pay-per-click
  159. PC speaker
  160. Peer-to-peer
  161. Pentium
  162. Peripheral
  163. Personal computer
  164. Personal digital assistant
  165. Phishing
  166. Pirated software
  167. Podcasting
  168. Pointing device
  169. POP3
  170. Programming language
  171. QuickTime
  172. Random access memory
  173. Routers
  174. Safari
  175. Scalability
  176. Scrollbar
  177. Scrolling
  178. Scroll wheel
  179. Search engine
  180. Security cracking
  181. Server
  182. Simple Mail Transfer Protocol
  183. Skype
  184. Social software
  185. Software bug
  186. Software cracker
  187. Software library
  188. Software utility
  189. Solaris Operating Environment
  190. Sound Blaster
  191. Soundcard
  192. Spam
  193. Spamdexing
  194. Spam in blogs
  195. Speech recognition
  196. Spoofing attack
  197. Spreadsheet
  198. Spyware
  199. Streaming media
  200. Supercomputer
  201. Tablet computer
  202. Telecommunications
  203. Text messaging
  204. Trackball
  205. Trojan horse
  206. TV card
  207. Unicode
  208. Uniform Resource Identifier
  209. Unix
  210. URL redirection
  211. USB flash drive
  212. USB port
  213. User interface
  214. Vlog
  215. Voice over IP
  216. Warez
  217. Wearable computer
  218. Web application
  219. Web banner
  220. Web browser
  221. Web crawler
  222. Web directories
  223. Web indexing
  224. Webmail
  225. Web page
  226. Website
  227. Wiki
  228. Wikipedia
  229. WIMP
  230. Windows CE
  231. Windows key
  232. Windows Media Player
  233. Windows Vista
  234. Word processor
  235. World Wide Web
  236. Worm
  237. XML
  238. X Window System
  239. Yahoo
  240. Zombie computer
 



MY COMPUTER
This article is from:
http://en.wikipedia.org/wiki/Spamdexing

All text is available under the terms of the GNU Free Documentation License: http://en.wikipedia.org/wiki/Wikipedia:Text_of_the_GNU_Free_Documentation_License 

Spamdexing

From Wikipedia, the free encyclopedia

 

Spamdexing or search engine spamming is the practice of deliberately creating web pages which will be indexed by search engines in order to increase the chance of a website or page being placed close to the beginning of search engine results, or to influence the category to which the page is assigned. Many designers of web pages try to get a good ranking in search engines and design their pages accordingly. The word is a portmanteau of spamming and indexing.

Search engines use a variety of algorithms to determine relevancy ranking. Some of these include determining whether the search term appears in the META keywords tag, others whether the search term appears in the body text or URL of a web page. A variety of techniques are used to spamdex (see below). Many search engines check for instances of spamdexing and will remove suspect pages from their indexes.

The rise of spamdexing in the mid-1990s made the leading search engines of the time less useful, and the success of Google at both producing better search results and combating keyword spamming, through its reputation-based PageRank link analysis system, helped it become the dominant search site late in the decade, where it remains. While it has not been rendered useless by spamdexing, Google has not been immune to more sophisticated methods either. Google bombing is another form of search engine result manipulation, which involves placing hyperlinks that directly affect the rank of other sites[1].

Common spamdexing techniques can be classified into two broad classes: content spam and link spam.

Content spam

These techniques involve altering the logical view that a search engine has over the page's contents. They all aim at variants of the vector space model for information retrieval on text collections.

Hidden or invisible text:

  • Disguising keywords and phrases by making them the same (or almost the same) color as the background, using a tiny font size or hiding them within the HTML code such as "no frame" sections, ALT attributes and "no script" sections. This is useful to make a page appear to be relevant for a web crawler in a way that makes it more likely to be found. Example: A promoter of a Ponzi scheme wants to attract web surfers to a site where he advertises his scam. He places hidden text appropriate for a fan page of a popular music group on his page, hoping that the page will be listed as a fan site and receive many visits from music lovers. However, hidden text is not always spamdexing: it can also be used to enhance accessibility.

Keyword stuffing:

  • This involves the insertion of hidden, random text on a webpage to raise the keyword density or ratio of keywords to other words on the page. Older versions of indexing programs simply counted how often a keyword appeared, and used that to determine relevance levels. Most modern search engines have the ability to analyze a page for keyword stuffing and determine whether the frequency is above a "normal" level.

Meta tag stuffing:

  • Repeating keywords in the Meta tags, and using keywords that are unrelated to the site's content.

"Gateway" or doorway pages:

  • Creating low-quality web pages that contain very little content but are instead stuffed with very similar key words and phrases. They are designed to rank highly within the search results. A doorway page will generally have "click here to enter" in the middle of it.

Scraper sites:

  • Scraper sites, also known as Made for AdSense sites, are created using various programs designed to 'scrape' search engine results pages or other sources of content and create 'content' for a website. These types of websites are generally full of advertising, or redirect the user to other sites.

Link spam

Link spam takes advantage of link-based ranking algorithms, such as Google's PageRank algorithm, which gives a higher ranking to a website the more other highly ranked websites link to it. These techniques also aim at influencing other link-based ranking techniques such as the HITS algorithm.

Link farms:

  • Involves creating tightly-knit communities of pages referencing each other, also known humorously as mutual admiration societies [2]

Hidden links:

  • Putting links where visitors will not see them in order to increase link popularity.

"Sybil attack":

  • This is the forging of multiple identities for malicious intent, named after the famous multiple personality disorder patient Shirley Ardell Mason. A spammer may create multiple web sites at different domain names that all link to each other, such as fake blogs known as spam blogs.

Wiki spam:

  • Using the open editability of wiki systems to place links from the wiki site to the spam site. Often, the subject of the spam site is totally unrelated to the page on the wiki where the link is added. While many powerful tools exist to filter or block email spam, there are very few tools for blocking wikispam.

Spam in blogs:

  • This is the placing or solicitation of links randomly on other sites, placing a desired keyword into the hyperlinked text of the inbound link. Guest books, forums, blogs and any site that accepts visitors comments are particular targets and are often victims of drive by spamming where automated software creates nonsense posts with links that are usually irrelevant and unwanted. See the Comment spam number mystery for a real world example.

Spam blogs (also known as splogs):

  • A spam blog, on the contrary, is a fake blog created exclusively with the intent of spamming. They are similar in nature to link farms.

Page hijacking:

  • is achieved by creating a rogue copy of a popular website which shows contents similar to the original to a web crawler, but redirects web surfers to unrelated or malicious websites

Referer log spamming:

  • When someone accesses a web page, i.e. the referee, by following a link from another web page, i.e. the referer, the referee is given the address of the referer by the person's internet browser. Some websites have a referer log which shows which pages link to that site. By having a robot randomly access many sites enough times, with a message or specific address given as the referer, that message or internet address then appears in the referer log of those sites that have referer logs. Since some search engines base the importance of sites by the number of different sites linking to them, referer-log spam may be used to increase the search engine rankings of the spammer's sites, by getting the referer logs of many sites to link to them.

Buying expired domains:

  • Some link spammers monitor DNS records for domains that will expire soon, then buy them when they expire and replace the pages with links to their pages.

Some of these techniques may be applied for creating a Google bomb, this is, to cooperate with other users to boost the ranking of a particular page for a particular query.

Other types of spamdexing

Mirror websites:

  • Hosting of multiple websites all with the same content but using different URLs. Some search engines give a higher rank to results where the keyword searched for appears in the URL.

URL redirection:

  • Taking the user to another page without his or her intervention, e.g. using META refresh tags, Java, JavaScript or Server side redirects

Cloaking refers to any of several means to serve up a different page to the search-engine spider than will be seen by human users. It can be an attempt to mislead search engines regarding the content on a particular web site. However, cloaking can also be used to ethically increase accessibility of a site to users with disabilities, or to provide human users with content that search engines aren't able to process or parse. It is also used to deliver content based on a user's location; Google itself uses IP delivery, a form of cloaking, to deliver results.

A form of this is code swapping, this is: optimizing a page for top ranking, then, swapping another page in its place once a top ranking is achieved.

The following techniques are also widely acknowledged as being spam, or "black hat":

  • Doorway pages
  • Link farms
  • Googleating

See also

  • Google bomb
  • Google juice
  • Link farm
  • TrustRank
  • 302 Google Jacking
  • Search_engine_indexing - overview of search engine indexing technologies

External links

To report spamdexed pages

  • Found on Google search engine results
  • Found on Yahoo! search engine results
  • Found on MSN search engine results

Search engine help pages for webmasters

  • Google's Webmaster Guidelines page
  • Yahoo!'s Search Engine Indexing page
  • MSN Search's Site Owner page

Other tools and information for webmasters

  • Black Hat SEO Tools and Forum
  • Online tool that detects spam techniques on web pages
  • A paper explaining various methods to determine webpage/blog spam
  • A public, searchable database of blog spam pages or spam blogs
  • AIRWeb '05: First International Workshop on Adversarial Information Retrieval on the Web
  • AIRWeb 2006: Second International Workshop on Adversarial Information Retrieval on the Web
  • A list of open proxy and bot IP's. Ban IP's on this list to prevent comment spam. Updated weekly.
  • BlackHat SEO Blog with a focus on spamdexing
Retrieved from "http://en.wikipedia.org/wiki/Spamdexing"