- Great Painters
- Accounting
- Fundamentals of Law
- Marketing
- Shorthand
- Concept Cars
- Videogames
- The World of Sports

- Blogs
- Free Software
- Google
- My Computer

- PHP Language and Applications
- Wikipedia
- Windows Vista

- Education
- Masterpieces of English Literature
- American English

- English Dictionaries
- The English Language

- Medical Emergencies
- The Theory of Memory
- The Beatles
- Dances
- Microphones
- Musical Notation
- Music Instruments
- Batteries
- Nanotechnology
- Cosmetics
- Diets
- Vegetarianism and Veganism
- Christmas Traditions
- Animals

- Fruits And Vegetables


  1. Adobe Reader
  2. Adware
  3. Altavista
  4. AOL
  5. Apple Macintosh
  6. Application software
  7. Arrow key
  8. Artificial Intelligence
  9. ASCII
  10. Assembly language
  11. Automatic translation
  12. Avatar
  13. Babylon
  14. Bandwidth
  15. Bit
  16. BitTorrent
  17. Black hat
  18. Blog
  19. Bluetooth
  20. Bulletin board system
  21. Byte
  22. Cache memory
  23. Celeron
  24. Central processing unit
  25. Chat room
  26. Client
  27. Command line interface
  28. Compiler
  29. Computer
  30. Computer bus
  31. Computer card
  32. Computer display
  33. Computer file
  34. Computer games
  35. Computer graphics
  36. Computer hardware
  37. Computer keyboard
  38. Computer networking
  39. Computer printer
  40. Computer program
  41. Computer programmer
  42. Computer science
  43. Computer security
  44. Computer software
  45. Computer storage
  46. Computer system
  47. Computer terminal
  48. Computer virus
  49. Computing
  50. Conference call
  51. Context menu
  52. Creative commons
  53. Creative Commons License
  54. Creative Technology
  55. Cursor
  56. Data
  57. Database
  58. Data storage device
  59. Debuggers
  60. Demo
  61. Desktop computer
  62. Digital divide
  63. Discussion groups
  64. DNS server
  65. Domain name
  66. DOS
  67. Download
  68. Download manager
  69. DVD-ROM
  70. DVD-RW
  71. E-mail
  72. E-mail spam
  73. File Transfer Protocol
  74. Firewall
  75. Firmware
  76. Flash memory
  77. Floppy disk drive
  78. GNU
  79. GNU General Public License
  80. GNU Project
  81. Google
  82. Google AdWords
  83. Google bomb
  84. Graphics
  85. Graphics card
  86. Hacker
  87. Hacker culture
  88. Hard disk
  89. High-level programming language
  90. Home computer
  91. HTML
  92. Hyperlink
  93. IBM
  94. Image processing
  95. Image scanner
  96. Instant messaging
  97. Instruction
  98. Intel
  99. Intel Core 2
  100. Interface
  101. Internet
  102. Internet bot
  103. Internet Explorer
  104. Internet protocols
  105. Internet service provider
  106. Interoperability
  107. IP addresses
  108. IPod
  109. Joystick
  110. JPEG
  111. Keyword
  112. Laptop computer
  113. Linux
  114. Linux kernel
  115. Liquid crystal display
  116. List of file formats
  117. List of Google products
  118. Local area network
  119. Logitech
  120. Machine language
  121. Mac OS X
  122. Macromedia Flash
  123. Mainframe computer
  124. Malware
  125. Media center
  126. Media player
  127. Megabyte
  128. Microsoft
  129. Microsoft Windows
  130. Microsoft Word
  131. Mirror site
  132. Modem
  133. Motherboard
  134. Mouse
  135. Mouse pad
  136. Mozilla Firefox
  137. Mp3
  138. MPEG
  139. MPEG-4
  140. Multimedia
  141. Musical Instrument Digital Interface
  142. Netscape
  143. Network card
  144. News ticker
  145. Office suite
  146. Online auction
  147. Online chat
  148. Open Directory Project
  149. Open source
  150. Open source software
  151. Opera
  152. Operating system
  153. Optical character recognition
  154. Optical disc
  155. output
  156. PageRank
  157. Password
  158. Pay-per-click
  159. PC speaker
  160. Peer-to-peer
  161. Pentium
  162. Peripheral
  163. Personal computer
  164. Personal digital assistant
  165. Phishing
  166. Pirated software
  167. Podcasting
  168. Pointing device
  169. POP3
  170. Programming language
  171. QuickTime
  172. Random access memory
  173. Routers
  174. Safari
  175. Scalability
  176. Scrollbar
  177. Scrolling
  178. Scroll wheel
  179. Search engine
  180. Security cracking
  181. Server
  182. Simple Mail Transfer Protocol
  183. Skype
  184. Social software
  185. Software bug
  186. Software cracker
  187. Software library
  188. Software utility
  189. Solaris Operating Environment
  190. Sound Blaster
  191. Soundcard
  192. Spam
  193. Spamdexing
  194. Spam in blogs
  195. Speech recognition
  196. Spoofing attack
  197. Spreadsheet
  198. Spyware
  199. Streaming media
  200. Supercomputer
  201. Tablet computer
  202. Telecommunications
  203. Text messaging
  204. Trackball
  205. Trojan horse
  206. TV card
  207. Unicode
  208. Uniform Resource Identifier
  209. Unix
  210. URL redirection
  211. USB flash drive
  212. USB port
  213. User interface
  214. Vlog
  215. Voice over IP
  216. Warez
  217. Wearable computer
  218. Web application
  219. Web banner
  220. Web browser
  221. Web crawler
  222. Web directories
  223. Web indexing
  224. Webmail
  225. Web page
  226. Website
  227. Wiki
  228. Wikipedia
  229. WIMP
  230. Windows CE
  231. Windows key
  232. Windows Media Player
  233. Windows Vista
  234. Word processor
  235. World Wide Web
  236. Worm
  237. XML
  238. X Window System
  239. Yahoo
  240. Zombie computer

This article is from:

All text is available under the terms of the GNU Free Documentation License: 

Spam in blogs

From Wikipedia, the free encyclopedia


Spam in blogs (also called simply blog spam or comment spam) is a form of spamdexing. It is done by automatically posting random comments, promoting commercial services, to blogs, wikis, guestbooks, or other publicly accessible online discussion boards. Any web application that accepts and displays hyperlinks submitted by visitors may be a target.

Adding links that point to the spammer's web site artificially increases the site's search engine ranking. An increased ranking often results in the spammer's commercial site being listed ahead of other sites for certain searches, increasing the number of potential visitors and paying customers.


This type of spam originally appeared in internet guestbooks, where spammers repeatedly fill a guestbook with links to their own site and no relevant comment to increase search engine rankings. If an actual comment is given it is often just "cool page", "nice website", or keywords of the spammed link.

In 2003, spammers began to take advantage of the open nature of comments in the blogging software like Movable Type by repeatedly placing comments to various blog posts that provided nothing more than a link to the spammer's commercial web site. Jay Allen created a free plugin, called MT-BlackList, for the Movable Type weblog tool (versions prior to 3.2) that attempted to alleviate this problem. Many current blogging packages now have methods of preventing or reducing the effect of blog spam, but spammers become smarter as well. Many of them use special blog spamming tools like Trackback Submitter to bypass comment spam protection on popular blogging systems like Movable Type, Wordpress and others.

Possible solutions

Blocking by keyword

This is simplest form of blocking, which yields very good results, because comment spam is targeted at bots, so it must be readable by simple software. A lot of spam can be blocked by banning names of popular pharmaceuticals and casino games.


In early 2005 Google announced that hyperlinks with rel="nofollow" attribute would not influence the link target's ranking in the search engine's index.

(rel="nofollow" actually tells a search engine "Don't score this link" rather than "Don't follow this link." This differs from the meaning of nofollow as used within a robots meta tag, which does tell a search engine: "Do not follow any of the hyperlinks in the body of this document.")

Using rel="nofollow" is a much easier solution that makes the improvised techniques above irrelevant. Most weblog software now marks reader-submitted links this way by default (with no option to disable it without code modification). A more sophisticated server software could spare the nofollow for links submitted by trusted users like those registered for a long time or on a whitelist or with a high karma. Some server software adds rel="nofollow" to pages that have been recently edited but omits it from stable pages, under the theory that stable pages will have had offending links removed by human editors.

Some weblog authors object to the use of rel="nofollow", arguing, for example[1], that

  • Link spammers will continue to spam everyone to reach the sites that do not use rel="nofollow"
  • Link spammers will continue to place links for clicking (by surfers), even if those links are ignored by search engines.
  • Google is advocating the use of rel="nofollow" in order to reduce the effect of heavy inter-blog linking on page ranking

In particular, on the English Wikipedia, after a discussion, it was decided not to use rel="nofollow" in articles and to use a URL blacklist instead. In this way, Wikipedia contributes to the scores of the pages it links to, and expects editors to link to relevant pages. However, Wikipedia does use rel="nofollow" on pages that are not considered to be part of the actual encyclopedia, such as discussion pages, and Wikipedia projects in languages other than English also use it in articles.[2]

Other websites like Slashdot, with high user participation, use improvised nofollow implementations like adding rel="nofollow" only for potentially misbehaving users. Potential spammers posing as users can be determined through various heuristics like age of registered account and other factors. Slashdot also uses the poster's karma as a determinant in attaching a nofollow tag to user submitted links.

Turing tests

Various methods requiring humans to do spamming by hand have been attempted. A variety of CAPTCHA gateways have been implemented, in an effort to prevent bots from submitting entries. Drawbacks to this are the annoyance it poses for regular users, the lack of any alternative for visually impaired users, and the ability of some advanced bots to defeat simple captchas most of the time.

Disallowing links in posts

There is neglible gain from spam that does not contain links, so currently all spam posts contain (excessive number of) links. It is safe to require passing turing tests only if post contains links and letting all other posts through.


Instead of displaying a direct hyperlink submitted by a visitor, a web application could display a link to a script on its own website that redirects to the correct URL. This will not prevent all spam since spammers do not always check for link redirection, but effectively prevents against increasing their PageRank, just as rel=nofollow. An added benefit is that the redirection script can count how many people visit external URLs, although it will increase the load on the site.

Redirects should be server-side to avoid accessibility issues related to client-side redirects. This can be done via the .htaccess file in Apache.

Another way of preventing PageRank leakage is to make use of public redirection services such as TinyURL or My-Own.Net. For example,

<a href="" rel="nofollow" >Link</a>

where 'alias_of_target' is the alias of target address.

Distributed approaches

This approach is very new to addressing link spam. One of the shortcomings of link spam filters is that most sites only receive one link from each domain which is running a spam campaign. If the spammer varies IP addresses, there is little to no distiguishable pattern left on the vandalized site. The pattern, however, is left across the thousands of sites that were hit quickly with the same links.

A distributed approach, like the free LinkSleeve, uses XML-RPC to communicate between the various server applications (such as blogs, guestbooks, forums, and wikis) and the filter server, in this case LinkSleeve. The posted data is stripped of urls and each url is checked against recently submitted urls across the web. If a threshold is exceeded, a "reject" response is returned, thus deleting the comment, message, or posting. Otherwise, an "accept" message is sent.

A more robust distributed approach is Akismet, which uses a similar approach to LinkSleeve but uses API keys to assign trust to nodes and also has wider distribution as a result of being bundled with the 2.0 release of WordPress. They claim over 140,000 blogs contributing to their system. Akismet libraries have been implemented for Java, Python, Ruby, and PHP, but its adoption may be hindered by the requirement of an API key and its commercial use restrictions. No such restrictions are in place for Linksleeve.

Application-specific anti-spam methods

Particularly popular software products such as Movable Type and MediaWiki have developed their own custom anti-spam measures, as spammers focus more attention on targeting those platforms. Whitelists and blacklists that prevent certain IPs from posting, or that prevent people from posting content that matches certain filters, are common defenses. More advanced access control lists require various forms of validation before users can contribute anything like linkspam.

The goal in every case is to allow good users to continue to add links to their comments, as that is considered by some to be a valuable aspect of any comments section.

RSS feed monitoring

Some wikis allow you to access an RSS feed of recent changes or comments. If you add that to your news reader and set up a smart search for common spam terms (usually viagra and other drug names) you can quickly identify and remove the offending spam.

Response tokens

Another filter available to webmasters is to add a hidden session token or hash function to their comment form. When the comments are submitted, data stored within the posting such as IP address and time of posting can be compared to the data stored with the session token or hash generated when the user loaded the comment form. Postings that use different IP addresses for loading the comment form and posting the comment form, or postings that took unusually short or long periods of time to compose can be filtered out. This method is particularly effective against spammers who spoof their IP Address in an attempt to conceal their identities.


Some blog software such as Typo allow the blog administrator to only allow comments submitted via Ajax XMLHttpRequests, and discard regular form POST requests. Although Ajax comment forms can be easily defeated after examining the page source, spammers so far have mainly chosen to pass such opportunities by.

Switching off comments

Some bloggers have chosen to turn off comments because of the volume of spam.

See also

  • Social networking spam

External links

  • Anti-spam Features of MediaWiki
  • Article about latest spamming techniques on Search Engine Journal
  • Six Apart Comment Spam Guide, fairly broad overview from Movable Type's authors.
  • The (Evil) Genius of Comment Spammers, an article on link spam from Wired magazine.
  • Gilad Mishne, David Carmel and Ronny Lempel: Blocking Blog Spam with Language Model Disagreement, PDF. From the First International Workshop on Adversarial Information Retrieval (AIRWeb'05) Chiba, Japan, 2005.
  • A Comprehensive Guide to Protecting Your Blog from Spam - a series of measures you can follow to making your WordPress Blog spamfree
  • Spam Huntress The Norwegian Spam Huntress - Ann Elisabeth
  • Anti Spam Articles. -Anti Spam Articles and lots of information.
  • SecuriTeam Blogs Spam section Intensive technical posting by the Gadi Evron on blog spam techniques and counter-measures.
  • SignedPing An open specification for blog security to combat spam.
Retrieved from ""