In
computing, a spell checker (or spell check) is an
application program that flags words in a document that may not be
spelt (or
spelled
US English) correctly. Spell checkers may be stand-alone capable of
operating on a block of text, or as part of a larger application, such
as a
word processor,
email client, electronic
dictionary, or
search engine.
Eye have a spelling chequer,
It came with my Pea Sea.
It plane lee marks four my revue
Miss Steaks I can knot sea.
Eye strike the quays and type a
whirred
And weight four it two say
Weather eye am write oar wrong
It tells me straight a weigh.
Eye ran this poem threw it,
Your shore real glad two no.
Its vary polished in its weigh.
My chequer tolled me sew.
A chequer is a bless thing,
It freeze yew lodes of thyme.
It helps me right all stiles of righting,
And aides me when eye rime.
Each frays come posed up on my screen
Eye trussed too bee a joule.
The chequer pours o'er every word
Two cheque sum spelling rule.
The original version of this poem was written by Jerrold H.
Zar in 1992. An unsophisticated spell checker will find
little or no fault with this poem because it checks words in
isolation. A more sophisticated spell checker will make use
of a
language model to consider the context in which a word
occurs.
Design
A spell checker customarily consists of two parts:
- A set of routines for scanning text and extracting words, and
- An algorithm for comparing the extracted words against a known
list of correctly spelled words (i.e., the dictionary).
The scanning routines sometimes include language-dependent algorithms
for handling
morphology. Even for a lightly inflected language like
English, word extraction routines will need to handle such phenomena
as
contractions and
possessives. It is unclear whether morphological analysis provides a
significant benefit for English, though its benefits for highly
synthetic languages such as German, Hungarian or Turkish are clear.
The word list might contain just a list of words, or it might also
contain additional information, such as hyphenation points or lexical
and grammatical attributes.
As an adjunct to these two components, the program's
user interface will allow users to approve replacements and modify
the program's operation.
One exception to the above paradigm are spell checkers which use
solely statistical information, such as
n-grams.
This approach usually requires a lot of effort to obtain sufficient
statistical information and may require a lot more runtime storage.
These methods are not currently in general use. In some cases spell
checkers use a fixed list of misspellings and suggestions for those
misspellings; this less flexible approach is often used in paper-based
correction methods, such as the see also entries of
encyclopedias.
History
Research extends back to 1957, including spelling checkers for bitmap
images of cursive writing and special applications to find records in
databases in spite of incorrect entries. In 1961,
Les Earnest, who headed the research on this budding technology, saw
it necessary to include the first spell checker that accessed a list of
10,000 acceptable words.[1]
Ralph Gorin, a graduate student under Earnest at the time, created the
first true spelling checker program written as an applications program
(rather than research) for general English text: Spell for the DEC
PDP-10 at Stanford University's Artificial Intelligence Laboratory, in
February 1971.[2]
Gorin wrote SPELL in assembly language, for faster action; he made the
first spelling corrector by searching the word list for plausible
correct spellings that differ by a single letter or adjacent letter
transpositions and presenting them to the user. Gorin made SPELL
publicly accessible, as was done with most SAIL (Stanford Artificial
Intelligence Laboratory) programs, and it soon spread around the world
via the new ARPAnet, about ten years before personal computers came into
general use.[3]
Spell, its algorithms and data structures inspired the Unix ispell
program.
The first spell checkers were widely available on mainframe computers
in the late 1970s. A group of six linguists from
Georgetown University developed the first spell-check system for the
IBM corporation.[4]
The company Software Concepts, Inc., founded by
William J. Tobin in 1978, developed one of the first patented
computer software programs in the
United States for spelling verification. The program was used by
most major word-processing and photo-typesetting systems, including
Lanier,
Philips,
and Xerox,
among many others.[5][6]
The patent the company was issued in 1980 for the Spell-Checking program
was one of the first software patents issued in the
United States,
Canada,
and Europe.[6]
The first spell checkers for personal computers appeared for
CP/M and
TRS-80
computers in 1980, followed by packages for the
IBM PC after it was introduced in 1981. Developers such as Maria
Mariani,[4]
Soft-Art, Microlytics, Proximity, Circle Noetics, and Reference Software[citation
needed] rushed
OEM packages or end-user products into the rapidly expanding
software market, primarily for the PC but also for
Apple Macintosh,
VAX, and
Unix. On
the PCs, these spell checkers were standalone programs, many of which
could be run in
TSR mode from within word-processing packages on PCs with sufficient
memory.
However, the market for standalone packages was short-lived, as by
the mid 1980s developers of popular word-processing packages like
WordStar and
WordPerfect had incorporated spell checkers in their packages,
mostly licensed from the above companies, who quickly expanded support
from just
English to
European
and eventually even
Asian languages. However, this required increasing sophistication in
the morphology routines of the software, particularly with regard to
heavily-agglutinative languages like
Hungarian and
Finnish. Although the size of the word-processing market in a
country like
Iceland
might not have justified the investment of implementing a spell checker,
companies like WordPerfect nonetheless strove to localize their software
for as many as possible national markets as part of their global
marketing strategy.
Recently, spell checking has moved beyond word processors as
Firefox
2.0, a
web browser, has spell check support for user-written content, such
as when editing Wikitext, writing on many
webmail
sites,
blogs, and
social networking websites. The web browsers
Google Chrome,
Konqueror, and
Opera, the email client
Kmail and the
instant messaging
client
Pidgin also offer spell checking support, transparently using
GNU
Aspell as their engine.
Mac OS X now has spell check systemwide, extending the service to
virtually all bundled and third party applications.
Functionality
The first spell checkers were "verifiers" instead of "correctors."
They offered no suggestions for incorrectly spelled words. This was
helpful for
typos but it was not so helpful for logical or phonetic errors. The
challenge the developers faced was the difficulty in offering useful
suggestions for misspelled words. This requires reducing words to a
skeletal form and applying pattern-matching algorithms.
It might seem logical that where spell-checking dictionaries are
concerned, "the bigger, the better," so that correct words are not
marked as incorrect. In practice, however, an optimal size for English
appears to be around 90,000 entries. If there are more than this,
incorrectly spelled words may be skipped because they are mistaken for
others. For example, a linguist might determine on the basis of
corpus linguistics that the word
baht is more frequently a misspelling of bath or bat
than a reference to the Thai currency. Hence, it would typically be more
useful if a few people who write about Thai currency were slightly
inconvenienced, than if the spelling errors of the many more people who
discuss baths were overlooked.
A screenshot of the
AbiWord spell checker
The first MS-DOS spell checkers were mostly used in proofing mode
from within word processing packages. After preparing a document, a user
scanned the text looking for misspellings. Later, however, batch
processing was offered in such packages as
Oracle's short-lived CoAuthor. This allowed a user to view the
results after a document was processed and only correct the words that
he or she knew to be wrong. When memory and processing power became
abundant, spell checking was performed in the background in an
interactive way, such as has been the case with the Sector Software
produced Spellbound program released in 1987 and
Microsoft Word since Word 95.
In recent years, spell checkers have become increasingly
sophisticated; some are now capable of recognizing simple
grammatical errors. However, even at their best, they rarely catch
all the errors in a text (such as
homophone errors) and will flag
neologisms and foreign words as misspellings. Nonetheless, spell
checkers can be considered as a type of
foreign language writing aid that non-native language learners can
rely on to detect and correct their misspellings in the target language.[7]
Spell-checking non-English languages
English is unusual in that most words used in formal writing have a
single spelling that can be found in a typical dictionary, with the
exception of some jargon and modified words. In many languages, however,
it is typical to frequently combine words in new ways. In German,
compound nouns are frequently coined from other existing nouns. Some
scripts do not clearly separate one word from another, requiring
word-splitting algorithms. Each of these presents unique challenges to
non-English language spell checkers.
Context-sensitive spell checkers
Recently, research has focused on developing algorithms which are
capable of recognizing a misspelled word, even if the word itself is in
the vocabulary, based on the context of the surrounding words. Not only
does this allow words such as those in the poem above to be caught, but
it mitigates the detrimental effect of enlarging dictionaries, allowing
more words to be recognized. For example, baht in the same
paragraph as Thai or Thailand would not be recognized as a
misspelling of bath. The most common example of errors caught by
such a system are
homophone errors, such as the bold words in the following sentence:
- Their coming too sea if its reel.
The most successful algorithm to date is Andrew Golding and Dan
Roth's "Winnow-based
spelling correction algorithm",[8]
published in 1999, which is able to recognize about 96% of
context-sensitive spelling errors, in addition to ordinary non-word
spelling errors. A
context-sensitive spell checker appears in
Microsoft Office 2007,[9]
Google Wave,[10]
Ginger Software[11]
and in Ghotit Dyslexia Software[12]
context spell checker tuned for people with dyslexia.
Criticism
Some critics[who?]
of technology and computers have attempted to link spell checkers to a
trend of skill losses in writing, reading, and speaking. They claim that
the convenience of computers has led people to become lazy, often not
proofreading written work past a simple pass by a spell checker.
Supporters[who?]
claim that these changes may actually be beneficial to society, by
making writing and learning new languages more accessible to the general
public. They claim that the skills lost by the invention of automated
spell checkers are being replaced by better skills, such as faster and
more efficient research skills. Other supporters of technology point to
the fact that these skills are not being lost to people who require and
make use of them regularly, such as authors, critics, and language
professionals.[13]
An example of the problem of completely relying on spell checkers is
shown in the Spell-checker Poem
[14]
above. It was originally composed by Dr. Jerrold H. Zar[15]
in 1991, assisted by Mark Eckman[16]
with an original length of 225 words, and containing 123 incorrectly
used words. According to most spell checkers, the poem is valid,
although most people would be able to tell at a simple glance that most
words are used incorrectly. As a result, spell checkers are sometimes
derided as spilling chuckers or similar, slightly misspelled
names.
Not all of the critics are opponents of technological progress,
however. An article based on research by Galletta et al.[17]
reports that in the Galletta study, higher verbal skills are needed for
highest performance when using a spell checker. The theory suggested
that only writers with higher verbal skills could recognize and ignore
false positives or incorrect suggestions. However, it was found that
those with the higher skills lost their unaided performance advantage in
multiple categories of errors, performing as poorly as the low verbals
with the spell-checkers turned on. The conclusion points to some
evidence of a loss of skill.
See also
References
-
^
Earnest,
Les.
"The First Three Spelling Checkers". Stanford University.
Retrieved 10 October 2011.
-
^
Peterson,
James (Dec 1980).
Computer Programs for Detecting and Correcting Spelling Errors.
Retrieved 2011-02-18.
-
^
Earnest,
Les.
Visible Legacies for Y3K.
Retrieved 2011-02-18.
-
^
a
b
"Georgetown U Faculty & Staff: The Center for Language, Education &
Development". Retrieved
2008-12-18., citation: "Maria Mariani... was one of a
group of six linguists from Georgetown University who developed the
first spell-check system for the IBM corporation."
-
^
"William J. Tobin biography".
LinkedIn. Retrieved
2011-05-18.
- ^
a
b
"Mr. Tobin has been awarded 15 patents in the past 40 years".
WilliamJTobin.com. Retrieved
2011-05-18.
-
^
Banks, T. (2008).
Foreign Language Learning Difficulties and Teaching Strategies.
(pp. 29). Master's Thesis, Dominican University of California.
Retrieved 19 March 2012.
-
^
Journal Article. SpringerLink.
Retrieved 22 September 2010.
-
^
Walt
Mossberg (4 January 2007).
"Review". Wall Street Journal.
Retrieved 24 September 2010.
-
^
"Google Operating System". googlesystem.blogspot.com.
Retrieved 25 September 2010.
"Google's Context-Sensitive Spell
Checker". May 29, 2009.
-
^
"Ginger Software - The World's Leading Grammar and Spell Checker".
Gingersoftware.com.com. Retrieved
19 June 2011.
-
^
"Ghotit Dyslexia Software for People with Learning Disabilities".
Ghotit.com. Retrieved 25
September 2010.
-
^
Baase, Sara. A Gift of Fire: Social,
Legal, and Ethical Issues for Computing and the Internet. 3. Upper
Saddle River: Prentice Hall, 2007. Pages 357-358.
ISBN 0-13-600848-8.
-
^
Jerrold H.
Zar.
"Candidate for a Pullet Surprise". Northern Illinois
University. Retrieved 24
September 2010.
-
^
"Retired faculty page". NIU.edu.
Retrieved 6 May 2010.
-
^
Richard
Nordquist.
"The Spell Checker Poem, by Mark Eckman and Jerrold H. Zar".
About.com. Retrieved 24 September
2010.
-
^
Education.com Is Spell Check Creating a Generation of Dummies?