Spell checker

In computing, a spell checker (or spell check) is an application program that flags words in a document that may not be spelt (or spelled US English) correctly. Spell checkers may be stand-alone capable of operating on a block of text, or as part of a larger application, such as a word processor, email client, electronic dictionary, or search engine.

Eye have a spelling chequer,
It came with my Pea Sea.
It plane lee marks four my revue
Miss Steaks I can knot sea.

Eye strike the quays and type a whirred
And weight four it two say
Weather eye am write oar wrong
It tells me straight a weigh.

Eye ran this poem threw it,
Your shore real glad two no.
Its vary polished in its weigh.
My chequer tolled me sew.

A chequer is a bless thing,
It freeze yew lodes of thyme.
It helps me right all stiles of righting,
And aides me when eye rime.

Each frays come posed up on my screen
Eye trussed too bee a joule.
The chequer pours o'er every word
Two cheque sum spelling rule.

The original version of this poem was written by Jerrold H. Zar in 1992. An unsophisticated spell checker will find little or no fault with this poem because it checks words in isolation. A more sophisticated spell checker will make use of a language model to consider the context in which a word occurs.

Design

A spell checker customarily consists of two parts:

A set of routines for scanning text and extracting words, and
An algorithm for comparing the extracted words against a known list of correctly spelled words (i.e., the dictionary).

The scanning routines sometimes include language-dependent algorithms for handling morphology. Even for a lightly inflected language like English, word extraction routines will need to handle such phenomena as contractions and possessives. It is unclear whether morphological analysis provides a significant benefit for English, though its benefits for highly synthetic languages such as German, Hungarian or Turkish are clear.

The word list might contain just a list of words, or it might also contain additional information, such as hyphenation points or lexical and grammatical attributes.

As an adjunct to these two components, the program's user interface will allow users to approve replacements and modify the program's operation.

One exception to the above paradigm are spell checkers which use solely statistical information, such as n-grams. This approach usually requires a lot of effort to obtain sufficient statistical information and may require a lot more runtime storage. These methods are not currently in general use. In some cases spell checkers use a fixed list of misspellings and suggestions for those misspellings; this less flexible approach is often used in paper-based correction methods, such as the see also entries of encyclopedias.

History

Research extends back to 1957, including spelling checkers for bitmap images of cursive writing and special applications to find records in databases in spite of incorrect entries. In 1961, Les Earnest, who headed the research on this budding technology, saw it necessary to include the first spell checker that accessed a list of 10,000 acceptable words.^[1] Ralph Gorin, a graduate student under Earnest at the time, created the first true spelling checker program written as an applications program (rather than research) for general English text: Spell for the DEC PDP-10 at Stanford University's Artificial Intelligence Laboratory, in February 1971.^[2] Gorin wrote SPELL in assembly language, for faster action; he made the first spelling corrector by searching the word list for plausible correct spellings that differ by a single letter or adjacent letter transpositions and presenting them to the user. Gorin made SPELL publicly accessible, as was done with most SAIL (Stanford Artificial Intelligence Laboratory) programs, and it soon spread around the world via the new ARPAnet, about ten years before personal computers came into general use.^[3] Spell, its algorithms and data structures inspired the Unix ispell program.

The first spell checkers were widely available on mainframe computers in the late 1970s. A group of six linguists from Georgetown University developed the first spell-check system for the IBM corporation.^[4]

The company Software Concepts, Inc., founded by William J. Tobin in 1978, developed one of the first patented computer software programs in the United States for spelling verification. The program was used by most major word-processing and photo-typesetting systems, including Lanier, Philips, and Xerox, among many others.^[5]^[6] The patent the company was issued in 1980 for the Spell-Checking program was one of the first software patents issued in the United States, Canada, and Europe.^[6]

The first spell checkers for personal computers appeared for CP/M and TRS-80 computers in 1980, followed by packages for the IBM PC after it was introduced in 1981. Developers such as Maria Mariani,^[4] Soft-Art, Microlytics, Proximity, Circle Noetics, and Reference Software^{[citation
needed]} rushed OEM packages or end-user products into the rapidly expanding software market, primarily for the PC but also for Apple Macintosh, VAX, and Unix. On the PCs, these spell checkers were standalone programs, many of which could be run in TSR mode from within word-processing packages on PCs with sufficient memory.

However, the market for standalone packages was short-lived, as by the mid 1980s developers of popular word-processing packages like WordStar and WordPerfect had incorporated spell checkers in their packages, mostly licensed from the above companies, who quickly expanded support from just English to European and eventually even Asian languages. However, this required increasing sophistication in the morphology routines of the software, particularly with regard to heavily-agglutinative languages like Hungarian and Finnish. Although the size of the word-processing market in a country like Iceland might not have justified the investment of implementing a spell checker, companies like WordPerfect nonetheless strove to localize their software for as many as possible national markets as part of their global marketing strategy.

Recently, spell checking has moved beyond word processors as Firefox 2.0, a web browser, has spell check support for user-written content, such as when editing Wikitext, writing on many webmail sites, blogs, and social networking websites. The web browsers Google Chrome, Konqueror, and Opera, the email client Kmail and the instant messaging client Pidgin also offer spell checking support, transparently using GNU Aspell as their engine. Mac OS X now has spell check systemwide, extending the service to virtually all bundled and third party applications.

Functionality

The first spell checkers were "verifiers" instead of "correctors." They offered no suggestions for incorrectly spelled words. This was helpful for typos but it was not so helpful for logical or phonetic errors. The challenge the developers faced was the difficulty in offering useful suggestions for misspelled words. This requires reducing words to a skeletal form and applying pattern-matching algorithms.

It might seem logical that where spell-checking dictionaries are concerned, "the bigger, the better," so that correct words are not marked as incorrect. In practice, however, an optimal size for English appears to be around 90,000 entries. If there are more than this, incorrectly spelled words may be skipped because they are mistaken for others. For example, a linguist might determine on the basis of corpus linguistics that the word baht is more frequently a misspelling of bath or bat than a reference to the Thai currency. Hence, it would typically be more useful if a few people who write about Thai currency were slightly inconvenienced, than if the spelling errors of the many more people who discuss baths were overlooked.

A screenshot of the AbiWord spell checker

The first MS-DOS spell checkers were mostly used in proofing mode from within word processing packages. After preparing a document, a user scanned the text looking for misspellings. Later, however, batch processing was offered in such packages as Oracle's short-lived CoAuthor. This allowed a user to view the results after a document was processed and only correct the words that he or she knew to be wrong. When memory and processing power became abundant, spell checking was performed in the background in an interactive way, such as has been the case with the Sector Software produced Spellbound program released in 1987 and Microsoft Word since Word 95.

In recent years, spell checkers have become increasingly sophisticated; some are now capable of recognizing simple grammatical errors. However, even at their best, they rarely catch all the errors in a text (such as homophone errors) and will flag neologisms and foreign words as misspellings. Nonetheless, spell checkers can be considered as a type of foreign language writing aid that non-native language learners can rely on to detect and correct their misspellings in the target language.^[7]

Spell-checking non-English languages

English is unusual in that most words used in formal writing have a single spelling that can be found in a typical dictionary, with the exception of some jargon and modified words. In many languages, however, it is typical to frequently combine words in new ways. In German, compound nouns are frequently coined from other existing nouns. Some scripts do not clearly separate one word from another, requiring word-splitting algorithms. Each of these presents unique challenges to non-English language spell checkers.

Context-sensitive spell checkers

Recently, research has focused on developing algorithms which are capable of recognizing a misspelled word, even if the word itself is in the vocabulary, based on the context of the surrounding words. Not only does this allow words such as those in the poem above to be caught, but it mitigates the detrimental effect of enlarging dictionaries, allowing more words to be recognized. For example, baht in the same paragraph as Thai or Thailand would not be recognized as a misspelling of bath. The most common example of errors caught by such a system are homophone errors, such as the bold words in the following sentence:

Their coming too sea if its reel.

The most successful algorithm to date is Andrew Golding and Dan Roth's "Winnow-based spelling correction algorithm",^[8] published in 1999, which is able to recognize about 96% of context-sensitive spelling errors, in addition to ordinary non-word spelling errors. A context-sensitive spell checker appears in Microsoft Office 2007,^[9] Google Wave,^[10] Ginger Software^[11] and in Ghotit Dyslexia Software^[12] context spell checker tuned for people with dyslexia.

Criticism

Some critics^[who?] of technology and computers have attempted to link spell checkers to a trend of skill losses in writing, reading, and speaking. They claim that the convenience of computers has led people to become lazy, often not proofreading written work past a simple pass by a spell checker. Supporters^[who?] claim that these changes may actually be beneficial to society, by making writing and learning new languages more accessible to the general public. They claim that the skills lost by the invention of automated spell checkers are being replaced by better skills, such as faster and more efficient research skills. Other supporters of technology point to the fact that these skills are not being lost to people who require and make use of them regularly, such as authors, critics, and language professionals.^[13]

An example of the problem of completely relying on spell checkers is shown in the Spell-checker Poem ^[14] above. It was originally composed by Dr. Jerrold H. Zar^[15] in 1991, assisted by Mark Eckman^[16] with an original length of 225 words, and containing 123 incorrectly used words. According to most spell checkers, the poem is valid, although most people would be able to tell at a simple glance that most words are used incorrectly. As a result, spell checkers are sometimes derided as spilling chuckers or similar, slightly misspelled names.

Not all of the critics are opponents of technological progress, however. An article based on research by Galletta et al.^[17] reports that in the Galletta study, higher verbal skills are needed for highest performance when using a spell checker. The theory suggested that only writers with higher verbal skills could recognize and ignore false positives or incorrect suggestions. However, it was found that those with the higher skills lost their unaided performance advantage in multiple categories of errors, performing as poorly as the low verbals with the spell-checkers turned on. The conclusion points to some evidence of a loss of skill.

References

^ Earnest, Les. "The First Three Spelling Checkers". Stanford University. Retrieved 10 October 2011.
^ Peterson, James (Dec 1980). Computer Programs for Detecting and Correcting Spelling Errors. Retrieved 2011-02-18.
^ Earnest, Les. Visible Legacies for Y3K. Retrieved 2011-02-18.
^ ^a ^b "Georgetown U Faculty & Staff: The Center for Language, Education & Development". Retrieved 2008-12-18., citation: "Maria Mariani... was one of a group of six linguists from Georgetown University who developed the first spell-check system for the IBM corporation."
^ "William J. Tobin biography". LinkedIn. Retrieved 2011-05-18.
^ ^a ^b "Mr. Tobin has been awarded 15 patents in the past 40 years". WilliamJTobin.com. Retrieved 2011-05-18.
^ Banks, T. (2008). Foreign Language Learning Difficulties and Teaching Strategies. (pp. 29). Master's Thesis, Dominican University of California. Retrieved 19 March 2012.
^ Journal Article. SpringerLink. Retrieved 22 September 2010.
^ Walt Mossberg (4 January 2007). "Review". Wall Street Journal. Retrieved 24 September 2010.
^ "Google Operating System". googlesystem.blogspot.com. Retrieved 25 September 2010. "Google's Context-Sensitive Spell Checker". May 29, 2009.
^ "Ginger Software - The World's Leading Grammar and Spell Checker". Gingersoftware.com.com. Retrieved 19 June 2011.
^ "Ghotit Dyslexia Software for People with Learning Disabilities". Ghotit.com. Retrieved 25 September 2010.
^ Baase, Sara. A Gift of Fire: Social, Legal, and Ethical Issues for Computing and the Internet. 3. Upper Saddle River: Prentice Hall, 2007. Pages 357-358. ISBN 0-13-600848-8.
^ Jerrold H. Zar. "Candidate for a Pullet Surprise". Northern Illinois University. Retrieved 24 September 2010.
^ "Retired faculty page". NIU.edu. Retrieved 6 May 2010.
^ Richard Nordquist. "The Spell Checker Poem, by Mark Eckman and Jerrold H. Zar". About.com. Retrieved 24 September 2010.
^ Education.com Is Spell Check Creating a Generation of Dummies?

CONDIZIONI DI USO DI QUESTO SITO • agg. 13.12.12
L'utente può utilizzare il sito ELINGUE solo se comprende e accetta quanto segue:

le risorse e i servizi linguistici presentati all'interno della cartella di sito denominata ELINGUE (www.englishgratis.com/elingue) , d'ora in poi definita "ELINGUE", sono accessibili solo previa sottoscrizione di un abbonamento a pagamento e si possono utilizzare esclusivamente per uso personale e non commerciale con tassativa esclusione di ogni condivisione comunque effettuata. Tutti i diritti sono riservati. La riproduzione anche parziale è vietata senza autorizzazione scritta.
si precisa altresì che il nome del sito EnglishGratis, che ospita ELINGUE, è esclusivamente un marchio di fantasia e un nome di dominio internet che fa riferimento alla disponibilità sul sito di un numero molto elevato di risorse gratuite e non implica dunque in alcun modo una promessa di gratuità relativamente a prodotti e servizi nostri o di terze parti pubblicizzati a mezzo banner e link, o contrassegnati chiaramente come prodotti a pagamento (anche ma non solo con la menzione "Annuncio pubblicitario"), o comunque menzionati nelle pagine del sito ma non disponibili sulle pagine pubbliche, non protette da password, del sito stesso. In particolare sono esclusi dalle pretese di gratuità i seguenti prodotti a pagamento: il nuovo abbonamento ad ELINGUE, i corsi 20 ORE e le riviste English4Life. L'utente che abbia difficoltà a capire il significato del marchio English Gratis o la relazione tra risorse gratuite e risorse a pagamento è pregato di contattarci per le opportune delucidazioni PRIMA DI UTILIZZARE IL SITO onde evitare spiacevoli equivoci.
ELINGUE è riservato in linea di massima ad utenti singoli (privati o aziendali). Qualora si sia interessati ad abbonamenti multi-utente si prega di contattare la redazione per un'offerta ad hoc.
l'utente si impegna a non rivelare a nessuno i dati di accesso che gli verranno comunicati (nome utente e password)
coloro che si abbonano accettano di ricevere le nostre comunicazioni di servizio (newsletter e mail singole) che sono l'unico tramite di comunicazione tra noi e il nostro abbonato, e servono ad informare l'abbonato della scadenza imminente del suo abbonamento e a comunicargli in anticipo eventuali problematiche tecniche e di manutenzione che potrebbero comportare l'indisponibilità transitoria del sito.
Nel quadro di una totale trasparenza e cortesia verso l'utente, l'abbonamento NON si rinnova automaticamente. Per riabbonarsi l'utente dovrà di nuovo effettuare la procedura che ha dovuto compiere la prima volta che si è abbonato.
Le risorse costituite da codici di embed di YouTube e di altri siti che incoraggiano lo sharing delle loro risorse (video, libri, audio, immagini, foto ecc.) sono ovviamente di proprietà dei rispettivi siti. L'utente riconosce e accetta che 1) il sito di sharing che ce ne consente l'uso può in ogni momento revocare la disponibilità della risorsa 2) l'eventuale pubblicità che figura all'interno delle risorse non è inserita da noi ma dal sito di sharing 3) eventuali violazioni di copyright sono esclusiva responsabilità del sito di sharing mentre è ovviamente nostra cura scegliere risorse solo da siti di sharing che pratichino una politica rigorosa di controllo e interdizione delle violazioni di copyright.
Nel caso l'utente riscontri nel sito una qualsiasi violazione di copyright, è pregato di segnalarcelo immediatamente per consentirci interventi di verifica ed eventuale rimozione del contenuto in questione. I contenuti rimossi saranno, nel limite del possibile, sostituiti con altri contenuti analoghi che non violano il copyright.
I servizi linguistici da noi forniti sulle pagine del sito ma erogati da aziende esterne (per esempio, la traduzione interattiva di Google Translate e Bing Translate realizzata rispettivamente da Google e da Microsoft, la vocalizzazione Text To Speech dei testi inglesi fornita da ReadSpeaker, il vocabolario inglese-italiano offerto da Babylon con la sua Babylon Box, il servizio di commenti sociali DISQUS e altri) sono ovviamente responsabilità di queste aziende esterne. Trattandosi di servizi interattivi basati su web, possono esserci delle interruzioni di servizio in relazione ad eventi di manutenzione o di sovraccarico dei server su cui non abbiamo alcun modo di influire. Per esperienza, comunque, tali interruzioni sono rare e di brevissima durata, saremo comunque grati ai nostri utenti che ce le vorranno segnalare.
Per quanto riguarda i servizi di traduzione automatica l'utente prende atto che sono forniti "as is" dall'azienda esterna che ce li eroga (Google o Microsoft). Nonostante le ovvie limitazioni, sono strumenti in continuo perfezionamento e sono spesso in grado di fornire all'utente, anche professionale, degli ottimi suggerimenti e spunti per una migliore traduzione.
In merito all'utilizzabilità del sito ELINGUE su tablet e cellulari a standard iOs, Android, Windows Phone e Blackberry facciamo notare che l'assenza di standard comuni si ripercuote a volte sulla fruibilità di certe prestazioni tipiche del nostro sito (come il servizio ReadSpeaker e la traduzione automatica con Google Translate). Mentre da parte nostra è costante lo sforzo di rendere sempre più compatibili il nostro sito con il maggior numero di piattaforme mobili, non possiamo però assicurare il pieno raggiungimento di questo obiettivo in quanto non dipende solo da noi. Chi desidera abbonarsi è dunque pregato di verificare prima di perfezionare l'abbonamento la compatibilità del nostro sito con i suoi dispositivi informatici, mobili e non, utilizzando le pagine di esempio che riproducono una pagina tipo per ogni tipologia di risorsa presente sul nostro sito. Non saranno quindi accettati reclami da parte di utenti che, non avendo effettuato queste prove, si trovino poi a non avere un servizio corrispondente a quello sperato. In tutti i casi, facciamo presente che utilizzando browser come Chrome e Safari su pc non mobili (desktop o laptop tradizionali) si ha la massima compatibilità e che il tempo gioca a nostro favore in quanto mano a mano tutti i grandi produttori di browser e di piattaforme mobili stanno convergendo, ognuno alla propria velocità, verso standard comuni.
Il sito ELINGUE, diversamente da English Gratis che vive anche di pubblicità, persegue l'obiettivo di limitare o non avere affatto pubblicità sulle proprie pagine in modo da garantire a chi studia l'assenza di distrazioni. Le uniche eccezioni sono 1) la promozione di alcuni prodotti linguistici realizzati e/o garantiti da noi 2) le pubblicità incorporate dai siti di sharing direttamente nelle risorse embeddate che non siamo in grado di escludere 3) le pubblicità eventualmente presenti nei box e player che servono ad erogare i servizi linguistici interattivi prima citati (Google, Microsoft, ReadSpeaker, Babylon ecc.).
Per quanto riguarda le problematiche della privacy, non effettuiamo alcun tracciamento dell'attività dell'utente sul nostro sito neppure a fini statistici. Tuttavia non possiamo escludere che le aziende esterne che ci offrono i loro servizi o le loro risorse in modalità sharing effettuino delle operazioni volte a tracciare le attività dell'utente sul nostro sito. Consigliamo quindi all'utente di utilizzare browser che consentano la disattivazione in blocco dei tracciamenti o l'inserimento di apposite estensioni di browser come Ghostery che consentono all'utente di bloccare direttamente sui browser ogni agente di tracciamento.
Le risposte agli utenti nella sezione di commenti sociali DISQUS sono fornite all'interno di precisi limiti di accettabilità dei quesiti posti dall'utente. Questi limiti hanno lo scopo di evitare che il servizio possa essere "abusato" attraverso la raccolta e sottoposizione alla redazione di ELINGUE di centinaia o migliaia di quesiti che intaserebbero il lavoro della redazione. Si prega pertanto l'utente di leggere attentamente e comprendere le seguenti limitazioni d'uso del servizio:
- il servizio è moderato per garantire che non vengano pubblicati contenuti fuori tema o inadatti all'ambiente di studio online
- la redazione di ELINGUE si riserva il diritto di editare gli interventi degli utenti per correzioni ortografiche e per chiarezza
- il servizio è erogato solo agli utenti abbonati registrati gratuitamente al servizio di commenti sociali DISQUS
- l'utente non può formulare più di un quesito al giorno
- un quesito non può contenere, salvo eccezioni, più di una domanda
- un utente non può assumere più nomi, identità o account di Disqus per superare i limiti suddetti
- nell'ambito del servizio non sono forniti servizi di traduzione
- la redazione di ELINGUE gestisce la priorità delle risposte in modo insindacabile da parte dell'utente
- in tutti i casi, la redazione di ELINGUE è libera in qualsiasi momento di de-registrare temporaneamente l'utente abbonato dal
servizio DISQUS qualora sussistano fondati motivi a suo insindacabile giudizio. La misura verrà comunque attuata solo in casi di
eccezionale gravità.
L'utente, inoltre, accetta di tenere Casiraghi Jones Publishing SRL indenne da qualsiasi tipo di responsabilità per l'uso - ed eventuali conseguenze di esso - delle informazioni linguistiche e grammaticali contenute sul sito, in particolare, nella sezione Disqus. Le nostre risposte grammaticali sono infatti improntate ad un criterio di praticità e pragmaticità che a volte è in conflitto con la rigidità delle regole "ufficiali" che tendono a proporre un inglese schematico e semplificato dimenticando la ricchezza e variabilità della lingua reale. Anche l'occasionale difformità tra le soluzioni degli esercizi e le regole grammaticali fornite nella grammatica va concepita come stimolo a formulare domande alla redazione onde poter spiegare più nei dettagli le particolarità della lingua inglese che non possono essere racchiuse in un'opera grammaticale di carattere meramente introduttivo come la nostra grammatica online.

ELINGUE è un sito di Casiraghi Jones Publishing SRL
Piazzale Cadorna 10 - 20123 Milano - Italia
Tel. 02-36553040 - Fax 02-3535258 email: robertocasiraghi@iol.it
Iscritta al Registro Imprese di MILANO - C.F. e PARTITA IVA: 11603360154
Iscritta al R.E.A. di al n. 1478561 • Capitale Sociale Euro 10.400,00 interamente versato

Contents