Security

Geek Trivia: What English town is infamous for its role in testing early profanity filters?

There's a computer security term for sorting issues involving dangerous plaintext words. The term is named after an English township that famously ran afoul of some early online profanity filters.

Scunthorpe, North Lincolnshire, England is an industrial town in eastern central Britain known for its metalworks. However, in computer security circles, it's the namesake of the Scunthorpe Problem, a phenomenon where computer language filters are befuddled by widespread false positives, rendering the filter (and often the system it's defending) useless.

The origin of the Scunthorpe Problem can be traced to 1996, when a number of Scunthorpe residents were suddenly unable to enroll as new customers for America Online's Internet service. AOL's profanity filter mistook the second through fifth letters of the word Scunthorpe for what Captain Kirk would describe as a colorful metaphor and the US Federal Communications Commission would describe as a finable offense if uttered over broadcast airwaves.

The Scunthorpe Problem isn't limited, however, to variations on George Carlin's infamous Seven Dirty Words You Can't Say on Television. The existence of medireview illustrates how even non-profane words can be mistaken for threatening terms by crudely designed language filters. While profanity filters have matured -- pun intended -- to the point they avoid false positives for most common curse words, the rise of crude spam filters has caused the Scunthorpe Problem to evolve. For example, attempting to block spam messages for the drug Cialis can result in any use of the word specialist being banned.

The need for a Semantic Web has never been clearer. Until computers can tell the difference between bank, the place where I keep my money, and bank, the side of a river (these are, in fact, two separate places), the Scunthorpe Problem will always be with us.

That's not just some infuriating English-language interpolation, it's an etymologically excellent example of Geek Trivia.

The quibble of the week

If you uncover a questionable fact or debatable aspect of this week's Geek Trivia, just post it in the discussion area of the article. Every week, yours truly will choose the best quibble from our assembled masses and discuss it in a future edition of Geek Trivia.

Get the quibble.

About

Jay Garmon has a vast and terrifying knowledge of all things obscure, obtuse, and irrelevant. One day, he hopes to write science fiction, but for now he'll settle for something stranger -- amusing and abusing IT pros. Read his full profile. You can a...

34 comments
wendygoerl
wendygoerl

It even predates the Web as we know it. I'm suprised nothing in this article mentions Sussex, or Verizon's ban on the Libshitz Family from using their last name in E-Mail ...

drawman
drawman

A few years back a female engineer complained that the company IT system had blocked an email. We were working in the UK for a large USA corporation on a civil engineering project. I soon spotted why a Yankee filter had stopped it (and I'm English) but as she was French she had no idea what the States were objecting to. The email read "if you can't move it that way use a snatch block". Perfectly innocuous in the UK.

ozchorlton
ozchorlton

I have the same issue, with the name of my home town, (on the southern outskirts of Manchester, England), called Chorlton-cum-Hardy. The number of porn filters, that block the name, can be Very High! Note that the TechRepublic Porn Filter will not allow the middle word, of the name, in this post, it replaces the letters c,u,and m, with *** - my point!

chrisbedford
chrisbedford

...are "There, they're and their" heterographic homophones. There and their, certainly, but anyone who pronounces "they're" the same as the other two must have been brought up in a very poor education system. Likewise "your" and "you're". Needless to say, I see examples of both being used incorrectly every day. Mostly by Americans.

victor.gutzler
victor.gutzler

It would have been geekier to use the term 'homographic' in describing words that are spelled the same but have different meanings. Let's just hope the filters don't turn this comment into 'gaygraphic'....;D

ian_hardie
ian_hardie

According to wikipedia..it is PENISTONE!

apaintz
apaintz

Blankensure, I see now is incorrect!

Garden Gnome
Garden Gnome

Wikipedia (fount of all wisdom and knowledge) (?) estimates the population of Scunthorpe at 74,500 in 2010. That may be a small town in the USA, but certainly isn't here. Interesting article. Have you thought of forwarding it to Verbatim? www.verbatimmag.com I think the editor would like it.

CharlieSpencer
CharlieSpencer

A conservative Christian organization that automatically substituted the word 'homosexual' for the word 'gay' in their news summaries. They wound up changing the last name of medal-winning Olympic athlete.

sboverie
sboverie

Thanks for this week's geek trivia. I did not know it had a name but I have heard of other words that were censored because it had pieces of naughty words embedded in the word. It was like Beavis and Butthead wrote the filters so they could get a chuckle out of normal words. Have you heard of an association called the Turtles? To become a member you have to answer a few questions correctly like "What does a man do standing up, a woman does sitting down and a dog does on three legs?" and the answer is proper for mixed company.

whitehound
whitehound

Scunthorpe - I don't have to look it up, I remember. Some British names for birds were also edited out as indecent because they are spelled the same as various bits of sexual anatomy. A cock in Britain is a male bird, especially a male chicken (although confusingly a cock sparrow isn't a male house sparrow but a separate species with a cocked tail), and a tit is any of several species of small garden birds similar to finches.

AnsuGisalas
AnsuGisalas

Didn't I see a film once starring a person of that name? ;) It was an art film, obviously.

AnsuGisalas
AnsuGisalas

The speed of speaking varies all the time, and there are circumstances where a person will say they're in exactly the same way as that same person would say "their" or "there" in other circumstances. The vowel of they're (depending entirely on dialect - NEVER on quality of education) is slightly higher and more monotomous, but these differences do have a strong tendency to disappear in non-distinct speech. Anyone who thinks that the distinct speech patterns of an actor doing Shakespeare is the epitome of language, is sadly disturbed. Sure it can be neat, but it's more of a perversion than it is a purified form.

AnsuGisalas
AnsuGisalas

Bank and bank and bank are homonyms If they are spelled differently, but spoken the same, they are heterographic homophones There, they're and their are heterographic homophones. If they are written the same, but pronounced differently, they are heterophonic homographs. Desert and desert are heterophonic homographs There lots of confusion, which is likely intentional, since this is really just a standard three-bit (triple binary) variation: Meaning match 1 or 0, Spelling match 1 or 0, Pronounciation match 1 or 0 (I'll call this set MSP) M1S1P1 = these are instances of the same lexeme (same meaning, sound and writing) M1S0P1 = these are words with one meaning, and one pronounciation, but different writings - word sets like "thru/through" and "color/colour". M1S1P0 = similarly, these are words with one meaning, one spelling but different pronounciations, this obviously is usually a case of dialectal differences, like the famous song by the Gershwins "Let's call the whole thing off". M0S1P1 = these are Homographic Homophones (which are called homonyms - so far so good) M0S0P1 = these are Heterographic Homophones (which are called heterographs - slightly confusing) M0S1P0 = these are heterophonic homographs (which are very confusingly called heteronyms or less confusingly, heterophones) M1S0P0 = these are words which are not spelled the same, nor spoken the same, but mean the same... in other words they are synonyms. M0S0P0 = this set of values describes the relationship between the vast majority (I believe it's warranted this time, Santeewelding) of lexemes; they don't mean the same, they're not spelled the same, and they're not pronounced the same, either. Since these are heterophonic heterographs, this is what makes me think the above naming conventions seem to be intentionally confusing ;)

AnsuGisalas
AnsuGisalas

Did they get sued into bankruptcy?!?

sboverie
sboverie

It is ironic that in your response to the medireview article that you used normal words that have been automatically filled in with asterisks.

chrisbedford
chrisbedford

In English ("British" to Americans) English, as opposed to American English there is a clear distinction. Well, OK, I'm talking about the so-called "received pronunciation", i.e. non-dialectical, non-regional English, but even in most regional accents there would be little confusion in pronunciation. One rhymes with "air" and the other in an almost two-syllable word the first part of which is clearly "they". Hard to imagine how education could not be part of that. And perhaps that's why more Americans tend to mis-spell those forms. *That* is certainly the result of poor education, because if you write "their going to town" when you mean "they're going to town" your primary education is unquestionably lacking.

RipVan
RipVan

I use the homophone to call my gay friend. Er, I mean, to RING HIM UP!

GSG
GSG

I'll have to quibble your heterophonic homographs. I assume that you mean: Desert: Dry geographic location, such as the Sahara Desert. Dessert: Yummy sweet end to a meal, such as chocolate cake. Spelled differently, pronounced differently, and one is quite delicious!

Andy M
Andy M

Now why would they be sued for that? It's funny, sure, and embarrassing to the organization, but something to sue over? I don't think so.

sboverie
sboverie

I liked the one that goes "What is it on a man that is round and hard and sticks so far out of his pajamas that you can hang a hat on it?" There is also another similar question that goes "What is a 4 letter word for a woman ending in the letters "UNT"?" Remember, the correct answer is appropriate for all ages.

chrisbedford
chrisbedford

And anyway no-one is really saying that American English is a different language from English English. But no-one can deny they *are* different, and we just say for ease of speaking that they're different "languages" whereas, yes, you are right, they are actually different dialects. No biggy, but if you were to take someone from, say, deeply rural West Virginia and put him down in "deeply rural" England, (not that there really is such a place) - say Somerset, I doubt he'd be able to understand more than a couple of words, or make himself understood either. Who said England and America are two countries "divided by a common tongue"? Sounds like GBS or one of his contemporaries. But I digress. I still maintain anyone who writes "their" when he means "they're" (or "you're" for "your", etc etc) is the victim of poor education, as a result of poor enunciation. How did we get onto this topic anyway?

GSG
GSG

I'm American and I speak the language "English", the same way the British speak it, and the Australians, etc... I do however, speak the American dialect of the language using colloquialisms from my region of the USA. Just becuase I spell some words differently, or use them differently (such as boot) doesn't mean that I'm speaking the language incorrectly. There are different regional colloquialisms in Britain, and that doesn't make them wrong either. It's just a natural progression and growth of language. The English language that you speak and write today, has very little resemblance to the English language that was spoken and written 500 years ago, and when the American Dialect split off from the English dialect all those years ago, it also changed in its own direction. Think of it as being similar to Darwin's theory of evolution, but applied to language instead of flora and fauna.

AnsuGisalas
AnsuGisalas

that most people think the things what come out of their mouths is sounding completely diffrently frum whut et rilly zounds laik. It really takes a scientific phonetic study to say anything about what people are actually saying... and, ok, that goes as much against my claim as against yours. A "corpus" is a set of texts or recordings, collected according to strict rules, to ensure that distortions are documentable. It has to be spontaneous speech, because people don't talk normally in an interview or recording situation, they invariably try to make their speech more distinct and more "correct" than it would normally be. It can easily be that you are correct, for the dialects you mean to refer to... but it would take some study to make certain of it.

chrisbedford
chrisbedford

Mostly because I have no idea what you are trying to say with "large Corpus of spontaneous speech of the dialects you want covered by your claim". Seriously. But as to "I've read writing from Englishmen poor enough to warrant hospitalization" - well exactly my point. Poor education. Please note I never said English *education* was any better than anyone else's - just that standard Enlish pronunciation would (or should) distinguish between... well, whatever those fancy words are that you used.

AnsuGisalas
AnsuGisalas

First of all, please give me a large Corpus of spontaneous speech of the dialects you want covered by your claim, so I can verify it. Secondly, I've read writing from Englishmen poor enough to warrant hospitalization (of me - it burnt my eyes), so do you have any evidence for your statement on statistical distribution? Lastly, if you say "they're" indistinctly, unstressed and hurriedly, I'd be very surprised to find a three-way glide in there. The triphtong turns into a diphtong under some circumstances, and when it does, it is the diphtong you can also find in pronounciations of "there" and "their".

chrisbedford
chrisbedford

...but there are no end of people who will argue that it is...

AnsuGisalas
AnsuGisalas

Who can thinking of deserting when there's chocolate cake to be had!

GSG
GSG

I realized that later. I'm such a goober. My mind was on the chocolate birthday cake my mom was making for my brother's birthday! Yummm.... chocolate cake....

AnsuGisalas
AnsuGisalas

to desert while in the desert is a bad idea... you'll be a buzzard's dessert :D I lifted the examples from wikipedia, though.

AnsuGisalas
AnsuGisalas

I know I wouldn't. And it doesn't mean that I think "homosexual" is an insult, it's simply that it's not fitting to make statements about a person's sexual orientation... if they do, they can be sued.