I am looking at the question: How many words are there in a language? I’d like to know for languages in general, comparatively, and for pedagogical reasons, in some well known western language which may as well be English.
What I found quite incidentally is a hornets nest of curmudgeonistic pedanticmaniacal jibberishosity. (There. Whatever the count was, it is now N+3)
(For more Falsehoods, click here. Also, listen to “Everything You Know is Sort of Wrong,” on Skeptically Speaking Talk Radio. )
First I want to explain why I was interested in this at all. There has for some years been discussion of the vastosity of language, and how impressive this vastosity is in relation to the ability of a child to enlearn it all. Various studies have shown that children of a certain age know (as in recognize) a waylot of words, a virtual spoorload of lexicon. When you do the maths, it turns out that children are learning some horrific number of words per day from the time they are yajabbering infants in order to reach that number by said age.
Indeed, it has been guestimated that the number of words in English is far far greater than the number of words we tend to think there is in English, and rugrats know way more of them than anyone has ever ponderified. The usual story goes like this: The English dictionary you can find with the mostest of words is probably Funk and Wagnall’s New Standard Unabridged, with just under a hemimillion (maybe 450,000) entries. When this list is adjusted to account for the fact that words are not really what they seem when they are listed in the dictionary, a sublist can be generated. If this list, about a quatromillion in size, is sampled one can make a test to see how many words a person, perhaps a rugrat, knows. Call the result the Lexiknowitall Quotient, if you will. Or, for simpleness sake, “L.” (I will not be using the variable “L” for the rest of this post, so there really was no reason to tell you that.)
Given this, a fully growd adult with a high school education knows about 45,000 words. A six year old knows 13,000. Do the maths. To get from zero to 13,000 a child has to learn one new word every two hours. Watch them. You can see them doing it.
Well, really, you can’t see it. Which is why this is all very interesting. Is it really happening, or is this just some fantasy of Steven Pinker, who would really prefer to think that the words are practically encoded in our genomes somehow. Perhaps, I imagine him thinking, we have a lexinome from which these words spring to be spoken in the context of our grammarome.
Anyway, if you go to Teh Google and ask it “how many words are there, huh?” you will get this one answer that is repeatedly plagiarized, and it is little more than curmudgeonistic pedantistery. In fact, I have identified it as a Falsehood of sorts. It goes something like this:
How can we tell how many words there are!!???? We can’t even tell what a word is!!!???11?? (That’s the falsehood part … that we can’t even tell what a word is.) And these are the reasons given that we can’t tell what a word is:
1) What IS a word? If “run” is a verb, is the noun “run” another word?
OMG. I can’t believe they start out with this one. Run to run and the run of a mill are utterly different things. r-u-n is a spelling, and ru-nh is a pronunciation. Run the verb and run the noun are two words, and there are many many things called “run” that are nouns. Each and every one of them is a different homophone, a different word. Duh.
2) What about inflected forms, like ran, runs, and stuff???11??
Ah … no … those are tenses and such. Not different words. And, in the study I mentioned above where the toddlers are learning a new word every several minutes, run, ran, runs etc. are NOT counted as different words. Or at least, that is the story as I have gorfed it.
3) Are compounds, such as man-child or man-eater or man-bites-dog different words?
Well, ok, there is a tiny bit of ambiguity here. Man-child, man-eater, and similar cases are clearly words. In English, this is easy to figure out. Take out the dash (or space). Does it still work? Then it’s a distinct word. One example given was “man-bites-dog.” That is not a word. It is a sentence where someone has put dashes in where the spaces are supposed to go. “Manbitesdog” is not a word. For the most part, the “compound” issue is as goosechasingness as one can get.
4) What is English, anyway? What about “veal” which comes from The French. Is that a word!!!??? Huh?!?!!??
More stupidosity. Yes, veal is a word in English. Jeesh. So is spaghetti. And pho. Give me a break.
5) What about obsolete words? Are they words!!!/???? Do you count them?
Well, no. they are words but we are looking for a lexicon, not a word list. If it’s on the line and still rarely used it’s in, otherwise it’s off the list. Obviously. Duh.
6) What about the names of chemicals and stuff?E?E?E?
Well, there you’ve got me. That is a little ambiguous. Yes, they are words, but no, since chemists have a systematic way of creating the words in advance and there are a lot of combinations of chemicals (even those with a low existostiy index) then we can’t count this any more than we’d count the arbitrary assignation of morphemes to, say, items on a Mexican Restaurant menu so we could create a word for every combination of taco, burrito, enchilada, quesadilla, etc. where a given meal can have from one up to six per plate. I totally ate ni a place with a menu like that in San Diego once. The menu itself was dozens of pages long, and only a summary of the actual theoretical menu. (“I’ll have the bitacoqadroburrito, please.”)
Either way it is an arbitrary non-lexicographic alinguistic expansion of a word list. Really, it is a verbose numbering system. Numbering systemds don’t count.
But yes, “busigagor” (the Magic School bus transformed into an alligator) is a word. Bugigator is … the word for the Magic School bus when it is an alligator. This is not hard.
(The above statements about the hardosity of counting words are cribbed from here and here. See also this, this, this, and this. The discussion of how many words there are is cribbed from The Language Instinct: How the Mind Creates Language (P.S.) and refers mainly to the research of Nagy and Richard Anderson.)
So how many words are there? Actually, it’s kind of irrelevant because words mean little more than what they mean, and meaning has only a vague association with the details of the lexicon, which gives the curmudgeons and pedants nightmares. Or would, if they noticed. I mean, really, did you have any trouble understanding the meanings of minifalsehood, curmudgeonistic, pedanticmaniacal, vastosity, jibberishosity, spoorload, yajabbering, waylot, ponderified, mostest, hemimillion, quatromillion, Lexiknowitall, existostiy, alinguistic, or hardosity?
No, I mentated negatorially.