It's really easy: Once you open the site, choose the file you wish to evaluate or copy and paste it in the empty text box. Then press “START.” The text will then be presented on the screen, and results are displayed both by color and by percentage. Words in black are common words, words in orange are mid-frequency words, and words in red are jargon. The table on the right presents the number of words in the text and the results: the number of words and the percentage of words for each frequency (high, mid and jargon).
The program works with texts that are manually copied onto the website or uploaded Text(.txt)/ Word(.docx) files. It can evaluate texts of unlimited length; however, very long texts may take more time to evaluate than shorter texts.
The results can be used to evaluate a text or speech that communicates science before it is published or presented to non-experts. Results can also be used to compare a pre-post design, evaluating a text before and after a training workshop or course. Naturally, we would expect that the percentage of jargon in a text written following a science communication workshop would be lower than a text written before the workshop (Baram-Tsabari & Lewenstein, 2013; Rakedzon, Segev, & Baram-Tsabari, 2016; Rakedzon & Baram-Tsabari, 2016; Sharon & Baram-Tsabari, 2013).
Studies have shown that a reader needs to understand 98% of vocabulary in a text to adequately comprehend the content (Hu & Nation, 2000). According to the literature, the top 2000 high frequency word families (word families include a headword, e.g. develop, would in addition include all of the related forms, such as undeveloped, underdeveloped, development, developments, developer, and developers; high frequency words include words such as weak, eye, animal) cover on average about 85% of general spoken or written texts (Schmitt & Schmitt, 2014). Moreover, the literature has found that ideally readers, including second language readers, should be able to understand 98% of the words in a text. Therefore, the percentage for rare words should not exceed 2%. However, some researchers, discuss the option of a less stringent 95% (Laufer & Ravenhorst-Kalovski, 2010). In any case of jargon, the writer might consider replacing the jargon with other words, or adding an explanation.
Researchers have found that approximately 10% of texts contain mid-frequency words (e.g., laser, inject, protein); these words are defined as the group which contains words between high and low (rare) frequency which should be familiar to intermediate and advanced readers (Schmitt & Schmitt, 2014). In academic texts, research has shown different percentages: 5% technical vocabulary, 8-10% academic vocabulary (some overlap with mid-frequency), and 80% high frequency (Nation, 2001).
References:
Baram-Tsabari, A., & Lewenstein, B. V. (2013). An Instrument for Assessing Scientists’ Written Skills in Public Communication of Science. Science Communication, 35(1), 56–85. (link)
Hu, M., & Nation, I. S. P. (2000). Vocabulary density and reading comprehension. Reading in a Foreign Language, 23, 403–430. (link)
Laufer, B., & Ravenhorst-Kalovski, G. C. (2010). Lexical threshold revisited: Lexical text coverage, learners’ vocabulary size and reading comprehension, 22(1). (link)
Nation, I. (2006). How Large a Vocabulary is Needed For Reading and Listening? Canadian Modern Language Review, 63(1), 59–82. (link)
Nation, I. S. (2001). Learning Vocabulary in Another Language. New York: Cambridge University Press. (link)
Rakedzon, T., & Baram-Tsabari, A. (2016). Assessing and improving L2 graduate students’ popular science and academic writing in an academic writing course. Educational Psychology. (link)
Rakedzon, T., Segev, E., & Baram-Tsabari, A. (2016). An automatic jargon identifier for scientists engaging with the public and for science communication educators. Manuscript in Preparation.
Schmitt, N., & Schmitt, D. (2014). A reassessment of frequency and vocabulary size in L2 vocabulary teaching. Language Teaching, 47(4), 484–503. (link)
Sharon, A. J., & Baram-Tsabari, A. (2013). Measuring mumbo jumbo: A preliminary quantification of the use of jargon in science communication. Public Understanding of Science (Bristol, England), 23(5), 528–546. (link)
example texts:
1. A well-written popular text:
Below you can find a well-written text processed by the De-Jargonizer. The text is intended for general intermediate to advanced readers, followed by the score. The score shows common (high frequency - black), normal, (mid-frequency- orange) and rare words (jargon - red).
For each level, a percentage of the words (left) and total number of words (right), of the text are presented. This text uses only one unknown word, which readers could spot as a name, 10% normal (mid-frequency) and 90% common (high frequency) words.
2. A text requiring some adaptation for non-experts:
Author: Meital Ben-Ari
This text uses 3% jargon words, 13% normal (mid-frequency) and 84% common (high frequency) words. In this case, the writer should review the words in red and decide to either leave them, delete them or provide an explanation. For example, the writer in this case may leave withholds; they may delete proximal, which is not necessary to understand the text, and they may simplify functionally recapitulate to “repeat the functions [of adult heart cells].”