linguist.link
Analysis for Announcing linguist.link: NLP insights for web pages
Time to read: ~3.51 minutes
Reading level: Professional
Most Surprising Words
- paywall
- hooray
- url
- standalone
- syllables
- retrieves
- ie
- gradient
- wrapper
- turing
Most Common Words
- the (used 37 times)
- a (used 35 times)
- to (used 33 times)
- (used 18 times)
- of (used 17 times)
- on (used 14 times)
- for (used 14 times)
- i (used 13 times)
- this (used 11 times)
- web (used 10 times)
Most Common Bigrams
- a web (used 7 times)
- on a (used 7 times)
- web page. (used 4 times)
Most Common Trigrams
- a web page. (used 3 times)
- on a web (used 3 times)
- corpus of text. (used 2 times)
Most Common Quadgrams
- a large corpus of (used 2 times)
- of new york times (used 2 times)
- new york times articles. (used 2 times)
Most Surprising Sentences
- ); Average reading time (calculates as the number of words in a post divided by 200); Reading score (calculated using the Flesch-Kincaid readability score); Most common words on the page; Most common bigrams (sequences of two words), trigrams (three words), and quadgrams (four words), and; All words on a page with a gradient background depending on how surprising they are.
- When a user requests analytics for a URL, linguist.link retrieves the web page, cleans the page to retrieve relevant words for analysis (a process referred to as stopword removal in NLP), then calculates the aforementioned statistics.
- The Python wrapper for readability.js, maintained by the Alan Turing Institute, meant I could use readability.js in a Python application, my chosen tech stack due to the robust NLP toolset available for the language.
Named Entities
- Web Mex Natural Language Processing NLP (Misc.)
- New (Organization)
- York Times NLP Fles (Misc.)
- ##ch (Organization)
- Kincaid (Person)
- U (Organization)
Surprisal
Citation
Use the following citation to reference this page in academic works.
linguist.link. “Analysis for https://jamesg.blog/2023/06/03/linguist-link/” February 15, 2024. https://linguist.link/?url=https://jamesg.blog/2023/06/03/linguist-link/.