Reading is one of the basic skills in learning, and often it represents one of the main skills required in distance learning courses. That being said, one of the most important characteristics of distance education is the construction and the design of the learning message offered to learners during a distance learning course (Agrusti & Vertecchi, 2007). The present article focuses on Italian core dictionary, examining some known algorithms and approaches used by Tullio De Mauro in the creation of the Italian core dictionary and adapting it to a larger, yet less reliable, context: The World Wide Web. This synopsis will present a brief summary of the approaches adopted in order to identify the data of interest, data collection, filtering and in the end an assessment of viability and reliability of the newly created dictionary (Web-based core dictionary). All this in order to keep up with the effects the rapid advances of technology, globalization and connectivity have, not only on our life style, but also on our spoken and written language on a daily manner (Downes, 2008). This research shows how little the useable data is, in analogy to bulks of Big Data collected from the internet, the need to be careful in the adoption of new words and the need to adopt new approaches regarding the creation of a web-based core dictionary, as well as the need to consider new elements to refine the end product.
Keywords: Distance education, web-based basic vocabulary, core dictionaries, web crawling, Big Data, Internet.