Abstract: This is a report of Dictionary Group(currently called Glossary group) activities through email discussions (ed-dictionary@listserv. readadp.com) during January 2nd and March 7th, 1997.
Patrick Burnstad, Vladimir Goodkovsky, Greg Hume,
Tom Murray, Roy Rada and Jim Schoening.
Contents Introduction
I. Overview of Progress
- Background
- Dictionaries on the Web
- Users and Use
- Content and Categorization
- Style
- Methodology
- Technical Support
II. Plans
- Requirements to the dictionary:
- Assumptions:
- Methodology:
- Suggested plan:
This report is based on intensive email discussions held within dictionary group. It could be considered as an attempt to analyze the results obtained so far and incorporate in a systematic way whenever possible a variety of useful ideas, comments and suggestions of group participants. The sections below contain also some guidelines for the next stage of dictionary design.
At the first stage of the dictionary design the main task was to reveal a common terminology used within P1484 and to search for definitions of the corresponding terms. The main goal was to come up with a small prototype, which could be enlarged gradually.
It was revealed meanwhile, that the research field covered by the Groups within P1484 corresponds to Computer-Based Learning (CBL) in general. Areas of Groups potential terminology within CBL is roughly represented on Fig.1
Circles represent conceptual areas, several overlapped circles identify levels of details in area description. Groups (in rectangles) are placed approximately around their main concepts areas. Group terminology is partly overlapped, some specific areas are not covered yet, but a unified terminology area could be called either CBL or "what P1484 is dealing with".
This fact implies, that potentially all terms within this field might be necessary for the groups, sooner or later. On the other hand, despite a large variety of Web-based Dictionaries on Computers, Communications, Internet, and the like, a specific terminology related to computer-based education is not covered yet. Some steps are made in this direction. In the forthcoming edition of IEEE Computer Dictionary a collection of technical terms, CBL itself and its modifications (CAI and the like) as well as some end-users-oriented terms are present.
Some terms could be found in FOLDOC, as well as in other Computer Dictionaries, such as ANSI Vocabulary (X3K5), Software Engineering Dictionary, etc. (an examples could be found in ONELOOK Reference list ). Examples of small collections (about 50 entries) of specific terms in the field are Video-conferencing Dictionary and Computer-Based Training Dictionary.
Therefore, the dictionary on CBL could be valuable not only for P1484 members, but to a community related to the "computers in education" field. Orientation to "external" (relatively to P1484) users should not make the dictionary less important for using inside the Group, as the final goal of the Group is to suggest standards which would be accepted by community.
Having in mind growing domain terminology, a lack of fixed structure, wide use of vague concepts and variety of potential users and viewpoints, it is necessary to reach an agreement on basic rules of dictionary design first. For this purpose the following issues should be considered:
People are looking into dictionaries to get information, not for entertainment and enjoy. Nevertheless, taking into account some basic experience in dictionaries design as well as addressing users' needs and meeting expectations makes dictionary more impressive. Some Web-pages describing dictionaries characterize them as "large", "popular", "complete", "good", so it was necessary to find, what makes dictionary "nice" and "popular". Browsing Web dictionaries some attractive features as well as those which on the contrary that should be avoided are revealed.
A dictionary should be attractive, both on cognitive and perceptual level. This is the first rule of motivation. You can easily find examples of informative but ugly dictionaries (i.e. NASA), or nice but surface level (CBT).
A style of specialized dictionaries varies in three dimensions :
The choice of certain style should be driven by dictionary purpose and aim.
Visualization and search mechanisms are also different.
Finally, a dictionary is attractive if it is easy to use and understand, if information could be easily accessed and retrieved and one can get some additional hints, guidelines or directions of what could be searched for if he/she is not satisfied with a current explanation.
Preliminary evaluation of on-line dictionaries shows, that the following questions could help evaluate a dictionary:
So, let's continue preliminary discussion, looking at users and users groups.
The primary dictionary users include P1484 Working Groups, namely:
Task Ontology (P1484.9) group activity is tightly related to the one of Dictionary, but it could not be considered as "users" in a usual sense, rather as co-designers.
Other groups interests shown in Fig. 1 could be roughly described as follows.
Learning Agreements terminology is somewhere on the boarder with the legal one and educational standards, dealing with requirements, goals & objectives, assessment and resources.
Task Model considers real-world and abstract tasks, their relation to domain knowledge, resources, requirements, objectives, assessment and the like. Session Management considers behavior and interaction, as well as knowledge to control tutoring process.
Reference Model/Architecture bases it's terminology on general software concepts, and have a lot of common one with those, used by Authoring Tools, Communications between Tools and Instructional Agents and, to certain extent, Learner Model.
Communications between Tools and Instructional Agents might use concepts shared with Authoring Tools and Reference Model/Architecture groups. Authoring Tools cover the core terminology in the area, from learning objectives to domain knowledge and educational requirements. It is tightly connected with Learner Model and Reference Model/Architecture.
Learner Model deals with a variety of learner characteristics, which are used by all groups, with an exception of Communications between Tools and Instructional Agents.
Thus, the "upper level" (not too specific terminology) of Authoring Tools, Reference Model/Architecture and Learner Model forms the basis of the shared vocabulary in the field.
Traditional CBL user groups, both end-users and producers, can be shown at the same picture of CBL concepts. One can see, that Reference Model/Architecture and Communications between Tools and Instructional Agents mirror an engineering viewpoint? whereas Authoring Tools - that of instructionists.
What concepts are not represented explicitly, neither on the picture, nor in the group descriptions ? Firstly, some specific technology-based areas, such as distant learning, virtual reality, multimedia, computer-supported collaborative learning, etc. Secondly, AI-based methods and techniques which shifted CBL design from "page-turning" electronic books to intelligent systems. These concept groups should also be represented in the dictionary.
Another important issue is :
Diversity of background and aims of dictionary use implies that concepts defined within the dictionary should be represented at several levels of details. So, to provide a better understanding it is necessary to support additional visualization of relations between concepts included.
Another support to this statement could be found in the fact that CBL, as the research field, is developing and reconstructing its concept base. So, the dictionary could also provide an explication of the field as a whole as well as explaining each term in isolation. Therefore, both term definitions and relations between terms are important and should be represented explicitly.
The basic requirements to the dictionary could be formulated as follows:
If not all the goals can be achieved simultaneously, certain priorities should be set.
Dictionary building is traditionally understood as an activity related to collecting and arranging terms which are already known and recognized by the community. The discussion, if any, is mostly concentrated along preciseness of the definitions.
In rapidly developing fields, such as CBL, a number of less-known (and less-accepted) terms, as well as the concepts that are not defined yet is significant, which often causes a necessity of terminological discussion.
The main questions that should be answered about content are:
Taking into account specifics of the domain and user categories, the following suggestions on dictionary content can be formulated:
The details of terms collection process are defined by the chosen methodology. The main include possible partitioning of the field into certain "categories" or sub-domains (according to P1484 groups, users tasks and activities, CBL systems structure, etc.) and rules of terms choice (statistical evaluation, publications search, expert evaluation, etc.).
Among variety of styles used in on-line dictionaries, vocabularies and thesaurus some common features could be traced in those, designed for certain science-related fields. As a starting point for CBL P1484 dictionary, a style guide of P610 Computer Dictionary Project was chosen. The use of this style guide allow to represent terms in a unified form and concentrate on term content and dictionary structure. After more careful consideration the dictionary entry style was enriched by some additional fields to describe term in a structured way, keeping the main principles:
Enriched list of eligible fields for the term entry includes:
- abbreviation
- definition
- synonyms
- examples
- illustrations
- structure and/or attributes
- use or function
- references to related terms (See, See also, Contrast with)
- origin or genesis notes (incorporating essential additional information which does not match any of the suggested field).
The fields listed in bold are present in the most of structured dictionaries, including IEEE. The rest are included in academic dictionaries only. All the fields except definition are mandatory. If a term is a synonym or abbreviation, the reference (See) to the definition is included.
The parent term should be included into a definition when possible. If it is not appropriate for some reason, terms corresponding higher level categories and included into the dictionary for the term under consideration should be explicitly mentioned.
In the interim dictionary version term status should be defined ( e.g. internal_use (P1484, or Group),, candidate, etc. ), as well as the source of the definition when appropriate.
Relations between terms will be incorporated into their descriptions, being mentioned within the appropriate fields. Besides this, the relations between terms should be explicitly depicted in the taxonomies or so-called term trees which are term hierarchies combined according to certain relation between corresponding concepts. The depth of trees or taxonomies should be chosen in a way to provide shallower and more specific view of the field.
Each dictionary project has its own methodology of terms collection. The choice of methodology is driven by project goals, subject domain features, available resources etc. Many suggestions as to term collection, domain partitioning, layering of the terms, discussed earlier in sections on scope, content, and style reflect a variety of approaches. Each has its own potential premises and drawbacks, which sometimes cannot be evaluated to the whole extent from the beginning.
The main questions that should be answered for dictionary design are:
In any case, it is important to take into account domain ontologies if they exist.
Support tools are necessary to support the following stages of the product life cycle:
To choose the appropriate tools some desirable features of the tools should be defined. The features listed below are superfluous in a sense, that certain characteristic is represented on several levels. At the same time, some additional useful features could be revealed during the preliminary tools evaluation.
The following features are desirable for Discussion support tool:
The desirable features for Dictionary design and version support tool include:
convenience for use within the chosen methodology for entries update, addition and deletion;
In addition to (Web-based) discussion tool, a mailing list should be available.
Scope: Terms in CBL: those all the study groups need
Tentative scope: Those which some of the groups such as Architecture, Authoring, and Learner model need.
[1] Term collection [Nagao, 1992]
Terms are collected first according to the following procedures. The key idea here is to build term trees to obtain the contexts of term definition and to make a global structure of the target area explicit.
(A) Mixed approach
(1) Top-down approach
1.1 Term trees construction by consulting sources that should be taken into account to keep balance between old and new tendencies of CBL as well as existing ontologies. (Term trees construction method is described below.)
1.2 Enumerate terms under the guidance of the term trees obtained.
(2) Bottom-up approach
2.1 Candidate term solicitation from each study group
2.2 Enumeration of possible terms
2.3 Construction of abstraction hierarchies
(B) Adjustment
(3) Adjust term trees and find the missing terms by investigating term trees.
(C) Entry term determination
(4) Determine the entry terms. Pick up terms from each term trees from the top to bottom until specified number of terms, say, 500, are obtained. When we get thousands of terms, we should pay attention to keep the tree balanced.
(5) Terms not being picked up as entry terms from term trees can be kept as candidates of entry terms (or as index terms, if they might be used in the definitions of the corresponding entry terms(their parents or siblings), from which readers can learn what they mean.
[2] Term tree building [Nagao, 1992]
[3] Term definition
An example dictionary
A dictionary of terms in Computer science containing 4500 entry terms, 13,000 index terms and tens of term trees. 1,200 pages in total (Term description: 800 pages, Term trees: 100 pages, Index: 300 pages).
An example of term tree:
(Omitted)
[Nagao, 1992] Nagao, M., Systematic organization of knowledge of a specific area in the form of dictionary, J. of Jap. Society of AI, Vol.7, No.2, pp.320-328, 1992. (in Japanese).