Dictionary Group:

Overview of progress and plans

K. Sinitsa & R. Mizoguchi

Abstract: This is a report of Dictionary Group(currently called Glossary group) activities through email discussions (ed-dictionary@listserv. readadp.com) during January 2nd and March 7th, 1997.  

Acknowledgement
Thanks for active participation and valuable comments are given to

Patrick Burnstad, Vladimir Goodkovsky, Greg Hume,
Tom Murray, Roy Rada and Jim Schoening.

 

Contents

Introduction
I. Overview of Progress

  1. Background
  2. Dictionaries on the Web
  3. Users and Use
  4. Content and Categorization
  5. Style
  6. Methodology
  7. Technical Support

II. Plans

  1. Requirements to the dictionary:
  2. Assumptions:
  3. Methodology:
  4. Suggested plan:


Introduction

  This report is based on intensive email discussions held within dictionary group. It could be considered as an attempt to analyze the results obtained so far and incorporate in a systematic way whenever possible a variety of useful ideas, comments and suggestions of group participants. The sections below contain also some guidelines for the next stage of dictionary design.

I. Overview of Progress

1. Background

At the first stage of the dictionary design the main task was to reveal a common terminology used within P1484 and to search for definitions of the corresponding terms. The main goal was to come up with a small prototype, which could be enlarged gradually.

  It was revealed meanwhile, that the research field covered by the Groups within P1484 corresponds to Computer-Based Learning (CBL) in general. Areas of Groups potential terminology within CBL is roughly represented on Fig.1 

Circles represent conceptual areas, several overlapped circles identify levels of details in area description. Groups (in rectangles) are placed approximately around their main concepts areas. Group terminology is partly overlapped, some specific areas are not covered yet, but a unified terminology area could be called either CBL or "what P1484 is dealing with".

  This fact implies, that potentially all terms within this field might be necessary for the groups, sooner or later. On the other hand, despite a large variety of Web-based Dictionaries on Computers, Communications, Internet, and the like, a specific terminology related to computer-based education is not covered yet. Some steps are made in this direction. In the forthcoming edition of IEEE Computer Dictionary a collection of technical terms, CBL itself and its modifications (CAI and the like) as well as some end-users-oriented terms are present.

  Some terms could be found in FOLDOC, as well as in other Computer Dictionaries, such as ANSI Vocabulary (X3K5), Software Engineering Dictionary, etc. (an examples could be found in ONELOOK Reference list ). Examples of small collections (about 50 entries) of specific terms in the field are Video-conferencing Dictionary and Computer-Based Training Dictionary.

  Therefore, the dictionary on CBL could be valuable not only for P1484 members, but to a community related to the "computers in education" field. Orientation to "external" (relatively to P1484) users should not make the dictionary less important for using inside the Group, as the final goal of the Group is to suggest standards which would be accepted by community.

  Having in mind growing domain terminology, a lack of fixed structure, wide use of vague concepts and variety of potential users and viewpoints, it is necessary to reach an agreement on basic rules of dictionary design first. For this purpose the following issues should be considered:

2. Dictionaries on the Web

  People are looking into dictionaries to get information, not for entertainment and enjoy. Nevertheless, taking into account some basic experience in dictionaries design as well as addressing users' needs and meeting expectations makes dictionary more impressive. Some Web-pages describing dictionaries characterize them as "large", "popular", "complete", "good", so it was necessary to find, what makes dictionary "nice" and "popular". Browsing Web dictionaries some attractive features as well as those which on the contrary that should be avoided are revealed.

  A dictionary should be attractive, both on cognitive and perceptual level. This is the first rule of motivation. You can easily find examples of informative but ugly dictionaries (i.e. NASA), or nice but surface level (CBT).

  A style of specialized dictionaries varies in three dimensions :

The choice of certain style should be driven by dictionary purpose and aim.

  Visualization and search mechanisms are also different.

  Finally, a dictionary is attractive if it is easy to use and understand, if information could be easily accessed and retrieved and one can get some additional hints, guidelines or directions of what could be searched for if he/she is not satisfied with a current explanation.

  Preliminary evaluation of on-line dictionaries shows, that the following questions could help evaluate a dictionary:

  So, let's continue preliminary discussion, looking at users and users groups.

3. Users and Use

  The primary dictionary users include P1484 Working Groups, namely:

  Task Ontology (P1484.9) group activity is tightly related to the one of Dictionary, but it could not be considered as "users" in a usual sense, rather as co-designers.

  Other groups interests shown in Fig. 1 could be roughly described as follows.

Learning Agreements terminology is somewhere on the boarder with the legal one and educational standards, dealing with requirements, goals & objectives, assessment and resources.

  Task Model considers real-world and abstract tasks, their relation to domain knowledge, resources, requirements, objectives, assessment and the like.   Session Management considers behavior and interaction, as well as knowledge to control tutoring process.

  Reference Model/Architecture bases it's terminology on general software concepts, and have a lot of common one with those, used by Authoring Tools, Communications between Tools and Instructional Agents and, to certain extent, Learner Model.

  Communications between Tools and Instructional Agents might use concepts shared with Authoring Tools and Reference Model/Architecture groups.   Authoring Tools cover the core terminology in the area, from learning objectives to domain knowledge and educational requirements. It is tightly connected with Learner Model and Reference Model/Architecture.

  Learner Model deals with a variety of learner characteristics, which are used by all groups, with an exception of Communications between Tools and Instructional Agents.

  Thus, the "upper level" (not too specific terminology) of Authoring Tools, Reference Model/Architecture and Learner Model forms the basis of the shared vocabulary in the field.

  Traditional CBL user groups, both end-users and producers, can be shown at the same picture of CBL concepts. One can see, that Reference Model/Architecture and Communications between Tools and Instructional Agents mirror an engineering viewpoint? whereas Authoring Tools - that of instructionists.

  What concepts are not represented explicitly, neither on the picture, nor in the group descriptions ? Firstly, some specific technology-based areas, such as distant learning, virtual reality, multimedia, computer-supported collaborative learning, etc. Secondly, AI-based methods and techniques which shifted CBL design from "page-turning" electronic books to intelligent systems. These concept groups should also be represented in the dictionary.

  Another important issue is :

  Diversity of background and aims of dictionary use implies that concepts defined within the dictionary should be represented at several levels of details. So, to provide a better understanding it is necessary to support additional visualization of relations between concepts included.

  Another support to this statement could be found in the fact that CBL, as the research field, is developing and reconstructing its concept base. So, the dictionary could also provide an explication of the field as a whole as well as explaining each term in isolation. Therefore, both term definitions and relations between terms are important and should be represented explicitly.

  The basic requirements to the dictionary could be formulated as follows:

  If not all the goals can be achieved simultaneously, certain priorities should be set.

4. Content and Categorization

  Dictionary building is traditionally understood as an activity related to collecting and arranging terms which are already known and recognized by the community. The discussion, if any, is mostly concentrated along preciseness of the definitions.

  In rapidly developing fields, such as CBL, a number of less-known (and less-accepted) terms, as well as the concepts that are not defined yet is significant, which often causes a necessity of terminological discussion.

  The main questions that should be answered about content are:

  Taking into account specifics of the domain and user categories, the following suggestions on dictionary content can be formulated:

  1. All the existing terms within CBL should be considered for inclusion into the dictionary and have higher priority.
  2. Definitions of existing terms which changed their initial meaning can include both a new meaning and initial one.
  3. Newly suggested terms should be related to recognized concepts and provide synonyms of other terms used with the same or similar meaning in the sources.
  4. An ambiguity of term use should be included into its definition, i.e. the definition should contain only essential features of the concept to provide a better platform for agreement.
  5. Terms for local P1484 use should not necessary be based on accepted concepts, though relation to such concepts is encouraged.
  6. Preference of certain terms to obsolete ones can be expressed by corresponding notes in the term entries and term references handling.
  7. Terms defined in the dictionary can belong to various levels of details; the relations between lower level and higher level concepts should be clearly represented.
  8. Terms defined in other, more general purpose dictionaries can be included in the dictionary, for instance, in the following cases:
    • term definition should be changed to more specific one,
    • term is not easy to find in other dictionaries,
    • knowledge of the definition is essential for correct understanding of the term defined in the dictionary,
    • term is used to define a class of concepts.
  9. Dictionary terms can belong to different part of speech: nouns, verbs, adjectives.

  The details of terms collection process are defined by the chosen methodology. The main include possible partitioning of the field into certain "categories" or sub-domains (according to P1484 groups, users tasks and activities, CBL systems structure, etc.) and rules of terms choice (statistical evaluation, publications search, expert evaluation, etc.).

5. Style

  Among variety of styles used in on-line dictionaries, vocabularies and thesaurus some common features could be traced in those, designed for certain science-related fields. As a starting point for CBL P1484 dictionary, a style guide of P610 Computer Dictionary Project was chosen. The use of this style guide allow to represent terms in a unified form and concentrate on term content and dictionary structure. After more careful consideration the dictionary entry style was enriched by some additional fields to describe term in a structured way, keeping the main principles:

  Enriched list of eligible fields for the term entry includes:

  The fields listed in bold are present in the most of structured dictionaries, including IEEE. The rest are included in academic dictionaries only. All the fields except definition are mandatory. If a term is a synonym or abbreviation, the reference (See) to the definition is included.

  The parent term should be included into a definition when possible. If it is not appropriate for some reason, terms corresponding higher level categories and included into the dictionary for the term under consideration should be explicitly mentioned.

  In the interim dictionary version term status should be defined ( e.g. internal_use (P1484, or Group),, candidate, etc. ), as well as the source of the definition when appropriate.

  Relations between terms will be incorporated into their descriptions, being mentioned within the appropriate fields. Besides this, the relations between terms should be explicitly depicted in the taxonomies or so-called term trees which are term hierarchies combined according to certain relation between corresponding concepts. The depth of trees or taxonomies should be chosen in a way to provide shallower and more specific view of the field.

6. Methodology

  Each dictionary project has its own methodology of terms collection. The choice of methodology is driven by project goals, subject domain features, available resources etc. Many suggestions as to term collection, domain partitioning, layering of the terms, discussed earlier in sections on scope, content, and style reflect a variety of approaches. Each has its own potential premises and drawbacks, which sometimes cannot be evaluated to the whole extent from the beginning.

  The main questions that should be answered for dictionary design are:

  1. how terms are collected:
    • how dictionary scope was defined;
    • is there initial partitioning or grouping of terms;
    • how terms are searched (statistically, terminology refinement or generalization, etc.)
  2. how they are considered :
    • in isolation or in a group of related terms to refine a definition.
  3. when term could be put into dictionary:
    • after initial approve of its definition, without definition or with alternative definitions.
  4. how close a prototype version should be to the final product.

In any case, it is important to take into account domain ontologies if they exist.

7. Technical Support

  Support tools are necessary to support the following stages of the product life cycle:

  To choose the appropriate tools some desirable features of the tools should be defined. The features listed below are superfluous in a sense, that certain characteristic is represented on several levels. At the same time, some additional useful features could be revealed during the preliminary tools evaluation.

  The following features are desirable for Discussion support tool:

  The desirable features for Dictionary design and version support tool include:

convenience for use within the chosen methodology for entries update, addition and deletion;

  In addition to (Web-based) discussion tool, a mailing list should be available.


II. Plans: A rough plan to the next step

1. Requirements to the dictionary

  1. A term dictionary in an academic field should play a role of explication of the research field as a whole as well as explaining each term isolatedly.
    -> Term trees/meta-dictionary
  2. Relations among terms and indication of the position where each term locates in the research field are equally important to term definitions. -> Rich description format
  3. Easy to use. -> (off-line) Rich indexes more than entry terms, Term tree representation, Pointers to the term trees from entry word (on-line) HTML, Support tools to retrieve, etc.
  4. Easy for various levels of people to understand. -> Definitions in multiple contexts for Developers, Learners, Teachers, etc.
  5. Easy to extend -> Basically dictionary is very modular, so extension could be additive. Term trees make it easy to extend the dictionary because they provide us with contexts
  6. Coping with large and diverse vocabulary -> Systematic methodology of development Support tools

2. Assumptions

  1. Types of users affect the context in which each term is defined, but the results are additive, that is, we easily add another definition in a different context to the existing ones.
  2. Agreement on the term description format is obtained.
  3. Final output is both off line and on line.

3. Methodology

3.1 Basic philosophies

  1. 80% agreement policy. -> Don't pursue complete agreement Minimum essential/least commitment description Leave multiple definitions
  2. Opportunistic approach -> Don't stick to one approach Say, Top-down/bottom-up mixed approach followed by concrete prototype building
  3. To employ email discussion tools (HyperNews, Domino, or Hypermail) and development support tools (RM is currently developing one)
  4. To keep partnership with Ontology group

3.2 A suggested plan

Scope: Terms in CBL: those all the study groups need

Tentative scope: Those which some of the groups such as Architecture, Authoring, and Learner model need.

[1] Term collection [Nagao, 1992]

  Terms are collected first according to the following procedures. The key idea here is to build term trees to obtain the contexts of term definition and to make a global structure of the target area explicit.

(A) Mixed approach

(1) Top-down approach

1.1 Term trees construction by consulting sources that should be taken into account to keep balance between old and new tendencies of CBL as well as existing ontologies. (Term trees construction method is described below.)

1.2 Enumerate terms under the guidance of the term trees obtained.

(2) Bottom-up approach

2.1 Candidate term solicitation from each study group

2.2 Enumeration of possible terms

2.3 Construction of abstraction hierarchies

(B) Adjustment  

(3) Adjust term trees and find the missing terms by investigating term trees.  

(C) Entry term determination

(4) Determine the entry terms. Pick up terms from each term trees from the top to bottom until specified number of terms, say, 500, are obtained. When we get thousands of terms, we should pay attention to keep the tree balanced.

(5) Terms not being picked up as entry terms from term trees can be kept as candidates of entry terms (or as index terms, if they might be used in the definitions of the corresponding entry terms(their parents or siblings), from which readers can learn what they mean.

[2] Term tree building [Nagao, 1992]

  1. Relations among terms are of network structure, not trees at all. But, for humans, tree representation is easier to understand the global structure.
  2. Don't expect complete term tree representation. Much compromise is necessary.
  3. The major relations are is-a and part-whole. But there are others.
  4. When it isn't easy to find appropriate attributes(slot names), attribute names can be neglected. It may be ugly, but there is no way of expressing such items smartly. To show inappropriate attributes is not good.
  5. One important criterion is not to build deep trees.
  6. Build two or three layers of trees: upper tree and lower trees.
  7. Accept overlaps among term trees to allow different views of term definitions.

[3] Term definition

  1. Term definition is mainly done after completion of term trees in the agreed format. But, some terms could be defined in parallel to term collection to see initial definitions, since term definition may stimulate us to notice unforeseen factors and structures.
  2. Pointers to term trees should be described.

An example dictionary

  A dictionary of terms in Computer science containing 4500 entry terms, 13,000 index terms and tens of term trees. 1,200 pages in total (Term description: 800 pages, Term trees: 100 pages, Index: 300 pages).

An example of term tree:

(Omitted) 


Reference

[Nagao, 1992] Nagao, M., Systematic organization of knowledge of a specific area in the form of dictionary, J. of Jap. Society of AI, Vol.7, No.2, pp.320-328, 1992. (in Japanese).