LNCS_CSCWD06_KC_Recommendation

1

Olympus: Personal Knowledge Recommendation using Agents, Ontologies and Web Mining

Juliana Lucas de Rezende1, Vinícios Batista Pereira1, Geraldo Xexéo1,2, Jano Mo-reira de Souza1,2

1COPPE/UFRJ – Graduate School of Computer Science 2DCC/IM - Institute of Mathematics

Federal University of Rio de Janeiro, PO Box 68.513, ZIP Code 21.945-970, Cidade Uni-versitária - Ilha do Fundão, Rio de Janeiro, RJ, Brazil

{juliana, vinicios, xexeo, jano}@cos.ufrj.br

Abstract. There are many initiatives in the scientific community to produce knowledge management and CSCW systems. However, it is difficult to pro-mote the easy information share among learners. In this paper we present Olympus, a multi-agent system to help learners to share not only what the in-formation content is, where the information is, and those who have the infor-mation needed by the learner, but also how to use the available knowledge. Olympus uses agent’s technology, ontologies and data mining to create knowledge chains in a semi-automatic way, which is a job that usually would take a lot of effort. The agent monitors the learner’s web navigation. From there, another agent classifies its content using an ontology, creates and rec-ommends a knowledge chain to the learner. As a sub-product of this work we have a knowledge base with classified web pages contents.

1 Introduction

The Internet became an important way to make information available to people who needs to acquire new knowledge faster and in a much greater volume than in the past. There are communities of practice which act as a method to complement teaching in the traditional classroom, to acquire knowledge in evolution [1], and to improve the learner’s performance [2]. They are called learning communities. One of the princi-ples of Wenger for cultivating communities of practice is the knowledge sharing to improve personal knowledge. Another issue related to making a successful communi-ty should be to assist the members in building up their personal knowledge. [3]

To complement the learning process, we considered a process to promote knowledge building, dissemination, and exchange in learning communities. The need of a number of individuals to work together raises problems in the CSCW domain. [4]

Knowledge design [5] is defined as a science of selecting, organizing and present-ing the knowledge in a huge knowledge space and in a proper way, so it can be sensed, digested and utilized by human beings efficiently and effectively. It aims to offer the right knowledge to the right person in the right manner at the right point of time. According to Xexeo [6], the design activity has been described as belonging to a

2

class of problems that have no optimal solution, only satisfactory ones. They are complex, usually interdisciplinary in nature and require a group of people to solve it. Designing knowledge is similar in principle to designing computer software. It takes time, careful thought and creativity to do it well. The biggest difference is that you cannot just load the knowledge into someone's brain like you can do with software in a computer; you need an implementation procedure to build the knowledge in the learner's mind. [5]

1.1. Motivation

To complement the learning process, a system has been developed to promote knowledge building, dissemination, and exchange in learning communities. This system is called the Knowledge Chains Editor (KCE), and is based on a process for building personal knowledge through the exchange of knowledge chains (KCs) [1]. It is implemented over COPPEER1. The process differential is the addition of “how to use” the available knowledge to “authors” (who), “localization” (where), and “con-tent” (what), which are commonly used.

The KC is a structure created to organize knowledge structure and organization. A KC is made up of a header (which contains basic information related to the chain) and a knowledge unit (KU) list. Fig 1.a presents an example where Class is prerequisite of Inheritage, so Overriding is Inheritage successor. The other way to organize knowledge is by composition. When a KU is formed by composition of other KUs, it can be represented like in Fig 1.b. In this example, Class is composed by Attribute and Method.

Fig. 1. Knowledge Organization

Conceptually, knowledge can be decomposed into smaller units of knowledge (recur-sive decomposition). For the sake of simplification, it was considered that there is a basic unit which can be represented as a KU (structure formed by a set of attributes).

To build his KC, the learner can use the KCE. In the case of questioning, he must create a KU whose state is “question”. At this moment, the system starts the search. It sends messages to other peers and waits for an answer. Each peer performs an internal search. This search consists of verifying if there are any KUs similar to the one in the search. All KUs found are returned to the requesting part. [Fig 2]

1 COPPEER [6] is a framework for creating very flexible collaborative peer-to-peer (P2P)

applications. It provides non-specific collaboration tools as plug-ins.

3

Fig. 2. KCE Architecture

The creation of a KU of type ‘question’ is obviously motivated by the learner’s need to obtain that knowledge. So far, we have considered the existence of two motivating factors for the creation of available KCs. The first would be a matter of recognition by the communities, since each KU created has a registered author. The second would be the case where the professor makes them available “as a job”, with the intention of guiding his students’ studies.

However, we were aware that the learner needs more motivation to create new KCs. In the attempt to solve this problem, in this work we present a proposal to im-prove the creation of new KCs, which is a system called Olympus. The Olympus’s main goal is to recommend potential KCs that can be accepted, modified or even discarded by the learner. These KCs will be created from the data collected by moni-toring (carried out by a software agent2) learner navigation. Olympus had been devel-oped based on the proposal presented in [8].

The remainder of this paper is organized as follows. The main concepts of web mining and ontologies are presented in the next two sections. Section 4 presents the proposed idea and the prototype developed. Conclusions are given in section 5.

2 Collaborative Learning Ontologies

Ontology is a formal specification of concepts and their relationships. By defining a common vocabulary, ontologies reduce concept definition mistakes, allowing shared understanding, improved communications, and a more detailed description of re-sources [9].

According to Guarino [10] the ontologies can be categorized in 4 types: top-level, domain, task and application. Top-level ontologies describe very general concepts like space, time, object, etc., which are independent of a particular problem or domain. Domain ontologies and task ontologies describe, respectively, the vocabulary related to a generic domain (like medicine or automobiles) or a generic task or activity (like diagnosing or selling), by specializing the terms introduced in the top-level ontology. Application ontologies describe concepts depending both on a particular domain and task, which are often specializations of the related ontologies.

2 A Software Agent [7] can be defined as a complex object with attitude.

4

A more generic ontology can become easily, more specific in accordance with the necessity. However, to transform a specific ontology into a more generic one can be a difficult task. In this work we first created a domain ontology and, from this one, we created a more specific ontology which was more appropriate to our needs.

The prototype developed has been instantiated to the Java learning community, and the first ontology created was a domain ontology which describes the object oriented (OO) language concepts. After this, specific properties were added to the created ontology to incorporate thesaurus functionalities. In this way the software agent can search in the ontology for words found in the text and correlate web pages with ontol-ogy concepts, transforming the domain ontology.

All classes that symbolize concepts from an OO language inherit of a superclass called “Concept”. In our case, this superclass contains a property named keyword, which is used on the page classification, and if we need to add new properties related to the classification it is enough to make it in the Concept class. To transform the new ontology in a domain ontology it is enough to remove the Concept class.

The OO language ontology was instantiated to Java to be used as a specific base of knowledge by the application. With the concepts and relations instantiated, it’s possi-ble to compare the keywords found in the page mining process with the ontology keywords. The attribution of weights to the page keywords makes possible the proba-bilistic classification of the page according to the ontology concept.

The relationship between the ontology concepts can be used to support decisions about the concept represented by a page. When the page has the occurrence of key-words that are concepts related to the same concept, the page can be classified as a representation of the common concept.

For example, in Fig 3 we have an ontology that has the concept Package related to the concept Class, and Package java.util related to Class, Vector and HashTable. If the page has keywords, with the same weight, referring to the java classes Vector and HashTable, the system can consider that both are related to Package java.util and can classify the page as a reference to Package.

Fig. 3. Example of an Ontology

The collaborative learning ontology [11] is the system of concepts for modeling the collaborative learning process, such as ‘learning goal’, ‘learning group type’, and ‘learning scenario’. When the ontologies are in use, they are usually arranged in three layers. The top layer is the negotiation level that corresponds to negotiation ontology. The intermediate layer corresponds to the collaborative learning ontology. Here, only important abstracts for negotiation from agent level remain as the necessary scope of information at an abstract level. The negotiation level is the level that represents the important information for negotiation at an abstract level. The bottom layer is the agent level that corresponds to the individual learning ontology.

5

This work contemplates only the two lower layers of a collaborative learning on-tology, as it captures the learner’s personal learning process, which supports the low-est layer; and allows the exchange of learning processes, creating the necessary in-formation for the highest layer.

3 Web and Text Mining

In a simplified way, we can say that web mining can be used to specify the path taken by the user while he is navigating on the web (Web Usage Mining) and to classify the navigated pages (Web Content Mining). [12, 13] However, there is a problem that cannot be solved only using web mining, and this is the difficulty in calculating the information hierarchy. This problem can be solved with the use of ontologies.

In addition to the availability of little (if any) structure in the text, there are other reasons why text mining is so difficult. The concepts found in a text are usually rather abstract and can hardly be modeled by using conventional knowledge representation structures. Furthermore, the occurrence of synonyms (different words with the same meaning) and homonyms (words with the same spelling but with distinct meanings) makes it difficult to detect valid relationships between different parts of the text. [14]

3.1 User Web Navigation

We make use of web usage mining when the data is related to the user navigation, this means, when we store and analyze the order of the navigation pages, the visit length for each page, and the exit page. This information will be important for verifying, respectively, what the order of the navigated concepts is, after page classification; and which pages are relevant when the user doesn’t follow the structure of a site and goes to a new site on the same subject, or stops studying the subject. [12]

3.2 Page Content Analysis and Classification

Once the relevant pages are selected using web usage mining, the web content mining can be used to analyze and to classify the page content. [12] In this kind of mining the input data is the HTML code of the page and the output data is one or more possibili-ties of classification of the page in accordance with the considered ontology.

In order to simplify the page classification we used an automatic summarization technique (AST) that extracts the most relevant sentences from the page. [14] First, the AST applies several preprocessing methods to the input page, namely case fold-ing, stemming and removal of stop words. The next step is to separate the sentences. The end of a sentence can be defined as a “.” (full stop), an “!” (exclamation mark), a “?” (question mark), etc. In HTML texts, we can also consider tags of the language.

Once all the sentences of the page were identified, it is necessary to give a "weight" to each remaining word based on its HTML tag [Table 1] and to compute the value of a TF-ISF (term frequency – inverse sentence frequency) measure for each word. For each sentence s, the average TF-ISF weight of the sentence, denoted Avg-

6

TF-ISF(s) is computed by calculating the arithmetic average of the TF-ISF(w,s) weight over all the words w in the sentence. Sentences with high values of TF-ISF are considered relevant.

Once the value of the Avg-TF-ISF(s) measure is computed for each sentence s, the final step is to select the most relevant sentences, i.e. the ones with the largest values of the Avg-TF-ISF(s) measure. In the current version of our system this is done as follows. The system finds the sentence with the largest Avg-TF-ISF(s) value, called the Max-Avg-TF-ISF value. The user specifies a threshold on the percentage of this value, denoted percentage-threshold.

Sentences with high values of TF-ISF are selected to produce a summary of the source text. According to Larocca [14] this technique has been evaluated on real-world documents, and the results are satisfactory.

4 Personal Knowledge Chains Semi-Automatic Building

The main goal of this work is to automatically build knowledge chains to be recom-mended to the learners. As has been previously stated, the learner can accept, modify or even discard these KCs. For this to be possible, the proposal is to extend the Knowledge Chains Editor (KCE) [1], to automatically build personal KCs.

In order for this to occur, we need an ontology of the considered domain. The goal is to determine the sub-groups of navigated concepts (concepts found in the navigated pages), and relate them to the pages.

The software agent called Argus observes the learner’s navigation through web pages. It sends the web pages to the agent Hera, which stores the page content and the time spent on each page. [Fig 4]

Fig. 4. Olympus Architecture After this, the agent called Hermes has the responsibility for mining the navigation and the page content to determine the sub-group of ontology navigated concepts relat-ed to the navigated pages and to create a graph from this. With all this information, Hermes can build a potential KC that will be recommended to the learner. [Fig 4]

It is necessary to point out that the new KC will be recommended to the same learner that is navigating on the web. He/She will decide if he/she wants to add (or not) the recommended KC to his/her personal knowledge. From this point onwards, if

7

the learner accepts the KC, it can be exchanged between the community members using the KCE.

4.1 Olympus - Knowledge Chains Recommendation System

The Argus agent is responsible for observing the learner’s navigation and sending the navigated page content, the visited length and the times that it has been accessed by the Hera agent, which store all information. In this first stage, the agents only create a database of web pages and access information (Web Usage Mining). At a later stage, with a frequency determined by the user, the Hermes agent will select, from the stored pages, the pages that are related to the subject discussed by the community (The sub-ject must be known because it is necessary to have an ontology on it in the communi-ty). This will be made by comparing the content of the web page with a set of key-words (ontology concepts) related to the subject in question. In this way, the stored pages are filtered, with only the ones that are in fact of interest to the community remaining. This also solves any problems related to user privacy, since those pages that are not related to a community subject are discarded.

With this set of stored pages, the system has a guided graph, because the naviga-tion order has been stored. As the system goal is to make a KC with the concepts studied by the learner, it is necessary to use text mining techniques to classify the pages in accordance with the described concepts of the ontology. This classification is based on the proposals of Desmontils [15] and Jacquin [16]. However, instead of using a thesaurus with an ontology, we have improved our ontology by adding, in all concepts, a vector of attributes with the keywords related to the concept. Thus, we can do the mining and the classification only using the ontology.

At this time, the system needs to remove all the stop words from the text on a page. Then it is necessary to give a "weight" to each remaining word, based on its HTML tag. The weights are given in accordance with the values given in Table 1.

Table 1. Higher coefficients associated with HTML markers [15, 16]

HTML marker description HTML marker Weight Document Title <title></title> 10 Keyword <meta name=”keywords”… content=…> 9 Hyper-link <a href=…></a> 8 Heading level 1 <h1></h1> 3 Bold font <b></b> 2 … … …

Once the frequency and the weight of the keywords on a page are compared with the ontology concepts, the page receives degrees of relevance. With this relationship between pages and ontology concepts, the graph of pages can be transformed into a knowledge chain. This KC will be recommended to the learner, and he/she can decide what to do with it.

As there are many software agents “working” for the learners, a lot of KCs will be created. Therefore, it is possible to identify absent concepts in the navigation of one

8

learner that have already been studied by another, and recommend KUs, concepts, pages and even the users who know the concepts the learner doesn’t know.

4.2 Example

The following example shows how a KC is created from the learner’s navigation through web pages. Fig 5 shows the web pages navigated by the learner and Fig 6 shows the community ontology (arrows represent a non hierarchical relationship).

Fig. 5. Web page navigation Fig. 6.Community Ontology

In the first stage, web mining will be performed, and according to the keywords found on the web page, it may match partially with one concept from ontology and partially with another. In this case there is a relevance degree for each concept relating to the page. Therefore, for each page the result is:

After relating the web pages to the most relevant ontology concepts, the software agent will create a learning path in the ontology, which is a learning ontology

and the creation of the KU is initiated, mapping the web pages on the learning ontology.

At this time the KUs are created using the learning ontology and all the information on the learner’s navigation through web pages. In this example, it is necessary to study “attribute”, then study “class”, to study “object”.

As has been said before, a KU is a structure formed by an attribute set. These attrib-utes are grouped into categories: General (name, description, keywords, author, crea-tion date, last use date), Life Cycle (history, current state, contributors), Rights (intel-lectual property rights, conditions of use), Relation (the relationship between knowledge resources), Classification (the KU in relation to a classification system) and Annotation (comments and evaluations of the KUs and their creators). Many of these attributes can be automatically filled, which facilitates the creation of new KCs.

9

5 Conclusions and Future Work

The growing number of learning communities which communicate online makes possible to exchange, and use chains of explicit knowledge as a strategy for creating personal knowledge. Today, we have the WWW (who, what, where) triad, where “who” is the people who have the knowledge, “what” is the knowledge itself, and “where” is its location - in our case, the peer in which it is located. Using knowledge chains, we hope to add “how to use” the available knowledge to the existing triad.

Apart from KCE, there are other tools that stimulate knowledge sharing in com-munities. These include WebWatcher [16], which is a search tool where the learner specifies his/her interests and receives the related pages navigated by the other com-munity members. OntoShare [17] uses software agents which allow the user to share relevant pages. MILK [18] allows the communities to manage knowledge produced from metadata. The main difference between these tools and the KCE is that they are focused on sharing "where" and/or "with whom" the knowledge can be found. KCE adds the sharing of "what" and "how to use" this knowledge.

As has been previously stated, to motivate the learner in the creation of new KCs, we propose a personal knowledge recommendation system that uses software agents technology to monitor learner navigation; uses web mining to plot the path taken by the user while he is navigating on the web and to classify the navigated pages; and uses learning ontologies in addition to all the information collected for the creation of new KCs.

The experimental use of the extended KCE shows evidence that, when used by a learner to build a personal KC, the hypothesis that he/she creates more new KCs, that he/she will achieve a reduction in the time dedicated to studying a specific subject as well as gaining a more comprehensive knowledge of the subject studied has been confirmed. In order to evaluate whether the KCE’s goal has been reached, experi-ments aimed at obtaining qualitative and quantitative data that would make the verifi-cation of the hypothesis under consideration possible must be carried out.

It is necessary to emphasize that it is not the goal of this work to ensure that the learner has assimilated everything in his/her KCs. Our goal is to stimulate the creation of new KCs, so that the knowledge network can expand, and better assist the commu-nity members. This is a relevant point, because it’s very difficult to motivate users to share knowledge.

Due to the fact that this work is still in progress, many future projects are expected to take place. The most important are: improving the algorithm used to map the web page on the ontology nodes, and extending the monitored domain, considering any media manipulated by the learner, instead of only the navigated web pages.

Acknowledgement

This work was partially supported by CAPES and CNPq.

10

References

1. Rezende, J.L., da Silva, R.L.S, de Souza, J.M., Ramirez, M.: Building Personal Knowledge through Exchanging Knowledge Chains, Proc. of IADIS Int. Conf. on WBC. Algarve, Por-tugal, 2005, pp. 87-94.

2. Pawlowski, S., Robey, D., Raven A.: Supporting shared information systems: boundary objects, communities, and brokering, Proc. of the 21th Int. Conf. on Information Systems. Brisbane, Australia (2000) 329-338.

3. Tornaghi, A., Vivacqua, A., Souza, J.M.: Creating Educator Communities, Int. Journal Web Based Communities. Grã-Bretanha, (2005) 001 – 015.

4. Rezende, J.L., de Souza, J.F., de Souza, J.M.: Peer - to - Peer Collaborative Integration of Dynamic Ontologies, 9th Int. Conf. on CSCWD, Coventry, UK (2005).

5. Leitch, M.: Human Knowledge Design by undergraduate project, February 1986. Published on the web, 31 July 2002.

6. Xexeo, G., et al: COE: A Collaborative Ontology Editor Based on a Peer-to-Peer Frame-work, Int. Journal of Advanced Engineering Informatics, Germany (2004).

7. Bradshaw, J.M.: An Introduction to Software Agents, J. M. Bradshaw Ed., Software Agents, MIT Press (1997).

8. Rezende, J.L., Pereira, V.B., Xexeo, G., de Souza, J.M.: Building a Personal Knowledge Recommendation System using Agents, Learning Ontologies and Web Mining, 10th Int. Conf. on CSCWD, Nanjing, China (2006).

9. Gruber, T.R.: Toward Principles for the Design of Ontologies Used for Knowledge Sharing, Int. Workshop on Formal Ontology (1993).

10. Guarino, N.: Formal Ontology in Information Systems, Proc. of FOIS’98, Trento, Italy, IOS Press, (1998).

11. Supnithi, T., Inaba, A., Ikeda, M., Toyoda, J., Mizoguchi, R.: Learning Goal Ontology Supported by Learning Theories for Opportunistic Group Formation, In Artificial Intelli-gence in Education S.P.Lajoie and M.Vivet (Eds.), IOS Press (1999).

12. Zaïane, O.R.: Web Mining: Concepts, Practices and Research, Conference Tutorial Notes, XIV SBBD, João Pessoa, Paraíba, Brazil, (2000).

13. Cooley, R., Mobasher, B., Srivastava, J.: Web Mining: Information and Pattern Discovery on the World Wide Web. Proc. of the 9th IEEE Int. Conf. on Tools with Artificial Intelli-gence, Newport Beach, CA, USA, (1997).

14. Neto, J.L., Santos, A.D., Kaestner, C.A.A., Freitas, A.A.: Document clustering and text summarization. In Proc. of the 4th Int. Conf. Practical Applications of Knowledge Discov-ery and Data Mining, London: The Practical Application Company (2000).

15. Desmontils, E., Jacquin, C.: Indexing a web site with a terminology oriented ontology. SWWS (2001) 549-565.

16. Joachims, T., Freitag, D., Mitchell, T.: Webwatcher: A tour guide for the world wide web, Proc. of the 15th IJCAI, Nagoya, Japan, (1997) 770-775.

17. Davies, J., Duke, A., Sure, Y.: OntoShare – A Knowledge Management Environment for Virtual Communities of Practice. Proc. of the Int. Conf. on Knowledge Capture (K-CAP03), Sanibel Island, Florida, USA (2003).

18. Agostini, A., et al.: Stimulating Knowledge Discovery and Sharing. In Int. ACM Conf. on Supporting Group Work, Sanibel, Florida. ACM Press (2003) 248-257.

LNCS_CSCWD06_KC_Recommendation

Documents

Transcript of LNCS_CSCWD06_KC_Recommendation