SIGA:aSystemtoManageInformationRetrievalEvaluations · PDF...

1
P ÁGICO : Evaluating Wikipedia-based information retrieval in Portuguese SIGA: a System to Manage Information Retrieval Evaluations L UÍS C OSTA ,C LÁUDIA F REITAS ,C RISTINA M OTA ,D IANA S ANTOS AND A LBERTO S IMÕES http://www.linguateca.pt A CKNOWLEDGMENTS Linguateca has throughout the years been jointly funded by the Portuguese Gov- ernment, the European Union (FEDER and FSE), UMIC, FCCN and FCT. Págico was also supported by the Universities of Oslo, PUC-Rio, Coimbra and FCT grant SFRH/BPD/73011/2010. R EFERENCES Diana Santos, Nuno Cardoso, Paula Carvalho, Iustin Dornescu, Sven Hartrumpf, Johannes Leveling, and Yvonne Skalban. 2009. GikiP at GeoCLEF 2008: Joining GIR and QA forces for querying Wikipedia. In C. Peters, T. Deselaers, N. Ferro, J. Gonzalo, G. J.F.Jones, M. Kurimo, T. Mandl, A. Pe nas, and V. Petras, ed., Evaluating Systems for Multilingual and Multimodal Information Access 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17-19, 2008, Revised Selected Papers, Springer, pp. 894-905. Diana Santos and Luís Miguel Cabral. 2010. GikiCLEF : Expectations and lessons learned. In Carol Peters, Giorgio Di Nunzio, Mikko Kurimo, Thomas Mandl, Djamel Mostefa, Anselmo Peñas, and Giovanna Roda, ed., Multilingual Information Access Evaluation, VOL I, Springer, pp. 212-222. Diana Santos, Cristina Mota, Cláudia Freitas, and Luís Costa. 2012. Linguamática 4, number 1, special volume about Págico. M OTIVATION AND TASK Is it possible to develop better systems to answer realistic user needs, searching for answers to a particular topic in Wikipedia? Is Wikipedia in Portuguese good enough to provide information on lusophone topics? Can we learn from watching people trying to answer them? Is competition or cooperation between human and automatic participants worth indulging in? PT W IKIPEDIA IN 150 TOPICS Information needs related to Portuguese-speaking countries and their history, with enough coverage in Wikipedia, and not easily browsable through simple categories or infoboxes, span- ning areas from History (50) through Geography (26) and Music (19) to Mathematics (1) and Geology (2). How to assess the an- swers and their justifications was often quite difficult. P ÁGICO COLLECTION based on the 25 April 2011 wikipedia snapshot; converted to XHTML using: mwlib for the markup conversion; MediaWiki::DumpFile to control the snapshot parsing; in-house tools to manage macro expansion; Collection constitution: Page type Total docs Template pages 32 900 Disambiguation pages 5 006 Redirection pages 574 077 Multimedia pages 9 678 Article pages 856 005 SIGA Topic creation System run submission and testing Human participation interface Assessment interface Conflict resolution Pool browsing Scoring H UMANS VS .S YSTEMS 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Scores Pseudo-recall Precision ludIT_1 GLNISTT_1 João Miranda_1 Ângela Mota_1 RAPPORTAGICO_3 RAPPORTAGICO_2 RAPPORTAGICO_1 Bruno Nascimento_1 RENOIR_1 RENOIR_3 RENOIR_2 Average Correct & Justified Unjustified Invalid document Other Both Systems Human participants Automatic Evaluation 0.0 0.2 0.4 0.6 0.8 1.0 Letras Artes Geografia Cultura Política Desporto Ciência Economia Final score per subject 0 100 200 300 400 500 0 2 4 6 8 10 Humans Systems Letras Artes Geografia Cultura Política Desporto Ciência Economia Precision per subject 0.0 0.2 0.4 0.6 0.8 1.0 0 0.008 0.016 0.024 0.032 0.04 Humans Systems The most correct topics ID Topic Total Hum Sys H&S H 135 Aves de Angola 54 10 44 0 19 Tribos indígenas que vivem na Amazônia. 115 56 35 24 90 Filmes brasileiros premiados na categoria Montagem. 34 8 19 7 13 Dinossauros carnívoros que habitaram o Brasil. 23 6 12 5 S 19 Tribos indígenas que vivem na Amazônia. 115 56 35 24 62 Praias de Portugal boas para a prática de surf 30 5 6 19 7 Guitarristas portugueses que também foram compositores. 34 17 0 17 11 Filmes sobre o cangaço. 41 20 4 17 E VAL MEASURES Precision: P p,c = |C p,c | |R p,c | Pseudo-recall: α p,c = |C p,c | |C P agico |+|C aval | Pseudo-F-measure: φ p,c =2 × P p,c ×α p,c P p,c +α p,c Originality: O p,c = T i R p,c,i j o(r p,c,i,j ) Creativity: K p,c = T i R p,c,i j k (r p,c,i,j ) Final score: M p,j = |C p,c P c,j In addition to the measures used in GikiP and GikiCLEF, we chose to investigate originality and cre- ativity, by weighing differently answers according to the number of participants who found them. U SER BROWSER BEHAVIOUR 0 200 600 0 40 80 ludit Browsing order Time spent on topic 0 20 40 60 80 0 40 80 angelamota Browsing order Time spent on topic 0 10 30 50 2 4 6 8 miranda Browsing order Time spent on topic 0 5 10 15 20 25 30 2 6 10 Px120 Browsing order Time spent on topic 0 20 40 60 80 0 20 40 60 GLNISTT1 Browsing order Time spent on topic 0 50 100 150 200 0 10 20 30 GLNISTT2 Browsing order Time spent on topic 0 20 40 60 80 0 5 15 25 GLNISTT3 Browsing order Time spent on topic 0 10 30 50 0 10 30 GLNISTT4 Browsing order Time spent on topic 0 20 40 60 80 0 20 40 60 GLNISTT5 Browsing order Time spent on topic 0 5 10 15 20 25 0 10 20 30 GLNISTT6 Browsing order Time spent on topic 0 10 30 50 0 50 100 150 GLNISTT7 Browsing order Time spent on topic 0 20 40 60 0 20 40 60 GLNISTT8 Browsing order Time spent on topic C ARTOLA http://www.linguateca.pt/Cartola Págico answers pool Number of answers and justification documents Percentage of answers and justifications only in the PT wikipedia

Transcript of SIGA:aSystemtoManageInformationRetrievalEvaluations · PDF...

Page 1: SIGA:aSystemtoManageInformationRetrievalEvaluations · PDF fileSIGA:aSystemtoManageInformationRetrievalEvaluations LUÍS ... 90 Filmes brasileiros premiados na ... 7 Guitarristas portugueses

PÁGICO: Evaluating Wikipedia-based information retrieval in PortugueseSIGA: a System to Manage Information Retrieval Evaluations

LUÍS COSTA, CLÁUDIA FREITAS, CRISTINA MOTA, DIANA SANTOS AND ALBERTO SIMÕES

http://www.linguateca.pt

ACKNOWLEDGMENTSLinguateca has throughout the years been jointly funded by the Portuguese Gov-ernment, the European Union (FEDER and FSE), UMIC, FCCN and FCT. Págicowas also supported by the Universities of Oslo, PUC-Rio, Coimbra and FCT grantSFRH/BPD/73011/2010.

REFERENCESDiana Santos, Nuno Cardoso, Paula Carvalho, Iustin Dornescu, Sven Hartrumpf, Johannes Leveling, and Yvonne Skalban. 2009. GikiP at GeoCLEF 2008: Joining

GIR and QA forces for querying Wikipedia. In C. Peters, T. Deselaers, N. Ferro, J. Gonzalo, G. J.F.Jones, M. Kurimo, T. Mandl, A. Pe nas, and V. Petras, ed., EvaluatingSystems for Multilingual and Multimodal Information Access 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17-19, 2008, RevisedSelected Papers, Springer, pp. 894-905.

Diana Santos and Luís Miguel Cabral. 2010. GikiCLEF : Expectations and lessons learned. In Carol Peters, Giorgio Di Nunzio, Mikko Kurimo, Thomas Mandl,Djamel Mostefa, Anselmo Peñas, and Giovanna Roda, ed., Multilingual Information Access Evaluation, VOL I, Springer, pp. 212-222.

Diana Santos, Cristina Mota, Cláudia Freitas, and Luís Costa. 2012. Linguamática 4, number 1, special volume about Págico.

MOTIVATION AND TASKIs it possible to develop better systems to answer realistic user

needs, searching for answers to a particular topic in Wikipedia? IsWikipedia in Portuguese good enough to provide information onlusophone topics? Can we learn from watching people trying toanswer them? Is competition or cooperation between human andautomatic participants worth indulging in?

PT WIKIPEDIA IN 150 TOPICSInformation needs related to Portuguese-speaking countries

and their history, with enough coverage in Wikipedia, and noteasily browsable through simple categories or infoboxes, span-ning areas from History (50) through Geography (26) and Music(19) to Mathematics (1) and Geology (2). How to assess the an-swers and their justifications was often quite difficult.

PÁGICO COLLECTIONbased on the 25 April 2011 wikipedia snapshot;converted to XHTML using:mwlib for the markup conversion;MediaWiki::DumpFile to control the snapshot parsing;in-house tools to manage macro expansion;

Collection constitution:

Page type Total docsTemplate pages 32 900Disambiguation pages 5 006Redirection pages 574 077Multimedia pages 9 678Article pages 856 005

SIGATopic creationSystem run submission and testingHuman participation interfaceAssessment interfaceConflict resolutionPool browsingScoring

HUMANS VS. SYSTEMS

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Scores

Pseudo-recall

Precision

ludIT_1GLNISTT_1João Miranda_1Ângela Mota_1RAPPORTAGICO_3RAPPORTAGICO_2RAPPORTAGICO_1Bruno Nascimento_1RENOIR_1RENOIR_3RENOIR_2Average

Correct & Justified Unjustified Invalid document Other

BothSystemsHuman participants

Automatic Evaluation

0.0

0.2

0.4

0.6

0.8

1.0

Letras Artes Geografia Cultura Política Desporto Ciência Economia

Final score per subject

0100

200

300

400

500

02

46

810

HumansSystems

Letras Artes Geografia Cultura Política Desporto Ciência Economia

Precision per subject

0.0

0.2

0.4

0.6

0.8

1.0

00.008

0.016

0.024

0.032

0.04

HumansSystems

The most correct topicsID Topic Total Hum Sys H & S

H

135 Aves de Angola 54 10 44 019 Tribos indígenas que vivem na Amazônia. 115 56 35 2490 Filmes brasileiros premiados na categoria Montagem. 34 8 19 713 Dinossauros carnívoros que habitaram o Brasil. 23 6 12 5

S

19 Tribos indígenas que vivem na Amazônia. 115 56 35 2462 Praias de Portugal boas para a prática de surf 30 5 6 19

7 Guitarristas portugueses que também foram compositores. 34 17 0 1711 Filmes sobre o cangaço. 41 20 4 17

EVAL MEASURES

Precision: Pp,c =|Cp,c||Rp,c|

Pseudo-recall:αp,c =

|Cp,c||CPagico|+|Caval|

Pseudo-F-measure:φp,c = 2× Pp,c×αp,c

Pp,c+αp,c

Originality:Op,c =

∑Ti

∑Rp,c,i

j o(rp,c,i,j)

Creativity:Kp,c =

∑Ti

∑Rp,c,i

j k(rp,c,i,j)

Final score: Mp,j = |Cp,c| × Pc,j

In addition to the measures usedin GikiP and GikiCLEF, we choseto investigate originality and cre-ativity, by weighing differentlyanswers according to the numberof participants who found them.

USER BROWSER BEHAVIOUR

0 200 600

040

80

ludit

Browsing order

Tim

e sp

ent o

n to

pic

0 20 40 60 80

040

80

angelamota

Browsing order

Tim

e sp

ent o

n to

pic

0 10 30 50

24

68

miranda

Browsing order

Tim

e sp

ent o

n to

pic

0 5 10 15 20 25 30

26

10

Px120

Browsing order

Tim

e sp

ent o

n to

pic

0 20 40 60 80

020

4060

GLNISTT1

Browsing order

Tim

e sp

ent o

n to

pic

0 50 100 150 200

010

2030

GLNISTT2

Browsing order

Tim

e sp

ent o

n to

pic

0 20 40 60 80

05

1525

GLNISTT3

Browsing order

Tim

e sp

ent o

n to

pic

0 10 30 50

010

30

GLNISTT4

Browsing order

Tim

e sp

ent o

n to

pic

0 20 40 60 80

020

4060

GLNISTT5

Browsing order

Tim

e sp

ent o

n to

pic

0 5 10 15 20 25

010

2030

GLNISTT6

Browsing order

Tim

e sp

ent o

n to

pic

0 10 30 50

050

100150

GLNISTT7

Browsing order

Tim

e sp

ent o

n to

pic

0 20 40 60

020

4060

GLNISTT8

Browsing order

Tim

e sp

ent o

n to

pic

CARTOLA http://www.linguateca.pt/Cartola

Págico answers poolNumber of answers and justification documents Percentage of answers and justifications

only in the PT wikipedia