Generated on 11/6/2009 at 12:44:24 with version v0.3.2 5 Mars 2009 11:12

*Statistics for entities and category structure:


Known entities:624658

Categories :Categories with 100 entries and more (all words)

:Categories with 100 entries and more (2 firsts words)

:Categories with 100 entries and more (1 firsts words)

:Clusterisation results



*Structure info of original file:/DATA_NLG/dewiki-20081011-pages-articles.xml

Stats
Total count : 67163150
Entries : 624648
Redir : 467735
Homonims : 193841


Score for the named entity classification task:

Label pers (52) Precision=0.830508474576271 Recall=0.942307692307692 FS=0.882882882882883 (49:59)
Label org (14) Precision=0.5625 Recall=0.642857142857143 FS=0.6 (9:16)
Label date (3) Precision=1 Recall=0.666666666666667 FS=0.8 (2:2)
Label place (42) Precision=0.760869565217391 Recall=0.833333333333333 FS=0.795454545454546 (35:46)
Label unk (50) Precision=0.848484848484849 Recall=0.56 FS=0.674698795180723 (28:33)
Label prod (26) Precision=0.645161290322581 Recall=0.769230769230769 FS=0.701754385964912 (20:31)

Page sets used for test = 209 (en fichier test 624648 : pris 187) FScore = 0.742465101580511


Stats for graphs and redirections
Amount of graph:624648
Redirect in graph=1779131