www.wikimeta.com

Account
Home - Search Metadata - Metadata description - Classification rules - Some Links - Publications - About us


Try also Wikimeta Semantic Labelling Tool based on NLGbAse

Browse metadata

ENFR ES
[Browse metadatas in English, French and Spanish (using keyword search)]

About structure of NLGbAse metadata


NLGbAse metadata can be downloaded from your user account. Metadata are generated from an edition of Wikipedia dump downloaded from here. This means that metadata are in perpetual evolution and are versionned (according to the NLGbAse production process) to include new concepts, writing forms and the structural modifications introduced in real time into Wikipedia.

Each set of metadata is made of two files, representing one Linguistic Edition (LE):

  • LE.data.csv
  • LE.tfidf-label.csv
 
Information about data files:
File LE.data.csv contain the lexical networks and class label. Files have the following structure (each csv record separated by \t have the following structure, LE is reference of linguistic edition):

 
internal_number  
class_label_(according to ester rules) : ie LOC.ADMI, PERS.HUM
name_key_in_reference_language   : ie Alabama
[name_0    name_x ] all available writing form in reference language
[name:le_0    name_le_x ] all writing forms in [le] linguistic édition (ie en: -english, de: -german etc )
Links to related entry point in the Linked Data Semantic Web Network : ie http://www.dbpedia.org/page/Alabama

 
Sample of record:
243767 LOC.ADMI   Alabama Alabama (U.S. state)    Alabama, United States  The Yellowhammer State  Alabam  Alabama (state) The Heart of Dixie      22nd State      Alabahmu        State of Alabama        US-AL   de:Alabama      Alabama (Bundesstaat)   fr:Alabama      État de l'Alabama       it:Alabama      es:Alabama      Alabama (estado)        Ala (homonym) dbpedia.org/Alabama
 
File LE.tfidf-label.csv contains the words and respectif tfidf weight. Files have the following structure (each csv record separated by \t have the following structure, LE is reference of linguistic edition):

Internal_number (same as LE.data.csv file)   name_key_in_reference_language  class_label_(according to ester rules)   [ word_0:tfidf_weight   word_n:tfidf_weight ]