NLGbAse metadata can be downloaded from your user account. Metadata are
generated from
an edition of Wikipedia dump downloaded from here. This means that
metadata are in perpetual evolution and are versionned (according to
the NLGbAse production process) to include new concepts, writing forms
and
the structural modifications introduced in real time into Wikipedia.
Each set of metadata is made of two files, representing one Linguistic Edition (LE):
Information about data files:
File LE.data.csv contain the lexical networks and class label. Files have the following structure (each csv record separated by \t have the following structure, LE is reference of linguistic edition):
Sample of record:
File LE.tfidf-label.csv contains the words and respectif tfidf weight. Files have the following structure (each csv record separated by \t have the following structure, LE is reference of linguistic edition):
Each set of metadata is made of two files, representing one Linguistic Edition (LE):
- LE.data.csv
- LE.tfidf-label.csv
Information about data files:
File LE.data.csv contain the lexical networks and class label. Files have the following structure (each csv record separated by \t have the following structure, LE is reference of linguistic edition):
| internal_number class_label_(according to ester rules) : ie LOC.ADMI, PERS.HUM name_key_in_reference_language : ie Alabama [name_0 name_x ] all available writing form in reference language [name:le_0 name_le_x ] all writing forms in [le] linguistic édition (ie en: -english, de: -german etc ) Links to related entry point in the Linked Data Semantic Web Network : ie http://www.dbpedia.org/page/Alabama |
Sample of record:
| 243767 LOC.ADMI Alabama Alabama (U.S. state) Alabama, United States The Yellowhammer State Alabam Alabama (state) The Heart of Dixie 22nd State Alabahmu State of Alabama US-AL de:Alabama Alabama (Bundesstaat) fr:Alabama État de l'Alabama it:Alabama es:Alabama Alabama (estado) Ala (homonym) dbpedia.org/Alabama |
File LE.tfidf-label.csv contains the words and respectif tfidf weight. Files have the following structure (each csv record separated by \t have the following structure, LE is reference of linguistic edition):
| Internal_number (same as LE.data.csv file) name_key_in_reference_language class_label_(according to ester rules) [ word_0:tfidf_weight word_n:tfidf_weight ] |
