Wikidata:WikiProject Languages/Data model
This is a draft description of how items for languages are modelled in Wikidata.
Labels
[edit]Labels should not include words like "language" when it is used for disambiguation. Labels in Wikidata do not have to be unique and are not expected to be identical to the linked Wikipedia article names.
More information: Help:Label
Examples:
- German (Q188)
- In English, the label is "German" because language names do not normally include "language". People say things like "They're learning German", and not "They're learning German language". The English Wikipedia page is "German language" but this is necessary because page names have to be unique and "German" is a disambiguation page.
- In German, the label is "Deutsch" (German) for the same reason as English. The German Wikipedia page is "Deutsche Sprache" (German language) and "Deutsch" is a disambiguation page.
- In Japanese, the label is "ドイツ語" (literally: Germany language), because language names are typically formed by adding "語" (language) to a location. This is not disambiguation because without "語", it would mean something different.
- American Sign Language (Q14759)
- In English, the label is "American Sign Language" because "Sign Language" is normally part of the name of a sign language. People typically say things like "They're learning American Sign Language", and not "They're learning American Sign" or "They're learning American". Since it's part of the name, the words are capitalised.
Descriptions
[edit]Descriptions should help people identify the item and distinguish it from other similarly named ones. For languages, descriptions typically include the language family and where it is spoken.
Although languages are often closely connected to particular ethnic groups, it is generally not useful to include ethnic groups in the description. The language and the ethnic group often share the same name, and people are not likely to be familiar with the language but not the ethnic group, or vice versa.
Examples:
- Koro (Sino-Tibetan language spoken in India)
- Koro (Oceanic language spoken in Papua New Guinea)
- Koro (Oceanic language spoken in Vanuatu)
- Koro (Mande language spoken in Ivory Coast)
Basic statements
[edit]subclass of (P279) is used to indicate the next level up in a language family tree.
country (P17) is used to indicate the countries where a language is spoken. It does not imply any official status in that country.
indigenous to (P2341) is used to indicate the ethnic groups and the locations that a language is indigenous to.
ethnic group (P172), location (P276) and located in the administrative territorial entity (P131) are not used for this information.
native label (P1705) is used to store the native label (autonym) of the language. It is also sometimes used to add the name in other languages used by the speakers of the language (such as an official language of the country where it is spoken).
If the language isn't available in the list of languages, select mis
and add language of work or name (P407) as a qualifier.
To link to the category related to the language. Such categories comes often from Wiktionary projects, especially for minor languages.
Grammatical and phonetic behaviour
[edit]External identifiers
[edit]- ISO 639-1 code (P218)
- ISO 639-2 code (P219)
- ISO 639-3 code (P220)
- ISO 639-5 code (P1798)
- ISO 639-6 code (P221)
- IETF language tag (P305)
- Ethnologue.com language code (P1627)
- Linguist List code (P1232)
- Glottolog code (P1394)
- Linguasphere code (P1396)
- WALS lect code (P1466)
- WALS genus code (P1467)
- WALS family code (P1468)
- endangeredlanguages.com ID (P2192)
- UNESCO Atlas of the World's Languages in Danger ID (P2355)
Regional
[edit]- ABS ASCL 2011 code (P1251) (Australia)
- AUSTLANG code (P1252) (Australia)
- Guthrie code (P2161) (Africa)
- Statistics Indonesia language code (P2590) (Indonesia)
- Newguineaworld ID (P11696) (Papua New Guinea)
Other databases which don't yet have an external identifier property can be linked using described at URL (P973), e.g.
- Asian and African Sign Languages (AASL)
- Numeral Systems of the World's Languages
- Austronesian Basic Vocabulary Database (ABVD)
- Turkic Database at Elegant Lexicon
- ScriptSource
CLLD databases
[edit]- Austronesian Comparative Dictionary (ACD)
- AfBo: A world-wide survey of affix borrowing
- Atlas of Pidgin and Creole Language Structures (APiCS)
- Automated Similarity Judgment Program (ASJP)
- electronic World Atlas of Varieties of English (eWAVE)
- Grambank
- PHOIBLE
- Uralic Areal Typology Online (UraTyp)
Constructed languages
[edit]- Conlang Atlas of Language Structures (CALS)
- The Conlang Database
- FrathWiki
- In the Land of Invented Languages
- Linguifex
- Tolkien Gateway
|