Wikidata:WikiProject Languages/Data model

From Wikidata
Jump to navigation Jump to search

This is a draft description of how items for languages are modelled in Wikidata.

Labels

[edit]

Labels should not include words like "language" when it is used for disambiguation. Labels in Wikidata do not have to be unique and are not expected to be identical to the linked Wikipedia article names.

More information: Help:Label

Examples:

German (Q188)
In English, the label is "German" because language names do not normally include "language". People say things like "They're learning German", and not "They're learning German language". The English Wikipedia page is "German language" but this is necessary because page names have to be unique and "German" is a disambiguation page.
In German, the label is "Deutsch" (German) for the same reason as English. The German Wikipedia page is "Deutsche Sprache" (German language) and "Deutsch" is a disambiguation page.
In Japanese, the label is "ドイツ語" (literally: Germany language), because language names are typically formed by adding "語" (language) to a location. This is not disambiguation because without "語", it would mean something different.
American Sign Language (Q14759)
In English, the label is "American Sign Language" because "Sign Language" is normally part of the name of a sign language. People typically say things like "They're learning American Sign Language", and not "They're learning American Sign" or "They're learning American". Since it's part of the name, the words are capitalised.

Descriptions

[edit]

Descriptions should help people identify the item and distinguish it from other similarly named ones. For languages, descriptions typically include the language family and where it is spoken.

Although languages are often closely connected to particular ethnic groups, it is generally not useful to include ethnic groups in the description. The language and the ethnic group often share the same name, and people are not likely to be familiar with the language but not the ethnic group, or vice versa.

Examples:

  • Koro (Sino-Tibetan language spoken in India)
  • Koro (Oceanic language spoken in Papua New Guinea)
  • Koro (Oceanic language spoken in Vanuatu)
  • Koro (Mande language spoken in Ivory Coast)

Basic statements

[edit]

subclass of (P279) is used to indicate the next level up in a language family tree.

country (P17) is used to indicate the countries where a language is spoken. It does not imply any official status in that country.

indigenous to (P2341) is used to indicate the ethnic groups and the locations that a language is indigenous to.

ethnic group (P172), location (P276) and located in the administrative territorial entity (P131) are not used for this information.

native label (P1705) is used to store the native label (autonym) of the language. It is also sometimes used to add the name in other languages used by the speakers of the language (such as an official language of the country where it is spoken).

If the language isn't available in the list of languages, select mis and add language of work or name (P407) as a qualifier.

To link to the category related to the language. Such categories comes often from Wiktionary projects, especially for minor languages.

Grammatical and phonetic behaviour

[edit]

External identifiers

[edit]

Regional

[edit]

Other databases which don't yet have an external identifier property can be linked using described at URL (P973), e.g.

CLLD databases

[edit]

Constructed languages

[edit]