User:John Cummings/wikidataimport guide
Data import guide
This guide has been created for anyone wishing to import data into Wikidata. You may also find these related resources helpful:
Data Import Hub |
Importing data into Wikidata requires many skills, however the process can be broken down into individual steps. This means that the Wikidata community can work together to import data. The prerequisite skills to get started importing data are:
The process of uploading data to Wikidata can be broken down into the following steps which can be broken down further into the following stages: Preparing the data requires minimal technical skills and importing data into Wikidata can be done by either requesting the data be imported by bot (highly recommended) or by importing it yourself (only for experts and not yet documented). | |
Step 1: Choose data to import
[edit]
Data imported into Wikidata must be:
If in doubt please ask about the dataset on the Partnerships and Data Imports discussion page. |
Step 2: Create a data import request
[edit]
Part A: Go to the Data Import Hub and follow instructions to create a new request. Part B: Add the table and subheading as outlined in the Instructions for data importers section of the Data Import Hub. Please do this even if you are going to import the data yourself, it allow others to help you and understand what you have done for future updates to the data. |
Step 3: Describe the dataset
[edit]Step 4: Import the data into spreadsheet
[edit]Step 5: Define the structure of the data within Wikidata
[edit]
This step is often the most difficult, however there are many knowledgeable people within the Wikidata community that will be able to work with you to accomplish this step on the Wikidata:Partnerships and data imports page. Part A: Look at the Wikidata glossary to understand the terms used in the following steps. Part B: Look at examples of potentially similar data within Wikidata to understand what structure is already used for items.
Part C: Outline the structure within Wikidata in the table on the dataset import page. The dataset will need to be broken down into which parts of the data will be items, properties and values and if any qualifiers are needed. Also any issues or notes about the data e.g if the data is complete or if the data is related to any other datasets. Add what work has been done to the and any work still to do e.g propose properties. If you need help with defining the structure of the dataset ask on the data imports talk page.
Part D: Create one or more example items with the data structured in the way described, these practical examples will show how the data will be structured within an item and surface any issues in implementing the proposed data structure. |
Step 6: Format the data to be imported
[edit]
Part A: Duplicate the Original dataset sheet within your spreadsheet and rename the copy Structured for Wikidata. Part B: Reorder your spreadsheet to use the following structure to make it easier for the people importing the data into Wikidata. A downloadable version of this format is available here. |
Unique ID | Name / Title | Description for Wikidata | Description for importing data | URL | More data 1 | More data 2 |
A set of numbers/letters/characters that uniquely identify items in your dataset. This allows us to create a map from your data set to the corresponding Wikidata items.
Data can be imported without this, but it is strongly recommended to create an ID system if you do not already have one as the import process becomes significantly easier (there are a range of other benefits too, such as increased discoverability of your content) NOTE: if the donating organisation does not have an ID system and cannot create one internally, the data importer will make up an id system at when they upload the data. The recommended format is FAKE_ID_$ (with $ representing a number) |
This is the name/title of each item that you have some data about.
For example, if you were donating data about people (dates of birth, occupation, place of death etc), then this column should show the name of each person in the data set. If you were donating data about a book, the title of each book would be shown. Note: if you have names of your items in multiple languages, include an additional column for each language |
A short description of the item from a few words up to a sentence. This will describe the item within Wikidata. Descriptions can be created by combining data fields within the dataset e.g For a dataset of Biosphere Reserves where data on the country and year of inscription was available, the description could be 'Biosphere reserve in Democratic Republic Of The Congo, designated in 1976.'. | A short description of the item from a few words up to a paragraph. This field can be the same as the Description for Wikidata field. This is not for importing into Wikidata - it's purpose is to help match items in your dataset with Wikidata items unambiguaously.
For example, the description would help us distinguish two people of the same name by providing some extra info about their lives (e.g. occupation and date of birth).Note: This column is not essential if you are providing data in other columns that can be used to disambiguate. For example, if 'occupation' and 'country of citizenship' are given in other columns, this would usually be enough to identify a person uniquely (along with their name of course). |
If applicable, you should included a URL to a page about on your website.
For example, a digital collection of a museum would have a page on their site for each item in the collection. NOTE: If your website has a URL pattern for getting to an item's page from the unique ID number, then you can just provide us with one example (e.g. www.example.com/collection/12345) - obviously we also need the unique IDs given in column A to make use of the pattern. |
Any other data about an item that you would like to make avaialable for import into Wikidata.
This heading of this column might be "date of birth", "population", "area in square meters", "occupation", "height", "colour", or any other meanignful type of date that you have for some or all of the items in the data set |
You can add as many additional columns as you like for additional points of data. |
As an example here is a small section of the spreadsheet structure used to import data from the UNESCO Man and the Biosphere Programme.
Name of Site | Description | URL | Country / countries | Designation year | Year withdrawn | Midpoint Latitude | Midpoint Longitude | Total area of the newest data (ha) | Area of all core zones | Area of all buffer zones | Area of all transition zones |
Yangambi | Biosphere reserve in Democratic Republic Of The Congo | designated in 1976 | http://www.unesco.org/new/en/natural-sciences/environment/ecological-sciences/biosphere-reserves/africa/democratic-republic-of-the-congo/yangambi/ | DEMOCRATIC REPUBLIC OF THE CONGO | 1976 | 0.3333333333 | 24.5 | 220000 | 160000 | 60000 | ||
Luki | Biosphere reserve in Democratic Republic Of The Congo | designated in 1976 | http://www.unesco.org/new/en/natural-sciences/environment/ecological-sciences/biosphere-reserves/africa/democratic-republic-of-the-congo/luki/ | DEMOCRATIC REPUBLIC OF THE CONGO | 1976 | -5.633333333 | -13.18333333 | 32968 | 6816 | 5216 | 20936 | |
Touran | Biosphere reserve in Islamic Republic Of Iran | designated in 1976 | http://www.unesco.org/new/en/natural-sciences/environment/ecological-sciences/biosphere-reserves/asia-and-the-pacific/islamic-republic-of-iran/touran/ | ISLAMIC REPUBLIC OF IRAN | 1976 | 35.61 | 56.01 | 1459506.2 | 730599.3 | 635003.7 | 93903.2 | |
Miankaleh | Biosphere reserve in Islamic Republic Of Iran | designated in 1976 | http://www.unesco.org/new/en/natural-sciences/environment/ecological-sciences/biosphere-reserves/asia-and-the-pacific/islamic-republic-of-iran/miankaleh/ | ISLAMIC REPUBLIC OF IRAN | 1976 | 36.5 | 53.65 | 96678.5 | 24950 | 42038.5 | 29690 |
Step 7: Importing the data into Wikidata
[edit]
Option 1: Request data is imported by other people[edit]Step A: Request data be imported into Wikidata on the Wikidata bot request page. To make a request click on the Add a new request button and then link to your request on the Data Hub page. Step B: Check the Manual work needed section of the table once the import has started to see what work has to be done manually.
Option 2: Self import[edit]Step 1: Matching[edit]Before you can import data about any list of items, you will need to know the corresponding Wikidata Id numbers for each item in the list (essentially each row of your spreadsheet). You will also sometimes need to find Wikidata Ids for people, places, concepts or other things that are used to describe your main list of items.
Step 2: Adding data[edit]
|
Step 8: Check the data import
[edit]
Once the data has been imported into Wikidata request a query to ensure your data has been imported correctly at Wikidata:Request a query.This step ensures the data has been imported correctly and highlights any issues that may come about from importing data. A list of useful queries to check the data has been imported properly will be added here soon. |