Jump to content

Grants:Project/0x010C/LinguaLibre

From Meta, a Wikimedia project coordination wiki


statusselected
Lingua Libre
summaryBring the Lingua Libre project to the next level by moving it on a Mediawiki/Wikibase instance.
targetWikimedia Commons ; Wiktionaries
amount30 600 EUR (35 990 USD)
grantee0x010C
advisorXenophônLyokoïYug
contact• wm(_AT_)0x010c.fr
this project needs...
volunteer
join
endorse
created on16:24, 20 July 2017 (UTC)


Project idea

[edit]

What is the problem you're trying to solve?

[edit]

What problem are you trying to solve by doing this project? This problem should be small enough that you expect it to be completely or mostly resolved by the end of this project. Remember to review the tutorial for tips on how to answer this question.

Pronunciation audio files are a big help for readers to learn how to pronounce a word, but recording them is very time-consuming for volunteers and need many skills. Pronunciation audio files is of interest for at least 4 Wikimedia projects: Commons (obviously, see WikiProject Pronunciation), Wiktionaries (Template:audio on en, Modèle:prononciation on fr,...), Wikipedias and Wikidata (P443). But only 3% of the Wiktionary entries have an audio record, and it's even less for Wikipedia articles or Wikidata items. The Lingua Libre project began two years ago in order to facilitate such recordings and encourage contributions in minority languages into Wikimedia projects. It is supported by a team of contributors having different backgrounds and with the help of Wikimedia France.

A lot of progress has been accomplished and the tool is today functional and able to realize massive open audio recordings in many languages, but it's effectiveness nowadays is still limited. While the tool by itself is very efficient – it can record up to 1000 words per hour – the process afterwards, transferring the files to Wikimedia Commons and integrating them into the Wikimedia projects, is quite lengthy. Main reasons are the not existing integration with the Wikimedia projects (users need create a separate account on a separate website, there's no automatic import process, etc), a French-only interface and a not easily searchable sound database. These elements are wasting a lot of volunteers' time and energy, loosing many volunteers on the way.

What is your solution to this problem?

[edit]

For the problem you identified in the previous section, briefly describe your how you would like to address this problem. We recognize that there are many ways to solve a problem. We’d like to understand why you chose this particular solution, and why you think it is worth pursuing. Remember to review the tutorial for tips on how to answer this question.

To bring Lingua Libre closer to the Wikimedia movement and thus facilitate and encourage contributions, we plan to move the tool to a wiki-based infrastructure. This solution seems very natural for several reasons, including:

  • it will be a well-known environment for experienced Wikimedians
  • it offers many solutions for the needs we have
  • it ensures the sustainability of the project (thanks to it's contributors, MediaWiki is a stable and constantly evolving platform), and the daily-management will be ensured by contributors themselves (like on Wikimedia projects)

To store rich and linguistically useful meta-data on each records (accent, dialect, gender of the speaker, native language or not, approximate location,...), we also plan to use the Wikibase extension allowing us to have an open, flexible and easy to explore database (those meta-data on files are not in the scope of Wikidata, and Structured Data is not willing to arrive on Commons before 2020). This will allow a much more convenient reuse of these freely licensed pronunciation record by Wikimedians or third parties, like linguists, language teachers or so on.

Beyond that, we will have to adapt our "recording studio" to work on this new infrastructure. We wish to transform it into a Mediawiki extension (something similar to the UploadWizard, but for direct sound recordings, a kind of RecordingWizard), so that other wikis using Mediawiki can benefit of our work.

Project goals

[edit]

What are your goals for this project? Your goals should describe the top two or three benefits that will come out of your project. These should be benefits to the Wikimedia projects or Wikimedia communities. They should not be benefits to you individually. Remember to review the tutorial for tips on how to answer this question.

  • Create a recording Mediawiki extension, which can be used by any other wiki
  • Facilitate mass recording, upload of files to Commons and their reuse on other Wikimedia project
  • Let the general public, Wkimedians or linguists research and visualize the pronunciations sounds, using a rich metadata system
  • Encourage people from small languages community to step into the dynamic of contribution in the Wikimedian universe

Project impact

[edit]

How will you know if you have met your goals?

[edit]

For each of your goals, we’d like you to answer the following questions:

  1. During your project, what will you do to achieve this goal? (These are your outputs.)
  2. Once your project is over, how will it continue to positively impact the Wikimedia community or projects? (These are your outcomes.)

For each of your answers, think about how you will capture this information. Will you capture it with a survey? With a story? Will you measure it with a number? Remember, if you plan to measure a number, you will need to set a numeric target in your proposal (i.e. 45 people, 10 articles, 100 scanned documents). Remember to review the tutorial for tips on how to answer this question.

  1. During your project, what will you do to achieve this goal? (These are your outputs.)
    Nowadays, Lingua Libre gather approximately 130 contributors and has been use to record around 10,000 words in 30 languages. We planed that one year after the start of the grant project (~4 month after it's end), we will have 400 contributors and 150,000 files recorded (doubling the number of records currently on the English Wiktionary) in 50 languages.
    Furthermore, at the end of the grant period, we will launch an online survey to capture the satisfaction of the contributors, asking them among others: "Do you find the pronunciation recording with Lingua Libre easy ?", hoping that at least 85% will agree with.
  2. Once your project is over, how will it continue to positively impact the Wikimedia community or projects? (These are your outcomes.)
    Allowing contribution to Wikimedia's project in any languages is the objective of Lingua Libre. By improving it, we want to increase participation in minority languages. The contributory approach through voice will allow new people to enter more easily into the Wikimedian universe.

Do you have any goals around participation or content?

[edit]

Are any of your goals related to increasing participation within the Wikimedia movement, or increasing/improving the content on Wikimedia projects? If so, we ask that you look through these three metrics, and include any that are relevant to your project. Please set a numeric target against the metrics, if applicable.

Briefsäckel (a post box in Alemannisch), recorded using the Lingua Libre tool.
  • Number of content pages created or improved:
    • 150 000 Pronounciation files uploaded to Commons
    • 70 000 Wiktionary entries improved with audio records

Project plan

[edit]

Activities

[edit]

Tell us how you'll carry out your project. What will you and other organizers spend your time doing? What will you have done at the end of your project? How will you follow-up with people that are involved with your project?


Lingua Libre actual recording studio in use
Develop a new recording Extension for MediaWiki

As Lingua Libre already has a recording tool (which we are very proud of), we will use it as a basis for this new MediaWiki extension, but a lot of development is still needed. We will set up a dedicated page containing a step-by-step procedure (a bit like the UploadWizard), letting the user choose the license, the recording method, the list of words/texts she/he will record and several meta-data information. The core of the tool will also be adapted to fulfil MediaWiki's upload process. The user interface will also require some adaptation to fit MediaWiki's environment and respect the guidelines (overhaul to use OOjs-ui, internationalization,...).

Migrate Lingua Libre on a wiki-based architecture

Setting up this wiki will require many developments in all corners. Using both OAuth for the authentication (for contributors to be able to use their Wikimedia account) and Wikibase (to store the meta-data) will require some configuration and adaptations for our use case. We will also need some specific on-wiki JS scripts to help people navigating and contributing (for example to generate random words lists to record in a given language, words lists containing the names of nearby places, override some Wikibase functionalities to be more easy to use in our case, generate on-the-fly statistics, etc.). To maintain a visual identity of our own, we will integrate a proper skin, based on mock-ups we already have (on top of an already existing skin, maybe timeless). Finally, we will have to import all the records and associated meta-data we have in our current database, so that previous work doesn't get lost.

Reuse on Wikimedia wikis

Once we have the infrastructure to perform our mass audio recordings, we still need to make Wikimedia projects benefit from it. Direct upload from the user to Commons is not possible, for two main reasons: Lingua Libre is an external tool (we so have Same-origin policy issues) and the browser audio recording API currently only produces .wav files (which are not allowed on Commons, and thus need a conversion). That's why the recorded files will have to pass through our server, before establishing an OAuth connection with Commons for the final upload. As we aim to provide the most easy-to-use workflow possible for contributors, this step will be transparent for them. We will also have to map the collected meta-data to fill descriptions and categories. For communities which will ask for it (currently the French Wiktionary, but ideally all the Wiktionaries and Wikidata, we'll see with them what they think of it), we will develop bot-tools that will pick up freshly uploaded files and put them on the corresponding entry (when it's relevant). With that, end-users can focus on speaking words, all the following annoying tasks will be done automatically.

Recording session during the French Wikiconvention 2016
Data exploration and visualization

Having a lot of useful meta-data and locking them up is useless. We want to make a focus on meta-data by setting up a dedicated SPARQL endpoint to let people create their own visualizations. We will also create a couple of turnkey queries (display the pronunciation of a given word depending of the location shown on a map,...) in a user-friendly interface for the general public.

Outreach

Outreach, communications and learning are the key points that will allow a small community to federate around this project. We will carry out several actions for this purpose:

  • Weekly reports by mail and on a wiki page will inform interested contributors of the progression of the project
  • We will keep village pumps of several wikis informed each time we succeed significant steps (something like 3 or 4 times during the grant period)
  • Specific communication will be made to invest small languages wikis a bit before the end when it starts to become really functional
  • Have talks and workshops during several (Wikimedia-related or not) events to promote and encourage contributing


Note that all the code written with the help of this grant will be available under the GPL license and accessible on a public git repository.

Budget

[edit]

How you will use the funds you are requesting? List bullet points for each expense. (You can create a table later if needed.) Don’t forget to include a total amount, and update this amount in the Probox at the top of your page too!

The project is scheduled to run 27 weeks (6 months and a half) from December 2017 to June 2018, only programming tasks as an independent developer are counted in the budget, nothing is requested for the project management/administrative part.

Description Days Total
(200 EUR/d)[1]
Develop a new recording Extension for MediaWiki (RecordWizard)
1 - Take back and adapt the existing Lingua Libre's recording studio 4 800
2 - Change the interface to use OOjs-ui 7 1400
3 - Turn it into a MediaWiki extension 14 2800
4 - Improve the recording studio with alternative recording methods 9 1800
5 - I18n 2 400
6 - Develop a guided tour (using the GuidedTour extension) to help newcomers 5 1000
Total 41 d 8200 €
Migrate Lingua Libre on a wiki-based architecture
7 - Setup of a dedicated MediaWiki and Wikibase instance, with the newly-created recording extension 3 600
8 - Use OAuth for the authentication 5 1000
9 - Define the RDF Schema 4 800
10 - Create a dedicated MediaWiki skin 12 2400
11 - Develop specific on-wiki JS scripts to facilitate the navigation and the modification of items 18 3600
12 - Initialize the wiki with all the necessary basic wikibase properties and items 2 400
13 - Import all our existing sound records in the new database 4 800
Total 48 d 9600 €
Reuse on Wikimedia wikis
14 - Setting up OAuth to allow uploading sounds to Commons 10 2000
15 - Develop bot-tools to add the uploaded sounds to articles on wikis that asked for (currently the French Wiktionary) 15 3000
16 - Contact other communities to extend the reuse of these sounds 4 800
Total 29 d 5800 €
Data exploration and visualization
17 - Setup a SPARQL endpoint 10 2000
18 - Create turnkey SPARQL queries 7 1400
Total 17 d 3400 €
Travelling expenses
19 - Presentation and workshops at Wikimania 2018 (two person) / 3000
20 - Presentations and workshops across France focused on local languages (Breton, Alemannic, Franco-Provençal) / 600
Total / 3600€
Total project 135 d 30 600 €
Total in EUR
30 600 €
Total in USD
$35 990[2]


  1. According to a specialized website [1], the average price of a junior fullstack developer in France is 287 euros per day. As this project represents several months of work, I ask for 200 euros per day (including 23% professional charges payable in France).
  2. on the basis of the exchange rate the 24 October 2017

Community engagement

[edit]

How will you let others in your community know about your project? Why are you targeting a specific audience? How will you engage the community you’re aiming to serve at various points during your project? Community input and participation helps make projects successful.

Lingua Libre already has a small community of Wikimedians and linguists involved. All these developments will be carried out in collaboration with them, in an iterative pattern. This will allow them to give feedback on a regular basis, and thus make them actively participate in reflections on the growth of the project.

Furthermore, we will inform some communities (Wiktionaries, Commons, etc.) on a regular basis on the major achievements done on this project to increase community awareness.

Interested volunteers may take the following roles:

  • Become a Lingua Libre ambassador in a specific community
  • Help translating the website in as many different languages as possible
  • Test the new features, report bugs, give feedbacks and make suggestions

Get involved

[edit]

Participants

[edit]

Please use this section to tell us more about who is working on this project. For each member of the team, please describe any project-related skills, experience, or other background you have that might help contribute to making this idea a success.

Role Name Notes
Grantee 0x010C,
a.k.a Antoine Lamielle
Wikimedian volunteer since 2014, sysop on the French Wikipedia, regular Commonist. He is a software engineer, working on many on-wiki technical stuffs (maintaining and developing bots, gadgets, templates, modules).
Advisor Nicolas Vion First developer of Lingua Libre, also known for being the core developer of the Shtooka project (which also deals with massive audio recording).
Advisor Lyokoï French Wiktionary admin, connoisseur of regional french languages. Co-founder of Lingua Libre, works on the project architecture and his promotion.
Advisor Yug Community coordination, international workshops (hackathons, wikimania), chinese lexicography.
Advisor Xenophôn Community coordination, national workshops.
  • Jayreborn (talk) -- Volunteer i have already contributed to LinguaLibre and i like to contribute more for TAMIL.
  • Volunteer CoolCanuck (talk) 02:04, 28 October 2017 (UTC)

Community notification

[edit]

You are responsible for notifying relevant communities of your proposal, so that they can help you! Depending on your project, notification may be most appropriate on a Village Pump, talk page, mailing list, etc.--> Please paste links below to where relevant communities have been notified of your proposal, and to any other relevant community discussions. Need notification tips?

Endorsements

[edit]

Do you think this project should be selected for a Project Grant? Please add your name and rationale for endorsing this project below! (Other constructive feedback is welcome on the discussion page).

  • Lingua Libre is a really nice and innovative tool, which allow to easily bring new audio contents to Wikimedia projects (especially Wiktionnary), in various languages from underrepresented parts of the world. But this tool needs to scale up to be widely usable by Wikimedians (including me: I want to use it when it will possible to use my SUL account, and when audio files will automatically be uploaded on Commons). 0x010C is a trusted Wikimedian with great skills in programming, and he already worked on the project, with Nicolas Vion, Lyokoï, Yug and Xenophôn. Jules78120 (talk) 20:52, 22 September 2017 (UTC)
  • Serious contributors on a serious project with impressive goals and results. Simon Villeneuve 16:38, 23 September 2017 (UTC)
  • I hereby announce that I endorse this great project that would add immense value to our encyclopedia. 0x010C is a very valuable user of Wikipedia and I'm sure his project is serious. Godspeed! Durifon (talk) 17:41, 23 September 2017 (UTC)
  • Great project, great contributors. Thibaut120094 (talk) 19:40, 23 September 2017 (UTC)
  • This is really great, 0x010c has worked with me on the Wikimooc project and did an amazing job, making interactive tutorials for us. So I strongly support this project which will give a wonderful way to contribute and increase our various projects ! I can't wait :) GrandCelinien (talk) 08:19, 24 September 2017 (UTC)
  • This would be a very useful tool. Trusted and experimented users. ——d—n—f (talk) 08:33, 24 September 2017 (UTC)
  • I really, really like this tool, as it is super-efficient and useful. Indeed, the dis-integration (as in: not existing integration) with the Wikimedia projects is a hard burden to make outreach for it. So I would love to see integrated into the Wikimedia projects. Additionally, by end of 2018, we will have a new Wiktionary on a Wikibase base – just in time for this tool. So please fund it, I think Movement money is really well used with this grant. Best regards, --Jcornelius (talk)
  • The planned execution seems great, and perspectives for the tool are impressive, I would love to be able to read articles about places all over the world and hear how their names are pronounced, same for famous people, … the list of examples could be endless ! Sukkoria (talk) 12:48, 24 September 2017 (UTC)
  • Sounds to be very useful and promising. --Visem (talk) 11:09, 25 September 2017 (UTC)
  • This is a great project. I would be happy to see it help pronoucing african words.BamLifa (talk)
  • The tool is already functional and can be of great help to improve coverage of every languages, including endangered languages with very few access online. As a field linguist, I already did wordlist recordings in the field and the tool I was using was less efficient than Lingua Libre! No doubt of the multiple uses it can have: in Wiktionary, in Wikidata and outside of the wikisphere as well. Noé (talk) 17:13, 25 September 2017 (UTC)
  • Support Support JackPotte (talk) 17:15, 25 September 2017 (UTC)
  • Support Support I fully agree with what has been written by other supporters. Pamputt (talk) 18:26, 25 September 2017 (UTC)
  • Support Support Unsui (talk) 21:06, 25 September 2017 (UTC)
  • Support Support Very well-scoped proposal, will definitely bring the community together. This tooling is core to a useful shift from (too much) literacy to more orality. It is easy to devise how a better audio wiki coverage could benefit the whole media-wiki applicative ecosystem. Improving user experience in such area to foster content generation is therefore pivotal. Xentyr (talk) 21:23, 26 September 2017 (UTC)
  • i endorse this beacuse it is a impotant and future generation project which will be useful for people in all domains. Jayreborn (talk) 09:18, 28 September 2017 (UTC)
  • Support Support --Psychoslave (talk) 19:50, 28 September 2017 (UTC)
  • Support Support A very interesting tool for all languages but specially for small languages. As I have already test it, it's really useful! -Xabier Cañas (talk) 07:51, 29 September 2017 (UTC)
  • Support Support Yug 12:29, 29 September 2017 (UTC) Lingua Libre has proven a very popular project with the current proof of concept. Infrastructure need a boost for scalability and internationalisation.
  • Support Support Audio on Wiktionaries is currently undersupported and has great potential - Jberkel (talk) 12:13, 17 October 2017 (UTC)
User:Amqui is recording User:Cecile169 as part of Atikamekw Language Project.
  • Support Support I am the volunteer Lingua Libre representative in Canada and I use it to record Canadian Aboriginal languages. The tool is very great and easy to use, I only had good feedback. However, the uploading to Commons part is very time consuming and a lot of files are still waiting on my hard drive waiting for me to have time. This is very much needed. Amqui (talk) 10:17, 20 October 2017 (UTC)
  • Strong support Strong support. At Wikimedia Canada, we use Ligua Libre to record words in First Nation languages. And the biggest part is to upload the files. Making this part of the recording process would be more than welcome. Best. Benoit Rochon (talk) 15:13, 20 October 2017 (UTC)
  • Support Support I discovered Lingua Libre during Wikimania 2017 and the tool is already very interesting and powerfull. Integrating it within the MediaWiki environment is the right way to improve it and will make it easier to translate into different languages. This tool really deserves worldwide recognition. Djiboun (talk) 22:11, 27 October 2017 (UTC)
  • Support Support Anthere (talk)
  • At 31 k€, a project to enrich 70k Wiktionary entries with audio is good value for money. Based on the talk page clarifications and Amqui's comment above about an untapped reserve of recordings to be put into use, it seems there is a good chance that LinguaLibre enable such an achievement, and points 14–16 make a lot of sense to me. I've not reviewed the technical project behind points 1–13 in detail so I won't express an opinion on those. Besides, I met 0x010C at some hackathons and I found he's cooperative with the larger Wikimedia technical community, although I didn't have a chance to work him closely; I've also known Amqui's tireless efforts around Canadian languages (and more) for a few years. Nemo 15:49, 6 November 2017 (UTC)
  • Support Support Popular project, big outcome Thekidpossum (talk) 21:44, 7 November 2017 (UTC)
  • Support Support Great project, great contributors, I discovered LinguaLibre during Wikimania 2017, and and I use it to record shawiya language, i uploading to Commons and to French Wiktionary. Great11 (talk)
At Wikamania 2017 Montreal projet Lingualibre project, shawiya language