AmericasNLP 2026 Shared Task: Cultural Image Captioning for Indigenous Languages
Important! See the submission procedure!The AmericasNLP 2026 Shared Task challenges participants to develop systems that generate accurate, culturally grounded captions for images depicting Indigenous cultures of the Americas, written in the Indigenous languages themselves.
GitHub: https://github.com/AmericasNLP/americasnlp2026
Motivation
Many Indigenous languages of the Americas are endangered and lack the resources needed to train NLP systems effectively. Language communities are actively pursuing revitalization, but creating culturally grounded teaching materials is expensive and time-consuming. Image captioning systems present an opportunity to generate such materials at scale, but doing so requires not only linguistic competence but also cultural knowledge — understanding the people, traditions, and contexts depicted in the images.Task Description
Participants are given a dataset of culturally situated images, each paired with a caption in the associated Indigenous language. The goal is to generate captions for unseen images.Example:
| Image | A wooden structure |
| Target Caption (Wixárika) | Ik+ kareta m+ya kaxetuni wixárika wapait+ yu +kú puti utá, uti xainék+ metá tsiere manapait+ rá ye hupú. |
| English | The so-called carretón, built specifically to store food like corn, is also used as housing for people. |
Rules
- Participants may use the provided training and development data, plus any additional resources (external data, pretrained models, etc.).
- Participants must not create test outputs manually or train on the test sets.
or train on the development set(UPDATE: participants are allowed to use the development set for training)
Evaluation
We adopt a two-stage evaluation protocol:- Stage 1: All systems are ranked using ChrF++.
- Stage 2: The top-5 systems are evaluated by human judges according to a fixed set of criteria.
Languages
| Language | Region |
|---|---|
| Bribri | Costa Rica |
| Guaraní | Paraguay |
| Yucatec Maya | Mexico |
| Wixárika | Mexico |
| Surprise language: Nahuatl | Mexico |
Data
Pilot
Pilot data is available under data/pilot/. Each dataset is provided as a JSONL file with corresponding images. See data/pilot/wixarika.jsonl for an example.⚠ Note: The pilot data includes Spanish captions for reference, but these are provided only in the pilot set. Spanish captions will not be included in the development or test sets and should not be relied upon for building systems.
Development
Development data is available under data/dev/ for Bribri, Guaraní, Maya and Wixárika. Each language folder contains a JSONL file and corresponding images. Please also consider the surprise language: Orizaba Nahuatl (nlv), part of the broader Nahuatl (nah) language family!Important Dates
| Date | Milestone |
|---|---|
| Release of pilot data and baseline system | |
| Release of development sets (50 examples) | |
| Release of surprise languages | |
| Release of test sets | |
| Submission of results (shared task deadline) | |
| Winner announcement | |
| Submission of system description paper | |
| Acceptance notification for system description papers | |
| May 22, 2026 | Camera-ready version due |
All deadlines are 11:59pm UTC-12h (AoE).
Registration
If you are interested in participating, please register here: Google FormSubmission
Send your submission via email to americas.nlp.workshop@gmail.com In the email body, include:- Line 1: Team name
- Line 2: Names of all team members
- Line 3: All languages you are sending submissions for, in the order of your choice (we will use this to double-check that we received all the files you intended to send) [optional]
- Line 4: A link to a GitHub repository with code that can be used to reproduce your results. This is not required to participate in the shared task, but it is strongly encouraged.
Please attach all output files to your email as a single zip file, named after your team (e.g., `TeamName.zip`). Within that zip file, individual files should be named `