User Tools

Site Tools


Sidebar

gaia

GAIA AND NIKE PROJECTS

Here you will find information relating to the twin projects, Gaia and Nike projects. Both projects are distinct but very much related!

  • For the Gaia project, you will create prompts with synthetic PII, ensuring alignment with metadata, PII types, and contextual relevance.
  • For the Nike project, you will help us by identifying and labelling PII spans accurately, following the PII type definitions, as well as the format and specific guidelines.

Please find the below the links to the relevant resources and training for these projects:

Guidelines:

Gaia (Authoring)
Nike (Labelling)

PII entities:

AR
FI-FI
HI-IN
NB-NO
NL-NL and NL-BE
External Link
PL-PL
PT-BR and PT-PT
SV-SE
VI_VN
ZH-CN and ZH-SG

Training Modules:
Gaia Prompt Authoring Orientation Module
Nike PII Annotation Orientation Module

How to check PII is not real (for Gaia):
For the Gaia project, to verify the PII data points are synthetic (fake) and there are no risks of them being linked to real individuals, you can check Fake PII Checks. The PDF provides instructions and context for interpreting and utilizing the Excel sheet. Refer to the PDF for guidance while performing PII validation tasks in the Excel file.

How to use PII checks excel
PII checks

USING PII ENTITIES FILES

The PII Entities files provide a comprehensive reference for working with PII in this project. They include details about each PII type, such as the domain (e.g., Travel, Finance), description, format requirements, and examples. These files can help you ensure consistent and accurate application of PII types during prompt creation and labeling. Please keep them in mind during your work!

PII Entities files for Prompt Authoring (GAIA)

In this project, you will work with PII types to create prompts accurately. The process involves searching the correct type of PII based on the PII type required and authoring synthetic PII that matches the metadata and domain requirements. Understanding how to correctly apply PII types and formats is essential for ensuring high-quality outputs and compliance with project guidelines.

  • When drafting the prompts, refer to the pii_type row (e.g., NAME, PHONE, EMAIL, PASSPORT_NUMBER_XX) specified in the metadata found in the webapp.
  • Look for the PII Type column in
  • Follow the format described under the Description column for each PII type, ensuring compliance with any locale-specific rules (e.g., Portuguese phone numbers must include a country code).
  • Use the Examples to ensure the generated PII matches the required structure and style. Examples include:
    • NAME: “My name is João Manuel Andrade Souza Pereira.”
    • PHONE: “My cell phone number is +351936658892.”
    • EMAIL: “My e-mail address is sandra.pereira@gmail.com.”
  • Include realistic and contextually relevant PII in the prompts you author, ensuring the format aligns with the metadata. DO NOT just copy the examples found in the Entities files. These are for guidance only!

PII Entities files for Prompt Labelling (NIKE)

In this project, you will work with PII types to label PII spans accurately. The process involves identifying and tagging the correct PII spans based on the required PII type detected in the prompt. Understanding how to correctly apply PII types and formats is essential for ensuring high-quality outputs and compliance with project guidelines.

  • Review the text to identify any spans containing PII (e.g., names, phone numbers, emails, etc.).
  • Once you believe to have identified the PII type, use the PII Type column in the entities file to determine the type of PII you want to label.

Refer to the Description column to ensure the detected PII spans match the required format and comply with any locale-specific rules (e.g., Portuguese passport number format, etc.).

  • Use the examples provided in the entities file to validate your labeling. Examples include:
  • NAME: ”João Manuel Andrade Souza Pereira“
  • PHONE: ”+351936658892“
  • EMAIL: ”sandra.pereira@gmail.com“
  • Label realistic and contextually relevant PII spans based on the text provided, ensuring the spans align with the entity descriptions.
gaia.txt · Last modified: 2025/01/16 15:58 by sergio