BELONGG - AI-BASED INDIAN LANGUAGE CORPUS TRANSLATION TOOL FOR BELONGGAI - A PLATFORM FOR INTERSECTIONAL INCLUSION IN DEVELOPMENT PROGRAMS

Samagra-Code for GovTech
  • Virtual Internship
  • 17-Apr-2024
  • Pan India,
  • Start date
    Immediately
  • Duration
    3 Months
  • Stipend
    ₹33000 /month
  • No of Credits
    10
  • Apply by
    08-May-2024
  • Virtual Internship

About the program

Belongg is developing BelonggAI a tool that will help development practitioners researchers funders etc analyze their proposals program documents policy documents etc to discover intersectional perspectives gender disability sexual orientation caste religion etc that could get added to make the program more inclusive The tool is based on a RAG architecture with customized prompts and a corpus running into thousands of research papers media articles grey literature This is constantly growing as we expand our focus areas However all of this is still in English To avoid omitting the knowledge produced by and on marginalized and underserved communities we are committed to building on the tools capability to process knowledge produced in languages other than English To this end Belongg would like to undertake a project to develop a tool to translate text documents audio and videos in Indian languages to PDFs with English text The developed tool will be embedded in our existing LLM model that can only process English text While working on this project the intern selected as part of DMP will receive guidance from a Belongg mentor to coordinate with our technology team which includes an LLM engineer and colleagues from ARTPark and receive mentorship from a separate mentor assigned by Samagra to provide technical assistance to the mentee Goals MidPoint Milestone Goals Development of Multilingual Translation Tool Quality Assurance and Performance Metrics Integration and Usability with BelonggAI Documentation and Training Materials Progress Reporting and Feedback Mechanism Goals Achieved By Midpoint Milestone Expected Outcome UserFriendly Input Interface Development of an accesscontrolled webpage where Belongg team members can submit knowledge assets in various formats text files PDFs audio files video files and languages This interface should allow for the submission of URLs for media content or direct uploads of the files Batch Processing and Metadata Management The interface must support batch uploads enabling users to add multiple knowledge assets simultaneously Each file should have an option to include metadata eg title source link tags related to content type The Belongg team adds hundreds of knowledge objects to the corpus each week This feature will ensure organized and efficient handling of large volumes of data Translation and Conversion to English Text All uploaded knowledge assets will be automatically processed to convert and translate the content into English text maintaining the integrity and context of the original materials Integration with Google Drive and Sheets The translated and converted content along with its metadata will be systematically stored in a designated Google Drive folder A Google Spreadsheet will be programmatically updated with the status of each submission including links to the processed files in Drive ensuring efficient tracking and management of the knowledge assets Acceptance Criteria Functionality of the Input Interface The webpage must be secure userfriendly and capable of handling multiple file uploads with associated metadata Only authorized Belongg team members should access this portal Accuracy and Reliability of Translation The system must deliver highquality translations with a predefined accuracy threshold eg 95 accuracy ensuring the content is contextually and culturally accurate in English Efficient Batch Processing The tool should handle batch uploads seamlessly with each files metadata accurately captured and associated with the corresponding translated content in the output Seamless Integration and Data Management Successful integration with Google Drive for storing translated files and Google Sheets for realtime status updates The system should maintain a high level of organization allowing easy retrieval and tracking of processed knowledge assets

Perks

1. Lucrative stipend of INR 1 lakh over a period of 3 months 2. Dedicated 11 mentorship by industry experts 3. Handson experience to hone your skills 4. Access to bootcamps and expert sessions 5. Potential job extended internship opportunities 6. Opportunity to network with global opensource tech leaders

Who can apply?

Only those candidates can apply who:

  1. are from Any,
  2. and specialisation from Any,
  3. are available for duration of 3 Months
  4. have relevant skills and interests

Terms of Engagement

1. 50000 received on completion of midpoint milestone as decided with mentor 2. 50000 received on completion of final milestone as decided with mentor 3. Certificate of completion received on successful completion of internship

Number of openings

1

Sorry You Cannot Apply Date Expired