TATTLE-CLUSTERING LARGE AMOUNT OF AUDIO FOR FELUDA

Samagra-Code for GovTech
  • Virtual Internship
  • 16-Apr-2024
  • PAN India,
  • Start date
    Immediately
  • Duration
    3 Months
  • Stipend
    ₹33000 /month
  • No of Credits
    10
  • Apply by
    08-May-2024
  • Virtual Internship

About the program

Description Feluda allows researchers factcheckers and journalists to explore and analyze large quantity of multimedia content One important modality on Indian social media is audio The scope of this task is to explore various automated techniques suited for this grouping similar audio together and visualizing them After consultation with the team implement an end-to-end workflow that can be used to surface visual or temporal trends in a large collection of audio Goals Review Literature with our team and do research and prototyping to review state of the art ML and classical DSP techniques. Optimize the solution for consistent RAM and CPU usage limit the spikes caused by variables like file size video length etc since it will need to scale up for million videos. Integrate the solution into Feluda by creating a operator that adheres to Feluda operators interface. Expected Outcome Feludas goal is to provide a simple CLI or scriptable interface for analyzing multimodal social media data in that vein all the work that you do should be executable and configurable via scripts and config files. The solution should look at feludas architecture and its various components to identify best ways to enable this. The solution should have a way to configure data source database with file IDs or a S3 bucket with files specify and implement the data processing pipeline and where the result will be stored Our current implementation uses S3 and SQL database for data source and Elasticsearch for storing result but additional sources or stores can be added if apt for this project Acceptance Criteria Regular Interactive Demos with the team using a public jupyter notebook pushed to our experiments repository Working feluda operator with tests that can be run as an independent worker in the cloud to schedule processing jobs over a large dataset Output Structured data that can be passed onto a UI service web or mobile for downstream use cases Implementation Details One way we have approached this is by using Vector Embeddings We have done this to great success to surface visual trends in Images We used ResNet model to generate vector embeddings and store them in elasticsearch We also used tsne to reduce the dimensions of the vector embeddings to then display them in a 2D visualization It can be viewed here A detailed report over feludas usage in a project to analyze images can be read here The relevant feluda operator can be studied here The code for tsne is here A prior study of various ways to get insights out of images has been documented here

Perks

1. Lucrative stipend of INR 1 lakh over a period of 3 months. 2. Dedicated 11 mentorship by industry experts. 3. Handson experience to hone your skills. 4 .Access to bootcamps and expert sessions. 5 .Potential job extended internship opportunities. 6. Opportunity to network with global opensource tech leaders.

Who can apply?

Only those candidates can apply who:

  1. are from Any,
  2. and specialisation from Any,
  3. are available for duration of 3 Months
  4. have relevant skills and interests

Terms of Engagement

1. 50000 received on completion of midpoint milestone as decided with mentor. 2. 50000 received on completion of final milestone as decided with mentor. 3. Certificate of completion received on successful completion of internship.

Number of openings

1

Sorry You Cannot Apply Date Expired