Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Swahili Language Support for Sentence Parsing and Text Processing #5

Open
m453h opened this issue Mar 27, 2025 · 0 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@m453h
Copy link

m453h commented Mar 27, 2025

Media Cloud does not need any special language support to collect data for a given language. However, the data collected will not undergo any further processing to provide more insights e.g. Listing top word counts on the sources application for collections which predominantly contain Swahili language will mostly show stop words.

Thus, we would like to add support for the Swahili language in the system which will enable features like tokenization, stemming and stop words removal for Swahili which will greatly improve the data returned by various word-counting API endpoints.

@m453h m453h added the enhancement New feature or request label Mar 27, 2025
@m453h m453h self-assigned this Mar 27, 2025
@m453h m453h added this to COMMONS Mar 27, 2025
@m453h m453h moved this to 🚧 In Progress in COMMONS Mar 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 🚧 In Progress
Development

No branches or pull requests

2 participants