Add Swahili Language Support for Sentence Parsing and Text Processing #5

m453h · 2025-03-27T10:51:49Z

Media Cloud does not need any special language support to collect data for a given language. However, the data collected will not undergo any further processing to provide more insights e.g. Listing top word counts on the sources application for collections which predominantly contain Swahili language will mostly show stop words.

Thus, we would like to add support for the Swahili language in the system which will enable features like tokenization, stemming and stop words removal for Swahili which will greatly improve the data returned by various word-counting API endpoints.

m453h added the enhancement New feature or request label Mar 27, 2025

m453h self-assigned this Mar 27, 2025

m453h added this to COMMONS Mar 27, 2025

m453h moved this to 🚧 In Progress in COMMONS Mar 27, 2025

m453h assigned thepsalmist Mar 27, 2025

m453h mentioned this issue Mar 27, 2025

Ft/Swahili language support #6

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Swahili Language Support for Sentence Parsing and Text Processing #5

Add Swahili Language Support for Sentence Parsing and Text Processing #5

m453h commented Mar 27, 2025

Add Swahili Language Support for Sentence Parsing and Text Processing #5

Add Swahili Language Support for Sentence Parsing and Text Processing #5

Comments

m453h commented Mar 27, 2025