Paddy (Oryza sativa) is a widely consumed staple food worldwide, feeding over 50% of the global population. Sri Lanka has been cultivating paddy for centuries, and with the rapid increase in population, forecasting paddy yield becomes essential for food security and agricultural planning.
This research aims to predict Sri Lankan district-wise paddy yield using openly available data: CHIRPS 2.0, NASA POWER APIs, Sri Lankan Rice Research and Development Institute's PH and Salinity Maps, and Paddy Statistics from the Department of Census and Statistics – Sri Lanka for local agro zones.
For the three major agro-climatic zones and 25 administrative districts, data from 2004 to 2024 were collected for the two harvesting seasons "Yala" and "Maha". The target variable is the total paddy production per district (in metric tons), ranging from 185 MT (Mannar, 2006 Yala) to 530,356 MT (Anuradhapura, 2019-2020 Maha).
Using the crop calendar template from the Department of Agriculture – Sri Lanka, end-to-end crop harvesting simulations were constructed. By combining these simulations with climate variables, soil properties, and historical yield records, we created 12 heterogeneous datasets.
These were used to train 12 base models, whose out-of-fold predictions fed into two meta models, and finally a stacked meta model.
The final model achieved exceptional accuracy in predicting district-level seasonal paddy yield, demonstrating the potential of open data + machine learning for sustainable agriculture in Sri Lanka.
Research Collaboration
Ms. Achinthi Premasiri (BSc, University of Jaffna; MPhil – University of Ruhuna)Joint research combining expertise in agricultural science and machine learning