Machine learning in association with remote sensing has assisted agricultural specialists in monitoring, classification and yield estimation of crops. Tobacco is a major taxable crop of Pakistan, however the existing traditional methods for its monitoring and yield estimation are not only expensive and time consuming but also have limitations in terms of accuracy of collected data by a large number of diverse human surveyors. Due to the existence of such loopholes in the employed mechanism for tobacco crop monitoring and yield estimation, its illicit growth and distribution is on the rise. In this paper we have established a sophisticated machine learning mechanism for tobacco crop estimation using temporally stacked sentinel-2 satellite's data of Pakistan. Instead of the conventional approach of using single remotely sensed imagery for the target crop classification, we propose a machine learning based classification algorithm while keeping in view the phonological cycle of the target tobacco crop. Using the proposed mechanism, the temporal variations within the tobacco crop and its association with the variations of other vegetation is considered to improve the classification performance of the employed machine learning algorithm. Furthermore, the impact of stacking the vegetation indices derived from near infrared and vegetation red edge bands of sentinel-2 with the original sentinel-2 datasets, including Normalized Difference Vegetation Index (NDVI) and Normalized Difference Index 45 (NDI45), on the classification performance of the machine learning mechanism is investigated. Ground Truth data for training of our Artificial Neural Networks classifier, was obtained using indigenously developed survey application “GEOSurvey”. Experiments were conducted using our proposed mechanism while considering various input data setups - including single date imagery, temporally stacked datasets based on phonological cycle of tobacco crop and different combinations of NDVI and NDI45 stacking. Our proposed experimental setup consisting of temporally stacked imagery along with NDVI stacking results in the best classification performance of 95.81% with reference to the single date imagery stacked with NDVI and NDI45, with performance gain of 07.32%.