Skip to content

Latest commit

 

History

History
206 lines (163 loc) · 9.69 KB

README.md

File metadata and controls

206 lines (163 loc) · 9.69 KB

Static Badge Static Badge Static Badge Static Badge

CropClassification

Crop classification is a crucial process in agriculture that involves identifying and categorizing various types of crops based on specific features such as soil nutrients, temperature, humidity, pH levels, and rainfall. By utilizing machine learning algorithms, crop classification systems can analyze these parameters to predict the most suitable crops for a given environment. This not only aids farmers in making informed decisions about crop selection but also enhances agricultural productivity and sustainability. The implementation of crop classification technologies contributes to efficient resource management and helps in addressing food security challenges globally

Data card:

Information about data can be found here.

Model card:

Information about model can be found here

How to

In order to use the system we suggest to:

  1. Depending on your OS create a Python environment with the command:

    python3 -m venv name_of_your_env
    WINDOWS USERS:  call name_of_your_env/Scripts/activate
    MACOS USERS:    source name_of_your_env/bin/activate
    
  2. Install requirements:

    pip install -r requirements.txt
    
  3. Open the mlflow server:

    mlflow ui
    
  4. Start the DVC pipeline:

    dvc repro
    

Testing

The project has been tested with pytest and great expectations.
If you need to use these tools in project you can type in your terminal:

pytest *path_of_the_module_containing_your_testing_functions*
 - or/and -
python tests/data_expecations.py

Quality of code has been assessed with Pylint with an average score of 8.6/10 on the non-autogenerated modules.
You can check the code quality with the command:

pylint *path_of_module\folder_you_want_to_check*

APIs (local)

The project incorporates also a module that implements a set of APIs.
If you want to check them out you can run the uvicorn server just by running the python module api.py with the command:

python apis/main.py

The server will be accessible on your https://127.0.0.1:8000
You can also interact with the APIs through Swagger interface by adding "/docs" to your localhost address https://127.0.0.1:8000/docs.
Alternatively, you can explore the automatically-generated documentation via redoc adding "/redoc" instead https://127.0.0.1:8000/redoc.

As for the function within the project, also the APIs have been tested with pytest and Pylint (average score of 8.2/10).

Orchestration

Our project includes a dockerfile and a docker-compose.yaml.
Our configuration involved creating a straightforward Dockerfile, enabling the generation of an image within a Docker container.
Then another Docker container has been generated to handle the front-end of the application. We also developed a Docker Compose file to serve as a services orchestrator, managing our current container and any future containers.

To create the orchestrator you can use the command:

docker-compose up

GitHub Actions

Project defines two different GitHub Actions, specifically:

  • Pylint Action: checks the non-autogenerated files for code correctness. It is triggered with every push, across all directories.
  • Pytest Action: automatically run tests using pytest whenever code is pushed or a pull request is created. It is triggered with every push, across all directories.
  • Build and Deploy Job: Checks out the code, sets up Python, installs dependencies, and runs tests. Deploys the application after a successful build
  • Tests Action: uses a GitHub Secret that contains the credentials for accessing the remote repository (in this case, DagsHub) through DVC. This action replicates every test of our pipeline. It can be triggered only manually.

Code Carbon

Integrated Code Carbon to monitor and assess the environmental impact of our model results, providing valuable insights into sustainability. You can access the detailed results in the associated model card.

Deployment & Monitoring

  • Prometheus
    Prometheus has been locally installed and run through the command

    prometheus --config.file=prometheus.yml
    
    

    This command starts the Prometheus server, collecting essential metrics needed for Grafana visualization.

  • Locust
    To simulate web traffic and gather additional data for Grafana, use Locust.
    Locust server is available at http://localhost:8089/ after the initialization through the command

    locust
    
  • Grafana
    For comprehensive visualization of data generated by Locust and Prometheus, use Grafana, which is locally installed. This allows us to customize and explore the metrics dashboards to gain insights into the performance of our application.

  • Alibi
    With Alibi we can have under control also the data drift, checking, whenever it is necessary, if the data on which the model predicts, provokes a drift.

Project Organization

├── LICENSE
├── Makefile           <- Makefile with commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── github             <- folder containing all the github actions
│   ├── tests.yaml
│   ├── deploys.yaml
│   └── pylint.yaml
├── data
│   ├── external       <- Data from third party sources.
│    interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
|   |    ├── train.csv
|   |    └── test.csv
│   └── raw            <- The original, immutable data dump.
|        └── Crop_Recommendation.csv   <- Dataset
|
|
├── docs                <- A default Sphinx project; see sphinx-doc.org for details
│
├── apis/               <- Containg all apis
|     ├── main.py
|     ├── schemas.py
|     └── test_api.py
├── models             <- Trained and serialized models, model predictions, or model summaries
|     ├── model.pkl            <-Trained Model
|     └── label_encoder.pkl    <-Trained Label Encoder
|
├── tests
|     ├── data_expectaions.py   <- Great Expectaion file to check data integreity
|     ├── test_evalute.py       <- Testing and evlauting model
|     └── test_train_model.py   <= Test for train model
├──
├── locustfile.py       <-configurations file for locust tests
|
├── prometheus.yml     <- configuration file for prometheus
|
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
├── src                <- Source code for use in this project.
│   ├── __init__.py    <- Makes src a Python module
│   │
│   ├── data           <- Scripts to download or generate data
│   │   └── make_dataset.py
│   │
│   ├── features       <- Scripts to turn raw data into features for modeling
│   │   └── build_features.py
│   │
│   ├── models         <- Scripts to train models and then use trained models to make
│   │   │                 predictions
│   │   ├── predict_model.py
│   │   └── train_model.py
│   │
│   └── visualization  <- Scripts to create exploratory and results oriented visualizations
│       └── visualize.py
├── great_expectations.yml
├── docker-compose.yml
├── Dockerfile                   <- File used by Docker
├── dvc.lock                     <- File used by DVC
├── dvc.yaml
└── tox.ini            <- tox file with settings for running tox; see tox.readthedocs.io

Project based on the cookiecutter data science project template. #cookiecutterdatascience