README.md

# MultimodalTransformer-CRP
# Repository Disclaimer

**Disclaimer:**  
This repository is a work in progress, and both the file structure and README are currently under development. For now, if you’re focusing on model training with the dataset, please direct your attention to the following key files. Other files can be disregarded for the time being.

### Files of Interest:


- **improved_resent**  
  This folder contains all relevant information for training the improved training set up using cropped pathes of the mamogramms
    - **container_setup**
      This folder contains all scripts for training the models on the cluster
        - **standard_setup**
          This folder contains the scripts for training all models on the cluster for the standard set up (only mammograms)
        - **teacher_student**
          This folder contains the scripts for training all models on the cluster for the piveledged knowledge set up (mamogramms plus priveledged knowledge from the random forest models)

- **student_model**  
  This folder includes all training scripts for all models trained on full mamomgrams with the standard set up

- **student_teacher_model**  
  This folder includes all training scripts for all models trained on full mamomgrams with priveledged knowledge set up

---

## Repository Structure

```
multimodaltransformer-crp/
├── README.md
├── data/
├── models/
├── student_model/
├── student_teacher_model/
├── improved_resnet/
├── teacher_model/
├── figures/
├── confusion_matrices/
├── scripts/
├── experiments/
└── utils/
```

---

## 1. Project Overview

This repository focuses on **Multimodal Transformer models for breast cancer detection** using the **privileged general distillation technique**. The project is structured around training various deep learning models (GoogleNet, ResNet, ViT) on **mammogram datasets**, specifically leveraging privileged information from **random forest models**. The ultimate goal is to improve explainability and performance using **Concept Relevance Propagation (CRP)**.

### Key Features:
- **Multimodal Transformer Training**: Standard vs. privileged knowledge training.
- **Dataset Handling**: Processing full mammograms vs. cropped patches.
- **Cluster Training Setup**: Scripts for executing jobs on a computing cluster.

---

## 2. Key Directories and Files

### 🔹 `improved_resnet/`
This folder contains all relevant files for training the **improved models setup**, which is based on using **cropped patches extracted from the region of interest (ROI)**. Unlike the other training setups, this approach focuses on **cropped mammogram patches** instead of full mammograms.

- **`container_setup/`**  
  - `standard_setup/`: Scripts for training models using only mammograms patches.
  - `teacher_student/`: Scripts for **privileged knowledge training** (mammograms + additional RF-based features).

### 🔹 `student_model/`
This folder contains all **training scripts for models trained on full mammograms** using the **standard setup**.

- **Key files**:
  - `ViT/`: Vision Transformer training and evaluation scripts.
  - `datasets/`: Dataset and preprocessing scripts.
  - `experiments/`: Scripts for running evaluation and model comparisons.
  - `models/`: Saved trained models for standard training.

### 🔹 `student_teacher_model/`
This folder contains all **training scripts for models trained on full mammograms** with **privileged knowledge setup**.

- **Key components**:
  - `GoogleNet/`, `ResNet/`, `ViT/`: Training scripts for each architecture.
  - `experiments/`: Scripts for various training and evaluation methods.
  - `models/`: Fine-tuned and pretrained models.

### 🔹 `teacher_model/`
This folder includes **decision tree and random forest models training scripts** used for generating privileged knowledge.

- **Key components**:
  - `decision_tree/`: Pretrained decision tree models for extracting features.
  - `random_forest/`: Random forest models used for feature extraction.

### 🔹 `data/`
This folder contains **mammogram datasets and preprocessing scripts**.

- **`cropped_mass_train_data/` and `cropped_mass_test_data/`**: Cropped patches for training/testing (used only for experimental purposes).
- **`original_csv_files/`**: Original dataset metadata.
- **`final_Full_mam_csv_files/`**: CSV files containing training/testing splits.
- **Preprocessing scripts**:
  - `cropp_all_img.py`: Cropping mammograms into patches (used only for experimental purposes).
  - `merg_data.py`: Merging and processing datasets.
  - `dataloader.py`: Data loading scripts.

### 🔹 `figures/` (used only for experimental purposes).
This folder stores **visualizations**, including:
- Confusion matrices
- Feature importance plots
- Training progress metrics

### 🔹 `confusion_matrices/` (used only for experimental purposes).
Contains stored confusion matrices for trained models.

---

## 3. Setup & Installation

To run this project, ensure you have the necessary dependencies installed.

### 🔹 1. Clone the Repository:student_model
This folder includes all training scripts for all models trained on full mamomgrams with the standard set up

student_teacher_model
This folder includes all training scripts for all models trained on full mamomgrams with priveledged knowledge set up

Thank you for your understanding as I work towards organizing t
```

### 🔹 2. Install Dependencies: (needs to be updated)
```bash
pip install -r requirements.txt
```

```

---

## 4. Training Workflow

### 🔹 Standard Training (Full Mammograms)
To train a **ViT** model on **full mammograms**, run:
```bash
python student_model/ViT/train_vit.py
```

### 🔹 Privileged Knowledge Training (Cropped ROI-Based Setup)
To train a **ResNet** model using **privileged knowledge on ROI-based cropped patches**, run:
```bash
python improved_resnet/container_setup/teacher_student/train_resnet.py
```

### 🔹 Evaluate a Model:
```bash
python experiments/eval_resnet.py --model_path models/best_model_resnet.pth
```

---