Projects — Mohamed Jalal Baim

V-EditR Python

A reasoning-first image editor powered by Vision–Language Models

An advanced AI image editing pipeline that understands complex natural language instructions and applies precise, context-aware edits. Unlike traditional tools, V-EditR first reasons about scene context, spatial relationships, and object identities before making any modification — handling requests like "remove the chair behind the table" or "make the person holding the phone wear a black jacket".

Tech Stack

Python GroundingDINO SAM InstructPix2Pix ControlNet Stable Diffusion LLM Parser

Key Features

Multi-stage pipeline: text → plan generation → object grounding → edit → validation
Spatial and relational reasoning ("next to", "behind", "holding")
Object grounding with GroundingDINO + SAM segmentation masks
Modular architecture with separate planners, validators, and verifiers

TrustRAG Python

Trustworthy Retrieval-Augmented Generation for the medical domain

A medical-domain QA system built on a hybrid dense + sparse retrieval pipeline with hallucination prevention. Uses ~230 K Wikipedia medical passages and refuses to return answers it cannot verify — hallucinated content is replaced with explicit, grounded refusal messages.

Tech Stack

Python FAISS BM25 e5-small-v2 MiniLM cross-encoder gemma3:4b (Ollama) nli-deberta-v3-base RAGAS

Key Features

Hybrid retrieval: dense (FAISS) + sparse (BM25) with score fusion
Cross-encoder reranking for improved passage relevance
NLI-based faithfulness verification — hallucinated sentences are blocked
Evaluated with RAGAS (Faithfulness, Answer Relevancy, Context Precision)

FocusFlow Python

Localized Image Editing via Masked Velocity Blending

A novel image editing method built on top of Stable Diffusion 3 that enables precise, localized edits using only text prompts — no manual masks required. Extends the FlowEdit technique by automatically identifying which regions to modify via velocity field analysis, then applying masked velocity blending to confine edits to the relevant area.

Tech Stack

Python 3.10+ PyTorch 2.x + CUDA Stable Diffusion 3 diffusers CLIP LPIPS

Key Features

Automatic mask generation via velocity field differencing between source/target prompts
Masked velocity blending: V_blend = M · V_target + (1 − M) · V_source
Evaluated on 40 test cases: pose changes, background replacement, material/style edits
Best CLIP-T score (0.296) vs. FlowEdit and SDEdit baselines

Direct Preference Optimization Notebook

DPO paper implementation for LLM alignment

A clean implementation of the Direct Preference Optimization algorithm for aligning language models with human preferences. Fine-tunes TinyLlama-1.1B on a sentiment classification task using preference pairs (chosen vs. rejected outputs) generated by gemma3:4b via Ollama.

Tech Stack

Python TinyLlama-1.1B Ollama + gemma3:4b DPO loss (β=0.1) PyTorch

Key Features

Full DPO training pipeline: data prep → SFT → DPO fine-tuning → evaluation
Preference pair generation using a larger LLM as a judge
Faithful reproduction of the original DPO paper methodology

FlowEdit Python

FlowEdit paper implementation — text-guided image editing with flow-matching diffusion

A faithful implementation of the FlowEdit paper, enabling precise text-guided image transformations using Stable Diffusion 3 via source and target text prompts — without re-training or fine-tuning any model. Used as the baseline in the FocusFlow project above.

Tech Stack

Python PyTorch Stable Diffusion 3 diffusers transformers NumPy PyYAML

Key Features

Reproduces the FlowEdit paper's delta velocity blending approach
Configurable editing parameters (timesteps, guidance scales, averaging steps)
Batch editing via YAML config and paper figure reproductions
Served as the baseline for the FocusFlow quantitative evaluation

RAG Assistant Agent Python

Multi-agent RAG system powered by LLMs

A multi-agent document assistant that answers questions from internal documents (PDFs, HTML, emails) with guaranteed citations and built-in PII/secrets safeguards. Exposes a FastAPI REST interface for ingestion and querying.

Tech Stack

Python FastAPI sentence-transformers ChromaDB BM25 Tesseract OCR OpenAI / Ollama

Key Features

Multi-agent pipeline: Retrieval → Reranker → QA → Citation/Verifier → Safety/PII → Composer
Supports PDF, HTML, TXT, and .eml (email) file ingestion
Hybrid retrieval: vector similarity + BM25 keyword search
Automatic PII/secret detection during ingestion and response generation
REST API (POST /ingest, POST /ask) for easy integration