Advanced-GenAI

GenAI

Overview

This project focuses on building a Retrieval-Augmented Generation (RAG) system designed for news retrieval and question answering. It is a collaboration between ETH Zurich, Lucerne University of Applied Sciences and Arts (HSLU), and Google DeepMind, as part of the HSLU Applied Information and Data Science Master’s Program.

The system leverages state-of-the-art retrieval and generation techniques to provide accurate responses to user queries based on a structured dataset of ETH Zurich news articles.

Project Goals

The primary objective is to develop an end-to-end multilingual RAG pipeline capable of efficiently retrieving and synthesizing relevant answers from news documents. The system is structured into three main phases:

Data Preparation – Extract, clean, and structure news articles from HTML files while enriching metadata to support retrieval.
Building the RAG System – Implement multiple retrieval strategies, including BM25, dense embedding search, GraphRAG, and hybrid approaches. Enhance response quality through advanced post-retrieval techniques like re-ranking.
Evaluation – Assess answer accuracy using automated metrics (e.g., semantic F1 score, BLEU, ROUGE) and human evaluation (relevance, correctness, clarity).

Deliverables

Participants will submit:

Codebase – A well-documented implementation (preferably in Python).
Evaluation Report – A comprehensive analysis of retrieval and generation performance.

This project provides an opportunity to explore the latest advancements in retrieval-based NLP, contributing to the development of trustworthy, efficient, and scalable AI-driven Q&A systems.

For further details, refer to the project documentation.

Milestone Checklist

Phase 1: Data Preparation (10 points)

Loading, Parsing, and Cleaning HTML Files

Use BeautifulSoup to extract main text from .html files
Utilize Docling for advanced document parsing
Implement a hybrid approach (BeautifulSoup + Docling) for comparison

Multilingual Text Preprocessing and Cleaning

Remove extra spaces, redundant line breaks, and normalize Unicode characters
Standardize date formats from different sources
Handle German-specific text processing (e.g., compound words, umlaut normalization)
Adds fields such as language, title, date, source
Adds main content, named entities, topics, keywords, summary
Compare both summaries and choose the best one
Store cleaned text and metadata in a structured format (JSON, CSV, or database) Explore which format is best for the project
Create additional metadata to support semantic search and context filtering

Deliverables

Write a brief report comparing BeautifulSoup and Docling in a Python notebook
Submit a structured multilingual dataset with enriched metadata

Phase 2.1: Research Agents - Retrieval Strategies (30 points)

Data Preprocessing and Benchmark Construction

Chunk news text data using fixed-size segmentation or semantic segmentation
Use GPT-4o to verify paragraph relevance to a given question (Pascal)
Assign ground-truth relevance labels (1.0, 0.5, 0.0) for evaluation (Pascal)

Implement Retrieval Strategies

Baseline: BM25 Keyword-Based Retrieval
Semantic Search: Dense Vector Retrieval (using mBERT, Sentence-BERT, etc.) -> Sentence-BERT
GraphRAG-Based Retrieval (Microsoft Local-to-Global GraphRAG)
Hybrid Retrieval (BM25 + Dense + GraphRAG)

Evaluate Retrieval Performance

Analyze retrieval methods using Precision@k, Recall@k, and MRR metrics
Examine query routing, query rewriting, and expansion methods

Deliverables

Submit a benchmark dataset with relevance scores
Implement and integrate a hybrid retrieval pipeline
Write a comparative report on retrieval strategies and performance

Phase 2.2: Aggregation & Response Synthesis (30 points)

Implement Re-ranking Models

Apply re-ranking techniques (EcoRank, Set-Encoder, List-Aware Reranking, etc.)
Compare performance with pre-built re-ranking solutions (e.g., OpenAI, Cohere)
Integrate summarization and fusion techniques

Evaluate Re-ranking Performance

Assess improvements in Precision@k, Recall@k, MRR after re-ranking
Evaluate computational efficiency of re-ranking models
Conduct qualitative analysis on document alignment with queries

Deliverables

Submit an integrated re-ranking model within the RAG pipeline
Provide performance metrics and visualization
Conduct qualitative analysis on relevance ordering

Phase 3: Evaluation (5 points)

Automated Metrics Calculation

Compute Semantic Exact Match, Semantic F1 Score, BLEU/ROUGE
Compare generated answers with ground truth using structured evaluation

Human Evaluation

Rate answers based on Relevance, Correctness, and Clarity (scale 1-5)
Provide a written analysis summarizing human evaluation results

Reporting and Presentation

Present results using tables and charts within a Python notebook
Compare automated metrics with human evaluation
Summarize system performance and identify areas for improvement

Deliverables

Submit calculated automated metrics for RAG-generated responses
Conduct human assessment of answer quality
Provide a final report on evaluation results