Advanced-GenAI

GenAI

Overview

This project focuses on building a Retrieval-Augmented Generation (RAG) system designed for news retrieval and question answering. It is a collaboration between ETH Zurich, Lucerne University of Applied Sciences and Arts (HSLU), and Google DeepMind, as part of the HSLU Applied Information and Data Science Master’s Program.

The system leverages state-of-the-art retrieval and generation techniques to provide accurate responses to user queries based on a structured dataset of ETH Zurich news articles.

Project Goals

The primary objective is to develop an end-to-end multilingual RAG pipeline capable of efficiently retrieving and synthesizing relevant answers from news documents. The system is structured into three main phases:

  1. Data Preparation – Extract, clean, and structure news articles from HTML files while enriching metadata to support retrieval.
  2. Building the RAG System – Implement multiple retrieval strategies, including BM25, dense embedding search, GraphRAG, and hybrid approaches. Enhance response quality through advanced post-retrieval techniques like re-ranking.
  3. Evaluation – Assess answer accuracy using automated metrics (e.g., semantic F1 score, BLEU, ROUGE) and human evaluation (relevance, correctness, clarity).

Deliverables

Participants will submit:

This project provides an opportunity to explore the latest advancements in retrieval-based NLP, contributing to the development of trustworthy, efficient, and scalable AI-driven Q&A systems.

For further details, refer to the project documentation.

Milestone Checklist

Phase 1: Data Preparation (10 points)

Loading, Parsing, and Cleaning HTML Files

Multilingual Text Preprocessing and Cleaning

Deliverables

Phase 2.1: Research Agents - Retrieval Strategies (30 points)

Data Preprocessing and Benchmark Construction

Implement Retrieval Strategies

Evaluate Retrieval Performance

Deliverables

Phase 2.2: Aggregation & Response Synthesis (30 points)

Implement Re-ranking Models

Evaluate Re-ranking Performance

Deliverables

Phase 3: Evaluation (5 points)

Automated Metrics Calculation

Human Evaluation

Reporting and Presentation

Deliverables