This project focuses on building a Retrieval-Augmented Generation (RAG)
system designed for news retrieval and question answering. It is a
collaboration between ETH Zurich, Lucerne University of Applied Sciences
and Arts (HSLU), and Google DeepMind, as part of the HSLU Applied
Information and Data Science Master’s Program.
The system leverages state-of-the-art retrieval and generation
techniques to provide accurate responses to user queries based on a structured
dataset of ETH Zurich news articles.
Project Goals
The primary objective is to develop an end-to-end multilingual RAG pipeline
capable of efficiently retrieving and synthesizing relevant answers from news
documents. The system is structured into three main phases:
Data Preparation – Extract, clean, and structure news articles from HTML
files while enriching metadata to support retrieval.
Building the RAG System – Implement multiple retrieval strategies,
including BM25, dense embedding search, GraphRAG, and hybrid approaches.
Enhance response quality through advanced post-retrieval techniques like
re-ranking.
Evaluation – Assess answer accuracy using automated metrics (e.g.,
semantic F1 score, BLEU, ROUGE) and human evaluation (relevance,
correctness, clarity).
Deliverables
Participants will submit:
Codebase – A well-documented implementation (preferably in Python).
Evaluation Report – A comprehensive analysis of retrieval and generation
performance.
This project provides an opportunity to explore the latest advancements in
retrieval-based NLP, contributing to the development of trustworthy,
efficient, and scalable AI-driven Q&A systems.