VerseVault: A Lyrics-Based Search Engine
VerseVault is a lyrics-based search engine designed to empower users to explore songs through text-based queries. VerseVault offers a user-friendly interface for searching song lyrics and understanding musical themes.
Key Features
- Text-Based Querying: Users can search for specific lyrical content or themes within songs.
- Data Aggregation: Collects and organizes extensive metadata about songs, artists, and albums.
- Semantic Search Capabilities: Implements semantic search techniques to enhance the relevance of search results.
Data Collection & Processing
The data collection process involves:
- Data Sources: Last.fm for track metadata and the Lyrist API for retrieving accurate lyrics.
- Data Pipeline: A Python-based pipeline that scrapes and processes data, including Named Entity Recognition using spaCy.
- Domain Model: Core entities include Track, LyricSection, Album, and Artist, each with specific attributes.
Below you can see our data collection pipeline:
Information Needs
VerseVault addresses various user scenarios, such as:
- Finding tracks discussing specific subjects in particular lyrical sections.
- Discovering songs on given topics.
- Searching for songs within specific genres that address certain themes.
Information Indexing
The project utilizes Apache Solr for indexing song data, with documents structured to include tracks and their associated lyrical sections. The indexing process is refined to enhance search functionality, using custom field types and analyzers to improve query matching.
Information Retrieval
Two search systems were developed:
- Refined Search System: Utilizes advanced query structures for improved accuracy.
- Semantic Search System: Employs embeddings and K-Nearest Neighbours for context-aware searches.
Evaluation
The search systems were evaluated based on precision and recall metrics. The semantic search system outperformed other approaches, achieving a Mean Average Precision of 73%, while the synonym-based system showed limited effectiveness.
GUI - Web App
A minimalist web app was developed using Next.js, providing users with a seamless interface for querying the VerseVault database. The app integrates a Flask API to handle requests and generate embeddings for the semantic search.
Want to know more? Click here
Team
- André Lima (Programmer)
- Guilherme Almeida (Programmer)
- Jorge Sousa (Programmer)
- José Castro (Programmer)