Efficient Content Processing and Recommendation System for Kyiv Independent News Journal

Data Science
OpenAI
Recommendation System

Client

The Kyiv Independent is a leading English-language media outlet based in Ukraine, providing independent, fact-based journalism on Ukrainian and global affairs. Known for its in-depth investigations, political analysis, and war reporting, the publication has gained international recognition for its commitment to press freedom and transparency. Since its founding in 2021, Kyiv Independent has delivered accurate news from Ukraine to a global audience.

We collaborated with Fourth Estate (our direct client), which provides the technological platform for Kyiv Independent, to develop a recommendation system for the publication. This solution enhances user experience by personalising content and improving audience engagement.

Review

Postdata's automation work allowed the client to improve their website's UX and popularity on Google. The team was responsive, completed all tasks on time, and communicated via virtual meetings, email, and messaging apps. Overall, the client was highly satisfied with the project's results.

Review from Fourth Estate

Project overview

The project involved implementing AI-driven solutions for automating content processing tasks:

  1. Summary Generation:

    • Arbitrary Summary: An advanced LLM-powered tool for summarising articles in a free-form way. This model has been specifically trained on summaries written by Kyiv Independent editors, ensuring high-quality, editorial-style condensations that preserve the essence and nuance of the original content. The resulting summaries successfully pass ZeroGPT detection, which is crucial for SEO, as Google prioritises human-written content and may downrank AI-generated text.

    • Newsletter Summary: A refined approach using few-shot learning to align summaries with Kyiv Independent’s newsletter style.

  2. Tags Generation:

    • Arbitrary Tags Without Policy: The model generates tags freely without following any predefined rules or structure. This allows for flexibility but may result in inconsistent tagging.

    • Arbitrary Tags With Policy: A predefined set of guidelines ensures that tags follow a consistent format and style. This improves organisation and makes the tags more useful for categorisation.

    • Static Tags With Schema: Tags are selected strictly from a fixed list, ensuring uniform terminology across all articles. This helps maintain consistency and prevents irrelevant or overly broad tags.

    • Static Tags With Schema and Policy: A hybrid approach that combines a predefined list of tags with additional rules to refine tag selection.

  3. Cost Reduction:

    • Combined summary and tag generation in a single model call to optimise performance, reduce API usage costs, and streamline the workflow for editors and SEO specialists.

  4. Recommendation System:

    • Content-based recommendations were generated using embeddings and stored in a vector database.

    • Articles were suggested based on similarity scores while filtering out irrelevant categories.

Results

We successfully automated content processing, improving efficiency and reducing costs. The solution integrates easily into the existing system.

  • Tags Generation: The model generates tags in different ways – fully AI-driven, based on specific guidelines, or from a predefined list. Tags play a crucial role in site search functionality and SEO optimisation, improving content discoverability both within the website and on search engines.

  • Summary Generation: Summaries can be free-form or follow the newsletter style for better readability and AI detection rates.

  • Cost Optimization: Tags and summaries are generated in a single call, cutting processing costs.

  • Recommendations: A content-based system finds similar articles using embeddings, ensuring relevant suggestions.

Project duration:

3 weeks

Team

1 person

1 Senior Data Scientist

Technologies

SQL, Python, OpenAI, Docker, Langchain

Tech challenge

  • Alignment with editorial standards: The AI model’s outputs needed constant adjustments and fine-tuning to meet Kyiv Independent’s editorial guidelines.

  • Complex tagging systems: Managing the complexity of multiple tagging formats and ensuring consistency across different articles posed significant challenges.

  • Consistent summary generation: Ensuring that summaries were clear, relevant, and aligned with the desired editorial tone also required substantial effort.

  • Iterative testing and refinement: These challenges were tackled through continuous testing and model refinements, ensuring the solution met Kyiv Independent’s specific needs while maintaining high accuracy.

Solution

The solution used few-shot learning to improve summary detection, standardised tagging with predefined guidelines, and optimised embeddings for better recommendations. Combining summary and tag generation in one call reduced costs, streamlining Kyiv Independent’s content automation while maintaining quality and efficiency.

Let's talk about your case

Email: andrii.rohovyi@postdata.ai

Let's talk about your case

Email: andrii.rohovyi@postdata.ai

Let's talk about your case

Email: andrii.rohovyi@postdata.ai