Building Python Powered Content Aggregator For Multimedia StartupPython Solution For Media Client

Some stress elegance of the Python programming language, others marvel at its modularity and extensive standard library, while we at Agiliway tip the hat to Python for the increased productivity it provides.

Once again Python played to its strengths at the fingertips of our engineers, and this time an American multimedia startup got a perfect tool for doing market research easily and effectively.

Challenges for the Startup in the Digital Age

The variety of digital data come thick and fast daily leaving readers overwhelmed and huddled over their laptops for hours on end. Per day about 4 mln hours of content are uploaded to YouTube, an average of 3.6 bln Google searches conducted, and more than 2 million articles published on the web. The Washington Post only publishes around 1,200 staff-produced articles, wire stories, graphics, and videos what makes one story every two minutes. Who would have thought that 600 new page edits are published on Wikipedia every minute?! Highlights of last year’s reports about world’s data generation make one scratch the head in confusion.

Given an all-time high pace at which events hit headlines the news goes viral and updates are published, our client representing the multimedia startup experienced a dire need of having routine daily work expedited and leveled up. The company serves the enterprise software industry by delivering detailed analyses of the industry, raising awareness about information technologies, creating webcasts, preparing articles, researching programs and white papers. Heavily reliant on the Internet for the latest industry news, updates, and analyses, the client could spend hours collecting relevant data on a certain topic. At times searching turned out to be ineffective due to the lack of necessary skills, time or resources. With the advent of digital media came continuous access to more incoming information than a person could spot, process and absorb.

Getting a Handle on Information Overload

To keep its business alive and thriving the company couldn’t afford to escape the glut of information coming fast and furious from a variety of sources. Without fail, only after data on a certain topic have been collected, filtered through, classified and then studied for accuracy and reliability, analytics merits notice.

The solution was made to build in a third-party app, Python Powered Content Aggregator, for all the dirty work of searching and collecting niche-relevant content from multiple sources. It serves to crawl the Internet (social media platforms, RSS feeds, news or company websites, online editions of journals and newspapers, forums, blogs, etc.), check out all sorts of updates, filter media outlets according to set criteria, automatically upload them in a repository which an admin can access anytime to further work with.

Advantages following the implementation of Python Powered Content Aggregator are as follows:

  • relevant content is pulled from all corners of the Internet and without human judgment
  • information is received automatically
  • value of updates is preserved due to timely delivery
  • admin can scan information quickly without having to visit each source site individually
  • Python-based platform is easy to use with powerful filtering and source configuration capabilities
  • highly customizable, the aggregator grabs information the webmaster set it up for
  • new posts can be categorized by subject, easily sorted through, added, deleted, and commented on
  • analysis has higher intrinsic value when based on a rich informative channel

Having full control, clicking only through items of interest and spending time on relevant posts were the client’s expectations. To meet them Agiliway engineers developed an aggregation strategy and turned an informative channel into a hub of community information with editorial comments. The process included a few steps:

  • creating a web crawler in Python for parsing source sites and fetching data
  • recording data into a non-relational database on the MongoDB platform for their fast and convenient processing
  • metadata for the crawler were entered in the MySQL database for better and faster indexing
  • WordPress plugin was created to configure sources, keywords, and categories for WordPress posts
  • crawler categorizes, filters and automatically creates posts in WordPress as well as in other CMSs
  • the user interface was designed using popular CMS as a prototype. Interactive and user-friendly, it allows users with limited expertise to review, edit, modify or delete content from a website.

Less time spent on searching relevant content means more time allocated to analyze it, reach conclusions, prepare data-driven reports and high-powered analytics. With content aggregator written in the high-level, dynamic, interactive Python programming language with the use of full-featured MongoDB, parsing, searching and grouping takes a moment.

Whether you decide to implement it or not, its applications are diverse – news, reviews, analysis, research, price comparisons, etc. – and most likely your competitors are already making the most of it.