Data analytics across multiple decentralized data sources

Blossom Sky utilizes AI for sentiment analytics and NLP transcription using decentralized data processing, enabling efficient training.
Data analytics across multiple decentralized data sources

Many applications today rely on non-relational data stores, as there is no single data store that can handle all aspects of modern analytics. For example, a user might employ an RDBMS to store a dataset but then need to perform a cluster analysis. Because relational databases are not optimized for such tasks, the data will be moved to another system for processing. The disadvantages of ETL include the fact that it is typically not ideal for near-real-time or on-demand data access and is geared towards more of a batch mode of working. This means that ETL is not be the best choice for situations where fast response times are required or data can't be transported into a central data analytics store, which is always the case with HIPAA, PHI SOX, PSD2 and partially for PCI-DSS.

In all data driven organizations or institutions, data from different sources needs to be analyzed together to get a holistic view of what is happening. As data resides in different places, it needs to be moved from one place to another so that it can be analyzed on the same platform. Nowadays, all organizations face the tedious task of manual data movement between different storage platforms and integrating different processing platforms. It is not rare for organizations to write ad hoc programs or ETL scripts to move data out of databases, data warehouses or data lakes and integrate different ETL platforms to deal with this problem semi-automatically.

Complex data analytics in retail

Databloom's Blossom Sky is a data management platform that breaks up complex data analytics into multiple pieces and selects the right processing platform to execute each of them. It can also take care of any data movement and transformation required to perform these pieces of computation on the right processing platform. Importantly, Blossom Sky complements the capabilities of data processing platforms with each other to enable them to perform complex analytics. For example, it can complement a DBMS with the ML capabilities of Apache Spark to perform a clustering task over the DBMS data. All this in an invisible manner for the data and BI teams.

Enable data driven sales with Blossom Sky

One of the world's largest retailers has increased sales by more than 22% per year using data-driven AI and ML models. To build these models, the company's data team designs, builds, and constantly maintains large datasets that predict products customers might be interested in based on their search history and purchase history combined with data acquired in local stores. For example, the team can consider all of an order's history—from its beginning when it was placed until there was a return or exchange—as well as customer relations from the company's CRM system, returns from its sales system, quality measurements from stores' quality management system and supplier data from its financial obligations system.

Different data pools are needed to build a model that identifies patterns by considering the purchase of all customers in the past as well as the profile of the customers (e.g., country/city of residence), combined with forecasting and warehouse delivery predictions to enable a nearly perfect customer experience. The data team updates the model as new purchases, products, customers come every day, even every few hours depending on the customer’s base size. Data analysts can perform further analytics on the database itself as well as on a machine learning platform to understand the current trends in product purchase for marketing campaigns as well as for supply chain logistics. Achieving this leads the data analyst to run analytics over databases (e.g., data warehouses) and to move data out of those databases to a machine learning platform (e.g., PyTorch) to learn predictive models.

The picture below illustrates how Blossom Sky enables retailers to acquire more customers, operate their stores efficiently, and increase the customer experience with the brand.

Federated AI in action: textural sentiment analysis
Generative AI with Blossom Sky for Customer Centric Sales Prediction

An AI-Powered Virtual Data Lakehouse in action: Textural Sentiment Analysis

Sentiment analysis, also known as opinion mining, is a natural language processing task that aims to determine the sentiment or emotional tone of a given text. It is a popular application in today's digital world, where people express their opinions and emotions on a wide range of platforms, including social media, online reviews, and forums.

Textural sentiment analysis is a specific type of sentiment analysis that focuses on understanding the underlying emotions and opinions expressed in a given text. This can include identifying positive, negative, or neutral sentiment, as well as more specific emotions such as joy, anger, or sadness. Let's look at how this use case applies to the retail industry.

A retail company can use sentiment analysis to gauge how satisfied its users and customers are with the products and services it provides. To build such a model, the model first categorizes data points based on what insights the data team wants to gain. For example, time is relevant when discussing user activity patterns; frequency is relevant when talking about product views; rapidity of sale is relevant when discussing sales velocity. The model must then extract user and comment data from the databases and data stores, data warehouses or data lakes and create a sentiment analysis model for each user group in order to classify each comment as positive or negative. Finally, creating a sentiment analysis model for each user group in order to classify each comment as positive or negative. Such sentiment analysis enables organizations and agencies to assess what customers think about your products, services and sales processes. and Blossom for Retailers
Sentiment Analysis with Blossom Sky's Virtual Data Lakehouse

Textual sentiment analysis can be used in the financial industry to analyze large amounts of text data to uncover evidence of sentiment or affect within the text. This can help researchers understand how sentiment impacts individual decision-makers, institutions, and markets. Textual sentiment analysis can be used to forecast financial trends, analyze risk, and automate tasks. For example, machine learning algorithms can be used to accurately predict whether a person or company will be a risky investment based on their credit score and financial transactions.


Textural sentiment analysis is a powerful tool for understanding the emotions and opinions expressed in text. With the advancement of natural language processing, machine learning and deep learning techniques, it has become more accurate and efficient in understanding the context of the text and providing meaningful insights from it. However, there is still room for improvement in this field and researchers are working to make it more accurate and efficient.

Thanks to its design, optimizer, and executor, Blossom Sky can provide a real federated data framework from the beginning and shorten the path to a working NLP AI:

  • Heterogenous Data Sources: Blossom Sky process data from (or over) multiple data sources in a seamlessly manner
  • Multi-Platform and Hybrid Cloud Execution: Blossom Sky automatically deploys each sub-part of a pipeline to the most relevant cloud provider and processing platform in a seamless manner to reduce costs and improve performance
  • Federated Machine Learning and AI: Blossom Sky provides a layer to automate the complex process of data processing integration. This allows organizations to train machine learning models on data from multiple sources while keeping the data secure and private
  • Ease of use: Blossom Sky hides the complexity of data processing systems. Developers write their applications on top of Blossom Sky and let it take care of executing the applications over multiple platforms without changing the code.

Data mesh and data platform abstraction are not silver bullets or one-size-fits-all solutions. They require careful planning, design, implementation, and governance. They also require a cultural shift from centralized to decentralized data ownership and collaboration. DataBloom's Virtual Data Lakehouse offers a promising vision for how organizations can harness the power of data to deliver better value for their providers, partners, and stakeholders. Be sure you undergo a brief consultation with your DataBloom AI representative to address the challenges of implementing Blossom Sky into your data strategies.

Research reference:
Prescriptive Learning for Air-Cargo Revenue Management (under participation of Walmart Global Tech)

About Databloom

DataBloom AI is a distributed data access and analytics startup who provides "Blossom Sky," an AI-powered Virtual Data Lakehouse that allows machine learning, AI models, and data analytics to operate at the data source rather than a central data lake, consequently avoiding difficult data management processes.
Blossom Sky stands for federated data lake technology, data collaboration, increased efficiency, and helping to create new insights by breaking data silos in a unified manner through a single system view. The platform is designed to adapt to a wide variety of AI algorithms and models. Blossom Sky integrates with all major data processing and streaming frameworks like Databricks, Snowflake, Cloudera, Hadoop, Teradata, Oracle, Apache Flink as well as AI systems like Tensorflow, Pandas, PyTorch.

Want to learn more? Please get in touch with us via or write us directly: [email protected]
Ready to join the AI Powered Data Revolution? Get a quote today!