Blossom Sky: The Virtual Data Lakehouse

Blossom Sky: The Virtual Data Lakehouse
February 25, 2023
Alexander Alten

An overview of DataBloom AI

DataBloom AI is a company that provides a federated data lakehouse and data analytics solutions and was founded by the original creators of Apache Wayang, the framework for cross-platform data processing. We aim to help data teams solve their problems and empower businesses with truly decentralized data infrastructures and homogenized data. We are also one of the pioneers of data mesh, a concept that advocates for distributed domain-oriented data ownership and governance. We are also at the forefront of data mesh, a decentralised domain-centered data ownership and management approach, and pioneered federated data processing (FDP) and unified data lakes (ULC) with our technology in several high-profile research papers.

Let me give a short overview of Blossom Sky, it's history and why we believe Federated Dat Lakes are the next frontier of AI.

Apache Wayang

Apache Wayang is a framework for cross-platform data processing. Apache Wayang allows users to write data applications that can run on different platforms without changing the code³. Rheem was the original name of the project when it started in 2015 as a cross-platform data processing system that decouples applications from the underlying platforms. Rheem was renamed into Apache Wayang in 2020 when it joined the Apache Software Foundation as an incubating project¹²⁴.

Apache Wayang differs from other data processing frameworks in that it is a cross-platform system that can run applications on different federated data processing platforms without changing the code¹. It also provides a three-layer data processing abstraction that sits between user applications and data processing platforms such as Hadoop and Spark³. Apache Wayang uses data quanta as the smallest processing units from the input datasets and allows users to specify their quality requirements for each task¹.

(1) RHEEM: Enabling Cross-Platform Data Processing - The Apache Software Foundation:
(2) Using Groovy with Apache Wayang and Apache Spark:
(3) GitHub - rheem-ecosystem/rheem: Rheem - a cross-platform data processing framework:
(4) Apache Wayang - Home:
(5) Rheem: Enabling Multi-Platform Task Execution - The Apache Software Foundation:

Blossom Sky, the Virtual Data Lakehouse

Blossom Sky is our answer to the increasing world of data velocity, processing platforms and data lakes. DataBloom AI is the pioneer in federated data lakes, data platform integration and data analytics. A virtual data lakehouse is a cutting-edge technology that allows you to train machine learning models on distributed data sources without moving or exposing the data to central data lakes or data warehouses. This way, users can easily deal with data privacy and security while gaining valuable insights.

Blossom Sky is a hybrid cloud platform that makes federated data access easy and accessible. You can use Blossom Sky to create, manage, and monitor federated data lakes and data silos across different domains and applications. Blossom Sky uses Apache Wayang in its core, enhancing it with various features such as data preprocessing, model selection, optimization, evaluation, and deployment.

It is important to note that Blossom Sky is more than just another machine learning platform. It is a framework and platform that enables users to solve real-world problems that require data collaboration over data legislations and data regulations, as example between California and France (CCPA and GDPR). Some of the use cases of Blossom Sky are:

  • Healthcare: train models on medical data from different hospitals and clinics without violating patient privacy or regulations.
  • Finance: detect fraud and money laundering by analyzing transactions from different banks without revealing sensitive information.
  • Education: personalize learning experiences by using data from different schools and universities without exposing student identities or grades.

Our goal is to democratize data and make it available to everyone at the easiest possible way. Whether you are a data scientist, a developer, or a business user, our goal is that you can use Blossom Sky to create data solutions that benefit your organization and business objectives. Are you looking for a way to harness the power of data without compromising privacy and security? Do you want to collaborate with other parties on machine learning projects without sharing your data? If yes, then you need Blossom Sky.

About DataBloom AI

DataBloom AI is a distributed data access and analytics startup who provides "Blossom Sky," an AI-powered Virtual Data Lakehouse that allows machine learning, AI models, and data analytics to operate at the data source rather than a central data lake, consequently avoiding difficult data management processes.
Blossom Sky stands for federated data lake technology, data collaboration, increased efficiency, and helping to create new insights by breaking data silos in a unified manner through a single system view. The platform is designed to adapt to a wide variety of AI algorithms and models. Blossom Sky integrates with all major data processing and streaming frameworks like Databricks, Snowflake, Cloudera, Hadoop, Teradata, Oracle, Apache Flink as well as AI systems like Tensorflow, Pandas, PyTorch.

Want to learn more? Please get in touch with us via or write us directly: [email protected]
back to all articlesFollow us on Google News
Ready to join the AI Powered Data Revolution? Get a quote today!