Blossom Sky: The Virtual Data Lakehouse

Blossom Sky: The Virtual Data Lakehouse
February 25, 2023
Alexander Alten

An overview of DataBloom AI

DataBloom AI is a company that provides a federated data lakehouse and data analytics solutions and was founded by the original creators of Apache Wayang, the framework for cross-platform data processing. We aim to help data teams solve their problems and empower businesses with truly decentralized data infrastructures and homogenized data. We are also one of the pioneers of data mesh, a concept that advocates for distributed domain-oriented data ownership and governance. We are also at the forefront of data mesh, a decentralised domain-centered data ownership and management approach, and pioneered federated data processing (FDP) and unified data lakes (ULC) with our technology in several high-profile research papers.

Let me give a short overview of Blossom Sky, it's history and why we believe Federated Dat Lakes are the next frontier of AI.

Apache Wayang

Apache Wayang is a framework for cross-platform data processing. Apache Wayang allows users to write data applications that can run on different platforms without changing the code³. Rheem was the original name of the project when it started in 2015 as a cross-platform data processing system that decouples applications from the underlying platforms. Rheem was renamed into Apache Wayang in 2020 when it joined the Apache Software Foundation as an incubating project¹²⁴.

Apache Wayang differs from other data processing frameworks in that it is a cross-platform system that can run applications on different federated data processing platforms without changing the code¹. It also provides a three-layer data processing abstraction that sits between user applications and data processing platforms such as Hadoop and Spark³. Apache Wayang uses data quanta as the smallest processing units from the input datasets and allows users to specify their quality requirements for each task¹.

(1) RHEEM: Enabling Cross-Platform Data Processing - The Apache Software Foundation:
(2) Using Groovy with Apache Wayang and Apache Spark:
(3) GitHub - rheem-ecosystem/rheem: Rheem - a cross-platform data processing framework:
(4) Apache Wayang - Home:
(5) Rheem: Enabling Multi-Platform Task Execution - The Apache Software Foundation:

Blossom Sky, the Virtual Data Lakehouse

Blossom Sky is our answer to the increasing world of data velocity, processing platforms and data lakes. DataBloom AI is the pioneer in federated data lakes, data platform integration and data analytics. A virtual data lakehouse is a cutting-edge technology that allows you to train machine learning models on distributed data sources without moving or exposing the data to central data lakes or data warehouses. This way, users can easily deal with data privacy and security while gaining valuable insights.

Blossom Sky is a hybrid cloud platform that makes federated data access easy and accessible. You can use Blossom Sky to create, manage, and monitor federated data lakes and data silos across different domains and applications. Blossom Sky uses Apache Wayang in its core, enhancing it with various features such as data preprocessing, model selection, optimization, evaluation, and deployment.

It is important to note that Blossom Sky is more than just another machine learning platform. It is a framework and platform that enables users to solve real-world problems that require data collaboration over data legislations and data regulations, as example between California and France (CCPA and GDPR). Some of the use cases of Blossom Sky are:

  • Healthcare: train models on medical data from different hospitals and clinics without violating patient privacy or regulations.
  • Finance: detect fraud and money laundering by analyzing transactions from different banks without revealing sensitive information.
  • Education: personalize learning experiences by using data from different schools and universities without exposing student identities or grades.

Our goal is to democratize data and make it available to everyone at the easiest possible way. Whether you are a data scientist, a developer, or a business user, our goal is that you can use Blossom Sky to create data solutions that benefit your organization and business objectives. Are you looking for a way to harness the power of data without compromising privacy and security? Do you want to collaborate with other parties on machine learning projects without sharing your data? If yes, then you need Blossom Sky.

About DataBloom

Blossom Sky is all about taking data collaboration and efficiency to the next level. Our platform tackles the big challenge of data silos, bringing everything together in one easy-to-use system. It's built to work smoothly with a whole range of AI algorithms and models.

The cool part? Blossom Sky works hand-in-hand with top data frameworks like Databricks, Snowflake, Cloudera, and others, including Hadoop, Teradata, and Oracle. Plus, it's fully compatible with AI favorites like TensorFlow, Pandas, and PyTorch. We've made sure it fits right into your existing setup.

Blossom Sky is the commercial version of Apache Wayang, and we're proud to offer it as Open Source Software. You can check out our public GitHub repo right here. If you're enjoying our software, we'd love your support - a star ⭐ would mean a lot to us!

If you need professional support from our team of industry leading experts, you can always reach out to us via Slack or Email.
back to all articlesFollow us on Google News
Ready to Elevate Your Data Experience? Get a quote today!