Reduce data costs up to 35% with Blossom Sky
March 29, 2023
Vatsal Shah

The Blossom Sky’s AI-powered Virtual Data Lakehouse lets you run analytics, machine learning (ML), and AI on various federated data sources and formats without moving or copying data to central data stores or data lakes. Most of our clients use the Blossom Sky platform to unify their current data platforms and processing engines. A Virtual Data Lakehouse enables decentralized processing and federated data access across a variety of data sources and formats. It allows users to run analytics and AI tasks directly where the data lives without the need to move or copy it.

DataBloom’s Blossom Sky can help companies recover and redeploy as much as 35 percent of their current data spend by applying greater data management to data-architecture, -sourcing, and -use practices. Many of the recommended improvements can be applied quickly by using Blossom Sky, with businesses capturing double-digit savings within six months. Companies can unlock significant savings by optimizing their current data stack with Blossom Sky.

CapEx cost reduction by unifying existing data platforms

We analyze Blossom compared to the state-of-the-art Apache Spark instances for big data analytics. We have reported in several research articles that, on average, Apache Spark is the fastest big data system. We thus compared the time and cost benefits of Blossom Sky against Apache Spark. The time savings reported here come from our research articles. For better representation of users' workloads, we consider three main workloads: 

  • text analytics (e.g., word frequency, word synonyms, inverted index creation) 
  • data analytics (e.g., aggregate queries and join queries)
  • machine learning (SGD, K-Means, and cross-community pagerank)

For this comparison, we considered a single AWS cloud instance of two popular types: m4 ($2.42 / h) and T3 ($8.786 / h). We assume that the user keeps running the instance 8h / day for performing its data analytics. The table below illustrates Blossom’s benefits in terms of time and monetary cost savings. Remarkably, we observe that using Blossom Sky always translates to time and cost savings: it allows users to save over $200,000 per year in the above-mentioned setting. 


Time Savings

Cost Savings (USD)

Text Analytics Workload 5x  
Yearly Savings – m4 instance (8hrs/day)   $27,878.4
Yearly Savings – t3 instance (8hrs/day)   $101,214.72
Data Analytics Workload 2x  
Yearly Savings – m4 instance (8hrs/day)   $6,969.6
Yearly Savings – t3 instance (8hrs/day)   $25,303.68 
Machine Learning (AI) Workload 10x  
Yearly Savings – m4 instance (8hrs/day)   $62,726.4
Yearly Savings – t3 instance (8hrs/day)   $227,733.12

Reduce OpEx, cut CapEx costs by reusing your current data stack

Based on our experience, organizations may free their employees by one-third by ignoring cost savings for employing IT staff and can immediately redeploy their current workforce. For instance, to keep a Spark cluster of 25 nodes in AWS and run around 5 consulting AI projects, the typical team size overall is 14 team members:

  • Backend developer (5) 
  • System specialists (2)
  • Data scientists / Data analyst (4)
  • Project managers (3) 

Implementing Blossom Sky reduces the average team size to 7 staff members:

  • Backend developer (2) 
  • System specialists (1)
  • Data scientists / Data analyst (2)
  • Project managers (2) 

This also leads to significant cost reductions for the entire firm. Due to the prolonged use of previously existing data processing platforms, such as Hadoop or Spark, and his commercial versions, our clients often save 35 - 40% OpEx expenses and on average more than 50% CapEx costs when using Blossom Sky. Please keep in mind that the OpEx savings may be promptly redeployed to drive more projects at the same time.

About DataBloom

Blossom Sky is all about taking data collaboration and efficiency to the next level. Our platform tackles the big challenge of data silos, bringing everything together in one easy-to-use system. It's built to work smoothly with a whole range of AI algorithms and models.

The cool part? Blossom Sky works hand-in-hand with top data frameworks like Databricks, Snowflake, Cloudera, and others, including Hadoop, Teradata, and Oracle. Plus, it's fully compatible with AI favorites like TensorFlow, Pandas, and PyTorch. We've made sure it fits right into your existing setup.

Blossom Sky is the commercial version of Apache Wayang, and we're proud to offer it as Open Source Software. You can check out our public GitHub repo right here. If you're enjoying our software, we'd love your support - a star ⭐ would mean a lot to us!

If you need professional support from our team of industry leading experts, you can always reach out to us via Slack or Email.
