Blossom Studio, an integral feature available in every license without additional costs, is powered by Blossom Core and provides a dynamic cloud-native interface that streamlines the creation, execution, and monitoring of data transformation tasks, ML and AI pipelines and data modification tasks. With the strength of Apache Wayang (incubating) at its core, Blossom Studio efficiently manages these processes across diverse platforms. The result? Comprehensive insights into schema and data outcomes at every workflow step.
A notable advantage of Blossom Core is its intrinsic support for extended table formats like Iceberg, parquet, json, csv, and similar, ensuring seamless data integration and transformation for a broader spectrum of datasets. Blossom Studio isn't just designed for structured data; it also excels with semi-structured data, which many conventional interfaces find challenging.
Blossom Studio is cloud-native add-on to Blossom-core for developing data processing (ETL) pipelines in a low-code way.
Features of Blossom Studio
When you initiate a task in Blossom Studio, there's an array of data sources to choose from, including PostgresSQL, local file systems, and distributed systems like HDFS. This ensures rapid data preparation for further analysis in diverse data landscapes. Beyond this, Blossom Studio provides tools to oversee ETL workflows, ensuring they function impeccably. The provision to preview data sets at each phase significantly aids in ETL task troubleshooting.
With Blossom Studio's intuitive interface, users can:
- Extract data from sources like PostgresSQL or distributed filesystems like HDFS.
- Set up diverse data transformations, including mapping, filtering, grouping, and joining.
- Choose the execution platform, be it Java 8 Streams or Apache Spark.
- Inspect dataset schematics or samples at every task juncture.
- Effortlessly initiate, oversee, and manage tasks integrated into Blossom Studio.
- Share pipelines and processors with other users
Rooted in Blossom Core, Blossom Studio excels in curating and managing tasks that gather, refine, and unify data from multiple data sources without moving them to a central place. And for those with intricate requirements, Blossom Studio serves as a potent tool to diagnose and tailor job scripts. Blossom Studio includes a graphical user interface that allows you to connect and query different data sources or join data across multiple data sources in a very intuitive way. It also supports complex data transformations that can be processed on platforms of the users choice.
The platform’s visual job editor presents users with a plethora of features:
- The ability to incorporate multiple data sources and targets.
- Preview data at each workflow node.
- Implement various data transformations, from simple mappings to complex joins.
- Switch data processing frameworks instantly, enables rapid testing and ultra-fast deployment
- Data platform independence - switch seamlessly from any supported platform to another (ex. Spark -> Flink)
Further, the script editor in Blossom Studio is adept for crafting or amending the ETL code for your tasks. After laying down the initial design, you can fine-tune the generated script to align with the specificities of your task. Blossom Studio’s performance dashboard offers an exhaustive view into your ETL tasks. This dashboard furnishes pivotal insights about job runs over selected timeframes, ensuring you're always informed.
Support for Dataset Partitioning
With Blossom Studio, you're empowered to handle partitioned datasets with finesse. Efficiently process, filter, and transform partitioned data, ensuring optimal utilization without unnecessary data listings or loadings.
Why Choose Blossom Studio?
Blossom Studio, integrated with Blossom Core, offers a streamlined avenue for crafting ETL workflows. With its capabilities and the muscle of Apache Wayang, it becomes an essential tool for ETL developers aiming for reliable processes to manage expansive, semi-structured datasets and deposit them into structured data environments. The culmination of user-centric design, coupled with the versatility of Blossom Core's advanced processing engine, makes Blossom Studio an indispensable tool in modern data management.
With Blossom Studio, not only do you get a simplified job management experience, but also a comprehensive view of your tasks and their interrelations. The platform's consolidated interface presents a continually refreshed perspective on ETL operations and resource allocations. This makes it an invaluable asset for anyone looking to optimize their data processing workflows.
The cool part? Blossom Sky works hand-in-hand with top data frameworks like Databricks, Snowflake, Cloudera, and others, including Hadoop, Teradata, and Oracle. Plus, it's fully compatible with AI favorites like TensorFlow, Pandas, and PyTorch. We've made sure it fits right into your existing setup.
Blossom Sky is the commercial version of Apache Wayang, and we're proud to offer it as Open Source Software. You can check out our public GitHub repo right here. If you're enjoying our software, we'd love your support - a star ⭐ would mean a lot to us!
If you need professional support from our team of industry leading experts, you can always reach out to us via Slack or Email.