Core Capabilities
Advanced data partitioning technology designed for modern cloud infrastructure
Cloud-Native
Specifically targets cold raw data in object storage (e.g. Amazon S3). Exploits S3 byte-range reads for parallel high-bandwidth access.
Read-Only Processing
Pre-processes data in read-only fashion. Indexes and metadata stored decoupled from raw objects, keeping cold data as-is.
Zero-Cost Partitioning
Lazy-evaluated partitions with zero-cost re-partitioning. Serializable for distributed computing environments like PySpark, Dask, or Ray.
Meet Data Cockpit
Official
An interactive IPython widget built on top of the DataPlug framework. Upload, browse, benchmark, and partition your scientific data with a beautiful interface.
Interactive Jupyter Widget
Built on top of DataPlug's cloud-aware partitioning, Data Cockpit provides an end-to-end Jupyter UI for seamless data processing.
pip install cloud_data_cockpit
What Data Cockpit Adds
Upload & Browse
Upload local files directly into any S3 bucket and browse existing datasets from AWS Open Data Registry
Explore Collections
Explore curated public and Metaspace collections for scientific data discovery
Performance Benchmarking
Run benchmarks across configurable batch sizes to discover optimal throughput
One-Click Partitioning
Partition a variety of scientific data types into chunks or batches with one click
Jupyter Integration
Integrate seamlessly into Jupyter notebooks for elastic, parallel workloads
PyRun Cloud Platform
Platform
Effortless Cloud Computing for Python. Experience true Serverless Python. Run scalable workloads for data processing, AI, and distributed computing without managing complex cloud infrastructure. Data Cockpit automatically obtains credentials for DataPlug to access your data, so you can focus purely on processing data without any configuration overhead.
Serverless Python Execution
Focus on your code, not the setup. PyRun provides an integrated environment with automated scaling and powerful framework support.
Why Choose PyRun?
Effortless Execution
Write standard Python and run it seamlessly in the cloud. PyRun automatically handles server management, scaling, and resource optimization.
Integrated & Automated
VS Code-like web interface with automatic credential management. Data Cockpit handles all AWS/S3 configurations, so you only focus on data processing.
Scalable & Versatile
Built-in, first-class support for powerful frameworks like Lithops (FaaS) and Dask. Scale from simple scripts to massively parallel computations.
Real-Time Monitoring
Gain instant insights into job performance with detailed metrics for CPU, memory, disk, network usage, and task execution timelines.
Seamless Integration with DataPlug & Data Cockpit
DataPlug Integration
Seamless integration with DataPlug for efficient data partitioning and processingData Cockpit Interface
Built-in Data Cockpit widget with automatic credential management for seamless data accessCloud-Native Execution
Execute DataPlug workflows directly in the cloud with automatic scalingReal-Time Monitoring
Monitor DataPlug and Data Cockpit operations with detailed performance metrics
Complete Workflow
- Write Python code with DataPlug and Data Cockpit
- Deploy to PyRun cloud platform
- Execute with automatic scaling
- Monitor performance in real-time
- Scale from simple scripts to massive computations
Your Workflow with Data Cockpit
Upload
Upload your local files directly into any S3 bucketBrowse
Browse existing buckets or public datasets from AWS Open Data RegistryBenchmark
Run benchmarks across configurable batch sizes to find optimal throughputProcess & Partition
Process & partition your data with one click, displaying progress entirely in-notebookRetrieve Slices
Retrieve partitions via get_data_slices() for downstream processing
Why Data Cockpit?
- Built on DataPlug's Cloud-Aware Partitioning
- Pre-processes data in read-only fashion
- Exploits S3 byte-range reads for parallel access
- Supports multiple scientific domains
- Allows re-partitioning with different strategies
- Zero-cost data movement
Supported Domains
Genomics
DNA/RNA sequencing data processing
FASTA
FASTQ
VCF
Geospatial
Spatial data and point clouds
LiDAR
Cloud-Optimized Point Cloud
COG
Metabolomics
Imaging mass spectrometry data
ImzML
Generic
Standard data formats
CSV
Raw Text
Astronomics
Astronomical measurement data
MeasurementSet
Format Examples
Explore real examples for each supported format. Each example includes working code and sample data.
Genomics
3 formats availableGeospatial
3 formats availableMetabolomics
1 format availableGeneric
2 formats availableAstronomics
1 format availableHow It Works
Pre-processing
Build lightweight indexes decoupled from raw objects
Data Slicing
Create lazy-evaluated partitions with metadata
Parallel Access
Multiple workers perform HTTP GET Byte-range requests
Evaluation
Data accessed only when needed, not before
Compatible Frameworks
🔥
PySpark⚡
Dask🚀
Ray🐍
Any PythonData Formats
10+
Zero Cost
100%
Parallel Access
∞
Ready to Get Started?
Join the community of scientists and engineers using DataPlug, Data Cockpit, and PyRun for efficient data partitioning.
DataPlug provides the core engine for efficient data partitioning, Data Cockpit offers the user-friendly interface, and PyRun delivers the cloud platform for seamless execution.