Overview
lakeFS transforms object storage buckets into data lake repositories that expose a Git-like interface. By design, it works with data of any size.
The Git-like interface means users of lakeFS can use the same development workflows for code and data. Git workflows greatly improved software development practices; we designed lakeFS to bring the same benefits to data.
In this way, lakeFS brings a unique combination of performance and manageability to data lakes
The move to data lakes, with their infinite scale and low costs, also introduced a new challenge in maintaining and ensuring data resilience and reliability within the data lake as time goes by. Naturally, the quality of the data we introduce determines the overall reliability of our data lake. Despite the scalability and performance advantages of running a data lake on top of object stores, enforcing best practices, ensuring high data quality and recovering quickly from errors remains extremely challenging. Specifically, the data ingestion stage is critical for ensuring the soundness of our service and data.
What are the lakeFS use cases? When considering it, data engineers should continuously test newly ingested data while ensuring they meet data quality requirements, much like software engineers applying automatic new code testing. So that when a mistake happened and 'bad data' was ingested into the lake, they can have a feasible way to reproduce the ingestion error at the time of failure, and roll back to the previous high quality snapshot of their data. Sounds right, doesn't it? Through its versioning engine, lakeFS enables the following built-in operations familiar from Git, to enable these best practices that are coming from the world of code into the world of data engineering:
- branch: a consistent copy of a repository, isolated from other branches and their changes. Initial creation of a branch is a metadata operation that does not duplicate objects.
- commit: an immutable checkpoint containing a complete snapshot of a repository.
- merge: performed between two branches - merges atomically update one branch with the changes from another.
- revert: return a repo to the exact state of a previous commit.
- tag: a pointer to a single immutable commit with a readable, meaningful name. Incorporating these operations into your data lake pipelines provides the same collaboration and organizational benefits you get when managing application code with source control.
What are the benefits of using lakeFS with data lakes? When using lakeFS on your object store, you improve the entire process of data management within your organization and enjoy the following benefits:
-
Data teams efficiency - lakeFS enables automation of many of the repetitive manual labor-heavy tasks that data engineers deal with on a daily basis. lakeFS eliminates manual tasks such as manual rollback of production data (have you ever tried to restore data that was accidentally deleted by some retention algorithm?), or trying to debug issues in production without a solid version of the data at the time of failure. When your data engineers are free from these tasks, they can focus on what they really know and love to do: develop more and more rich & efficient data sources and algorithms for your organization.
-
High quality data products - lakeFS enables validating the data coming into the data lake before it is exposed to external users. Being able to prevent inconsistencies and errors before they happen is one of the strongest capabilities of lakeFS. It enables organizations to gain more trust in their ever-growing and ever more complex data estates, and this is a great value for many organizations that rely on their data.
-
Data resilience - At lakeFS, we believe that data resilience means that even when mistakes and inconsistencies happen, we can quickly recover from them. One of the core capabilities of lakeFS is the ability to rollback the entire data lake to its previous consistent state. This is a valuable feature which enables organizations to eliminate data downtimes. In addition, keeping versions of the data and being able to time travel between them enables data resilience, as data engineers can automatically check the data as it was at the time of failure and reduce dramatically the time they invest in investigating and fixing bugs, errors and inconsistencies.
For custom pricing, EULA, or a private contract, please contact support@treeverse.io , for a private offer.
Highlights
- Data teams efficiency - eliminates repetitive manual tasks such as manual rollback of production data or data reproducibility. Save data engineers time by automation.
- High quality data product - validate the data coming into or analyzed within the lake before it is exposed to external users, taking advantage of a CI/CD pipeline for your data, preventing inconsistencies and error.
- Data resilience - quickly recover from mistakes / inconsistencies by rolling back the entire data lake to its previous consistent state.
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/12 months |
---|---|---|
lakeFS Managed Service | Git-like version control for a data lake, in a fully managed service. | $40,000.00 |
The following dimensions are not included in the contract terms, which will be charged based on your usage.
Dimension | Cost/unit |
---|---|
Additional cost per API call to lakeFS | $0.002 |
Vendor refund policy
We do not currently support refunds.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Software as a Service (SaaS)
SaaS delivers cloud-based software applications directly to customers over the internet. You can access these applications through a subscription model. You will pay recurring monthly usage fees through your AWS bill, while AWS handles deployment and infrastructure management, ensuring scalability, reliability, and seamless integration with other AWS services.
Resources
Vendor resources
Support
Vendor support
Reach support through email or within lakeFS cloud chat on https://lakefs.cloud Email support, contact us through our website, or report an issue on the built-in chat in the product.
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

