[英] BlazingSQL 一个基于 RAPIDS 的泡在 GPU 上的 SQL 引擎

1,652 阅读4分钟
原文链接: blog.blazingdb.com

BlazingSQL, the GPU-accelerated SQL engine of the RAPIDS ecosystem, is now 100% open-source licensed under Apache 2.0!BlazingSQL, the GPU-accelerated SQL engine of the RAPIDS ecosystem, is now 100% open-source licensed under Apache 2.0!

Check out the code on our Github page..

BlazingSQL is not a database, which is why we changed our original name of BlazingDB to BlazingSQL. It is a SQL engine that processes (almost) any data you want.BlazingSQL is not a database, which is why we changed our original name of BlazingDB to BlazingSQL. It is a SQL engine that processes (almost) any data you want.

Working within RAPIDS has been game-changing. There are now over 100 developers contributing to our community. Most of these developers come from enterprise and their contributions add valuable features to BlazingSQL, like support for more file formats. has been game-changing. There are now over 100 developers contributing to our community. Most of these developers come from enterprise and their contributions add valuable features to BlazingSQL, like support for more file formats.

As RAPIDS adoption continues to explode, open-sourcing BlazingSQL accelerates our development cycle, gets our product in the hands of more users, and aligns our licensing and messaging with the greater RAPIDS.ai ecosystem.As RAPIDS adoption continues to explode, open-sourcing BlazingSQL accelerates our development cycle, gets our product in the hands of more users, and aligns our licensing and messaging with the greater RAPIDS.ai ecosystem.

“NVIDIA and the RAPIDS ecosystem are delighted that BlazingSQL is open-sourcing their SQL engine built on RAPIDS,” said Josh Patterson, GM of data science at NVIDIA. “By leveraging Apache Arrow on GPUs and integrating with Dask, BlazingSQL will extend open-source functionality, and drive the next wave of interoperability in the accelerated data science ecosystem.”“NVIDIA and the RAPIDS ecosystem are delighted that BlazingSQL is open-sourcing their SQL engine built on RAPIDS,” said Josh Patterson, GM of data science at NVIDIA. “By leveraging Apache Arrow on GPUs and integrating with Dask, BlazingSQL will extend open-source functionality, and drive the next wave of interoperability in the accelerated data science ecosystem.”

We went all-in on RAPIDS before it had a name. Now, open-sourcing is the culmination of a strategy by NVIDIA and BlazingSQL.We went all-in on RAPIDS before it had a name. Now, open-sourcing is the culmination of a strategy by NVIDIA and BlazingSQL.

NVIDIA stepped up to ensure RAPIDS would solve customer problems at scale. BlazingSQL, in addition to contributing heavily to the RAPIDS ecosystem, will focus on the services and support agreements necessary to make RAPIDS + BlazingSQL deployments successful and accessible to all.NVIDIA stepped up to ensure RAPIDS would solve customer problems at scale. BlazingSQL, in addition to contributing heavily to the RAPIDS ecosystem, will focus on the services and support agreements necessary to make RAPIDS + BlazingSQL deployments successful and accessible to all.

Customer ChallengesCustomer Challenges

When we talk about challenges our customers are facing around their analytics pipelines we hear the same complaints over and over; processing data at scale is expensive, slow, and incredibly complex.When we talk about challenges our customers are facing around their analytics pipelines we hear the same complaints over and over; processing data at scale is expensive, slow, and incredibly complex.

  • Expensive — Customers cluster thousands of servers together for data science at scale. BlazingSQL + RAPIDS requires a small fraction of the infrastructure to run at an equivalent scale.— Customers cluster thousands of servers together for data science at scale. BlazingSQL + RAPIDS requires a small fraction of the infrastructure to run at an equivalent scale.
  • Slow — Workloads and queries can take hours or days on large data sets. BlazingSQL + RAPIDS provides GPU-accelerated results in seconds, allowing data scientists to quickly iterate over new models.— Workloads and queries can take hours or days on large data sets. BlazingSQL + RAPIDS provides GPU-accelerated results in seconds, allowing data scientists to quickly iterate over new models.
  • Complex — Workloads are prototyped at small scale and then rebuilt for distributed systems. BlazingSQL + RAPIDS enables users to write code once and dynamically change the scale of distribution with a single line of code.— Workloads are prototyped at small scale and then rebuilt for distributed systems. BlazingSQL + RAPIDS enables users to write code once and dynamically change the scale of distribution with a single line of code.

BlazingSQL addresses these customer concerns not only with an incredibly fast, distributed GPU SQL engine, but also a zealous focus on simplicity.BlazingSQL addresses these customer concerns not only with an incredibly fast, distributed GPU SQL engine, but also a zealous focus on simplicity.

With a few lines of code, BlazingSQL can query your raw data, wherever it resides and interoperate with your existing analytics stack and RAPIDS.With a few lines of code, BlazingSQL can query your raw data, wherever it resides and interoperate with your existing analytics stack and RAPIDS.

The Future of AnalyticsThe Future of Analytics

RAPIDS is the next-generation analytics ecosystem. SQL forms a fundamental pillar of every major analytics ecosystem to date, and BlazingSQL is the SQL standard for RAPIDS.RAPIDS is the next-generation analytics ecosystem. SQL forms a fundamental pillar of every major analytics ecosystem to date, and BlazingSQL is the SQL standard for RAPIDS.

For this reason, we are fully integrated with the greater RAPIDS team and contribute heavily to cuDF. BlazingSQL is built entirely on top of cuDF and cuIO. New features pushed to these projects directly impact BlazingSQL features and performance, and because BlazingSQL runs on GDFs it is 100% interoperable with all of RAPIDS.. New features pushed to these projects directly impact BlazingSQL features and performance, and because BlazingSQL runs on GDFs it is 100% interoperable with all of RAPIDS.

Something we wish to make very clear, if you are a user of RAPIDS, or are considering RAPIDS (which you honestly should), you need to check out BlazingSQL and add it to your stack. BlazingSQL offers RAPIDS users countless benefits, not limited to:Something we wish to make very clear, if you are a user of RAPIDS, or are considering RAPIDS (which you honestly should), you need to check out BlazingSQL and add it to your stack. BlazingSQL offers RAPIDS users countless benefits, not limited to:

  • Reducing code complexity — SQL is easy and can replace dozens to hundreds of cuDF function calls with a single statement. — SQL is easy and can replace dozens to hundreds of cuDF function calls with a single statement.
  • Connect to data lakes — never synch another database, BlazingSQL can query raw files in your cloud/networked filesystem. — never synch another database, BlazingSQL can query raw files in your cloud/networked filesystem.
  • Make RAPIDS faster — advanced SQL optimizers help the RAPIDS stack run smarter, not just harder. — advanced SQL optimizers help the RAPIDS stack run smarter, not just harder.

“Open-sourcing redefines what’s possible, and now partners, like NVIDIA, are contributing code to the BlazingSQL codebase to provide customers with holistic data science solutions.” — Felipe Aramburu CTO“Open-sourcing redefines what’s possible, and now partners, like NVIDIA, are contributing code to the BlazingSQL codebase to provide customers with holistic data science solutions.” — Felipe Aramburu CTO

Time to Roll Up Your SleevesTime to Roll Up Your Sleeves

So if it isn’t abundantly clear, this is an open-source project. The only thing left to do is try BlazingSQL out, work with it, BREAK it (because you will), and maybe even help fix it. it (because you will), and maybe even help fix it.

You can get started easily, and on free GPUs, through our Google Colab Demos. You can also install on any device of your choosing through our Dockerhub container, or if you really want the guts, you can build from the source code here.You can get started easily, and on free GPUs, through our Google Colab Demos. You can also install on any device of your choosing through our Dockerhub container, or if you really want the guts, you can build from the source code here.