[VOTE] Accept Crail into the Apache Incubator

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

[VOTE] Accept Crail into the Apache Incubator

Luciano Resende
Now that the discussion thread on the Crail proposal has ended, please vote
on accepting Crail into into the Apache Incubator.

The ASF voting rules are described at:
   http://www.apache.org/foundation/voting.html

A vote for accepting a new Apache Incubator podling is a majority vote
for which only Incubator PMC member votes are binding.

Votes from other people are also welcome as an indication of peoples
enthusiasm (or lack thereof).

Please do not use this VOTE thread for discussions.
If needed, start a new thread instead.

This vote will run for at least 72 hours. Please VOTE as follows
[] +1 Accept Crail into the Apache Incubator
[] +0 Abstain.
[] -1 Do not accept Crail into the Apache Incubator because ...

The proposal below is also on the wiki:
https://wiki.apache.org/incubator/CrailProposal

===

Abstract

Crail is a storage platform for sharing performance critical data in
distributed data processing jobs at very high speed. Crail is built
entirely upon principles of user-level I/O and specifically targets data
center deployments with fast network and storage hardware (e.g., 100Gbps
RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of operation
such resource disaggregation or serverless computing. Crail is written in
Java and integrates seamlessly with the Apache data processing ecosystem.
It can be used as a backbone to accelerate high-level data operations such
as shuffle or broadcast, or as a cache to store hot data that is queried
repeatedly, or as a storage platform for sharing inter-job data in complex
multi-job pipelines, etc.

Proposal

Crail enables Apache data processing frameworks to run efficiently in next
generation data centers using fast storage and network hardware in
combination with resource (e.g., DRAM, Flash) disaggregation.

Background

Crail started as a research project at the IBM Zurich Research Laboratory
around 2014 aiming to integrate high-speed I/O hardware effectively into
large scale data processing systems.

Rational

During the last decade, I/O hardware has undergone rapid performance
improvements, typically in the order of magnitudes. Modern day networking
and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a few
microseconds of access latencies. However, despite such progress in raw I/O
performance, effectively leveraging modern hardware in data processing
frameworks remains challenging. In most of the cases, upgrading to high-end
networking or storage hardware has very little effect on the performance of
analytics workloads. The problem comes from heavily layered software
imposing overheads such as deep call stacks, unnecessary data copies,
thread contention, etc. These problems have already been addressed at the
operating system level with new I/O APIs such as RDMA verbs, NVMe, etc.,
allowing applications to bypass software layers during I/O operations.
Distributed data processing frameworks on the other hand, are typically
implemented on legacy I/O interfaces such as such as sockets or block
storage. These interfaces have been shown to be insufficient to deliver the
full hardware performance. Yet, to the best of our knowledge, there are no
active and systematic efforts to integrate these new user level I/O APIs
into Apache software frameworks. This problem affects all end-users and
organizations that use Apache software. We expect them to see
unsatisfactory small performance gains when upgrading their networking and
storage hardware.

Crail solves this problem by providing an efficient storage platform built
upon user-level I/O, thus, bypassing layers such as JVM and OS during I/O
operations. Moreover, Crail directly leverages the specific hardware
features of RDMA and NVMe to provide a better integration with high-level
data operations in Apache compute frameworks. As a consequence, Crail
enables users to run larger, more complex queries against ever increasing
amounts of data at a speed largely determined by the deployed hardware.
Crail is generic solution that integrates well with the Apache ecosystem
including frameworks like Spark, Hadoop, Hive, etc.

Initial Goals

The initial goals to move Crail to the Apache Incubator is to broaden the
community, and foster contributions from developers to leverage Crail in
various data processing frameworks and workloads. Ultimately, the goal for
Crail is to become the de-facto standard platform for storing temporary
performance critical data in distributed data processing systems.

Current Status

The initial code has been developed at the IBM Zurich Research Center and
has recently been made available in GitHub under the Apache Software
License 2.0. The Project currently has explicit support for Spark and
Hadoop. Project documentation is available on the website www.crail.io.
There is also a public forum for discussions related to Crail available at
https://groups.google.com/forum/#!forum/zrlio-users.

Mericrotacy

The current developers are familiar with the meritocratic open source
development process at Apache. Over the last year, the project has gathered
interest at GitHub and several companies have already expressed interest in
the project. We plan to invest in supporting a meritocracy by inviting
additional developers to participate.

Community

The need for a generic solution to integrate high-performance I/O hardware
in the open source is tremendous, so there is a potential for a very large
community. We believe that Crail’s extensible architecture and its
alignment with the Apache Ecosystem will further encourage community
participation. We expect that over time Crail will attract a large
community.

Alignment

Crail is written in Java and is built for the Apache data processing
ecosystem. The basic storage services of Crail can be used seamlessly from
Spark, Hadoop, Storm. The enhanced storage services require dedicated data
processing specific binding, which currently are available only for Spark.
We think that moving Crail to the Apache incubator will help to extend
Crail’s support for different data processing frameworks.

Known Risks

To-date, development has been sponsored by IBM and coordinated mostly by
the core team of researchers at the IBM Zurich Research Center. For Crail
to fully transition to an "Apache Way" governance model, it needs to start
embracing the meritocracy-centric way of growing the community of
contributors.

Orphaned Products

The Crail developers have a long-term interest in use and maintenance of
the code and there is also hope that growing a diverse community around the
project will become a guarantee against the project becoming orphaned. We
feel that it is also important to put formal governance in place both for
the project and the contributors as the project expands. We feel ASF is the
best location for this.

Inexperience with Open Source

Several of the initial committers are experienced open source developers
(Linux Kernel, DPDK, etc.).

Relationships with Other Apache Products

As of now, Crail has been tested with Spark, Hadoop and Hive, but it is
designed to integrate with any of the Apache data processing frameworks.

Homogeneous Developers

The project already has a diverse developer base including contributions
from organizations and public developers.

An Excessive Fascination with the Apache Brand

Crail solves a real need for a generic approach to leverage modern network
and storage hardware effectively in the Apache Hadoop and Spark ecosystems.
Our rationale for developing Crail as an Apache project is detailed in the
Rationale section. We believe that the Apache brand and community process
will help to us to engage a larger community and facilitate closer ties
with various Apache data processing projects.

Documentation

Documentation regarding Crail is available at www.crail.io

Initial Source

Initial source is available on GitHub under the Apache License 2.0:

https://github.com/zrlio/crail
External Dependencies

Crail is written in Java and currently supports Apache Hadoop MapReduce and
Apache Spark runtimes. To the best of our knowledge, all dependencies of
Crail are distributed under Apache compatible licenses.

Required Resource

Mailing lists

[hidden email]
[hidden email]
[hidden email]
Git repository

https://git-wip-us.apache.org/repos/asf/incubator-crail.git
Issue Tracking

JIRA (Crail)
Initial Committers

Patrick Stuedi <stu AT ibm DOT zurich DOT com>
Animesh Trivedi <atr AT ibm DOT zurich DOT com>
Jonas Pfefferle <jpf AT ibm DOT zurich DOT com>
Bernard Metzler <bmt AT ibm DOT zurich DOT com>
Michael Kaufmann <kau AT ibm DOT zurich DOT com>
Adrian Schuepbach <dri AT ibm DOT zurich DOT com>
Patrick McArthur <patrick AT patrickmcarthur DOT net>
Ana Klimovic <anakli AT stanford DOT edu>
Yuval Degani <yuvaldeg AT mellanox DOT com>
Vu Pham <vuhuong AT mellanox DOT com>
Affiliations

IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard Metzler,
Michael Kaufmann, Adrian Schuepbach)
University of New Hampshire (Patrick McArthur)
Stanford University (Ana Klimovic)
Mellanox (Yuval Degani, Vu Pham)
Sponsors

Champion

Luciano Resende <lresende AT apache DOT org>

Nominated Mentors

Luciano Resende <lresende AT apache DOT org>

Raphael Bircher <rbircher AT apache DOT org>

Julian Hyde <jhyde AT apache DOT org>

Sponsoring Entity

We would like to propose the Apache Incubator to sponsor this project.


--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Accept Crail into the Apache Incubator

Luciano Resende
Off course, my + 1

On Thu, Oct 26, 2017 at 12:31 PM, Luciano Resende <[hidden email]>
wrote:

> Now that the discussion thread on the Crail proposal has ended, please
> vote on accepting Crail into into the Apache Incubator.
>
> The ASF voting rules are described at:
>    http://www.apache.org/foundation/voting.html
>
> A vote for accepting a new Apache Incubator podling is a majority vote
> for which only Incubator PMC member votes are binding.
>
> Votes from other people are also welcome as an indication of peoples
> enthusiasm (or lack thereof).
>
> Please do not use this VOTE thread for discussions.
> If needed, start a new thread instead.
>
> This vote will run for at least 72 hours. Please VOTE as follows
> [] +1 Accept Crail into the Apache Incubator
> [] +0 Abstain.
> [] -1 Do not accept Crail into the Apache Incubator because ...
>
> The proposal below is also on the wiki:
> https://wiki.apache.org/incubator/CrailProposal
>
> ===
>
> Abstract
>
> Crail is a storage platform for sharing performance critical data in
> distributed data processing jobs at very high speed. Crail is built
> entirely upon principles of user-level I/O and specifically targets data
> center deployments with fast network and storage hardware (e.g., 100Gbps
> RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of operation
> such resource disaggregation or serverless computing. Crail is written in
> Java and integrates seamlessly with the Apache data processing ecosystem.
> It can be used as a backbone to accelerate high-level data operations such
> as shuffle or broadcast, or as a cache to store hot data that is queried
> repeatedly, or as a storage platform for sharing inter-job data in complex
> multi-job pipelines, etc.
>
> Proposal
>
> Crail enables Apache data processing frameworks to run efficiently in next
> generation data centers using fast storage and network hardware in
> combination with resource (e.g., DRAM, Flash) disaggregation.
>
> Background
>
> Crail started as a research project at the IBM Zurich Research Laboratory
> around 2014 aiming to integrate high-speed I/O hardware effectively into
> large scale data processing systems.
>
> Rational
>
> During the last decade, I/O hardware has undergone rapid performance
> improvements, typically in the order of magnitudes. Modern day networking
> and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a few
> microseconds of access latencies. However, despite such progress in raw I/O
> performance, effectively leveraging modern hardware in data processing
> frameworks remains challenging. In most of the cases, upgrading to high-end
> networking or storage hardware has very little effect on the performance of
> analytics workloads. The problem comes from heavily layered software
> imposing overheads such as deep call stacks, unnecessary data copies,
> thread contention, etc. These problems have already been addressed at the
> operating system level with new I/O APIs such as RDMA verbs, NVMe, etc.,
> allowing applications to bypass software layers during I/O operations.
> Distributed data processing frameworks on the other hand, are typically
> implemented on legacy I/O interfaces such as such as sockets or block
> storage. These interfaces have been shown to be insufficient to deliver the
> full hardware performance. Yet, to the best of our knowledge, there are no
> active and systematic efforts to integrate these new user level I/O APIs
> into Apache software frameworks. This problem affects all end-users and
> organizations that use Apache software. We expect them to see
> unsatisfactory small performance gains when upgrading their networking and
> storage hardware.
>
> Crail solves this problem by providing an efficient storage platform built
> upon user-level I/O, thus, bypassing layers such as JVM and OS during I/O
> operations. Moreover, Crail directly leverages the specific hardware
> features of RDMA and NVMe to provide a better integration with high-level
> data operations in Apache compute frameworks. As a consequence, Crail
> enables users to run larger, more complex queries against ever increasing
> amounts of data at a speed largely determined by the deployed hardware.
> Crail is generic solution that integrates well with the Apache ecosystem
> including frameworks like Spark, Hadoop, Hive, etc.
>
> Initial Goals
>
> The initial goals to move Crail to the Apache Incubator is to broaden the
> community, and foster contributions from developers to leverage Crail in
> various data processing frameworks and workloads. Ultimately, the goal for
> Crail is to become the de-facto standard platform for storing temporary
> performance critical data in distributed data processing systems.
>
> Current Status
>
> The initial code has been developed at the IBM Zurich Research Center and
> has recently been made available in GitHub under the Apache Software
> License 2.0. The Project currently has explicit support for Spark and
> Hadoop. Project documentation is available on the website www.crail.io.
> There is also a public forum for discussions related to Crail available at
> https://groups.google.com/forum/#!forum/zrlio-users.
>
> Mericrotacy
>
> The current developers are familiar with the meritocratic open source
> development process at Apache. Over the last year, the project has gathered
> interest at GitHub and several companies have already expressed interest in
> the project. We plan to invest in supporting a meritocracy by inviting
> additional developers to participate.
>
> Community
>
> The need for a generic solution to integrate high-performance I/O hardware
> in the open source is tremendous, so there is a potential for a very large
> community. We believe that Crail’s extensible architecture and its
> alignment with the Apache Ecosystem will further encourage community
> participation. We expect that over time Crail will attract a large
> community.
>
> Alignment
>
> Crail is written in Java and is built for the Apache data processing
> ecosystem. The basic storage services of Crail can be used seamlessly from
> Spark, Hadoop, Storm. The enhanced storage services require dedicated data
> processing specific binding, which currently are available only for Spark.
> We think that moving Crail to the Apache incubator will help to extend
> Crail’s support for different data processing frameworks.
>
> Known Risks
>
> To-date, development has been sponsored by IBM and coordinated mostly by
> the core team of researchers at the IBM Zurich Research Center. For Crail
> to fully transition to an "Apache Way" governance model, it needs to start
> embracing the meritocracy-centric way of growing the community of
> contributors.
>
> Orphaned Products
>
> The Crail developers have a long-term interest in use and maintenance of
> the code and there is also hope that growing a diverse community around the
> project will become a guarantee against the project becoming orphaned. We
> feel that it is also important to put formal governance in place both for
> the project and the contributors as the project expands. We feel ASF is the
> best location for this.
>
> Inexperience with Open Source
>
> Several of the initial committers are experienced open source developers
> (Linux Kernel, DPDK, etc.).
>
> Relationships with Other Apache Products
>
> As of now, Crail has been tested with Spark, Hadoop and Hive, but it is
> designed to integrate with any of the Apache data processing frameworks.
>
> Homogeneous Developers
>
> The project already has a diverse developer base including contributions
> from organizations and public developers.
>
> An Excessive Fascination with the Apache Brand
>
> Crail solves a real need for a generic approach to leverage modern network
> and storage hardware effectively in the Apache Hadoop and Spark ecosystems.
> Our rationale for developing Crail as an Apache project is detailed in the
> Rationale section. We believe that the Apache brand and community process
> will help to us to engage a larger community and facilitate closer ties
> with various Apache data processing projects.
>
> Documentation
>
> Documentation regarding Crail is available at www.crail.io
>
> Initial Source
>
> Initial source is available on GitHub under the Apache License 2.0:
>
> https://github.com/zrlio/crail
> External Dependencies
>
> Crail is written in Java and currently supports Apache Hadoop MapReduce
> and Apache Spark runtimes. To the best of our knowledge, all dependencies
> of Crail are distributed under Apache compatible licenses.
>
> Required Resource
>
> Mailing lists
>
> [hidden email]
> [hidden email]
> [hidden email]
> Git repository
>
> https://git-wip-us.apache.org/repos/asf/incubator-crail.git
> Issue Tracking
>
> JIRA (Crail)
> Initial Committers
>
> Patrick Stuedi <stu AT ibm DOT zurich DOT com>
> Animesh Trivedi <atr AT ibm DOT zurich DOT com>
> Jonas Pfefferle <jpf AT ibm DOT zurich DOT com>
> Bernard Metzler <bmt AT ibm DOT zurich DOT com>
> Michael Kaufmann <kau AT ibm DOT zurich DOT com>
> Adrian Schuepbach <dri AT ibm DOT zurich DOT com>
> Patrick McArthur <patrick AT patrickmcarthur DOT net>
> Ana Klimovic <anakli AT stanford DOT edu>
> Yuval Degani <yuvaldeg AT mellanox DOT com>
> Vu Pham <vuhuong AT mellanox DOT com>
> Affiliations
>
> IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard Metzler,
> Michael Kaufmann, Adrian Schuepbach)
> University of New Hampshire (Patrick McArthur)
> Stanford University (Ana Klimovic)
> Mellanox (Yuval Degani, Vu Pham)
> Sponsors
>
> Champion
>
> Luciano Resende <lresende AT apache DOT org>
>
> Nominated Mentors
>
> Luciano Resende <lresende AT apache DOT org>
>
> Raphael Bircher <rbircher AT apache DOT org>
>
> Julian Hyde <jhyde AT apache DOT org>
>
> Sponsoring Entity
>
> We would like to propose the Apache Incubator to sponsor this project.
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>



--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Accept Crail into the Apache Incubator

Clebert Suconic
+1

On Thu, Oct 26, 2017 at 12:01 PM, Luciano Resende <[hidden email]> wrote:

> Off course, my + 1
>
> On Thu, Oct 26, 2017 at 12:31 PM, Luciano Resende <[hidden email]>
> wrote:
>
>> Now that the discussion thread on the Crail proposal has ended, please
>> vote on accepting Crail into into the Apache Incubator.
>>
>> The ASF voting rules are described at:
>>    http://www.apache.org/foundation/voting.html
>>
>> A vote for accepting a new Apache Incubator podling is a majority vote
>> for which only Incubator PMC member votes are binding.
>>
>> Votes from other people are also welcome as an indication of peoples
>> enthusiasm (or lack thereof).
>>
>> Please do not use this VOTE thread for discussions.
>> If needed, start a new thread instead.
>>
>> This vote will run for at least 72 hours. Please VOTE as follows
>> [] +1 Accept Crail into the Apache Incubator
>> [] +0 Abstain.
>> [] -1 Do not accept Crail into the Apache Incubator because ...
>>
>> The proposal below is also on the wiki:
>> https://wiki.apache.org/incubator/CrailProposal
>>
>> ===
>>
>> Abstract
>>
>> Crail is a storage platform for sharing performance critical data in
>> distributed data processing jobs at very high speed. Crail is built
>> entirely upon principles of user-level I/O and specifically targets data
>> center deployments with fast network and storage hardware (e.g., 100Gbps
>> RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of operation
>> such resource disaggregation or serverless computing. Crail is written in
>> Java and integrates seamlessly with the Apache data processing ecosystem.
>> It can be used as a backbone to accelerate high-level data operations such
>> as shuffle or broadcast, or as a cache to store hot data that is queried
>> repeatedly, or as a storage platform for sharing inter-job data in complex
>> multi-job pipelines, etc.
>>
>> Proposal
>>
>> Crail enables Apache data processing frameworks to run efficiently in next
>> generation data centers using fast storage and network hardware in
>> combination with resource (e.g., DRAM, Flash) disaggregation.
>>
>> Background
>>
>> Crail started as a research project at the IBM Zurich Research Laboratory
>> around 2014 aiming to integrate high-speed I/O hardware effectively into
>> large scale data processing systems.
>>
>> Rational
>>
>> During the last decade, I/O hardware has undergone rapid performance
>> improvements, typically in the order of magnitudes. Modern day networking
>> and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a few
>> microseconds of access latencies. However, despite such progress in raw I/O
>> performance, effectively leveraging modern hardware in data processing
>> frameworks remains challenging. In most of the cases, upgrading to high-end
>> networking or storage hardware has very little effect on the performance of
>> analytics workloads. The problem comes from heavily layered software
>> imposing overheads such as deep call stacks, unnecessary data copies,
>> thread contention, etc. These problems have already been addressed at the
>> operating system level with new I/O APIs such as RDMA verbs, NVMe, etc.,
>> allowing applications to bypass software layers during I/O operations.
>> Distributed data processing frameworks on the other hand, are typically
>> implemented on legacy I/O interfaces such as such as sockets or block
>> storage. These interfaces have been shown to be insufficient to deliver the
>> full hardware performance. Yet, to the best of our knowledge, there are no
>> active and systematic efforts to integrate these new user level I/O APIs
>> into Apache software frameworks. This problem affects all end-users and
>> organizations that use Apache software. We expect them to see
>> unsatisfactory small performance gains when upgrading their networking and
>> storage hardware.
>>
>> Crail solves this problem by providing an efficient storage platform built
>> upon user-level I/O, thus, bypassing layers such as JVM and OS during I/O
>> operations. Moreover, Crail directly leverages the specific hardware
>> features of RDMA and NVMe to provide a better integration with high-level
>> data operations in Apache compute frameworks. As a consequence, Crail
>> enables users to run larger, more complex queries against ever increasing
>> amounts of data at a speed largely determined by the deployed hardware.
>> Crail is generic solution that integrates well with the Apache ecosystem
>> including frameworks like Spark, Hadoop, Hive, etc.
>>
>> Initial Goals
>>
>> The initial goals to move Crail to the Apache Incubator is to broaden the
>> community, and foster contributions from developers to leverage Crail in
>> various data processing frameworks and workloads. Ultimately, the goal for
>> Crail is to become the de-facto standard platform for storing temporary
>> performance critical data in distributed data processing systems.
>>
>> Current Status
>>
>> The initial code has been developed at the IBM Zurich Research Center and
>> has recently been made available in GitHub under the Apache Software
>> License 2.0. The Project currently has explicit support for Spark and
>> Hadoop. Project documentation is available on the website www.crail.io.
>> There is also a public forum for discussions related to Crail available at
>> https://groups.google.com/forum/#!forum/zrlio-users.
>>
>> Mericrotacy
>>
>> The current developers are familiar with the meritocratic open source
>> development process at Apache. Over the last year, the project has gathered
>> interest at GitHub and several companies have already expressed interest in
>> the project. We plan to invest in supporting a meritocracy by inviting
>> additional developers to participate.
>>
>> Community
>>
>> The need for a generic solution to integrate high-performance I/O hardware
>> in the open source is tremendous, so there is a potential for a very large
>> community. We believe that Crail’s extensible architecture and its
>> alignment with the Apache Ecosystem will further encourage community
>> participation. We expect that over time Crail will attract a large
>> community.
>>
>> Alignment
>>
>> Crail is written in Java and is built for the Apache data processing
>> ecosystem. The basic storage services of Crail can be used seamlessly from
>> Spark, Hadoop, Storm. The enhanced storage services require dedicated data
>> processing specific binding, which currently are available only for Spark.
>> We think that moving Crail to the Apache incubator will help to extend
>> Crail’s support for different data processing frameworks.
>>
>> Known Risks
>>
>> To-date, development has been sponsored by IBM and coordinated mostly by
>> the core team of researchers at the IBM Zurich Research Center. For Crail
>> to fully transition to an "Apache Way" governance model, it needs to start
>> embracing the meritocracy-centric way of growing the community of
>> contributors.
>>
>> Orphaned Products
>>
>> The Crail developers have a long-term interest in use and maintenance of
>> the code and there is also hope that growing a diverse community around the
>> project will become a guarantee against the project becoming orphaned. We
>> feel that it is also important to put formal governance in place both for
>> the project and the contributors as the project expands. We feel ASF is the
>> best location for this.
>>
>> Inexperience with Open Source
>>
>> Several of the initial committers are experienced open source developers
>> (Linux Kernel, DPDK, etc.).
>>
>> Relationships with Other Apache Products
>>
>> As of now, Crail has been tested with Spark, Hadoop and Hive, but it is
>> designed to integrate with any of the Apache data processing frameworks.
>>
>> Homogeneous Developers
>>
>> The project already has a diverse developer base including contributions
>> from organizations and public developers.
>>
>> An Excessive Fascination with the Apache Brand
>>
>> Crail solves a real need for a generic approach to leverage modern network
>> and storage hardware effectively in the Apache Hadoop and Spark ecosystems.
>> Our rationale for developing Crail as an Apache project is detailed in the
>> Rationale section. We believe that the Apache brand and community process
>> will help to us to engage a larger community and facilitate closer ties
>> with various Apache data processing projects.
>>
>> Documentation
>>
>> Documentation regarding Crail is available at www.crail.io
>>
>> Initial Source
>>
>> Initial source is available on GitHub under the Apache License 2.0:
>>
>> https://github.com/zrlio/crail
>> External Dependencies
>>
>> Crail is written in Java and currently supports Apache Hadoop MapReduce
>> and Apache Spark runtimes. To the best of our knowledge, all dependencies
>> of Crail are distributed under Apache compatible licenses.
>>
>> Required Resource
>>
>> Mailing lists
>>
>> [hidden email]
>> [hidden email]
>> [hidden email]
>> Git repository
>>
>> https://git-wip-us.apache.org/repos/asf/incubator-crail.git
>> Issue Tracking
>>
>> JIRA (Crail)
>> Initial Committers
>>
>> Patrick Stuedi <stu AT ibm DOT zurich DOT com>
>> Animesh Trivedi <atr AT ibm DOT zurich DOT com>
>> Jonas Pfefferle <jpf AT ibm DOT zurich DOT com>
>> Bernard Metzler <bmt AT ibm DOT zurich DOT com>
>> Michael Kaufmann <kau AT ibm DOT zurich DOT com>
>> Adrian Schuepbach <dri AT ibm DOT zurich DOT com>
>> Patrick McArthur <patrick AT patrickmcarthur DOT net>
>> Ana Klimovic <anakli AT stanford DOT edu>
>> Yuval Degani <yuvaldeg AT mellanox DOT com>
>> Vu Pham <vuhuong AT mellanox DOT com>
>> Affiliations
>>
>> IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard Metzler,
>> Michael Kaufmann, Adrian Schuepbach)
>> University of New Hampshire (Patrick McArthur)
>> Stanford University (Ana Klimovic)
>> Mellanox (Yuval Degani, Vu Pham)
>> Sponsors
>>
>> Champion
>>
>> Luciano Resende <lresende AT apache DOT org>
>>
>> Nominated Mentors
>>
>> Luciano Resende <lresende AT apache DOT org>
>>
>> Raphael Bircher <rbircher AT apache DOT org>
>>
>> Julian Hyde <jhyde AT apache DOT org>
>>
>> Sponsoring Entity
>>
>> We would like to propose the Apache Incubator to sponsor this project.
>>
>>
>> --
>> Luciano Resende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>>
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/



--
Clebert Suconic

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Accept Crail into the Apache Incubator

Gang(Gary) Wang
+1


On Thu, Oct 26, 2017 at 9:25 AM, Clebert Suconic <[hidden email]>
wrote:

> +1
>
> On Thu, Oct 26, 2017 at 12:01 PM, Luciano Resende <[hidden email]>
> wrote:
> > Off course, my + 1
> >
> > On Thu, Oct 26, 2017 at 12:31 PM, Luciano Resende <[hidden email]>
> > wrote:
> >
> >> Now that the discussion thread on the Crail proposal has ended, please
> >> vote on accepting Crail into into the Apache Incubator.
> >>
> >> The ASF voting rules are described at:
> >>    http://www.apache.org/foundation/voting.html
> >>
> >> A vote for accepting a new Apache Incubator podling is a majority vote
> >> for which only Incubator PMC member votes are binding.
> >>
> >> Votes from other people are also welcome as an indication of peoples
> >> enthusiasm (or lack thereof).
> >>
> >> Please do not use this VOTE thread for discussions.
> >> If needed, start a new thread instead.
> >>
> >> This vote will run for at least 72 hours. Please VOTE as follows
> >> [] +1 Accept Crail into the Apache Incubator
> >> [] +0 Abstain.
> >> [] -1 Do not accept Crail into the Apache Incubator because ...
> >>
> >> The proposal below is also on the wiki:
> >> https://wiki.apache.org/incubator/CrailProposal
> >>
> >> ===
> >>
> >> Abstract
> >>
> >> Crail is a storage platform for sharing performance critical data in
> >> distributed data processing jobs at very high speed. Crail is built
> >> entirely upon principles of user-level I/O and specifically targets data
> >> center deployments with fast network and storage hardware (e.g., 100Gbps
> >> RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of
> operation
> >> such resource disaggregation or serverless computing. Crail is written
> in
> >> Java and integrates seamlessly with the Apache data processing
> ecosystem.
> >> It can be used as a backbone to accelerate high-level data operations
> such
> >> as shuffle or broadcast, or as a cache to store hot data that is queried
> >> repeatedly, or as a storage platform for sharing inter-job data in
> complex
> >> multi-job pipelines, etc.
> >>
> >> Proposal
> >>
> >> Crail enables Apache data processing frameworks to run efficiently in
> next
> >> generation data centers using fast storage and network hardware in
> >> combination with resource (e.g., DRAM, Flash) disaggregation.
> >>
> >> Background
> >>
> >> Crail started as a research project at the IBM Zurich Research
> Laboratory
> >> around 2014 aiming to integrate high-speed I/O hardware effectively into
> >> large scale data processing systems.
> >>
> >> Rational
> >>
> >> During the last decade, I/O hardware has undergone rapid performance
> >> improvements, typically in the order of magnitudes. Modern day
> networking
> >> and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a
> few
> >> microseconds of access latencies. However, despite such progress in raw
> I/O
> >> performance, effectively leveraging modern hardware in data processing
> >> frameworks remains challenging. In most of the cases, upgrading to
> high-end
> >> networking or storage hardware has very little effect on the
> performance of
> >> analytics workloads. The problem comes from heavily layered software
> >> imposing overheads such as deep call stacks, unnecessary data copies,
> >> thread contention, etc. These problems have already been addressed at
> the
> >> operating system level with new I/O APIs such as RDMA verbs, NVMe, etc.,
> >> allowing applications to bypass software layers during I/O operations.
> >> Distributed data processing frameworks on the other hand, are typically
> >> implemented on legacy I/O interfaces such as such as sockets or block
> >> storage. These interfaces have been shown to be insufficient to deliver
> the
> >> full hardware performance. Yet, to the best of our knowledge, there are
> no
> >> active and systematic efforts to integrate these new user level I/O APIs
> >> into Apache software frameworks. This problem affects all end-users and
> >> organizations that use Apache software. We expect them to see
> >> unsatisfactory small performance gains when upgrading their networking
> and
> >> storage hardware.
> >>
> >> Crail solves this problem by providing an efficient storage platform
> built
> >> upon user-level I/O, thus, bypassing layers such as JVM and OS during
> I/O
> >> operations. Moreover, Crail directly leverages the specific hardware
> >> features of RDMA and NVMe to provide a better integration with
> high-level
> >> data operations in Apache compute frameworks. As a consequence, Crail
> >> enables users to run larger, more complex queries against ever
> increasing
> >> amounts of data at a speed largely determined by the deployed hardware.
> >> Crail is generic solution that integrates well with the Apache ecosystem
> >> including frameworks like Spark, Hadoop, Hive, etc.
> >>
> >> Initial Goals
> >>
> >> The initial goals to move Crail to the Apache Incubator is to broaden
> the
> >> community, and foster contributions from developers to leverage Crail in
> >> various data processing frameworks and workloads. Ultimately, the goal
> for
> >> Crail is to become the de-facto standard platform for storing temporary
> >> performance critical data in distributed data processing systems.
> >>
> >> Current Status
> >>
> >> The initial code has been developed at the IBM Zurich Research Center
> and
> >> has recently been made available in GitHub under the Apache Software
> >> License 2.0. The Project currently has explicit support for Spark and
> >> Hadoop. Project documentation is available on the website www.crail.io.
> >> There is also a public forum for discussions related to Crail available
> at
> >> https://groups.google.com/forum/#!forum/zrlio-users.
> >>
> >> Mericrotacy
> >>
> >> The current developers are familiar with the meritocratic open source
> >> development process at Apache. Over the last year, the project has
> gathered
> >> interest at GitHub and several companies have already expressed
> interest in
> >> the project. We plan to invest in supporting a meritocracy by inviting
> >> additional developers to participate.
> >>
> >> Community
> >>
> >> The need for a generic solution to integrate high-performance I/O
> hardware
> >> in the open source is tremendous, so there is a potential for a very
> large
> >> community. We believe that Crail’s extensible architecture and its
> >> alignment with the Apache Ecosystem will further encourage community
> >> participation. We expect that over time Crail will attract a large
> >> community.
> >>
> >> Alignment
> >>
> >> Crail is written in Java and is built for the Apache data processing
> >> ecosystem. The basic storage services of Crail can be used seamlessly
> from
> >> Spark, Hadoop, Storm. The enhanced storage services require dedicated
> data
> >> processing specific binding, which currently are available only for
> Spark.
> >> We think that moving Crail to the Apache incubator will help to extend
> >> Crail’s support for different data processing frameworks.
> >>
> >> Known Risks
> >>
> >> To-date, development has been sponsored by IBM and coordinated mostly by
> >> the core team of researchers at the IBM Zurich Research Center. For
> Crail
> >> to fully transition to an "Apache Way" governance model, it needs to
> start
> >> embracing the meritocracy-centric way of growing the community of
> >> contributors.
> >>
> >> Orphaned Products
> >>
> >> The Crail developers have a long-term interest in use and maintenance of
> >> the code and there is also hope that growing a diverse community around
> the
> >> project will become a guarantee against the project becoming orphaned.
> We
> >> feel that it is also important to put formal governance in place both
> for
> >> the project and the contributors as the project expands. We feel ASF is
> the
> >> best location for this.
> >>
> >> Inexperience with Open Source
> >>
> >> Several of the initial committers are experienced open source developers
> >> (Linux Kernel, DPDK, etc.).
> >>
> >> Relationships with Other Apache Products
> >>
> >> As of now, Crail has been tested with Spark, Hadoop and Hive, but it is
> >> designed to integrate with any of the Apache data processing frameworks.
> >>
> >> Homogeneous Developers
> >>
> >> The project already has a diverse developer base including contributions
> >> from organizations and public developers.
> >>
> >> An Excessive Fascination with the Apache Brand
> >>
> >> Crail solves a real need for a generic approach to leverage modern
> network
> >> and storage hardware effectively in the Apache Hadoop and Spark
> ecosystems.
> >> Our rationale for developing Crail as an Apache project is detailed in
> the
> >> Rationale section. We believe that the Apache brand and community
> process
> >> will help to us to engage a larger community and facilitate closer ties
> >> with various Apache data processing projects.
> >>
> >> Documentation
> >>
> >> Documentation regarding Crail is available at www.crail.io
> >>
> >> Initial Source
> >>
> >> Initial source is available on GitHub under the Apache License 2.0:
> >>
> >> https://github.com/zrlio/crail
> >> External Dependencies
> >>
> >> Crail is written in Java and currently supports Apache Hadoop MapReduce
> >> and Apache Spark runtimes. To the best of our knowledge, all
> dependencies
> >> of Crail are distributed under Apache compatible licenses.
> >>
> >> Required Resource
> >>
> >> Mailing lists
> >>
> >> [hidden email]
> >> [hidden email]
> >> [hidden email]
> >> Git repository
> >>
> >> https://git-wip-us.apache.org/repos/asf/incubator-crail.git
> >> Issue Tracking
> >>
> >> JIRA (Crail)
> >> Initial Committers
> >>
> >> Patrick Stuedi <stu AT ibm DOT zurich DOT com>
> >> Animesh Trivedi <atr AT ibm DOT zurich DOT com>
> >> Jonas Pfefferle <jpf AT ibm DOT zurich DOT com>
> >> Bernard Metzler <bmt AT ibm DOT zurich DOT com>
> >> Michael Kaufmann <kau AT ibm DOT zurich DOT com>
> >> Adrian Schuepbach <dri AT ibm DOT zurich DOT com>
> >> Patrick McArthur <patrick AT patrickmcarthur DOT net>
> >> Ana Klimovic <anakli AT stanford DOT edu>
> >> Yuval Degani <yuvaldeg AT mellanox DOT com>
> >> Vu Pham <vuhuong AT mellanox DOT com>
> >> Affiliations
> >>
> >> IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard Metzler,
> >> Michael Kaufmann, Adrian Schuepbach)
> >> University of New Hampshire (Patrick McArthur)
> >> Stanford University (Ana Klimovic)
> >> Mellanox (Yuval Degani, Vu Pham)
> >> Sponsors
> >>
> >> Champion
> >>
> >> Luciano Resende <lresende AT apache DOT org>
> >>
> >> Nominated Mentors
> >>
> >> Luciano Resende <lresende AT apache DOT org>
> >>
> >> Raphael Bircher <rbircher AT apache DOT org>
> >>
> >> Julian Hyde <jhyde AT apache DOT org>
> >>
> >> Sponsoring Entity
> >>
> >> We would like to propose the Apache Incubator to sponsor this project.
> >>
> >>
> >> --
> >> Luciano Resende
> >> http://twitter.com/lresende1975
> >> http://lresende.blogspot.com/
> >>
> >
> >
> >
> > --
> > Luciano Resende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
>
>
>
> --
> Clebert Suconic
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Accept Crail into the Apache Incubator

Debo Dutta (dedutta)
+1

On 10/26/17, 9:30 AM, "Gang(Gary) Wang" <[hidden email]> wrote:

    +1
   
   
    On Thu, Oct 26, 2017 at 9:25 AM, Clebert Suconic <[hidden email]>
    wrote:
   
    > +1
    >
    > On Thu, Oct 26, 2017 at 12:01 PM, Luciano Resende <[hidden email]>
    > wrote:
    > > Off course, my + 1
    > >
    > > On Thu, Oct 26, 2017 at 12:31 PM, Luciano Resende <[hidden email]>
    > > wrote:
    > >
    > >> Now that the discussion thread on the Crail proposal has ended, please
    > >> vote on accepting Crail into into the Apache Incubator.
    > >>
    > >> The ASF voting rules are described at:
    > >>    http://www.apache.org/foundation/voting.html
    > >>
    > >> A vote for accepting a new Apache Incubator podling is a majority vote
    > >> for which only Incubator PMC member votes are binding.
    > >>
    > >> Votes from other people are also welcome as an indication of peoples
    > >> enthusiasm (or lack thereof).
    > >>
    > >> Please do not use this VOTE thread for discussions.
    > >> If needed, start a new thread instead.
    > >>
    > >> This vote will run for at least 72 hours. Please VOTE as follows
    > >> [] +1 Accept Crail into the Apache Incubator
    > >> [] +0 Abstain.
    > >> [] -1 Do not accept Crail into the Apache Incubator because ...
    > >>
    > >> The proposal below is also on the wiki:
    > >> https://wiki.apache.org/incubator/CrailProposal
    > >>
    > >> ===
    > >>
    > >> Abstract
    > >>
    > >> Crail is a storage platform for sharing performance critical data in
    > >> distributed data processing jobs at very high speed. Crail is built
    > >> entirely upon principles of user-level I/O and specifically targets data
    > >> center deployments with fast network and storage hardware (e.g., 100Gbps
    > >> RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of
    > operation
    > >> such resource disaggregation or serverless computing. Crail is written
    > in
    > >> Java and integrates seamlessly with the Apache data processing
    > ecosystem.
    > >> It can be used as a backbone to accelerate high-level data operations
    > such
    > >> as shuffle or broadcast, or as a cache to store hot data that is queried
    > >> repeatedly, or as a storage platform for sharing inter-job data in
    > complex
    > >> multi-job pipelines, etc.
    > >>
    > >> Proposal
    > >>
    > >> Crail enables Apache data processing frameworks to run efficiently in
    > next
    > >> generation data centers using fast storage and network hardware in
    > >> combination with resource (e.g., DRAM, Flash) disaggregation.
    > >>
    > >> Background
    > >>
    > >> Crail started as a research project at the IBM Zurich Research
    > Laboratory
    > >> around 2014 aiming to integrate high-speed I/O hardware effectively into
    > >> large scale data processing systems.
    > >>
    > >> Rational
    > >>
    > >> During the last decade, I/O hardware has undergone rapid performance
    > >> improvements, typically in the order of magnitudes. Modern day
    > networking
    > >> and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a
    > few
    > >> microseconds of access latencies. However, despite such progress in raw
    > I/O
    > >> performance, effectively leveraging modern hardware in data processing
    > >> frameworks remains challenging. In most of the cases, upgrading to
    > high-end
    > >> networking or storage hardware has very little effect on the
    > performance of
    > >> analytics workloads. The problem comes from heavily layered software
    > >> imposing overheads such as deep call stacks, unnecessary data copies,
    > >> thread contention, etc. These problems have already been addressed at
    > the
    > >> operating system level with new I/O APIs such as RDMA verbs, NVMe, etc.,
    > >> allowing applications to bypass software layers during I/O operations.
    > >> Distributed data processing frameworks on the other hand, are typically
    > >> implemented on legacy I/O interfaces such as such as sockets or block
    > >> storage. These interfaces have been shown to be insufficient to deliver
    > the
    > >> full hardware performance. Yet, to the best of our knowledge, there are
    > no
    > >> active and systematic efforts to integrate these new user level I/O APIs
    > >> into Apache software frameworks. This problem affects all end-users and
    > >> organizations that use Apache software. We expect them to see
    > >> unsatisfactory small performance gains when upgrading their networking
    > and
    > >> storage hardware.
    > >>
    > >> Crail solves this problem by providing an efficient storage platform
    > built
    > >> upon user-level I/O, thus, bypassing layers such as JVM and OS during
    > I/O
    > >> operations. Moreover, Crail directly leverages the specific hardware
    > >> features of RDMA and NVMe to provide a better integration with
    > high-level
    > >> data operations in Apache compute frameworks. As a consequence, Crail
    > >> enables users to run larger, more complex queries against ever
    > increasing
    > >> amounts of data at a speed largely determined by the deployed hardware.
    > >> Crail is generic solution that integrates well with the Apache ecosystem
    > >> including frameworks like Spark, Hadoop, Hive, etc.
    > >>
    > >> Initial Goals
    > >>
    > >> The initial goals to move Crail to the Apache Incubator is to broaden
    > the
    > >> community, and foster contributions from developers to leverage Crail in
    > >> various data processing frameworks and workloads. Ultimately, the goal
    > for
    > >> Crail is to become the de-facto standard platform for storing temporary
    > >> performance critical data in distributed data processing systems.
    > >>
    > >> Current Status
    > >>
    > >> The initial code has been developed at the IBM Zurich Research Center
    > and
    > >> has recently been made available in GitHub under the Apache Software
    > >> License 2.0. The Project currently has explicit support for Spark and
    > >> Hadoop. Project documentation is available on the website www.crail.io.
    > >> There is also a public forum for discussions related to Crail available
    > at
    > >> https://groups.google.com/forum/#!forum/zrlio-users.
    > >>
    > >> Mericrotacy
    > >>
    > >> The current developers are familiar with the meritocratic open source
    > >> development process at Apache. Over the last year, the project has
    > gathered
    > >> interest at GitHub and several companies have already expressed
    > interest in
    > >> the project. We plan to invest in supporting a meritocracy by inviting
    > >> additional developers to participate.
    > >>
    > >> Community
    > >>
    > >> The need for a generic solution to integrate high-performance I/O
    > hardware
    > >> in the open source is tremendous, so there is a potential for a very
    > large
    > >> community. We believe that Crail’s extensible architecture and its
    > >> alignment with the Apache Ecosystem will further encourage community
    > >> participation. We expect that over time Crail will attract a large
    > >> community.
    > >>
    > >> Alignment
    > >>
    > >> Crail is written in Java and is built for the Apache data processing
    > >> ecosystem. The basic storage services of Crail can be used seamlessly
    > from
    > >> Spark, Hadoop, Storm. The enhanced storage services require dedicated
    > data
    > >> processing specific binding, which currently are available only for
    > Spark.
    > >> We think that moving Crail to the Apache incubator will help to extend
    > >> Crail’s support for different data processing frameworks.
    > >>
    > >> Known Risks
    > >>
    > >> To-date, development has been sponsored by IBM and coordinated mostly by
    > >> the core team of researchers at the IBM Zurich Research Center. For
    > Crail
    > >> to fully transition to an "Apache Way" governance model, it needs to
    > start
    > >> embracing the meritocracy-centric way of growing the community of
    > >> contributors.
    > >>
    > >> Orphaned Products
    > >>
    > >> The Crail developers have a long-term interest in use and maintenance of
    > >> the code and there is also hope that growing a diverse community around
    > the
    > >> project will become a guarantee against the project becoming orphaned.
    > We
    > >> feel that it is also important to put formal governance in place both
    > for
    > >> the project and the contributors as the project expands. We feel ASF is
    > the
    > >> best location for this.
    > >>
    > >> Inexperience with Open Source
    > >>
    > >> Several of the initial committers are experienced open source developers
    > >> (Linux Kernel, DPDK, etc.).
    > >>
    > >> Relationships with Other Apache Products
    > >>
    > >> As of now, Crail has been tested with Spark, Hadoop and Hive, but it is
    > >> designed to integrate with any of the Apache data processing frameworks.
    > >>
    > >> Homogeneous Developers
    > >>
    > >> The project already has a diverse developer base including contributions
    > >> from organizations and public developers.
    > >>
    > >> An Excessive Fascination with the Apache Brand
    > >>
    > >> Crail solves a real need for a generic approach to leverage modern
    > network
    > >> and storage hardware effectively in the Apache Hadoop and Spark
    > ecosystems.
    > >> Our rationale for developing Crail as an Apache project is detailed in
    > the
    > >> Rationale section. We believe that the Apache brand and community
    > process
    > >> will help to us to engage a larger community and facilitate closer ties
    > >> with various Apache data processing projects.
    > >>
    > >> Documentation
    > >>
    > >> Documentation regarding Crail is available at www.crail.io
    > >>
    > >> Initial Source
    > >>
    > >> Initial source is available on GitHub under the Apache License 2.0:
    > >>
    > >> https://github.com/zrlio/crail
    > >> External Dependencies
    > >>
    > >> Crail is written in Java and currently supports Apache Hadoop MapReduce
    > >> and Apache Spark runtimes. To the best of our knowledge, all
    > dependencies
    > >> of Crail are distributed under Apache compatible licenses.
    > >>
    > >> Required Resource
    > >>
    > >> Mailing lists
    > >>
    > >> [hidden email]
    > >> [hidden email]
    > >> [hidden email]
    > >> Git repository
    > >>
    > >> https://git-wip-us.apache.org/repos/asf/incubator-crail.git
    > >> Issue Tracking
    > >>
    > >> JIRA (Crail)
    > >> Initial Committers
    > >>
    > >> Patrick Stuedi <stu AT ibm DOT zurich DOT com>
    > >> Animesh Trivedi <atr AT ibm DOT zurich DOT com>
    > >> Jonas Pfefferle <jpf AT ibm DOT zurich DOT com>
    > >> Bernard Metzler <bmt AT ibm DOT zurich DOT com>
    > >> Michael Kaufmann <kau AT ibm DOT zurich DOT com>
    > >> Adrian Schuepbach <dri AT ibm DOT zurich DOT com>
    > >> Patrick McArthur <patrick AT patrickmcarthur DOT net>
    > >> Ana Klimovic <anakli AT stanford DOT edu>
    > >> Yuval Degani <yuvaldeg AT mellanox DOT com>
    > >> Vu Pham <vuhuong AT mellanox DOT com>
    > >> Affiliations
    > >>
    > >> IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard Metzler,
    > >> Michael Kaufmann, Adrian Schuepbach)
    > >> University of New Hampshire (Patrick McArthur)
    > >> Stanford University (Ana Klimovic)
    > >> Mellanox (Yuval Degani, Vu Pham)
    > >> Sponsors
    > >>
    > >> Champion
    > >>
    > >> Luciano Resende <lresende AT apache DOT org>
    > >>
    > >> Nominated Mentors
    > >>
    > >> Luciano Resende <lresende AT apache DOT org>
    > >>
    > >> Raphael Bircher <rbircher AT apache DOT org>
    > >>
    > >> Julian Hyde <jhyde AT apache DOT org>
    > >>
    > >> Sponsoring Entity
    > >>
    > >> We would like to propose the Apache Incubator to sponsor this project.
    > >>
    > >>
    > >> --
    > >> Luciano Resende
    > >> http://twitter.com/lresende1975
    > >> http://lresende.blogspot.com/
    > >>
    > >
    > >
    > >
    > > --
    > > Luciano Resende
    > > http://twitter.com/lresende1975
    > > http://lresende.blogspot.com/
    >
    >
    >
    > --
    > Clebert Suconic
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: [hidden email]
    > For additional commands, e-mail: [hidden email]
    >
    >
   


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Accept Crail into the Apache Incubator

Kacie Karo
In reply to this post by Clebert Suconic
One

kacie karo

> On Oct 26, 2560 BE, at 11:25 AM, Clebert Suconic <[hidden email]> wrote:
>
> apache

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Accept Crail into the Apache Incubator

Julian Hyde-3
+1 binding

Julian


On Thu, Oct 26, 2017 at 10:30 AM, Kacie Karo <[hidden email]> wrote:

> One
>
> kacie karo
>
>> On Oct 26, 2560 BE, at 11:25 AM, Clebert Suconic <[hidden email]> wrote:
>>
>> apache
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Accept Crail into the Apache Incubator

Dave Fisher-5
In reply to this post by Luciano Resende
+1 Accept - Binding

> On Oct 26, 2017, at 8:31 AM, Luciano Resende <[hidden email]> wrote:
>
> Now that the discussion thread on the Crail proposal has ended, please vote
> on accepting Crail into into the Apache Incubator.
>
> The ASF voting rules are described at:
>   http://www.apache.org/foundation/voting.html
>
> A vote for accepting a new Apache Incubator podling is a majority vote
> for which only Incubator PMC member votes are binding.
>
> Votes from other people are also welcome as an indication of peoples
> enthusiasm (or lack thereof).
>
> Please do not use this VOTE thread for discussions.
> If needed, start a new thread instead.
>
> This vote will run for at least 72 hours. Please VOTE as follows
> [] +1 Accept Crail into the Apache Incubator
> [] +0 Abstain.
> [] -1 Do not accept Crail into the Apache Incubator because ...
>
> The proposal below is also on the wiki:
> https://wiki.apache.org/incubator/CrailProposal
>
> ===
>
> Abstract
>
> Crail is a storage platform for sharing performance critical data in
> distributed data processing jobs at very high speed. Crail is built
> entirely upon principles of user-level I/O and specifically targets data
> center deployments with fast network and storage hardware (e.g., 100Gbps
> RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of operation
> such resource disaggregation or serverless computing. Crail is written in
> Java and integrates seamlessly with the Apache data processing ecosystem.
> It can be used as a backbone to accelerate high-level data operations such
> as shuffle or broadcast, or as a cache to store hot data that is queried
> repeatedly, or as a storage platform for sharing inter-job data in complex
> multi-job pipelines, etc.
>
> Proposal
>
> Crail enables Apache data processing frameworks to run efficiently in next
> generation data centers using fast storage and network hardware in
> combination with resource (e.g., DRAM, Flash) disaggregation.
>
> Background
>
> Crail started as a research project at the IBM Zurich Research Laboratory
> around 2014 aiming to integrate high-speed I/O hardware effectively into
> large scale data processing systems.
>
> Rational
>
> During the last decade, I/O hardware has undergone rapid performance
> improvements, typically in the order of magnitudes. Modern day networking
> and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a few
> microseconds of access latencies. However, despite such progress in raw I/O
> performance, effectively leveraging modern hardware in data processing
> frameworks remains challenging. In most of the cases, upgrading to high-end
> networking or storage hardware has very little effect on the performance of
> analytics workloads. The problem comes from heavily layered software
> imposing overheads such as deep call stacks, unnecessary data copies,
> thread contention, etc. These problems have already been addressed at the
> operating system level with new I/O APIs such as RDMA verbs, NVMe, etc.,
> allowing applications to bypass software layers during I/O operations.
> Distributed data processing frameworks on the other hand, are typically
> implemented on legacy I/O interfaces such as such as sockets or block
> storage. These interfaces have been shown to be insufficient to deliver the
> full hardware performance. Yet, to the best of our knowledge, there are no
> active and systematic efforts to integrate these new user level I/O APIs
> into Apache software frameworks. This problem affects all end-users and
> organizations that use Apache software. We expect them to see
> unsatisfactory small performance gains when upgrading their networking and
> storage hardware.
>
> Crail solves this problem by providing an efficient storage platform built
> upon user-level I/O, thus, bypassing layers such as JVM and OS during I/O
> operations. Moreover, Crail directly leverages the specific hardware
> features of RDMA and NVMe to provide a better integration with high-level
> data operations in Apache compute frameworks. As a consequence, Crail
> enables users to run larger, more complex queries against ever increasing
> amounts of data at a speed largely determined by the deployed hardware.
> Crail is generic solution that integrates well with the Apache ecosystem
> including frameworks like Spark, Hadoop, Hive, etc.
>
> Initial Goals
>
> The initial goals to move Crail to the Apache Incubator is to broaden the
> community, and foster contributions from developers to leverage Crail in
> various data processing frameworks and workloads. Ultimately, the goal for
> Crail is to become the de-facto standard platform for storing temporary
> performance critical data in distributed data processing systems.
>
> Current Status
>
> The initial code has been developed at the IBM Zurich Research Center and
> has recently been made available in GitHub under the Apache Software
> License 2.0. The Project currently has explicit support for Spark and
> Hadoop. Project documentation is available on the website www.crail.io.
> There is also a public forum for discussions related to Crail available at
> https://groups.google.com/forum/#!forum/zrlio-users.
>
> Mericrotacy
>
> The current developers are familiar with the meritocratic open source
> development process at Apache. Over the last year, the project has gathered
> interest at GitHub and several companies have already expressed interest in
> the project. We plan to invest in supporting a meritocracy by inviting
> additional developers to participate.
>
> Community
>
> The need for a generic solution to integrate high-performance I/O hardware
> in the open source is tremendous, so there is a potential for a very large
> community. We believe that Crail’s extensible architecture and its
> alignment with the Apache Ecosystem will further encourage community
> participation. We expect that over time Crail will attract a large
> community.
>
> Alignment
>
> Crail is written in Java and is built for the Apache data processing
> ecosystem. The basic storage services of Crail can be used seamlessly from
> Spark, Hadoop, Storm. The enhanced storage services require dedicated data
> processing specific binding, which currently are available only for Spark.
> We think that moving Crail to the Apache incubator will help to extend
> Crail’s support for different data processing frameworks.
>
> Known Risks
>
> To-date, development has been sponsored by IBM and coordinated mostly by
> the core team of researchers at the IBM Zurich Research Center. For Crail
> to fully transition to an "Apache Way" governance model, it needs to start
> embracing the meritocracy-centric way of growing the community of
> contributors.
>
> Orphaned Products
>
> The Crail developers have a long-term interest in use and maintenance of
> the code and there is also hope that growing a diverse community around the
> project will become a guarantee against the project becoming orphaned. We
> feel that it is also important to put formal governance in place both for
> the project and the contributors as the project expands. We feel ASF is the
> best location for this.
>
> Inexperience with Open Source
>
> Several of the initial committers are experienced open source developers
> (Linux Kernel, DPDK, etc.).
>
> Relationships with Other Apache Products
>
> As of now, Crail has been tested with Spark, Hadoop and Hive, but it is
> designed to integrate with any of the Apache data processing frameworks.
>
> Homogeneous Developers
>
> The project already has a diverse developer base including contributions
> from organizations and public developers.
>
> An Excessive Fascination with the Apache Brand
>
> Crail solves a real need for a generic approach to leverage modern network
> and storage hardware effectively in the Apache Hadoop and Spark ecosystems.
> Our rationale for developing Crail as an Apache project is detailed in the
> Rationale section. We believe that the Apache brand and community process
> will help to us to engage a larger community and facilitate closer ties
> with various Apache data processing projects.
>
> Documentation
>
> Documentation regarding Crail is available at www.crail.io
>
> Initial Source
>
> Initial source is available on GitHub under the Apache License 2.0:
>
> https://github.com/zrlio/crail
> External Dependencies
>
> Crail is written in Java and currently supports Apache Hadoop MapReduce and
> Apache Spark runtimes. To the best of our knowledge, all dependencies of
> Crail are distributed under Apache compatible licenses.
>
> Required Resource
>
> Mailing lists
>
> [hidden email]
> [hidden email]
> [hidden email]
> Git repository
>
> https://git-wip-us.apache.org/repos/asf/incubator-crail.git
> Issue Tracking
>
> JIRA (Crail)
> Initial Committers
>
> Patrick Stuedi <stu AT ibm DOT zurich DOT com>
> Animesh Trivedi <atr AT ibm DOT zurich DOT com>
> Jonas Pfefferle <jpf AT ibm DOT zurich DOT com>
> Bernard Metzler <bmt AT ibm DOT zurich DOT com>
> Michael Kaufmann <kau AT ibm DOT zurich DOT com>
> Adrian Schuepbach <dri AT ibm DOT zurich DOT com>
> Patrick McArthur <patrick AT patrickmcarthur DOT net>
> Ana Klimovic <anakli AT stanford DOT edu>
> Yuval Degani <yuvaldeg AT mellanox DOT com>
> Vu Pham <vuhuong AT mellanox DOT com>
> Affiliations
>
> IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard Metzler,
> Michael Kaufmann, Adrian Schuepbach)
> University of New Hampshire (Patrick McArthur)
> Stanford University (Ana Klimovic)
> Mellanox (Yuval Degani, Vu Pham)
> Sponsors
>
> Champion
>
> Luciano Resende <lresende AT apache DOT org>
>
> Nominated Mentors
>
> Luciano Resende <lresende AT apache DOT org>
>
> Raphael Bircher <rbircher AT apache DOT org>
>
> Julian Hyde <jhyde AT apache DOT org>
>
> Sponsoring Entity
>
> We would like to propose the Apache Incubator to sponsor this project.
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/


signature.asc (817 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Accept Crail into the Apache Incubator

Kacie Karo
In reply to this post by Gang(Gary) Wang
#1

kacie karo

> On Oct 26, 2560 BE, at 11:30 AM, Gang(Gary) Wang <[hidden email]> wrote:
>
> wrote:

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Accept Crail into the Apache Incubator

Raphael Bircher-2
In reply to this post by Luciano Resende
+1 (binding)

Am .10.2017, 18:01 Uhr, schrieb Luciano Resende <[hidden email]>:

> Off course, my + 1
>
> On Thu, Oct 26, 2017 at 12:31 PM, Luciano Resende <[hidden email]>
> wrote:
>
>> Now that the discussion thread on the Crail proposal has ended, please
>> vote on accepting Crail into into the Apache Incubator.
>>
>> The ASF voting rules are described at:
>>    http://www.apache.org/foundation/voting.html
>>
>> A vote for accepting a new Apache Incubator podling is a majority vote
>> for which only Incubator PMC member votes are binding.
>>
>> Votes from other people are also welcome as an indication of peoples
>> enthusiasm (or lack thereof).
>>
>> Please do not use this VOTE thread for discussions.
>> If needed, start a new thread instead.
>>
>> This vote will run for at least 72 hours. Please VOTE as follows
>> [] +1 Accept Crail into the Apache Incubator
>> [] +0 Abstain.
>> [] -1 Do not accept Crail into the Apache Incubator because ...
>>
>> The proposal below is also on the wiki:
>> https://wiki.apache.org/incubator/CrailProposal
>>
>> ===
>>
>> Abstract
>>
>> Crail is a storage platform for sharing performance critical data in
>> distributed data processing jobs at very high speed. Crail is built
>> entirely upon principles of user-level I/O and specifically targets data
>> center deployments with fast network and storage hardware (e.g., 100Gbps
>> RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of  
>> operation
>> such resource disaggregation or serverless computing. Crail is written  
>> in
>> Java and integrates seamlessly with the Apache data processing  
>> ecosystem.
>> It can be used as a backbone to accelerate high-level data operations  
>> such
>> as shuffle or broadcast, or as a cache to store hot data that is queried
>> repeatedly, or as a storage platform for sharing inter-job data in  
>> complex
>> multi-job pipelines, etc.
>>
>> Proposal
>>
>> Crail enables Apache data processing frameworks to run efficiently in  
>> next
>> generation data centers using fast storage and network hardware in
>> combination with resource (e.g., DRAM, Flash) disaggregation.
>>
>> Background
>>
>> Crail started as a research project at the IBM Zurich Research  
>> Laboratory
>> around 2014 aiming to integrate high-speed I/O hardware effectively into
>> large scale data processing systems.
>>
>> Rational
>>
>> During the last decade, I/O hardware has undergone rapid performance
>> improvements, typically in the order of magnitudes. Modern day  
>> networking
>> and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a  
>> few
>> microseconds of access latencies. However, despite such progress in raw  
>> I/O
>> performance, effectively leveraging modern hardware in data processing
>> frameworks remains challenging. In most of the cases, upgrading to  
>> high-end
>> networking or storage hardware has very little effect on the  
>> performance of
>> analytics workloads. The problem comes from heavily layered software
>> imposing overheads such as deep call stacks, unnecessary data copies,
>> thread contention, etc. These problems have already been addressed at  
>> the
>> operating system level with new I/O APIs such as RDMA verbs, NVMe, etc.,
>> allowing applications to bypass software layers during I/O operations.
>> Distributed data processing frameworks on the other hand, are typically
>> implemented on legacy I/O interfaces such as such as sockets or block
>> storage. These interfaces have been shown to be insufficient to deliver  
>> the
>> full hardware performance. Yet, to the best of our knowledge, there are  
>> no
>> active and systematic efforts to integrate these new user level I/O APIs
>> into Apache software frameworks. This problem affects all end-users and
>> organizations that use Apache software. We expect them to see
>> unsatisfactory small performance gains when upgrading their networking  
>> and
>> storage hardware.
>>
>> Crail solves this problem by providing an efficient storage platform  
>> built
>> upon user-level I/O, thus, bypassing layers such as JVM and OS during  
>> I/O
>> operations. Moreover, Crail directly leverages the specific hardware
>> features of RDMA and NVMe to provide a better integration with  
>> high-level
>> data operations in Apache compute frameworks. As a consequence, Crail
>> enables users to run larger, more complex queries against ever  
>> increasing
>> amounts of data at a speed largely determined by the deployed hardware.
>> Crail is generic solution that integrates well with the Apache ecosystem
>> including frameworks like Spark, Hadoop, Hive, etc.
>>
>> Initial Goals
>>
>> The initial goals to move Crail to the Apache Incubator is to broaden  
>> the
>> community, and foster contributions from developers to leverage Crail in
>> various data processing frameworks and workloads. Ultimately, the goal  
>> for
>> Crail is to become the de-facto standard platform for storing temporary
>> performance critical data in distributed data processing systems.
>>
>> Current Status
>>
>> The initial code has been developed at the IBM Zurich Research Center  
>> and
>> has recently been made available in GitHub under the Apache Software
>> License 2.0. The Project currently has explicit support for Spark and
>> Hadoop. Project documentation is available on the website www.crail.io.
>> There is also a public forum for discussions related to Crail available  
>> at
>> https://groups.google.com/forum/#!forum/zrlio-users.
>>
>> Mericrotacy
>>
>> The current developers are familiar with the meritocratic open source
>> development process at Apache. Over the last year, the project has  
>> gathered
>> interest at GitHub and several companies have already expressed  
>> interest in
>> the project. We plan to invest in supporting a meritocracy by inviting
>> additional developers to participate.
>>
>> Community
>>
>> The need for a generic solution to integrate high-performance I/O  
>> hardware
>> in the open source is tremendous, so there is a potential for a very  
>> large
>> community. We believe that Crail’s extensible architecture and its
>> alignment with the Apache Ecosystem will further encourage community
>> participation. We expect that over time Crail will attract a large
>> community.
>>
>> Alignment
>>
>> Crail is written in Java and is built for the Apache data processing
>> ecosystem. The basic storage services of Crail can be used seamlessly  
>> from
>> Spark, Hadoop, Storm. The enhanced storage services require dedicated  
>> data
>> processing specific binding, which currently are available only for  
>> Spark.
>> We think that moving Crail to the Apache incubator will help to extend
>> Crail’s support for different data processing frameworks.
>>
>> Known Risks
>>
>> To-date, development has been sponsored by IBM and coordinated mostly by
>> the core team of researchers at the IBM Zurich Research Center. For  
>> Crail
>> to fully transition to an "Apache Way" governance model, it needs to  
>> start
>> embracing the meritocracy-centric way of growing the community of
>> contributors.
>>
>> Orphaned Products
>>
>> The Crail developers have a long-term interest in use and maintenance of
>> the code and there is also hope that growing a diverse community around  
>> the
>> project will become a guarantee against the project becoming orphaned.  
>> We
>> feel that it is also important to put formal governance in place both  
>> for
>> the project and the contributors as the project expands. We feel ASF is  
>> the
>> best location for this.
>>
>> Inexperience with Open Source
>>
>> Several of the initial committers are experienced open source developers
>> (Linux Kernel, DPDK, etc.).
>>
>> Relationships with Other Apache Products
>>
>> As of now, Crail has been tested with Spark, Hadoop and Hive, but it is
>> designed to integrate with any of the Apache data processing frameworks.
>>
>> Homogeneous Developers
>>
>> The project already has a diverse developer base including contributions
>> from organizations and public developers.
>>
>> An Excessive Fascination with the Apache Brand
>>
>> Crail solves a real need for a generic approach to leverage modern  
>> network
>> and storage hardware effectively in the Apache Hadoop and Spark  
>> ecosystems.
>> Our rationale for developing Crail as an Apache project is detailed in  
>> the
>> Rationale section. We believe that the Apache brand and community  
>> process
>> will help to us to engage a larger community and facilitate closer ties
>> with various Apache data processing projects.
>>
>> Documentation
>>
>> Documentation regarding Crail is available at www.crail.io
>>
>> Initial Source
>>
>> Initial source is available on GitHub under the Apache License 2.0:
>>
>> https://github.com/zrlio/crail
>> External Dependencies
>>
>> Crail is written in Java and currently supports Apache Hadoop MapReduce
>> and Apache Spark runtimes. To the best of our knowledge, all  
>> dependencies
>> of Crail are distributed under Apache compatible licenses.
>>
>> Required Resource
>>
>> Mailing lists
>>
>> [hidden email]
>> [hidden email]
>> [hidden email]
>> Git repository
>>
>> https://git-wip-us.apache.org/repos/asf/incubator-crail.git
>> Issue Tracking
>>
>> JIRA (Crail)
>> Initial Committers
>>
>> Patrick Stuedi <stu AT ibm DOT zurich DOT com>
>> Animesh Trivedi <atr AT ibm DOT zurich DOT com>
>> Jonas Pfefferle <jpf AT ibm DOT zurich DOT com>
>> Bernard Metzler <bmt AT ibm DOT zurich DOT com>
>> Michael Kaufmann <kau AT ibm DOT zurich DOT com>
>> Adrian Schuepbach <dri AT ibm DOT zurich DOT com>
>> Patrick McArthur <patrick AT patrickmcarthur DOT net>
>> Ana Klimovic <anakli AT stanford DOT edu>
>> Yuval Degani <yuvaldeg AT mellanox DOT com>
>> Vu Pham <vuhuong AT mellanox DOT com>
>> Affiliations
>>
>> IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard Metzler,
>> Michael Kaufmann, Adrian Schuepbach)
>> University of New Hampshire (Patrick McArthur)
>> Stanford University (Ana Klimovic)
>> Mellanox (Yuval Degani, Vu Pham)
>> Sponsors
>>
>> Champion
>>
>> Luciano Resende <lresende AT apache DOT org>
>>
>> Nominated Mentors
>>
>> Luciano Resende <lresende AT apache DOT org>
>>
>> Raphael Bircher <rbircher AT apache DOT org>
>>
>> Julian Hyde <jhyde AT apache DOT org>
>>
>> Sponsoring Entity
>>
>> We would like to propose the Apache Incubator to sponsor this project.
>>
>>
>> --
>> Luciano Resende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>>
>
>
>


--
My introduction https://youtu.be/Ln4vly5sxYU

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Accept Crail into the Apache Incubator

Pierre Smits
+1

Best regards

Pierre

On Fri, 27 Oct 2017 at 13:57 Raphael Bircher <[hidden email]>
wrote:

> +1 (binding)
>
> Am .10.2017, 18:01 Uhr, schrieb Luciano Resende <[hidden email]>:
>
> > Off course, my + 1
> >
> > On Thu, Oct 26, 2017 at 12:31 PM, Luciano Resende <[hidden email]>
> > wrote:
> >
> >> Now that the discussion thread on the Crail proposal has ended, please
> >> vote on accepting Crail into into the Apache Incubator.
> >>
> >> The ASF voting rules are described at:
> >>    http://www.apache.org/foundation/voting.html
> >>
> >> A vote for accepting a new Apache Incubator podling is a majority vote
> >> for which only Incubator PMC member votes are binding.
> >>
> >> Votes from other people are also welcome as an indication of peoples
> >> enthusiasm (or lack thereof).
> >>
> >> Please do not use this VOTE thread for discussions.
> >> If needed, start a new thread instead.
> >>
> >> This vote will run for at least 72 hours. Please VOTE as follows
> >> [] +1 Accept Crail into the Apache Incubator
> >> [] +0 Abstain.
> >> [] -1 Do not accept Crail into the Apache Incubator because ...
> >>
> >> The proposal below is also on the wiki:
> >> https://wiki.apache.org/incubator/CrailProposal
> >>
> >> ===
> >>
> >> Abstract
> >>
> >> Crail is a storage platform for sharing performance critical data in
> >> distributed data processing jobs at very high speed. Crail is built
> >> entirely upon principles of user-level I/O and specifically targets data
> >> center deployments with fast network and storage hardware (e.g., 100Gbps
> >> RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of
> >> operation
> >> such resource disaggregation or serverless computing. Crail is written
> >> in
> >> Java and integrates seamlessly with the Apache data processing
> >> ecosystem.
> >> It can be used as a backbone to accelerate high-level data operations
> >> such
> >> as shuffle or broadcast, or as a cache to store hot data that is queried
> >> repeatedly, or as a storage platform for sharing inter-job data in
> >> complex
> >> multi-job pipelines, etc.
> >>
> >> Proposal
> >>
> >> Crail enables Apache data processing frameworks to run efficiently in
> >> next
> >> generation data centers using fast storage and network hardware in
> >> combination with resource (e.g., DRAM, Flash) disaggregation.
> >>
> >> Background
> >>
> >> Crail started as a research project at the IBM Zurich Research
> >> Laboratory
> >> around 2014 aiming to integrate high-speed I/O hardware effectively into
> >> large scale data processing systems.
> >>
> >> Rational
> >>
> >> During the last decade, I/O hardware has undergone rapid performance
> >> improvements, typically in the order of magnitudes. Modern day
> >> networking
> >> and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a
> >> few
> >> microseconds of access latencies. However, despite such progress in raw
> >> I/O
> >> performance, effectively leveraging modern hardware in data processing
> >> frameworks remains challenging. In most of the cases, upgrading to
> >> high-end
> >> networking or storage hardware has very little effect on the
> >> performance of
> >> analytics workloads. The problem comes from heavily layered software
> >> imposing overheads such as deep call stacks, unnecessary data copies,
> >> thread contention, etc. These problems have already been addressed at
> >> the
> >> operating system level with new I/O APIs such as RDMA verbs, NVMe, etc.,
> >> allowing applications to bypass software layers during I/O operations.
> >> Distributed data processing frameworks on the other hand, are typically
> >> implemented on legacy I/O interfaces such as such as sockets or block
> >> storage. These interfaces have been shown to be insufficient to deliver
> >> the
> >> full hardware performance. Yet, to the best of our knowledge, there are
> >> no
> >> active and systematic efforts to integrate these new user level I/O APIs
> >> into Apache software frameworks. This problem affects all end-users and
> >> organizations that use Apache software. We expect them to see
> >> unsatisfactory small performance gains when upgrading their networking
> >> and
> >> storage hardware.
> >>
> >> Crail solves this problem by providing an efficient storage platform
> >> built
> >> upon user-level I/O, thus, bypassing layers such as JVM and OS during
> >> I/O
> >> operations. Moreover, Crail directly leverages the specific hardware
> >> features of RDMA and NVMe to provide a better integration with
> >> high-level
> >> data operations in Apache compute frameworks. As a consequence, Crail
> >> enables users to run larger, more complex queries against ever
> >> increasing
> >> amounts of data at a speed largely determined by the deployed hardware.
> >> Crail is generic solution that integrates well with the Apache ecosystem
> >> including frameworks like Spark, Hadoop, Hive, etc.
> >>
> >> Initial Goals
> >>
> >> The initial goals to move Crail to the Apache Incubator is to broaden
> >> the
> >> community, and foster contributions from developers to leverage Crail in
> >> various data processing frameworks and workloads. Ultimately, the goal
> >> for
> >> Crail is to become the de-facto standard platform for storing temporary
> >> performance critical data in distributed data processing systems.
> >>
> >> Current Status
> >>
> >> The initial code has been developed at the IBM Zurich Research Center
> >> and
> >> has recently been made available in GitHub under the Apache Software
> >> License 2.0. The Project currently has explicit support for Spark and
> >> Hadoop. Project documentation is available on the website www.crail.io.
> >> There is also a public forum for discussions related to Crail available
> >> at
> >> https://groups.google.com/forum/#!forum/zrlio-users.
> >>
> >> Mericrotacy
> >>
> >> The current developers are familiar with the meritocratic open source
> >> development process at Apache. Over the last year, the project has
> >> gathered
> >> interest at GitHub and several companies have already expressed
> >> interest in
> >> the project. We plan to invest in supporting a meritocracy by inviting
> >> additional developers to participate.
> >>
> >> Community
> >>
> >> The need for a generic solution to integrate high-performance I/O
> >> hardware
> >> in the open source is tremendous, so there is a potential for a very
> >> large
> >> community. We believe that Crail’s extensible architecture and its
> >> alignment with the Apache Ecosystem will further encourage community
> >> participation. We expect that over time Crail will attract a large
> >> community.
> >>
> >> Alignment
> >>
> >> Crail is written in Java and is built for the Apache data processing
> >> ecosystem. The basic storage services of Crail can be used seamlessly
> >> from
> >> Spark, Hadoop, Storm. The enhanced storage services require dedicated
> >> data
> >> processing specific binding, which currently are available only for
> >> Spark.
> >> We think that moving Crail to the Apache incubator will help to extend
> >> Crail’s support for different data processing frameworks.
> >>
> >> Known Risks
> >>
> >> To-date, development has been sponsored by IBM and coordinated mostly by
> >> the core team of researchers at the IBM Zurich Research Center. For
> >> Crail
> >> to fully transition to an "Apache Way" governance model, it needs to
> >> start
> >> embracing the meritocracy-centric way of growing the community of
> >> contributors.
> >>
> >> Orphaned Products
> >>
> >> The Crail developers have a long-term interest in use and maintenance of
> >> the code and there is also hope that growing a diverse community around
> >> the
> >> project will become a guarantee against the project becoming orphaned.
> >> We
> >> feel that it is also important to put formal governance in place both
> >> for
> >> the project and the contributors as the project expands. We feel ASF is
> >> the
> >> best location for this.
> >>
> >> Inexperience with Open Source
> >>
> >> Several of the initial committers are experienced open source developers
> >> (Linux Kernel, DPDK, etc.).
> >>
> >> Relationships with Other Apache Products
> >>
> >> As of now, Crail has been tested with Spark, Hadoop and Hive, but it is
> >> designed to integrate with any of the Apache data processing frameworks.
> >>
> >> Homogeneous Developers
> >>
> >> The project already has a diverse developer base including contributions
> >> from organizations and public developers.
> >>
> >> An Excessive Fascination with the Apache Brand
> >>
> >> Crail solves a real need for a generic approach to leverage modern
> >> network
> >> and storage hardware effectively in the Apache Hadoop and Spark
> >> ecosystems.
> >> Our rationale for developing Crail as an Apache project is detailed in
> >> the
> >> Rationale section. We believe that the Apache brand and community
> >> process
> >> will help to us to engage a larger community and facilitate closer ties
> >> with various Apache data processing projects.
> >>
> >> Documentation
> >>
> >> Documentation regarding Crail is available at www.crail.io
> >>
> >> Initial Source
> >>
> >> Initial source is available on GitHub under the Apache License 2.0:
> >>
> >> https://github.com/zrlio/crail
> >> External Dependencies
> >>
> >> Crail is written in Java and currently supports Apache Hadoop MapReduce
> >> and Apache Spark runtimes. To the best of our knowledge, all
> >> dependencies
> >> of Crail are distributed under Apache compatible licenses.
> >>
> >> Required Resource
> >>
> >> Mailing lists
> >>
> >> [hidden email]
> >> [hidden email]
> >> [hidden email]
> >> Git repository
> >>
> >> https://git-wip-us.apache.org/repos/asf/incubator-crail.git
> >> Issue Tracking
> >>
> >> JIRA (Crail)
> >> Initial Committers
> >>
> >> Patrick Stuedi <stu AT ibm DOT zurich DOT com>
> >> Animesh Trivedi <atr AT ibm DOT zurich DOT com>
> >> Jonas Pfefferle <jpf AT ibm DOT zurich DOT com>
> >> Bernard Metzler <bmt AT ibm DOT zurich DOT com>
> >> Michael Kaufmann <kau AT ibm DOT zurich DOT com>
> >> Adrian Schuepbach <dri AT ibm DOT zurich DOT com>
> >> Patrick McArthur <patrick AT patrickmcarthur DOT net>
> >> Ana Klimovic <anakli AT stanford DOT edu>
> >> Yuval Degani <yuvaldeg AT mellanox DOT com>
> >> Vu Pham <vuhuong AT mellanox DOT com>
> >> Affiliations
> >>
> >> IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard Metzler,
> >> Michael Kaufmann, Adrian Schuepbach)
> >> University of New Hampshire (Patrick McArthur)
> >> Stanford University (Ana Klimovic)
> >> Mellanox (Yuval Degani, Vu Pham)
> >> Sponsors
> >>
> >> Champion
> >>
> >> Luciano Resende <lresende AT apache DOT org>
> >>
> >> Nominated Mentors
> >>
> >> Luciano Resende <lresende AT apache DOT org>
> >>
> >> Raphael Bircher <rbircher AT apache DOT org>
> >>
> >> Julian Hyde <jhyde AT apache DOT org>
> >>
> >> Sponsoring Entity
> >>
> >> We would like to propose the Apache Incubator to sponsor this project.
> >>
> >>
> >> --
> >> Luciano Resende
> >> http://twitter.com/lresende1975
> >> http://lresende.blogspot.com/
> >>
> >
> >
> >
>
>
> --
> My introduction https://youtu.be/Ln4vly5sxYU
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
> --
Pierre Smits

ORRTIZ.COM <http://www.orrtiz.com>
OFBiz based solutions & services

OFBiz Extensions Marketplace
http://oem.ofbizci.net/oci-2/
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Accept Crail into the Apache Incubator

Willem Jiang
+1 (binding)


Willem Jiang

Blog: http://willemjiang.blogspot.com (English)
          http://jnn.iteye.com  (Chinese)
Twitter: willemjiang
Weibo: 姜宁willem

On Sat, Oct 28, 2017 at 2:12 AM, Pierre Smits <[hidden email]>
wrote:

> +1
>
> Best regards
>
> Pierre
>
> On Fri, 27 Oct 2017 at 13:57 Raphael Bircher <[hidden email]>
> wrote:
>
> > +1 (binding)
> >
> > Am .10.2017, 18:01 Uhr, schrieb Luciano Resende <[hidden email]>:
> >
> > > Off course, my + 1
> > >
> > > On Thu, Oct 26, 2017 at 12:31 PM, Luciano Resende <
> [hidden email]>
> > > wrote:
> > >
> > >> Now that the discussion thread on the Crail proposal has ended, please
> > >> vote on accepting Crail into into the Apache Incubator.
> > >>
> > >> The ASF voting rules are described at:
> > >>    http://www.apache.org/foundation/voting.html
> > >>
> > >> A vote for accepting a new Apache Incubator podling is a majority vote
> > >> for which only Incubator PMC member votes are binding.
> > >>
> > >> Votes from other people are also welcome as an indication of peoples
> > >> enthusiasm (or lack thereof).
> > >>
> > >> Please do not use this VOTE thread for discussions.
> > >> If needed, start a new thread instead.
> > >>
> > >> This vote will run for at least 72 hours. Please VOTE as follows
> > >> [] +1 Accept Crail into the Apache Incubator
> > >> [] +0 Abstain.
> > >> [] -1 Do not accept Crail into the Apache Incubator because ...
> > >>
> > >> The proposal below is also on the wiki:
> > >> https://wiki.apache.org/incubator/CrailProposal
> > >>
> > >> ===
> > >>
> > >> Abstract
> > >>
> > >> Crail is a storage platform for sharing performance critical data in
> > >> distributed data processing jobs at very high speed. Crail is built
> > >> entirely upon principles of user-level I/O and specifically targets
> data
> > >> center deployments with fast network and storage hardware (e.g.,
> 100Gbps
> > >> RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of
> > >> operation
> > >> such resource disaggregation or serverless computing. Crail is written
> > >> in
> > >> Java and integrates seamlessly with the Apache data processing
> > >> ecosystem.
> > >> It can be used as a backbone to accelerate high-level data operations
> > >> such
> > >> as shuffle or broadcast, or as a cache to store hot data that is
> queried
> > >> repeatedly, or as a storage platform for sharing inter-job data in
> > >> complex
> > >> multi-job pipelines, etc.
> > >>
> > >> Proposal
> > >>
> > >> Crail enables Apache data processing frameworks to run efficiently in
> > >> next
> > >> generation data centers using fast storage and network hardware in
> > >> combination with resource (e.g., DRAM, Flash) disaggregation.
> > >>
> > >> Background
> > >>
> > >> Crail started as a research project at the IBM Zurich Research
> > >> Laboratory
> > >> around 2014 aiming to integrate high-speed I/O hardware effectively
> into
> > >> large scale data processing systems.
> > >>
> > >> Rational
> > >>
> > >> During the last decade, I/O hardware has undergone rapid performance
> > >> improvements, typically in the order of magnitudes. Modern day
> > >> networking
> > >> and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a
> > >> few
> > >> microseconds of access latencies. However, despite such progress in
> raw
> > >> I/O
> > >> performance, effectively leveraging modern hardware in data processing
> > >> frameworks remains challenging. In most of the cases, upgrading to
> > >> high-end
> > >> networking or storage hardware has very little effect on the
> > >> performance of
> > >> analytics workloads. The problem comes from heavily layered software
> > >> imposing overheads such as deep call stacks, unnecessary data copies,
> > >> thread contention, etc. These problems have already been addressed at
> > >> the
> > >> operating system level with new I/O APIs such as RDMA verbs, NVMe,
> etc.,
> > >> allowing applications to bypass software layers during I/O operations.
> > >> Distributed data processing frameworks on the other hand, are
> typically
> > >> implemented on legacy I/O interfaces such as such as sockets or block
> > >> storage. These interfaces have been shown to be insufficient to
> deliver
> > >> the
> > >> full hardware performance. Yet, to the best of our knowledge, there
> are
> > >> no
> > >> active and systematic efforts to integrate these new user level I/O
> APIs
> > >> into Apache software frameworks. This problem affects all end-users
> and
> > >> organizations that use Apache software. We expect them to see
> > >> unsatisfactory small performance gains when upgrading their networking
> > >> and
> > >> storage hardware.
> > >>
> > >> Crail solves this problem by providing an efficient storage platform
> > >> built
> > >> upon user-level I/O, thus, bypassing layers such as JVM and OS during
> > >> I/O
> > >> operations. Moreover, Crail directly leverages the specific hardware
> > >> features of RDMA and NVMe to provide a better integration with
> > >> high-level
> > >> data operations in Apache compute frameworks. As a consequence, Crail
> > >> enables users to run larger, more complex queries against ever
> > >> increasing
> > >> amounts of data at a speed largely determined by the deployed
> hardware.
> > >> Crail is generic solution that integrates well with the Apache
> ecosystem
> > >> including frameworks like Spark, Hadoop, Hive, etc.
> > >>
> > >> Initial Goals
> > >>
> > >> The initial goals to move Crail to the Apache Incubator is to broaden
> > >> the
> > >> community, and foster contributions from developers to leverage Crail
> in
> > >> various data processing frameworks and workloads. Ultimately, the goal
> > >> for
> > >> Crail is to become the de-facto standard platform for storing
> temporary
> > >> performance critical data in distributed data processing systems.
> > >>
> > >> Current Status
> > >>
> > >> The initial code has been developed at the IBM Zurich Research Center
> > >> and
> > >> has recently been made available in GitHub under the Apache Software
> > >> License 2.0. The Project currently has explicit support for Spark and
> > >> Hadoop. Project documentation is available on the website
> www.crail.io.
> > >> There is also a public forum for discussions related to Crail
> available
> > >> at
> > >> https://groups.google.com/forum/#!forum/zrlio-users.
> > >>
> > >> Mericrotacy
> > >>
> > >> The current developers are familiar with the meritocratic open source
> > >> development process at Apache. Over the last year, the project has
> > >> gathered
> > >> interest at GitHub and several companies have already expressed
> > >> interest in
> > >> the project. We plan to invest in supporting a meritocracy by inviting
> > >> additional developers to participate.
> > >>
> > >> Community
> > >>
> > >> The need for a generic solution to integrate high-performance I/O
> > >> hardware
> > >> in the open source is tremendous, so there is a potential for a very
> > >> large
> > >> community. We believe that Crail’s extensible architecture and its
> > >> alignment with the Apache Ecosystem will further encourage community
> > >> participation. We expect that over time Crail will attract a large
> > >> community.
> > >>
> > >> Alignment
> > >>
> > >> Crail is written in Java and is built for the Apache data processing
> > >> ecosystem. The basic storage services of Crail can be used seamlessly
> > >> from
> > >> Spark, Hadoop, Storm. The enhanced storage services require dedicated
> > >> data
> > >> processing specific binding, which currently are available only for
> > >> Spark.
> > >> We think that moving Crail to the Apache incubator will help to extend
> > >> Crail’s support for different data processing frameworks.
> > >>
> > >> Known Risks
> > >>
> > >> To-date, development has been sponsored by IBM and coordinated mostly
> by
> > >> the core team of researchers at the IBM Zurich Research Center. For
> > >> Crail
> > >> to fully transition to an "Apache Way" governance model, it needs to
> > >> start
> > >> embracing the meritocracy-centric way of growing the community of
> > >> contributors.
> > >>
> > >> Orphaned Products
> > >>
> > >> The Crail developers have a long-term interest in use and maintenance
> of
> > >> the code and there is also hope that growing a diverse community
> around
> > >> the
> > >> project will become a guarantee against the project becoming orphaned.
> > >> We
> > >> feel that it is also important to put formal governance in place both
> > >> for
> > >> the project and the contributors as the project expands. We feel ASF
> is
> > >> the
> > >> best location for this.
> > >>
> > >> Inexperience with Open Source
> > >>
> > >> Several of the initial committers are experienced open source
> developers
> > >> (Linux Kernel, DPDK, etc.).
> > >>
> > >> Relationships with Other Apache Products
> > >>
> > >> As of now, Crail has been tested with Spark, Hadoop and Hive, but it
> is
> > >> designed to integrate with any of the Apache data processing
> frameworks.
> > >>
> > >> Homogeneous Developers
> > >>
> > >> The project already has a diverse developer base including
> contributions
> > >> from organizations and public developers.
> > >>
> > >> An Excessive Fascination with the Apache Brand
> > >>
> > >> Crail solves a real need for a generic approach to leverage modern
> > >> network
> > >> and storage hardware effectively in the Apache Hadoop and Spark
> > >> ecosystems.
> > >> Our rationale for developing Crail as an Apache project is detailed in
> > >> the
> > >> Rationale section. We believe that the Apache brand and community
> > >> process
> > >> will help to us to engage a larger community and facilitate closer
> ties
> > >> with various Apache data processing projects.
> > >>
> > >> Documentation
> > >>
> > >> Documentation regarding Crail is available at www.crail.io
> > >>
> > >> Initial Source
> > >>
> > >> Initial source is available on GitHub under the Apache License 2.0:
> > >>
> > >> https://github.com/zrlio/crail
> > >> External Dependencies
> > >>
> > >> Crail is written in Java and currently supports Apache Hadoop
> MapReduce
> > >> and Apache Spark runtimes. To the best of our knowledge, all
> > >> dependencies
> > >> of Crail are distributed under Apache compatible licenses.
> > >>
> > >> Required Resource
> > >>
> > >> Mailing lists
> > >>
> > >> [hidden email]
> > >> [hidden email]
> > >> [hidden email]
> > >> Git repository
> > >>
> > >> https://git-wip-us.apache.org/repos/asf/incubator-crail.git
> > >> Issue Tracking
> > >>
> > >> JIRA (Crail)
> > >> Initial Committers
> > >>
> > >> Patrick Stuedi <stu AT ibm DOT zurich DOT com>
> > >> Animesh Trivedi <atr AT ibm DOT zurich DOT com>
> > >> Jonas Pfefferle <jpf AT ibm DOT zurich DOT com>
> > >> Bernard Metzler <bmt AT ibm DOT zurich DOT com>
> > >> Michael Kaufmann <kau AT ibm DOT zurich DOT com>
> > >> Adrian Schuepbach <dri AT ibm DOT zurich DOT com>
> > >> Patrick McArthur <patrick AT patrickmcarthur DOT net>
> > >> Ana Klimovic <anakli AT stanford DOT edu>
> > >> Yuval Degani <yuvaldeg AT mellanox DOT com>
> > >> Vu Pham <vuhuong AT mellanox DOT com>
> > >> Affiliations
> > >>
> > >> IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard
> Metzler,
> > >> Michael Kaufmann, Adrian Schuepbach)
> > >> University of New Hampshire (Patrick McArthur)
> > >> Stanford University (Ana Klimovic)
> > >> Mellanox (Yuval Degani, Vu Pham)
> > >> Sponsors
> > >>
> > >> Champion
> > >>
> > >> Luciano Resende <lresende AT apache DOT org>
> > >>
> > >> Nominated Mentors
> > >>
> > >> Luciano Resende <lresende AT apache DOT org>
> > >>
> > >> Raphael Bircher <rbircher AT apache DOT org>
> > >>
> > >> Julian Hyde <jhyde AT apache DOT org>
> > >>
> > >> Sponsoring Entity
> > >>
> > >> We would like to propose the Apache Incubator to sponsor this project.
> > >>
> > >>
> > >> --
> > >> Luciano Resende
> > >> http://twitter.com/lresende1975
> > >> http://lresende.blogspot.com/
> > >>
> > >
> > >
> > >
> >
> >
> > --
> > My introduction https://youtu.be/Ln4vly5sxYU
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> > --
> Pierre Smits
>
> ORRTIZ.COM <http://www.orrtiz.com>
> OFBiz based solutions & services
>
> OFBiz Extensions Marketplace
> http://oem.ofbizci.net/oci-2/
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Accept Crail into the Apache Incubator

Luciano Resende
In reply to this post by Luciano Resende
On Thu, Oct 26, 2017 at 8:31 AM, Luciano Resende <[hidden email]>
wrote:

> Now that the discussion thread on the Crail proposal has ended, please
> vote on accepting Crail into into the Apache Incubator.
>
> The ASF voting rules are described at:
>    http://www.apache.org/foundation/voting.html
>
> A vote for accepting a new Apache Incubator podling is a majority vote
> for which only Incubator PMC member votes are binding.
>
> Votes from other people are also welcome as an indication of peoples
> enthusiasm (or lack thereof).
>
> Please do not use this VOTE thread for discussions.
> If needed, start a new thread instead.
>
> This vote will run for at least 72 hours. Please VOTE as follows
> [] +1 Accept Crail into the Apache Incubator
> [] +0 Abstain.
> [] -1 Do not accept Crail into the Apache Incubator because ...
>
> The proposal below is also on the wiki:
> https://wiki.apache.org/incubator/CrailProposal
>
> ===
>
> Abstract
>
> Crail is a storage platform for sharing performance critical data in
> distributed data processing jobs at very high speed. Crail is built
> entirely upon principles of user-level I/O and specifically targets data
> center deployments with fast network and storage hardware (e.g., 100Gbps
> RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of operation
> such resource disaggregation or serverless computing. Crail is written in
> Java and integrates seamlessly with the Apache data processing ecosystem.
> It can be used as a backbone to accelerate high-level data operations such
> as shuffle or broadcast, or as a cache to store hot data that is queried
> repeatedly, or as a storage platform for sharing inter-job data in complex
> multi-job pipelines, etc.
>
> Proposal
>
> Crail enables Apache data processing frameworks to run efficiently in next
> generation data centers using fast storage and network hardware in
> combination with resource (e.g., DRAM, Flash) disaggregation.
>
> Background
>
> Crail started as a research project at the IBM Zurich Research Laboratory
> around 2014 aiming to integrate high-speed I/O hardware effectively into
> large scale data processing systems.
>
> Rational
>
> During the last decade, I/O hardware has undergone rapid performance
> improvements, typically in the order of magnitudes. Modern day networking
> and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a few
> microseconds of access latencies. However, despite such progress in raw I/O
> performance, effectively leveraging modern hardware in data processing
> frameworks remains challenging. In most of the cases, upgrading to high-end
> networking or storage hardware has very little effect on the performance of
> analytics workloads. The problem comes from heavily layered software
> imposing overheads such as deep call stacks, unnecessary data copies,
> thread contention, etc. These problems have already been addressed at the
> operating system level with new I/O APIs such as RDMA verbs, NVMe, etc.,
> allowing applications to bypass software layers during I/O operations.
> Distributed data processing frameworks on the other hand, are typically
> implemented on legacy I/O interfaces such as such as sockets or block
> storage. These interfaces have been shown to be insufficient to deliver the
> full hardware performance. Yet, to the best of our knowledge, there are no
> active and systematic efforts to integrate these new user level I/O APIs
> into Apache software frameworks. This problem affects all end-users and
> organizations that use Apache software. We expect them to see
> unsatisfactory small performance gains when upgrading their networking and
> storage hardware.
>
> Crail solves this problem by providing an efficient storage platform built
> upon user-level I/O, thus, bypassing layers such as JVM and OS during I/O
> operations. Moreover, Crail directly leverages the specific hardware
> features of RDMA and NVMe to provide a better integration with high-level
> data operations in Apache compute frameworks. As a consequence, Crail
> enables users to run larger, more complex queries against ever increasing
> amounts of data at a speed largely determined by the deployed hardware.
> Crail is generic solution that integrates well with the Apache ecosystem
> including frameworks like Spark, Hadoop, Hive, etc.
>
> Initial Goals
>
> The initial goals to move Crail to the Apache Incubator is to broaden the
> community, and foster contributions from developers to leverage Crail in
> various data processing frameworks and workloads. Ultimately, the goal for
> Crail is to become the de-facto standard platform for storing temporary
> performance critical data in distributed data processing systems.
>
> Current Status
>
> The initial code has been developed at the IBM Zurich Research Center and
> has recently been made available in GitHub under the Apache Software
> License 2.0. The Project currently has explicit support for Spark and
> Hadoop. Project documentation is available on the website www.crail.io.
> There is also a public forum for discussions related to Crail available at
> https://groups.google.com/forum/#!forum/zrlio-users.
>
> Mericrotacy
>
> The current developers are familiar with the meritocratic open source
> development process at Apache. Over the last year, the project has gathered
> interest at GitHub and several companies have already expressed interest in
> the project. We plan to invest in supporting a meritocracy by inviting
> additional developers to participate.
>
> Community
>
> The need for a generic solution to integrate high-performance I/O hardware
> in the open source is tremendous, so there is a potential for a very large
> community. We believe that Crail’s extensible architecture and its
> alignment with the Apache Ecosystem will further encourage community
> participation. We expect that over time Crail will attract a large
> community.
>
> Alignment
>
> Crail is written in Java and is built for the Apache data processing
> ecosystem. The basic storage services of Crail can be used seamlessly from
> Spark, Hadoop, Storm. The enhanced storage services require dedicated data
> processing specific binding, which currently are available only for Spark.
> We think that moving Crail to the Apache incubator will help to extend
> Crail’s support for different data processing frameworks.
>
> Known Risks
>
> To-date, development has been sponsored by IBM and coordinated mostly by
> the core team of researchers at the IBM Zurich Research Center. For Crail
> to fully transition to an "Apache Way" governance model, it needs to start
> embracing the meritocracy-centric way of growing the community of
> contributors.
>
> Orphaned Products
>
> The Crail developers have a long-term interest in use and maintenance of
> the code and there is also hope that growing a diverse community around the
> project will become a guarantee against the project becoming orphaned. We
> feel that it is also important to put formal governance in place both for
> the project and the contributors as the project expands. We feel ASF is the
> best location for this.
>
> Inexperience with Open Source
>
> Several of the initial committers are experienced open source developers
> (Linux Kernel, DPDK, etc.).
>
> Relationships with Other Apache Products
>
> As of now, Crail has been tested with Spark, Hadoop and Hive, but it is
> designed to integrate with any of the Apache data processing frameworks.
>
> Homogeneous Developers
>
> The project already has a diverse developer base including contributions
> from organizations and public developers.
>
> An Excessive Fascination with the Apache Brand
>
> Crail solves a real need for a generic approach to leverage modern network
> and storage hardware effectively in the Apache Hadoop and Spark ecosystems.
> Our rationale for developing Crail as an Apache project is detailed in the
> Rationale section. We believe that the Apache brand and community process
> will help to us to engage a larger community and facilitate closer ties
> with various Apache data processing projects.
>
> Documentation
>
> Documentation regarding Crail is available at www.crail.io
>
> Initial Source
>
> Initial source is available on GitHub under the Apache License 2.0:
>
> https://github.com/zrlio/crail
> External Dependencies
>
> Crail is written in Java and currently supports Apache Hadoop MapReduce
> and Apache Spark runtimes. To the best of our knowledge, all dependencies
> of Crail are distributed under Apache compatible licenses.
>
> Required Resource
>
> Mailing lists
>
> [hidden email]
> [hidden email]
> [hidden email]
> Git repository
>
> https://git-wip-us.apache.org/repos/asf/incubator-crail.git
> Issue Tracking
>
> JIRA (Crail)
> Initial Committers
>
> Patrick Stuedi <stu AT ibm DOT zurich DOT com>
> Animesh Trivedi <atr AT ibm DOT zurich DOT com>
> Jonas Pfefferle <jpf AT ibm DOT zurich DOT com>
> Bernard Metzler <bmt AT ibm DOT zurich DOT com>
> Michael Kaufmann <kau AT ibm DOT zurich DOT com>
> Adrian Schuepbach <dri AT ibm DOT zurich DOT com>
> Patrick McArthur <patrick AT patrickmcarthur DOT net>
> Ana Klimovic <anakli AT stanford DOT edu>
> Yuval Degani <yuvaldeg AT mellanox DOT com>
> Vu Pham <vuhuong AT mellanox DOT com>
> Affiliations
>
> IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard Metzler,
> Michael Kaufmann, Adrian Schuepbach)
> University of New Hampshire (Patrick McArthur)
> Stanford University (Ana Klimovic)
> Mellanox (Yuval Degani, Vu Pham)
> Sponsors
>
> Champion
>
> Luciano Resende <lresende AT apache DOT org>
>
> Nominated Mentors
>
> Luciano Resende <lresende AT apache DOT org>
>
> Raphael Bircher <rbircher AT apache DOT org>
>
> Julian Hyde <jhyde AT apache DOT org>
>
> Sponsoring Entity
>
> We would like to propose the Apache Incubator to sponsor this project.
>
>
>

The vote has passed with 5 binding + 1 from:

Luciano Resende
Julian Hyde
Raphael Bircher
Willem Jiang
Dave Fisher

And 5 non-binding +1 from

Clebert Suconic
Gang(Gary) Wang
Debo Dutta (dedutta)
Kacie Karo
Pierre Smits

Thanks and Welcome to the Apache Incubator.

--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Accept Crail into the Apache Incubator

patrick stuedi
Wow..great! In the name of the Crail team I want to thank Luciano,
Raphael and Julian for supporting us along the way and bringing the
project on for vote. I also want to thank everyone that voted for
Crail. We are looking forward to becoming an incubator project, and we
invite new contributors to join the project, there is plenty of
interesting stuff that can be done!

-Patrick

On Wed, Nov 1, 2017 at 2:42 PM, Luciano Resende <[hidden email]> wrote:

> On Thu, Oct 26, 2017 at 8:31 AM, Luciano Resende <[hidden email]>
> wrote:
>
>> Now that the discussion thread on the Crail proposal has ended, please
>> vote on accepting Crail into into the Apache Incubator.
>>
>> The ASF voting rules are described at:
>>    http://www.apache.org/foundation/voting.html
>>
>> A vote for accepting a new Apache Incubator podling is a majority vote
>> for which only Incubator PMC member votes are binding.
>>
>> Votes from other people are also welcome as an indication of peoples
>> enthusiasm (or lack thereof).
>>
>> Please do not use this VOTE thread for discussions.
>> If needed, start a new thread instead.
>>
>> This vote will run for at least 72 hours. Please VOTE as follows
>> [] +1 Accept Crail into the Apache Incubator
>> [] +0 Abstain.
>> [] -1 Do not accept Crail into the Apache Incubator because ...
>>
>> The proposal below is also on the wiki:
>> https://wiki.apache.org/incubator/CrailProposal
>>
>> ===
>>
>> Abstract
>>
>> Crail is a storage platform for sharing performance critical data in
>> distributed data processing jobs at very high speed. Crail is built
>> entirely upon principles of user-level I/O and specifically targets data
>> center deployments with fast network and storage hardware (e.g., 100Gbps
>> RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of operation
>> such resource disaggregation or serverless computing. Crail is written in
>> Java and integrates seamlessly with the Apache data processing ecosystem.
>> It can be used as a backbone to accelerate high-level data operations such
>> as shuffle or broadcast, or as a cache to store hot data that is queried
>> repeatedly, or as a storage platform for sharing inter-job data in complex
>> multi-job pipelines, etc.
>>
>> Proposal
>>
>> Crail enables Apache data processing frameworks to run efficiently in next
>> generation data centers using fast storage and network hardware in
>> combination with resource (e.g., DRAM, Flash) disaggregation.
>>
>> Background
>>
>> Crail started as a research project at the IBM Zurich Research Laboratory
>> around 2014 aiming to integrate high-speed I/O hardware effectively into
>> large scale data processing systems.
>>
>> Rational
>>
>> During the last decade, I/O hardware has undergone rapid performance
>> improvements, typically in the order of magnitudes. Modern day networking
>> and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a few
>> microseconds of access latencies. However, despite such progress in raw I/O
>> performance, effectively leveraging modern hardware in data processing
>> frameworks remains challenging. In most of the cases, upgrading to high-end
>> networking or storage hardware has very little effect on the performance of
>> analytics workloads. The problem comes from heavily layered software
>> imposing overheads such as deep call stacks, unnecessary data copies,
>> thread contention, etc. These problems have already been addressed at the
>> operating system level with new I/O APIs such as RDMA verbs, NVMe, etc.,
>> allowing applications to bypass software layers during I/O operations.
>> Distributed data processing frameworks on the other hand, are typically
>> implemented on legacy I/O interfaces such as such as sockets or block
>> storage. These interfaces have been shown to be insufficient to deliver the
>> full hardware performance. Yet, to the best of our knowledge, there are no
>> active and systematic efforts to integrate these new user level I/O APIs
>> into Apache software frameworks. This problem affects all end-users and
>> organizations that use Apache software. We expect them to see
>> unsatisfactory small performance gains when upgrading their networking and
>> storage hardware.
>>
>> Crail solves this problem by providing an efficient storage platform built
>> upon user-level I/O, thus, bypassing layers such as JVM and OS during I/O
>> operations. Moreover, Crail directly leverages the specific hardware
>> features of RDMA and NVMe to provide a better integration with high-level
>> data operations in Apache compute frameworks. As a consequence, Crail
>> enables users to run larger, more complex queries against ever increasing
>> amounts of data at a speed largely determined by the deployed hardware.
>> Crail is generic solution that integrates well with the Apache ecosystem
>> including frameworks like Spark, Hadoop, Hive, etc.
>>
>> Initial Goals
>>
>> The initial goals to move Crail to the Apache Incubator is to broaden the
>> community, and foster contributions from developers to leverage Crail in
>> various data processing frameworks and workloads. Ultimately, the goal for
>> Crail is to become the de-facto standard platform for storing temporary
>> performance critical data in distributed data processing systems.
>>
>> Current Status
>>
>> The initial code has been developed at the IBM Zurich Research Center and
>> has recently been made available in GitHub under the Apache Software
>> License 2.0. The Project currently has explicit support for Spark and
>> Hadoop. Project documentation is available on the website www.crail.io.
>> There is also a public forum for discussions related to Crail available at
>> https://groups.google.com/forum/#!forum/zrlio-users.
>>
>> Mericrotacy
>>
>> The current developers are familiar with the meritocratic open source
>> development process at Apache. Over the last year, the project has gathered
>> interest at GitHub and several companies have already expressed interest in
>> the project. We plan to invest in supporting a meritocracy by inviting
>> additional developers to participate.
>>
>> Community
>>
>> The need for a generic solution to integrate high-performance I/O hardware
>> in the open source is tremendous, so there is a potential for a very large
>> community. We believe that Crail’s extensible architecture and its
>> alignment with the Apache Ecosystem will further encourage community
>> participation. We expect that over time Crail will attract a large
>> community.
>>
>> Alignment
>>
>> Crail is written in Java and is built for the Apache data processing
>> ecosystem. The basic storage services of Crail can be used seamlessly from
>> Spark, Hadoop, Storm. The enhanced storage services require dedicated data
>> processing specific binding, which currently are available only for Spark.
>> We think that moving Crail to the Apache incubator will help to extend
>> Crail’s support for different data processing frameworks.
>>
>> Known Risks
>>
>> To-date, development has been sponsored by IBM and coordinated mostly by
>> the core team of researchers at the IBM Zurich Research Center. For Crail
>> to fully transition to an "Apache Way" governance model, it needs to start
>> embracing the meritocracy-centric way of growing the community of
>> contributors.
>>
>> Orphaned Products
>>
>> The Crail developers have a long-term interest in use and maintenance of
>> the code and there is also hope that growing a diverse community around the
>> project will become a guarantee against the project becoming orphaned. We
>> feel that it is also important to put formal governance in place both for
>> the project and the contributors as the project expands. We feel ASF is the
>> best location for this.
>>
>> Inexperience with Open Source
>>
>> Several of the initial committers are experienced open source developers
>> (Linux Kernel, DPDK, etc.).
>>
>> Relationships with Other Apache Products
>>
>> As of now, Crail has been tested with Spark, Hadoop and Hive, but it is
>> designed to integrate with any of the Apache data processing frameworks.
>>
>> Homogeneous Developers
>>
>> The project already has a diverse developer base including contributions
>> from organizations and public developers.
>>
>> An Excessive Fascination with the Apache Brand
>>
>> Crail solves a real need for a generic approach to leverage modern network
>> and storage hardware effectively in the Apache Hadoop and Spark ecosystems.
>> Our rationale for developing Crail as an Apache project is detailed in the
>> Rationale section. We believe that the Apache brand and community process
>> will help to us to engage a larger community and facilitate closer ties
>> with various Apache data processing projects.
>>
>> Documentation
>>
>> Documentation regarding Crail is available at www.crail.io
>>
>> Initial Source
>>
>> Initial source is available on GitHub under the Apache License 2.0:
>>
>> https://github.com/zrlio/crail
>> External Dependencies
>>
>> Crail is written in Java and currently supports Apache Hadoop MapReduce
>> and Apache Spark runtimes. To the best of our knowledge, all dependencies
>> of Crail are distributed under Apache compatible licenses.
>>
>> Required Resource
>>
>> Mailing lists
>>
>> [hidden email]
>> [hidden email]
>> [hidden email]
>> Git repository
>>
>> https://git-wip-us.apache.org/repos/asf/incubator-crail.git
>> Issue Tracking
>>
>> JIRA (Crail)
>> Initial Committers
>>
>> Patrick Stuedi <stu AT ibm DOT zurich DOT com>
>> Animesh Trivedi <atr AT ibm DOT zurich DOT com>
>> Jonas Pfefferle <jpf AT ibm DOT zurich DOT com>
>> Bernard Metzler <bmt AT ibm DOT zurich DOT com>
>> Michael Kaufmann <kau AT ibm DOT zurich DOT com>
>> Adrian Schuepbach <dri AT ibm DOT zurich DOT com>
>> Patrick McArthur <patrick AT patrickmcarthur DOT net>
>> Ana Klimovic <anakli AT stanford DOT edu>
>> Yuval Degani <yuvaldeg AT mellanox DOT com>
>> Vu Pham <vuhuong AT mellanox DOT com>
>> Affiliations
>>
>> IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard Metzler,
>> Michael Kaufmann, Adrian Schuepbach)
>> University of New Hampshire (Patrick McArthur)
>> Stanford University (Ana Klimovic)
>> Mellanox (Yuval Degani, Vu Pham)
>> Sponsors
>>
>> Champion
>>
>> Luciano Resende <lresende AT apache DOT org>
>>
>> Nominated Mentors
>>
>> Luciano Resende <lresende AT apache DOT org>
>>
>> Raphael Bircher <rbircher AT apache DOT org>
>>
>> Julian Hyde <jhyde AT apache DOT org>
>>
>> Sponsoring Entity
>>
>> We would like to propose the Apache Incubator to sponsor this project.
>>
>>
>>
>
> The vote has passed with 5 binding + 1 from:
>
> Luciano Resende
> Julian Hyde
> Raphael Bircher
> Willem Jiang
> Dave Fisher
>
> And 5 non-binding +1 from
>
> Clebert Suconic
> Gang(Gary) Wang
> Debo Dutta (dedutta)
> Kacie Karo
> Pierre Smits
>
> Thanks and Welcome to the Apache Incubator.
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[RESULT] [VOTE] Accept Crail into the Apache Incubator

Craig Russell-3
In reply to this post by Luciano Resende
Subject line change to close the vote.

> On Nov 1, 2017, at 6:42 AM, Luciano Resende <[hidden email]> wrote:
>
> On Thu, Oct 26, 2017 at 8:31 AM, Luciano Resende <[hidden email]>
> wrote:
>
>> Now that the discussion thread on the Crail proposal has ended, please
>> vote on accepting Crail into into the Apache Incubator.
>>
>> The ASF voting rules are described at:
>>   http://www.apache.org/foundation/voting.html
>>
>> A vote for accepting a new Apache Incubator podling is a majority vote
>> for which only Incubator PMC member votes are binding.
>>
>> Votes from other people are also welcome as an indication of peoples
>> enthusiasm (or lack thereof).
>>
>> Please do not use this VOTE thread for discussions.
>> If needed, start a new thread instead.
>>
>> This vote will run for at least 72 hours. Please VOTE as follows
>> [] +1 Accept Crail into the Apache Incubator
>> [] +0 Abstain.
>> [] -1 Do not accept Crail into the Apache Incubator because ...
>>
>> The proposal below is also on the wiki:
>> https://wiki.apache.org/incubator/CrailProposal
>>
>> ===
>>
>> Abstract
>>
>> Crail is a storage platform for sharing performance critical data in
>> distributed data processing jobs at very high speed. Crail is built
>> entirely upon principles of user-level I/O and specifically targets data
>> center deployments with fast network and storage hardware (e.g., 100Gbps
>> RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of operation
>> such resource disaggregation or serverless computing. Crail is written in
>> Java and integrates seamlessly with the Apache data processing ecosystem.
>> It can be used as a backbone to accelerate high-level data operations such
>> as shuffle or broadcast, or as a cache to store hot data that is queried
>> repeatedly, or as a storage platform for sharing inter-job data in complex
>> multi-job pipelines, etc.
>>
>> Proposal
>>
>> Crail enables Apache data processing frameworks to run efficiently in next
>> generation data centers using fast storage and network hardware in
>> combination with resource (e.g., DRAM, Flash) disaggregation.
>>
>> Background
>>
>> Crail started as a research project at the IBM Zurich Research Laboratory
>> around 2014 aiming to integrate high-speed I/O hardware effectively into
>> large scale data processing systems.
>>
>> Rational
>>
>> During the last decade, I/O hardware has undergone rapid performance
>> improvements, typically in the order of magnitudes. Modern day networking
>> and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a few
>> microseconds of access latencies. However, despite such progress in raw I/O
>> performance, effectively leveraging modern hardware in data processing
>> frameworks remains challenging. In most of the cases, upgrading to high-end
>> networking or storage hardware has very little effect on the performance of
>> analytics workloads. The problem comes from heavily layered software
>> imposing overheads such as deep call stacks, unnecessary data copies,
>> thread contention, etc. These problems have already been addressed at the
>> operating system level with new I/O APIs such as RDMA verbs, NVMe, etc.,
>> allowing applications to bypass software layers during I/O operations.
>> Distributed data processing frameworks on the other hand, are typically
>> implemented on legacy I/O interfaces such as such as sockets or block
>> storage. These interfaces have been shown to be insufficient to deliver the
>> full hardware performance. Yet, to the best of our knowledge, there are no
>> active and systematic efforts to integrate these new user level I/O APIs
>> into Apache software frameworks. This problem affects all end-users and
>> organizations that use Apache software. We expect them to see
>> unsatisfactory small performance gains when upgrading their networking and
>> storage hardware.
>>
>> Crail solves this problem by providing an efficient storage platform built
>> upon user-level I/O, thus, bypassing layers such as JVM and OS during I/O
>> operations. Moreover, Crail directly leverages the specific hardware
>> features of RDMA and NVMe to provide a better integration with high-level
>> data operations in Apache compute frameworks. As a consequence, Crail
>> enables users to run larger, more complex queries against ever increasing
>> amounts of data at a speed largely determined by the deployed hardware.
>> Crail is generic solution that integrates well with the Apache ecosystem
>> including frameworks like Spark, Hadoop, Hive, etc.
>>
>> Initial Goals
>>
>> The initial goals to move Crail to the Apache Incubator is to broaden the
>> community, and foster contributions from developers to leverage Crail in
>> various data processing frameworks and workloads. Ultimately, the goal for
>> Crail is to become the de-facto standard platform for storing temporary
>> performance critical data in distributed data processing systems.
>>
>> Current Status
>>
>> The initial code has been developed at the IBM Zurich Research Center and
>> has recently been made available in GitHub under the Apache Software
>> License 2.0. The Project currently has explicit support for Spark and
>> Hadoop. Project documentation is available on the website www.crail.io.
>> There is also a public forum for discussions related to Crail available at
>> https://groups.google.com/forum/#!forum/zrlio-users.
>>
>> Mericrotacy
>>
>> The current developers are familiar with the meritocratic open source
>> development process at Apache. Over the last year, the project has gathered
>> interest at GitHub and several companies have already expressed interest in
>> the project. We plan to invest in supporting a meritocracy by inviting
>> additional developers to participate.
>>
>> Community
>>
>> The need for a generic solution to integrate high-performance I/O hardware
>> in the open source is tremendous, so there is a potential for a very large
>> community. We believe that Crail’s extensible architecture and its
>> alignment with the Apache Ecosystem will further encourage community
>> participation. We expect that over time Crail will attract a large
>> community.
>>
>> Alignment
>>
>> Crail is written in Java and is built for the Apache data processing
>> ecosystem. The basic storage services of Crail can be used seamlessly from
>> Spark, Hadoop, Storm. The enhanced storage services require dedicated data
>> processing specific binding, which currently are available only for Spark.
>> We think that moving Crail to the Apache incubator will help to extend
>> Crail’s support for different data processing frameworks.
>>
>> Known Risks
>>
>> To-date, development has been sponsored by IBM and coordinated mostly by
>> the core team of researchers at the IBM Zurich Research Center. For Crail
>> to fully transition to an "Apache Way" governance model, it needs to start
>> embracing the meritocracy-centric way of growing the community of
>> contributors.
>>
>> Orphaned Products
>>
>> The Crail developers have a long-term interest in use and maintenance of
>> the code and there is also hope that growing a diverse community around the
>> project will become a guarantee against the project becoming orphaned. We
>> feel that it is also important to put formal governance in place both for
>> the project and the contributors as the project expands. We feel ASF is the
>> best location for this.
>>
>> Inexperience with Open Source
>>
>> Several of the initial committers are experienced open source developers
>> (Linux Kernel, DPDK, etc.).
>>
>> Relationships with Other Apache Products
>>
>> As of now, Crail has been tested with Spark, Hadoop and Hive, but it is
>> designed to integrate with any of the Apache data processing frameworks.
>>
>> Homogeneous Developers
>>
>> The project already has a diverse developer base including contributions
>> from organizations and public developers.
>>
>> An Excessive Fascination with the Apache Brand
>>
>> Crail solves a real need for a generic approach to leverage modern network
>> and storage hardware effectively in the Apache Hadoop and Spark ecosystems.
>> Our rationale for developing Crail as an Apache project is detailed in the
>> Rationale section. We believe that the Apache brand and community process
>> will help to us to engage a larger community and facilitate closer ties
>> with various Apache data processing projects.
>>
>> Documentation
>>
>> Documentation regarding Crail is available at www.crail.io
>>
>> Initial Source
>>
>> Initial source is available on GitHub under the Apache License 2.0:
>>
>> https://github.com/zrlio/crail
>> External Dependencies
>>
>> Crail is written in Java and currently supports Apache Hadoop MapReduce
>> and Apache Spark runtimes. To the best of our knowledge, all dependencies
>> of Crail are distributed under Apache compatible licenses.
>>
>> Required Resource
>>
>> Mailing lists
>>
>> [hidden email]
>> [hidden email]
>> [hidden email]
>> Git repository
>>
>> https://git-wip-us.apache.org/repos/asf/incubator-crail.git
>> Issue Tracking
>>
>> JIRA (Crail)
>> Initial Committers
>>
>> Patrick Stuedi <stu AT ibm DOT zurich DOT com>
>> Animesh Trivedi <atr AT ibm DOT zurich DOT com>
>> Jonas Pfefferle <jpf AT ibm DOT zurich DOT com>
>> Bernard Metzler <bmt AT ibm DOT zurich DOT com>
>> Michael Kaufmann <kau AT ibm DOT zurich DOT com>
>> Adrian Schuepbach <dri AT ibm DOT zurich DOT com>
>> Patrick McArthur <patrick AT patrickmcarthur DOT net>
>> Ana Klimovic <anakli AT stanford DOT edu>
>> Yuval Degani <yuvaldeg AT mellanox DOT com>
>> Vu Pham <vuhuong AT mellanox DOT com>
>> Affiliations
>>
>> IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard Metzler,
>> Michael Kaufmann, Adrian Schuepbach)
>> University of New Hampshire (Patrick McArthur)
>> Stanford University (Ana Klimovic)
>> Mellanox (Yuval Degani, Vu Pham)
>> Sponsors
>>
>> Champion
>>
>> Luciano Resende <lresende AT apache DOT org>
>>
>> Nominated Mentors
>>
>> Luciano Resende <lresende AT apache DOT org>
>>
>> Raphael Bircher <rbircher AT apache DOT org>
>>
>> Julian Hyde <jhyde AT apache DOT org>
>>
>> Sponsoring Entity
>>
>> We would like to propose the Apache Incubator to sponsor this project.
>>
>>
>>
>
> The vote has passed with 5 binding + 1 from:
>
> Luciano Resende
> Julian Hyde
> Raphael Bircher
> Willem Jiang
> Dave Fisher
>
> And 5 non-binding +1 from
>
> Clebert Suconic
> Gang(Gary) Wang
> Debo Dutta (dedutta)
> Kacie Karo
> Pierre Smits
>
> Thanks and Welcome to the Apache Incubator.
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/

Craig L Russell
Secretary, Apache Software Foundation
[hidden email] http://db.apache.org/jdo


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [RESULT] [VOTE] Accept Crail into the Apache Incubator

Jitendra Pandey
+1

On 11/1/17, 7:40 AM, "Craig Russell" <[hidden email]> wrote:

    Subject line change to close the vote.
   
    > On Nov 1, 2017, at 6:42 AM, Luciano Resende <[hidden email]> wrote:
    >
    > On Thu, Oct 26, 2017 at 8:31 AM, Luciano Resende <[hidden email]>
    > wrote:
    >
    >> Now that the discussion thread on the Crail proposal has ended, please
    >> vote on accepting Crail into into the Apache Incubator.
    >>
    >> The ASF voting rules are described at:
    >>   http://www.apache.org/foundation/voting.html
    >>
    >> A vote for accepting a new Apache Incubator podling is a majority vote
    >> for which only Incubator PMC member votes are binding.
    >>
    >> Votes from other people are also welcome as an indication of peoples
    >> enthusiasm (or lack thereof).
    >>
    >> Please do not use this VOTE thread for discussions.
    >> If needed, start a new thread instead.
    >>
    >> This vote will run for at least 72 hours. Please VOTE as follows
    >> [] +1 Accept Crail into the Apache Incubator
    >> [] +0 Abstain.
    >> [] -1 Do not accept Crail into the Apache Incubator because ...
    >>
    >> The proposal below is also on the wiki:
    >> https://wiki.apache.org/incubator/CrailProposal
    >>
    >> ===
    >>
    >> Abstract
    >>
    >> Crail is a storage platform for sharing performance critical data in
    >> distributed data processing jobs at very high speed. Crail is built
    >> entirely upon principles of user-level I/O and specifically targets data
    >> center deployments with fast network and storage hardware (e.g., 100Gbps
    >> RDMA, plenty of DRAM, NVMe flash, etc.) as well as new modes of operation
    >> such resource disaggregation or serverless computing. Crail is written in
    >> Java and integrates seamlessly with the Apache data processing ecosystem.
    >> It can be used as a backbone to accelerate high-level data operations such
    >> as shuffle or broadcast, or as a cache to store hot data that is queried
    >> repeatedly, or as a storage platform for sharing inter-job data in complex
    >> multi-job pipelines, etc.
    >>
    >> Proposal
    >>
    >> Crail enables Apache data processing frameworks to run efficiently in next
    >> generation data centers using fast storage and network hardware in
    >> combination with resource (e.g., DRAM, Flash) disaggregation.
    >>
    >> Background
    >>
    >> Crail started as a research project at the IBM Zurich Research Laboratory
    >> around 2014 aiming to integrate high-speed I/O hardware effectively into
    >> large scale data processing systems.
    >>
    >> Rational
    >>
    >> During the last decade, I/O hardware has undergone rapid performance
    >> improvements, typically in the order of magnitudes. Modern day networking
    >> and storage hardware can deliver 100+ Gbps (10+ GBps) bandwidth with a few
    >> microseconds of access latencies. However, despite such progress in raw I/O
    >> performance, effectively leveraging modern hardware in data processing
    >> frameworks remains challenging. In most of the cases, upgrading to high-end
    >> networking or storage hardware has very little effect on the performance of
    >> analytics workloads. The problem comes from heavily layered software
    >> imposing overheads such as deep call stacks, unnecessary data copies,
    >> thread contention, etc. These problems have already been addressed at the
    >> operating system level with new I/O APIs such as RDMA verbs, NVMe, etc.,
    >> allowing applications to bypass software layers during I/O operations.
    >> Distributed data processing frameworks on the other hand, are typically
    >> implemented on legacy I/O interfaces such as such as sockets or block
    >> storage. These interfaces have been shown to be insufficient to deliver the
    >> full hardware performance. Yet, to the best of our knowledge, there are no
    >> active and systematic efforts to integrate these new user level I/O APIs
    >> into Apache software frameworks. This problem affects all end-users and
    >> organizations that use Apache software. We expect them to see
    >> unsatisfactory small performance gains when upgrading their networking and
    >> storage hardware.
    >>
    >> Crail solves this problem by providing an efficient storage platform built
    >> upon user-level I/O, thus, bypassing layers such as JVM and OS during I/O
    >> operations. Moreover, Crail directly leverages the specific hardware
    >> features of RDMA and NVMe to provide a better integration with high-level
    >> data operations in Apache compute frameworks. As a consequence, Crail
    >> enables users to run larger, more complex queries against ever increasing
    >> amounts of data at a speed largely determined by the deployed hardware.
    >> Crail is generic solution that integrates well with the Apache ecosystem
    >> including frameworks like Spark, Hadoop, Hive, etc.
    >>
    >> Initial Goals
    >>
    >> The initial goals to move Crail to the Apache Incubator is to broaden the
    >> community, and foster contributions from developers to leverage Crail in
    >> various data processing frameworks and workloads. Ultimately, the goal for
    >> Crail is to become the de-facto standard platform for storing temporary
    >> performance critical data in distributed data processing systems.
    >>
    >> Current Status
    >>
    >> The initial code has been developed at the IBM Zurich Research Center and
    >> has recently been made available in GitHub under the Apache Software
    >> License 2.0. The Project currently has explicit support for Spark and
    >> Hadoop. Project documentation is available on the website www.crail.io.
    >> There is also a public forum for discussions related to Crail available at
    >> https://groups.google.com/forum/#!forum/zrlio-users.
    >>
    >> Mericrotacy
    >>
    >> The current developers are familiar with the meritocratic open source
    >> development process at Apache. Over the last year, the project has gathered
    >> interest at GitHub and several companies have already expressed interest in
    >> the project. We plan to invest in supporting a meritocracy by inviting
    >> additional developers to participate.
    >>
    >> Community
    >>
    >> The need for a generic solution to integrate high-performance I/O hardware
    >> in the open source is tremendous, so there is a potential for a very large
    >> community. We believe that Crail’s extensible architecture and its
    >> alignment with the Apache Ecosystem will further encourage community
    >> participation. We expect that over time Crail will attract a large
    >> community.
    >>
    >> Alignment
    >>
    >> Crail is written in Java and is built for the Apache data processing
    >> ecosystem. The basic storage services of Crail can be used seamlessly from
    >> Spark, Hadoop, Storm. The enhanced storage services require dedicated data
    >> processing specific binding, which currently are available only for Spark.
    >> We think that moving Crail to the Apache incubator will help to extend
    >> Crail’s support for different data processing frameworks.
    >>
    >> Known Risks
    >>
    >> To-date, development has been sponsored by IBM and coordinated mostly by
    >> the core team of researchers at the IBM Zurich Research Center. For Crail
    >> to fully transition to an "Apache Way" governance model, it needs to start
    >> embracing the meritocracy-centric way of growing the community of
    >> contributors.
    >>
    >> Orphaned Products
    >>
    >> The Crail developers have a long-term interest in use and maintenance of
    >> the code and there is also hope that growing a diverse community around the
    >> project will become a guarantee against the project becoming orphaned. We
    >> feel that it is also important to put formal governance in place both for
    >> the project and the contributors as the project expands. We feel ASF is the
    >> best location for this.
    >>
    >> Inexperience with Open Source
    >>
    >> Several of the initial committers are experienced open source developers
    >> (Linux Kernel, DPDK, etc.).
    >>
    >> Relationships with Other Apache Products
    >>
    >> As of now, Crail has been tested with Spark, Hadoop and Hive, but it is
    >> designed to integrate with any of the Apache data processing frameworks.
    >>
    >> Homogeneous Developers
    >>
    >> The project already has a diverse developer base including contributions
    >> from organizations and public developers.
    >>
    >> An Excessive Fascination with the Apache Brand
    >>
    >> Crail solves a real need for a generic approach to leverage modern network
    >> and storage hardware effectively in the Apache Hadoop and Spark ecosystems.
    >> Our rationale for developing Crail as an Apache project is detailed in the
    >> Rationale section. We believe that the Apache brand and community process
    >> will help to us to engage a larger community and facilitate closer ties
    >> with various Apache data processing projects.
    >>
    >> Documentation
    >>
    >> Documentation regarding Crail is available at www.crail.io
    >>
    >> Initial Source
    >>
    >> Initial source is available on GitHub under the Apache License 2.0:
    >>
    >> https://github.com/zrlio/crail
    >> External Dependencies
    >>
    >> Crail is written in Java and currently supports Apache Hadoop MapReduce
    >> and Apache Spark runtimes. To the best of our knowledge, all dependencies
    >> of Crail are distributed under Apache compatible licenses.
    >>
    >> Required Resource
    >>
    >> Mailing lists
    >>
    >> [hidden email]
    >> [hidden email]
    >> [hidden email]
    >> Git repository
    >>
    >> https://git-wip-us.apache.org/repos/asf/incubator-crail.git
    >> Issue Tracking
    >>
    >> JIRA (Crail)
    >> Initial Committers
    >>
    >> Patrick Stuedi <stu AT ibm DOT zurich DOT com>
    >> Animesh Trivedi <atr AT ibm DOT zurich DOT com>
    >> Jonas Pfefferle <jpf AT ibm DOT zurich DOT com>
    >> Bernard Metzler <bmt AT ibm DOT zurich DOT com>
    >> Michael Kaufmann <kau AT ibm DOT zurich DOT com>
    >> Adrian Schuepbach <dri AT ibm DOT zurich DOT com>
    >> Patrick McArthur <patrick AT patrickmcarthur DOT net>
    >> Ana Klimovic <anakli AT stanford DOT edu>
    >> Yuval Degani <yuvaldeg AT mellanox DOT com>
    >> Vu Pham <vuhuong AT mellanox DOT com>
    >> Affiliations
    >>
    >> IBM (Patrick, Stuedi, Animesh Trivedi, Jonas Pfefferle, Bernard Metzler,
    >> Michael Kaufmann, Adrian Schuepbach)
    >> University of New Hampshire (Patrick McArthur)
    >> Stanford University (Ana Klimovic)
    >> Mellanox (Yuval Degani, Vu Pham)
    >> Sponsors
    >>
    >> Champion
    >>
    >> Luciano Resende <lresende AT apache DOT org>
    >>
    >> Nominated Mentors
    >>
    >> Luciano Resende <lresende AT apache DOT org>
    >>
    >> Raphael Bircher <rbircher AT apache DOT org>
    >>
    >> Julian Hyde <jhyde AT apache DOT org>
    >>
    >> Sponsoring Entity
    >>
    >> We would like to propose the Apache Incubator to sponsor this project.
    >>
    >>
    >>
    >
    > The vote has passed with 5 binding + 1 from:
    >
    > Luciano Resende
    > Julian Hyde
    > Raphael Bircher
    > Willem Jiang
    > Dave Fisher
    >
    > And 5 non-binding +1 from
    >
    > Clebert Suconic
    > Gang(Gary) Wang
    > Debo Dutta (dedutta)
    > Kacie Karo
    > Pierre Smits
    >
    > Thanks and Welcome to the Apache Incubator.
    >
    > --
    > Luciano Resende
    > http://twitter.com/lresende1975
    > http://lresende.blogspot.com/
   
    Craig L Russell
    Secretary, Apache Software Foundation
    [hidden email] http://db.apache.org/jdo
   
   
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [hidden email]
    For additional commands, e-mail: [hidden email]
   
   


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]