[DISCUSS] Druid incubation proposal

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Druid incubation proposal

Gian Merlino
Hi all,

I would like to open up a discussion about incubating Druid at Apache. I've
included a proposal in this mail and have also posted a draft at
https://wiki.apache.org/incubator/DruidProposal. More information about
Druid is also available on our project web site at: http://druid.io/

Thanks for your consideration!

Gian

= Druid Proposal =

== Abstract ==

Druid is a high-performance, column-oriented, distributed data store.

== Proposal ==

Druid is an open source data store designed for real-time exploratory
analytics on large data sets. Druid's key features are a column-oriented
storage layout, a distributed shared-nothing architecture, and ability to
generate and leverage indexing and caching structures. Druid is typically
deployed in clusters of tens to hundreds of nodes, and has the ability to
load data from Apache Kafka and Apache Hadoop, among other data sources.
Druid offers two query languages: a SQL dialect (powered by Apache Calcite)
and a JSON-over-HTTP API.

Druid was originally developed to power a slice-and-dice analytical UI
built on top of large event streams. The original use case for Druid
targeted ingest rates of millions of records/sec, retention of over a year
of data, and query latencies of sub-second to a few seconds. Many people
can benefit from such capability, and many already have (see
http://druid.io/druid-powered.html). In addition, new use cases have
emerged since Druid's original development, such as OLAP acceleration of
data warehouse tables and more highly concurrent applications operating
with relatively narrower queries.

== Background ==

Druid is a data store designed for fast analytics. It would typically be
used in lieu of more general purpose query systems like Hadoop !MapReduce
or Spark when query latency is of the utmost importance. Druid is often
used as a data store for powering GUI analytical applications.

The buzzwordy description of Druid is a high-performance, column-oriented,
distributed data store. What we mean by this is:

 * "high performance": Druid aims to provide low query latency and high
ingest rates possible.
 * "column-oriented": Druid stores data in a column-oriented format, like
most other systems designed for analytics. It can also store indexes along
with the columns.
 * "distributed": Druid is deployed in clusters, typically of tens to
hundreds of nodes.
 * "data store": Druid loads your data and stores a copy of it on the
cluster's local disks (and may cache it in memory). It doesn't query your
data from some other storage system.

== Rationale ==

Druid is a mature, active project with a large number of production
installations, dozens of contributors to each release, and multiple vendors
offering professional support. Given Druid's strong community, its close
integration with many other Apache projects (such as Kafka, Hadoop, and
Calcite), and its pre-existing Apache-inspired governance structure, we
feel that Apache is the best home for the project on a long-term basis.

== Current Status ==

=== Meritocracy ===
Since Druid was first open sourced the original developers have solicited
contributions from others, including through our blog, the project mailing
lists, and through accepting !GitHub pull requests. We have an
Apache-inspired governance structure with a PMC and committers, and our
committer ranks include a good number of people from outside the original
development team.

=== Community ===

The Druid core developers have sought to nurture a community throughout the
life of the project. We use !GitHub as the focal point for bug reports and
code contributions, and the mailing lists for most other discussion. To try
to make people feel welcome, we've also spelled this out on a "CONTRIBUTE"
link from the project page: http://druid.io/community/. Today we have an
active contributor base (a typical release has ~40 contributors) and
mailing list.

=== Core Developers ===

Druid enjoys good diversity of committer affiliation. The most active
developers over the past year are affiliated with four different companies:
Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are also
committers on other ASF projects as well, including Apache Airflow, Apache
Curator, and Apache Calcite. The original developers of Druid remain
involved in the project.

=== Alignment ===

Druid's current governance structure is Apache-inspired with a PMC and
committers chosen by a meritocratic process. Additionally, Druid integrates
with a number of other Apache projects, including Kafka, Hadoop, Hive,
Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.

== Known Risks ==

=== Orphaned products ===

The risk of Druid becoming orphaned is low, due to a diverse committer base
that is invested in the future of the project.

=== Inexperience with Open Source ===

Druid's core developers have been running it as a community-oriented open
source project for some time now, and many of them are committers on other
open source projects as well, including Apache Airflow, Apache Curator, and
Apache Calcite.

=== Homogenous Developers ===

Druid's current diversity of committer affiliation means that we have
become accustomed to working collaboratively and in the open. We hope that
a transition to the ASF helps Druid's contributor base become even more
diverse.

=== Reliance on Salaried Developers ===

Druid's user base and contributor base skews heavily towards salaried
developers. We believe this is natural since Druid is a technology designed
to be deployed on large clusters, and due to this, tends to be deployed by
organizations rather than by individuals. Nevertheless, many current Druid
developers have continued working on the project even through job changes,
which we take to be a good sign of developer commitment and personal
interest.

=== Relationships with Other Apache Products ===

Druid integrates with a number of other Apache projects. Druid internally
uses Calcite for SQL planning, and Curator and !ZooKeeper for coordination.
Druid can read data in Avro or Parquet format. Druid can load data from
streams in Kafka or from files in Hadoop. Druid integrates with Hive as an
option for SQL query acceleration. Druid data can be visualized by Superset
(incubating).

=== A Excessive Fascination with the Apache Brand ===

Druid is a successful project with a diverse community. The main reason for
pursuing incubation is to find a stable, long term home for the project
with a well known governance philosophy.

== Required Resources ==

=== Mailing lists ===

We would like to migrate the existing Druid mailing lists from Google
Groups to Apache.

 * druid-user@googlegroups -> [hidden email]
 * druid-development@googlegroups -> [hidden email]

=== Source control ===

Druid development currently takes place on !GitHub. We would like to
continue using !GitHub, if possible, in order to preserve the workflows the
community has developed around !GitHub pull requests.

=== Issue tracking ===
Druid currently uses !GitHub issues for issue tracking. We would like to
migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID.

== Documentation ==

Druid's documentation can be found at http://druid.io/docs/latest/.

== Initial Source ==

Druid was initially open-sourced by Metamarkets in 2012 and has been run in
a community-governed fashion since then. The code is currently hosted at
https://github.com/druid-io/ and includes the following repositories:

 * druid (primary repository)
 * druid-console (web console for Druid)
 * druid-io.github.io (source for Druid's website at http://druid.io/)
 * tranquility (realtime stream push client for Druid)
 * docker-druid (Docker image for Druid)
 * pydruid (Python library)
 * RDruid (R library)
 * oss-parent (Maven POM files)

== Source and Intellectual Property Submission Plan ==

A complete set of the open source code needs to be licensed from the owning
organization to the Foundation. Commercial legal counsel for the owning
organization will review the standard Foundation licensing paperwork and
propose any updates as needed. This license will enable Apache to incubate
and manage the Druid project moving forward.

Other Druid paraphernalia to be transferred to Apache consists of:

 * !GitHub organization at https://github.com/druid-io/
 * Twitter account at https://twitter.com/druidio
 * "druid.io" domain name
 * "Druid" trademark assignment per Foundation standard paper.  The
trademark assignment paperwork shall be reviewed by the owning
organization's commercial and IP counsel
 * CLAs - all rights in the code licensed above should encompass the CLAs
that existed between developers and owning organization

A copyright license to the code, trademark assignment of Druid, and
transfer of other paraphernalia to Apache should be sufficient to cover all
rights required by Apache to operate the project.

== External Dependencies ==
External dependencies distributed with Druid currently all have one of the
following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with one
exception: the optional Druid MySQL metadata store extension depends on
MySQL Connector/J, which is GPL licensed. Druid currently packages this as
a separate download; see our current presentation on:
http://druid.io/downloads.html. As part of incubation we intend to
determine the best strategy for handling the MySQL extension.

== Cryptography ==
Not applicable.

== Initial Committers ==

The initial committers for incubation are the current set of committers on
Druid who have expressed interest in being involved in Apache incubation.
Affiliations are listed where relevant. We may seek to add other committers
during incubation; for example, we would want to add any current Druid
committers who express an interest after incubation begins.

 * Charles Allen ([hidden email]) (Snap)
 * David Lim ([hidden email]) (Imply)
 * Eric Tschetter ([hidden email]) (Splunk)
 * Fangjin Yang ([hidden email]) (Imply)
 * Gian Merlino ([hidden email]) (Imply)
 * Himanshu Gupta ([hidden email]) (Oath)
 * Jihoon Son ([hidden email]) (Imply)
 * Jonathan Wei ([hidden email]) (Imply)
 * Maxime Beauchemin ([hidden email]) (Lyft)
 * Mohamed Slim Bouguerra ([hidden email]) (Hortonworks)
 * Nishant Bangarwa ([hidden email]) (Hortonworks)
 * Parag Jain ([hidden email]) (Oath)
 * Roman Leventov ([hidden email]) (Metamarkets)
 * Xavier Léauté ([hidden email]) (Confluent)

== Sponsors ==

 * Champion: Julian Hyde
 * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
 * Sponsoring entity: Apache Incubator
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Druid incubation proposal

Julian Hyde-3
As Champion for this proposal, let me say that the Druid project will be an excellent addition to the ASF. I have been an observer of the project for a couple of years, and in many respects it is already operating in the Apache Way. Druid had paid developers from a number of companies, some of whom were in competition, and its governance was strong enough to navigate the choppy waters that that can create.

A number of Druid committers subsequently started to work on Apache projects (Gian on Calcite, and Slim and Nishant on Hive) and so already know what to expect.

You can get a sense of the project dynamic by reading the archives of their dev list: https://groups.google.com/forum/#!forum/druid-development <https://groups.google.com/forum/#!forum/druid-development>

Julian


> On Feb 16, 2018, at 12:15 PM, Gian Merlino <[hidden email]> wrote:
>
> Hi all,
>
> I would like to open up a discussion about incubating Druid at Apache. I've
> included a proposal in this mail and have also posted a draft at
> https://wiki.apache.org/incubator/DruidProposal. More information about
> Druid is also available on our project web site at: http://druid.io/
>
> Thanks for your consideration!
>
> Gian
>
> = Druid Proposal =
>
> == Abstract ==
>
> Druid is a high-performance, column-oriented, distributed data store.
>
> == Proposal ==
>
> Druid is an open source data store designed for real-time exploratory
> analytics on large data sets. Druid's key features are a column-oriented
> storage layout, a distributed shared-nothing architecture, and ability to
> generate and leverage indexing and caching structures. Druid is typically
> deployed in clusters of tens to hundreds of nodes, and has the ability to
> load data from Apache Kafka and Apache Hadoop, among other data sources.
> Druid offers two query languages: a SQL dialect (powered by Apache Calcite)
> and a JSON-over-HTTP API.
>
> Druid was originally developed to power a slice-and-dice analytical UI
> built on top of large event streams. The original use case for Druid
> targeted ingest rates of millions of records/sec, retention of over a year
> of data, and query latencies of sub-second to a few seconds. Many people
> can benefit from such capability, and many already have (see
> http://druid.io/druid-powered.html). In addition, new use cases have
> emerged since Druid's original development, such as OLAP acceleration of
> data warehouse tables and more highly concurrent applications operating
> with relatively narrower queries.
>
> == Background ==
>
> Druid is a data store designed for fast analytics. It would typically be
> used in lieu of more general purpose query systems like Hadoop !MapReduce
> or Spark when query latency is of the utmost importance. Druid is often
> used as a data store for powering GUI analytical applications.
>
> The buzzwordy description of Druid is a high-performance, column-oriented,
> distributed data store. What we mean by this is:
>
> * "high performance": Druid aims to provide low query latency and high
> ingest rates possible.
> * "column-oriented": Druid stores data in a column-oriented format, like
> most other systems designed for analytics. It can also store indexes along
> with the columns.
> * "distributed": Druid is deployed in clusters, typically of tens to
> hundreds of nodes.
> * "data store": Druid loads your data and stores a copy of it on the
> cluster's local disks (and may cache it in memory). It doesn't query your
> data from some other storage system.
>
> == Rationale ==
>
> Druid is a mature, active project with a large number of production
> installations, dozens of contributors to each release, and multiple vendors
> offering professional support. Given Druid's strong community, its close
> integration with many other Apache projects (such as Kafka, Hadoop, and
> Calcite), and its pre-existing Apache-inspired governance structure, we
> feel that Apache is the best home for the project on a long-term basis.
>
> == Current Status ==
>
> === Meritocracy ===
> Since Druid was first open sourced the original developers have solicited
> contributions from others, including through our blog, the project mailing
> lists, and through accepting !GitHub pull requests. We have an
> Apache-inspired governance structure with a PMC and committers, and our
> committer ranks include a good number of people from outside the original
> development team.
>
> === Community ===
>
> The Druid core developers have sought to nurture a community throughout the
> life of the project. We use !GitHub as the focal point for bug reports and
> code contributions, and the mailing lists for most other discussion. To try
> to make people feel welcome, we've also spelled this out on a "CONTRIBUTE"
> link from the project page: http://druid.io/community/. Today we have an
> active contributor base (a typical release has ~40 contributors) and
> mailing list.
>
> === Core Developers ===
>
> Druid enjoys good diversity of committer affiliation. The most active
> developers over the past year are affiliated with four different companies:
> Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are also
> committers on other ASF projects as well, including Apache Airflow, Apache
> Curator, and Apache Calcite. The original developers of Druid remain
> involved in the project.
>
> === Alignment ===
>
> Druid's current governance structure is Apache-inspired with a PMC and
> committers chosen by a meritocratic process. Additionally, Druid integrates
> with a number of other Apache projects, including Kafka, Hadoop, Hive,
> Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.
>
> == Known Risks ==
>
> === Orphaned products ===
>
> The risk of Druid becoming orphaned is low, due to a diverse committer base
> that is invested in the future of the project.
>
> === Inexperience with Open Source ===
>
> Druid's core developers have been running it as a community-oriented open
> source project for some time now, and many of them are committers on other
> open source projects as well, including Apache Airflow, Apache Curator, and
> Apache Calcite.
>
> === Homogenous Developers ===
>
> Druid's current diversity of committer affiliation means that we have
> become accustomed to working collaboratively and in the open. We hope that
> a transition to the ASF helps Druid's contributor base become even more
> diverse.
>
> === Reliance on Salaried Developers ===
>
> Druid's user base and contributor base skews heavily towards salaried
> developers. We believe this is natural since Druid is a technology designed
> to be deployed on large clusters, and due to this, tends to be deployed by
> organizations rather than by individuals. Nevertheless, many current Druid
> developers have continued working on the project even through job changes,
> which we take to be a good sign of developer commitment and personal
> interest.
>
> === Relationships with Other Apache Products ===
>
> Druid integrates with a number of other Apache projects. Druid internally
> uses Calcite for SQL planning, and Curator and !ZooKeeper for coordination.
> Druid can read data in Avro or Parquet format. Druid can load data from
> streams in Kafka or from files in Hadoop. Druid integrates with Hive as an
> option for SQL query acceleration. Druid data can be visualized by Superset
> (incubating).
>
> === A Excessive Fascination with the Apache Brand ===
>
> Druid is a successful project with a diverse community. The main reason for
> pursuing incubation is to find a stable, long term home for the project
> with a well known governance philosophy.
>
> == Required Resources ==
>
> === Mailing lists ===
>
> We would like to migrate the existing Druid mailing lists from Google
> Groups to Apache.
>
> * druid-user@googlegroups -> [hidden email]
> * druid-development@googlegroups -> [hidden email]
>
> === Source control ===
>
> Druid development currently takes place on !GitHub. We would like to
> continue using !GitHub, if possible, in order to preserve the workflows the
> community has developed around !GitHub pull requests.
>
> === Issue tracking ===
> Druid currently uses !GitHub issues for issue tracking. We would like to
> migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID.
>
> == Documentation ==
>
> Druid's documentation can be found at http://druid.io/docs/latest/.
>
> == Initial Source ==
>
> Druid was initially open-sourced by Metamarkets in 2012 and has been run in
> a community-governed fashion since then. The code is currently hosted at
> https://github.com/druid-io/ and includes the following repositories:
>
> * druid (primary repository)
> * druid-console (web console for Druid)
> * druid-io.github.io (source for Druid's website at http://druid.io/)
> * tranquility (realtime stream push client for Druid)
> * docker-druid (Docker image for Druid)
> * pydruid (Python library)
> * RDruid (R library)
> * oss-parent (Maven POM files)
>
> == Source and Intellectual Property Submission Plan ==
>
> A complete set of the open source code needs to be licensed from the owning
> organization to the Foundation. Commercial legal counsel for the owning
> organization will review the standard Foundation licensing paperwork and
> propose any updates as needed. This license will enable Apache to incubate
> and manage the Druid project moving forward.
>
> Other Druid paraphernalia to be transferred to Apache consists of:
>
> * !GitHub organization at https://github.com/druid-io/
> * Twitter account at https://twitter.com/druidio
> * "druid.io" domain name
> * "Druid" trademark assignment per Foundation standard paper.  The
> trademark assignment paperwork shall be reviewed by the owning
> organization's commercial and IP counsel
> * CLAs - all rights in the code licensed above should encompass the CLAs
> that existed between developers and owning organization
>
> A copyright license to the code, trademark assignment of Druid, and
> transfer of other paraphernalia to Apache should be sufficient to cover all
> rights required by Apache to operate the project.
>
> == External Dependencies ==
> External dependencies distributed with Druid currently all have one of the
> following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with one
> exception: the optional Druid MySQL metadata store extension depends on
> MySQL Connector/J, which is GPL licensed. Druid currently packages this as
> a separate download; see our current presentation on:
> http://druid.io/downloads.html. As part of incubation we intend to
> determine the best strategy for handling the MySQL extension.
>
> == Cryptography ==
> Not applicable.
>
> == Initial Committers ==
>
> The initial committers for incubation are the current set of committers on
> Druid who have expressed interest in being involved in Apache incubation.
> Affiliations are listed where relevant. We may seek to add other committers
> during incubation; for example, we would want to add any current Druid
> committers who express an interest after incubation begins.
>
> * Charles Allen ([hidden email]) (Snap)
> * David Lim ([hidden email]) (Imply)
> * Eric Tschetter ([hidden email]) (Splunk)
> * Fangjin Yang ([hidden email]) (Imply)
> * Gian Merlino ([hidden email]) (Imply)
> * Himanshu Gupta ([hidden email]) (Oath)
> * Jihoon Son ([hidden email]) (Imply)
> * Jonathan Wei ([hidden email]) (Imply)
> * Maxime Beauchemin ([hidden email]) (Lyft)
> * Mohamed Slim Bouguerra ([hidden email]) (Hortonworks)
> * Nishant Bangarwa ([hidden email]) (Hortonworks)
> * Parag Jain ([hidden email]) (Oath)
> * Roman Leventov ([hidden email]) (Metamarkets)
> * Xavier Léauté ([hidden email]) (Confluent)
>
> == Sponsors ==
>
> * Champion: Julian Hyde
> * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
> * Sponsoring entity: Apache Incubator

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Druid incubation proposal

Tom Barber-3
I can second most of that from the peanut gallery, my high level
interactions with a few Druid folk and keeping a watchful eye on a very
exciting project over the last few years.

I think the Druid project would make an excellent addition to the ASF
portfolio.

Tom

On 16/02/18 22:17, Julian Hyde wrote:

> As Champion for this proposal, let me say that the Druid project will be an excellent addition to the ASF. I have been an observer of the project for a couple of years, and in many respects it is already operating in the Apache Way. Druid had paid developers from a number of companies, some of whom were in competition, and its governance was strong enough to navigate the choppy waters that that can create.
>
> A number of Druid committers subsequently started to work on Apache projects (Gian on Calcite, and Slim and Nishant on Hive) and so already know what to expect.
>
> You can get a sense of the project dynamic by reading the archives of their dev list: https://groups.google.com/forum/#!forum/druid-development <https://groups.google.com/forum/#!forum/druid-development>
>
> Julian
>
>
>> On Feb 16, 2018, at 12:15 PM, Gian Merlino <[hidden email]> wrote:
>>
>> Hi all,
>>
>> I would like to open up a discussion about incubating Druid at Apache. I've
>> included a proposal in this mail and have also posted a draft at
>> https://wiki.apache.org/incubator/DruidProposal. More information about
>> Druid is also available on our project web site at: http://druid.io/
>>
>> Thanks for your consideration!
>>
>> Gian
>>
>> = Druid Proposal =
>>
>> == Abstract ==
>>
>> Druid is a high-performance, column-oriented, distributed data store.
>>
>> == Proposal ==
>>
>> Druid is an open source data store designed for real-time exploratory
>> analytics on large data sets. Druid's key features are a column-oriented
>> storage layout, a distributed shared-nothing architecture, and ability to
>> generate and leverage indexing and caching structures. Druid is typically
>> deployed in clusters of tens to hundreds of nodes, and has the ability to
>> load data from Apache Kafka and Apache Hadoop, among other data sources.
>> Druid offers two query languages: a SQL dialect (powered by Apache Calcite)
>> and a JSON-over-HTTP API.
>>
>> Druid was originally developed to power a slice-and-dice analytical UI
>> built on top of large event streams. The original use case for Druid
>> targeted ingest rates of millions of records/sec, retention of over a year
>> of data, and query latencies of sub-second to a few seconds. Many people
>> can benefit from such capability, and many already have (see
>> http://druid.io/druid-powered.html). In addition, new use cases have
>> emerged since Druid's original development, such as OLAP acceleration of
>> data warehouse tables and more highly concurrent applications operating
>> with relatively narrower queries.
>>
>> == Background ==
>>
>> Druid is a data store designed for fast analytics. It would typically be
>> used in lieu of more general purpose query systems like Hadoop !MapReduce
>> or Spark when query latency is of the utmost importance. Druid is often
>> used as a data store for powering GUI analytical applications.
>>
>> The buzzwordy description of Druid is a high-performance, column-oriented,
>> distributed data store. What we mean by this is:
>>
>> * "high performance": Druid aims to provide low query latency and high
>> ingest rates possible.
>> * "column-oriented": Druid stores data in a column-oriented format, like
>> most other systems designed for analytics. It can also store indexes along
>> with the columns.
>> * "distributed": Druid is deployed in clusters, typically of tens to
>> hundreds of nodes.
>> * "data store": Druid loads your data and stores a copy of it on the
>> cluster's local disks (and may cache it in memory). It doesn't query your
>> data from some other storage system.
>>
>> == Rationale ==
>>
>> Druid is a mature, active project with a large number of production
>> installations, dozens of contributors to each release, and multiple vendors
>> offering professional support. Given Druid's strong community, its close
>> integration with many other Apache projects (such as Kafka, Hadoop, and
>> Calcite), and its pre-existing Apache-inspired governance structure, we
>> feel that Apache is the best home for the project on a long-term basis.
>>
>> == Current Status ==
>>
>> === Meritocracy ===
>> Since Druid was first open sourced the original developers have solicited
>> contributions from others, including through our blog, the project mailing
>> lists, and through accepting !GitHub pull requests. We have an
>> Apache-inspired governance structure with a PMC and committers, and our
>> committer ranks include a good number of people from outside the original
>> development team.
>>
>> === Community ===
>>
>> The Druid core developers have sought to nurture a community throughout the
>> life of the project. We use !GitHub as the focal point for bug reports and
>> code contributions, and the mailing lists for most other discussion. To try
>> to make people feel welcome, we've also spelled this out on a "CONTRIBUTE"
>> link from the project page: http://druid.io/community/. Today we have an
>> active contributor base (a typical release has ~40 contributors) and
>> mailing list.
>>
>> === Core Developers ===
>>
>> Druid enjoys good diversity of committer affiliation. The most active
>> developers over the past year are affiliated with four different companies:
>> Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are also
>> committers on other ASF projects as well, including Apache Airflow, Apache
>> Curator, and Apache Calcite. The original developers of Druid remain
>> involved in the project.
>>
>> === Alignment ===
>>
>> Druid's current governance structure is Apache-inspired with a PMC and
>> committers chosen by a meritocratic process. Additionally, Druid integrates
>> with a number of other Apache projects, including Kafka, Hadoop, Hive,
>> Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.
>>
>> == Known Risks ==
>>
>> === Orphaned products ===
>>
>> The risk of Druid becoming orphaned is low, due to a diverse committer base
>> that is invested in the future of the project.
>>
>> === Inexperience with Open Source ===
>>
>> Druid's core developers have been running it as a community-oriented open
>> source project for some time now, and many of them are committers on other
>> open source projects as well, including Apache Airflow, Apache Curator, and
>> Apache Calcite.
>>
>> === Homogenous Developers ===
>>
>> Druid's current diversity of committer affiliation means that we have
>> become accustomed to working collaboratively and in the open. We hope that
>> a transition to the ASF helps Druid's contributor base become even more
>> diverse.
>>
>> === Reliance on Salaried Developers ===
>>
>> Druid's user base and contributor base skews heavily towards salaried
>> developers. We believe this is natural since Druid is a technology designed
>> to be deployed on large clusters, and due to this, tends to be deployed by
>> organizations rather than by individuals. Nevertheless, many current Druid
>> developers have continued working on the project even through job changes,
>> which we take to be a good sign of developer commitment and personal
>> interest.
>>
>> === Relationships with Other Apache Products ===
>>
>> Druid integrates with a number of other Apache projects. Druid internally
>> uses Calcite for SQL planning, and Curator and !ZooKeeper for coordination.
>> Druid can read data in Avro or Parquet format. Druid can load data from
>> streams in Kafka or from files in Hadoop. Druid integrates with Hive as an
>> option for SQL query acceleration. Druid data can be visualized by Superset
>> (incubating).
>>
>> === A Excessive Fascination with the Apache Brand ===
>>
>> Druid is a successful project with a diverse community. The main reason for
>> pursuing incubation is to find a stable, long term home for the project
>> with a well known governance philosophy.
>>
>> == Required Resources ==
>>
>> === Mailing lists ===
>>
>> We would like to migrate the existing Druid mailing lists from Google
>> Groups to Apache.
>>
>> * druid-user@googlegroups -> [hidden email]
>> * druid-development@googlegroups -> [hidden email]
>>
>> === Source control ===
>>
>> Druid development currently takes place on !GitHub. We would like to
>> continue using !GitHub, if possible, in order to preserve the workflows the
>> community has developed around !GitHub pull requests.
>>
>> === Issue tracking ===
>> Druid currently uses !GitHub issues for issue tracking. We would like to
>> migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID.
>>
>> == Documentation ==
>>
>> Druid's documentation can be found at http://druid.io/docs/latest/.
>>
>> == Initial Source ==
>>
>> Druid was initially open-sourced by Metamarkets in 2012 and has been run in
>> a community-governed fashion since then. The code is currently hosted at
>> https://github.com/druid-io/ and includes the following repositories:
>>
>> * druid (primary repository)
>> * druid-console (web console for Druid)
>> * druid-io.github.io (source for Druid's website at http://druid.io/)
>> * tranquility (realtime stream push client for Druid)
>> * docker-druid (Docker image for Druid)
>> * pydruid (Python library)
>> * RDruid (R library)
>> * oss-parent (Maven POM files)
>>
>> == Source and Intellectual Property Submission Plan ==
>>
>> A complete set of the open source code needs to be licensed from the owning
>> organization to the Foundation. Commercial legal counsel for the owning
>> organization will review the standard Foundation licensing paperwork and
>> propose any updates as needed. This license will enable Apache to incubate
>> and manage the Druid project moving forward.
>>
>> Other Druid paraphernalia to be transferred to Apache consists of:
>>
>> * !GitHub organization at https://github.com/druid-io/
>> * Twitter account at https://twitter.com/druidio
>> * "druid.io" domain name
>> * "Druid" trademark assignment per Foundation standard paper.  The
>> trademark assignment paperwork shall be reviewed by the owning
>> organization's commercial and IP counsel
>> * CLAs - all rights in the code licensed above should encompass the CLAs
>> that existed between developers and owning organization
>>
>> A copyright license to the code, trademark assignment of Druid, and
>> transfer of other paraphernalia to Apache should be sufficient to cover all
>> rights required by Apache to operate the project.
>>
>> == External Dependencies ==
>> External dependencies distributed with Druid currently all have one of the
>> following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with one
>> exception: the optional Druid MySQL metadata store extension depends on
>> MySQL Connector/J, which is GPL licensed. Druid currently packages this as
>> a separate download; see our current presentation on:
>> http://druid.io/downloads.html. As part of incubation we intend to
>> determine the best strategy for handling the MySQL extension.
>>
>> == Cryptography ==
>> Not applicable.
>>
>> == Initial Committers ==
>>
>> The initial committers for incubation are the current set of committers on
>> Druid who have expressed interest in being involved in Apache incubation.
>> Affiliations are listed where relevant. We may seek to add other committers
>> during incubation; for example, we would want to add any current Druid
>> committers who express an interest after incubation begins.
>>
>> * Charles Allen ([hidden email]) (Snap)
>> * David Lim ([hidden email]) (Imply)
>> * Eric Tschetter ([hidden email]) (Splunk)
>> * Fangjin Yang ([hidden email]) (Imply)
>> * Gian Merlino ([hidden email]) (Imply)
>> * Himanshu Gupta ([hidden email]) (Oath)
>> * Jihoon Son ([hidden email]) (Imply)
>> * Jonathan Wei ([hidden email]) (Imply)
>> * Maxime Beauchemin ([hidden email]) (Lyft)
>> * Mohamed Slim Bouguerra ([hidden email]) (Hortonworks)
>> * Nishant Bangarwa ([hidden email]) (Hortonworks)
>> * Parag Jain ([hidden email]) (Oath)
>> * Roman Leventov ([hidden email]) (Metamarkets)
>> * Xavier Léauté ([hidden email]) (Confluent)
>>
>> == Sponsors ==
>>
>> * Champion: Julian Hyde
>> * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
>> * Sponsoring entity: Apache Incubator
>


--


Spicule Limited is registered in England & Wales. Company Number: 09954122.
Registered office: First Floor, Telecom House, 125-135 Preston Road,
Brighton, England, BN1 6AF. VAT No. 251478891.


All engagements are subject to Spicule Terms and Conditions of Business.
This email and its contents are intended solely for the individual to whom
it is addressed and may contain information that is confidential,
privileged or otherwise protected from disclosure, distributing or copying.
Any views or opinions presented in this email are solely those of the
author and do not necessarily represent those of Spicule Limited. The
company accepts no liability for any damage caused by any virus transmitted
by this email. If you have received this message in error, please notify us
immediately by reply email before deleting it from your system. Service of
legal notice cannot be effected on Spicule Limited by email.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Druid incubation proposal

Ashutosh Chauhan-2
+1 for Druid in ASF.
I have been involved with Hive Druid integration. If you are looking for
mentors, happy to help.

Thanks,
Ashutosh

On Fri, Feb 16, 2018 at 2:20 PM, Tom Barber <[hidden email]> wrote:

> I can second most of that from the peanut gallery, my high level
> interactions with a few Druid folk and keeping a watchful eye on a very
> exciting project over the last few years.
>
> I think the Druid project would make an excellent addition to the ASF
> portfolio.
>
> Tom
>
>
> On 16/02/18 22:17, Julian Hyde wrote:
>
>> As Champion for this proposal, let me say that the Druid project will be
>> an excellent addition to the ASF. I have been an observer of the project
>> for a couple of years, and in many respects it is already operating in the
>> Apache Way. Druid had paid developers from a number of companies, some of
>> whom were in competition, and its governance was strong enough to navigate
>> the choppy waters that that can create.
>>
>> A number of Druid committers subsequently started to work on Apache
>> projects (Gian on Calcite, and Slim and Nishant on Hive) and so already
>> know what to expect.
>>
>> You can get a sense of the project dynamic by reading the archives of
>> their dev list: https://groups.google.com/forum/#!forum/druid-development
>> <https://groups.google.com/forum/#!forum/druid-development>
>>
>> Julian
>>
>>
>> On Feb 16, 2018, at 12:15 PM, Gian Merlino <[hidden email]> wrote:
>>>
>>> Hi all,
>>>
>>> I would like to open up a discussion about incubating Druid at Apache.
>>> I've
>>> included a proposal in this mail and have also posted a draft at
>>> https://wiki.apache.org/incubator/DruidProposal. More information about
>>> Druid is also available on our project web site at: http://druid.io/
>>>
>>> Thanks for your consideration!
>>>
>>> Gian
>>>
>>> = Druid Proposal =
>>>
>>> == Abstract ==
>>>
>>> Druid is a high-performance, column-oriented, distributed data store.
>>>
>>> == Proposal ==
>>>
>>> Druid is an open source data store designed for real-time exploratory
>>> analytics on large data sets. Druid's key features are a column-oriented
>>> storage layout, a distributed shared-nothing architecture, and ability to
>>> generate and leverage indexing and caching structures. Druid is typically
>>> deployed in clusters of tens to hundreds of nodes, and has the ability to
>>> load data from Apache Kafka and Apache Hadoop, among other data sources.
>>> Druid offers two query languages: a SQL dialect (powered by Apache
>>> Calcite)
>>> and a JSON-over-HTTP API.
>>>
>>> Druid was originally developed to power a slice-and-dice analytical UI
>>> built on top of large event streams. The original use case for Druid
>>> targeted ingest rates of millions of records/sec, retention of over a
>>> year
>>> of data, and query latencies of sub-second to a few seconds. Many people
>>> can benefit from such capability, and many already have (see
>>> http://druid.io/druid-powered.html). In addition, new use cases have
>>> emerged since Druid's original development, such as OLAP acceleration of
>>> data warehouse tables and more highly concurrent applications operating
>>> with relatively narrower queries.
>>>
>>> == Background ==
>>>
>>> Druid is a data store designed for fast analytics. It would typically be
>>> used in lieu of more general purpose query systems like Hadoop !MapReduce
>>> or Spark when query latency is of the utmost importance. Druid is often
>>> used as a data store for powering GUI analytical applications.
>>>
>>> The buzzwordy description of Druid is a high-performance,
>>> column-oriented,
>>> distributed data store. What we mean by this is:
>>>
>>> * "high performance": Druid aims to provide low query latency and high
>>> ingest rates possible.
>>> * "column-oriented": Druid stores data in a column-oriented format, like
>>> most other systems designed for analytics. It can also store indexes
>>> along
>>> with the columns.
>>> * "distributed": Druid is deployed in clusters, typically of tens to
>>> hundreds of nodes.
>>> * "data store": Druid loads your data and stores a copy of it on the
>>> cluster's local disks (and may cache it in memory). It doesn't query your
>>> data from some other storage system.
>>>
>>> == Rationale ==
>>>
>>> Druid is a mature, active project with a large number of production
>>> installations, dozens of contributors to each release, and multiple
>>> vendors
>>> offering professional support. Given Druid's strong community, its close
>>> integration with many other Apache projects (such as Kafka, Hadoop, and
>>> Calcite), and its pre-existing Apache-inspired governance structure, we
>>> feel that Apache is the best home for the project on a long-term basis.
>>>
>>> == Current Status ==
>>>
>>> === Meritocracy ===
>>> Since Druid was first open sourced the original developers have solicited
>>> contributions from others, including through our blog, the project
>>> mailing
>>> lists, and through accepting !GitHub pull requests. We have an
>>> Apache-inspired governance structure with a PMC and committers, and our
>>> committer ranks include a good number of people from outside the original
>>> development team.
>>>
>>> === Community ===
>>>
>>> The Druid core developers have sought to nurture a community throughout
>>> the
>>> life of the project. We use !GitHub as the focal point for bug reports
>>> and
>>> code contributions, and the mailing lists for most other discussion. To
>>> try
>>> to make people feel welcome, we've also spelled this out on a
>>> "CONTRIBUTE"
>>> link from the project page: http://druid.io/community/. Today we have an
>>> active contributor base (a typical release has ~40 contributors) and
>>> mailing list.
>>>
>>> === Core Developers ===
>>>
>>> Druid enjoys good diversity of committer affiliation. The most active
>>> developers over the past year are affiliated with four different
>>> companies:
>>> Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are
>>> also
>>> committers on other ASF projects as well, including Apache Airflow,
>>> Apache
>>> Curator, and Apache Calcite. The original developers of Druid remain
>>> involved in the project.
>>>
>>> === Alignment ===
>>>
>>> Druid's current governance structure is Apache-inspired with a PMC and
>>> committers chosen by a meritocratic process. Additionally, Druid
>>> integrates
>>> with a number of other Apache projects, including Kafka, Hadoop, Hive,
>>> Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.
>>>
>>> == Known Risks ==
>>>
>>> === Orphaned products ===
>>>
>>> The risk of Druid becoming orphaned is low, due to a diverse committer
>>> base
>>> that is invested in the future of the project.
>>>
>>> === Inexperience with Open Source ===
>>>
>>> Druid's core developers have been running it as a community-oriented open
>>> source project for some time now, and many of them are committers on
>>> other
>>> open source projects as well, including Apache Airflow, Apache Curator,
>>> and
>>> Apache Calcite.
>>>
>>> === Homogenous Developers ===
>>>
>>> Druid's current diversity of committer affiliation means that we have
>>> become accustomed to working collaboratively and in the open. We hope
>>> that
>>> a transition to the ASF helps Druid's contributor base become even more
>>> diverse.
>>>
>>> === Reliance on Salaried Developers ===
>>>
>>> Druid's user base and contributor base skews heavily towards salaried
>>> developers. We believe this is natural since Druid is a technology
>>> designed
>>> to be deployed on large clusters, and due to this, tends to be deployed
>>> by
>>> organizations rather than by individuals. Nevertheless, many current
>>> Druid
>>> developers have continued working on the project even through job
>>> changes,
>>> which we take to be a good sign of developer commitment and personal
>>> interest.
>>>
>>> === Relationships with Other Apache Products ===
>>>
>>> Druid integrates with a number of other Apache projects. Druid internally
>>> uses Calcite for SQL planning, and Curator and !ZooKeeper for
>>> coordination.
>>> Druid can read data in Avro or Parquet format. Druid can load data from
>>> streams in Kafka or from files in Hadoop. Druid integrates with Hive as
>>> an
>>> option for SQL query acceleration. Druid data can be visualized by
>>> Superset
>>> (incubating).
>>>
>>> === A Excessive Fascination with the Apache Brand ===
>>>
>>> Druid is a successful project with a diverse community. The main reason
>>> for
>>> pursuing incubation is to find a stable, long term home for the project
>>> with a well known governance philosophy.
>>>
>>> == Required Resources ==
>>>
>>> === Mailing lists ===
>>>
>>> We would like to migrate the existing Druid mailing lists from Google
>>> Groups to Apache.
>>>
>>> * druid-user@googlegroups -> [hidden email]
>>> * druid-development@googlegroups -> [hidden email]
>>>
>>> === Source control ===
>>>
>>> Druid development currently takes place on !GitHub. We would like to
>>> continue using !GitHub, if possible, in order to preserve the workflows
>>> the
>>> community has developed around !GitHub pull requests.
>>>
>>> === Issue tracking ===
>>> Druid currently uses !GitHub issues for issue tracking. We would like to
>>> migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID.
>>>
>>> == Documentation ==
>>>
>>> Druid's documentation can be found at http://druid.io/docs/latest/.
>>>
>>> == Initial Source ==
>>>
>>> Druid was initially open-sourced by Metamarkets in 2012 and has been run
>>> in
>>> a community-governed fashion since then. The code is currently hosted at
>>> https://github.com/druid-io/ and includes the following repositories:
>>>
>>> * druid (primary repository)
>>> * druid-console (web console for Druid)
>>> * druid-io.github.io (source for Druid's website at http://druid.io/)
>>> * tranquility (realtime stream push client for Druid)
>>> * docker-druid (Docker image for Druid)
>>> * pydruid (Python library)
>>> * RDruid (R library)
>>> * oss-parent (Maven POM files)
>>>
>>> == Source and Intellectual Property Submission Plan ==
>>>
>>> A complete set of the open source code needs to be licensed from the
>>> owning
>>> organization to the Foundation. Commercial legal counsel for the owning
>>> organization will review the standard Foundation licensing paperwork and
>>> propose any updates as needed. This license will enable Apache to
>>> incubate
>>> and manage the Druid project moving forward.
>>>
>>> Other Druid paraphernalia to be transferred to Apache consists of:
>>>
>>> * !GitHub organization at https://github.com/druid-io/
>>> * Twitter account at https://twitter.com/druidio
>>> * "druid.io" domain name
>>> * "Druid" trademark assignment per Foundation standard paper.  The
>>> trademark assignment paperwork shall be reviewed by the owning
>>> organization's commercial and IP counsel
>>> * CLAs - all rights in the code licensed above should encompass the CLAs
>>> that existed between developers and owning organization
>>>
>>> A copyright license to the code, trademark assignment of Druid, and
>>> transfer of other paraphernalia to Apache should be sufficient to cover
>>> all
>>> rights required by Apache to operate the project.
>>>
>>> == External Dependencies ==
>>> External dependencies distributed with Druid currently all have one of
>>> the
>>> following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with
>>> one
>>> exception: the optional Druid MySQL metadata store extension depends on
>>> MySQL Connector/J, which is GPL licensed. Druid currently packages this
>>> as
>>> a separate download; see our current presentation on:
>>> http://druid.io/downloads.html. As part of incubation we intend to
>>> determine the best strategy for handling the MySQL extension.
>>>
>>> == Cryptography ==
>>> Not applicable.
>>>
>>> == Initial Committers ==
>>>
>>> The initial committers for incubation are the current set of committers
>>> on
>>> Druid who have expressed interest in being involved in Apache incubation.
>>> Affiliations are listed where relevant. We may seek to add other
>>> committers
>>> during incubation; for example, we would want to add any current Druid
>>> committers who express an interest after incubation begins.
>>>
>>> * Charles Allen ([hidden email]) (Snap)
>>> * David Lim ([hidden email]) (Imply)
>>> * Eric Tschetter ([hidden email]) (Splunk)
>>> * Fangjin Yang ([hidden email]) (Imply)
>>> * Gian Merlino ([hidden email]) (Imply)
>>> * Himanshu Gupta ([hidden email]) (Oath)
>>> * Jihoon Son ([hidden email]) (Imply)
>>> * Jonathan Wei ([hidden email]) (Imply)
>>> * Maxime Beauchemin ([hidden email]) (Lyft)
>>> * Mohamed Slim Bouguerra ([hidden email]) (Hortonworks)
>>> * Nishant Bangarwa ([hidden email]) (Hortonworks)
>>> * Parag Jain ([hidden email]) (Oath)
>>> * Roman Leventov ([hidden email]) (Metamarkets)
>>> * Xavier Léauté ([hidden email]) (Confluent)
>>>
>>> == Sponsors ==
>>>
>>> * Champion: Julian Hyde
>>> * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
>>> * Sponsoring entity: Apache Incubator
>>>
>>
>>
>
> --
>
>
> Spicule Limited is registered in England & Wales. Company Number:
> 09954122. Registered office: First Floor, Telecom House, 125-135 Preston
> Road, Brighton, England, BN1 6AF. VAT No. 251478891.
>
>
> All engagements are subject to Spicule Terms and Conditions of Business.
> This email and its contents are intended solely for the individual to whom
> it is addressed and may contain information that is confidential,
> privileged or otherwise protected from disclosure, distributing or copying.
> Any views or opinions presented in this email are solely those of the
> author and do not necessarily represent those of Spicule Limited. The
> company accepts no liability for any damage caused by any virus transmitted
> by this email. If you have received this message in error, please notify us
> immediately by reply email before deleting it from your system. Service of
> legal notice cannot be effected on Spicule Limited by email.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Druid incubation proposal

Jitendra Pandey
+1  
Druid will be a great addition to ASF.

On 2/21/18, 5:06 PM, "Ashutosh Chauhan" <[hidden email]> wrote:

    +1 for Druid in ASF.
    I have been involved with Hive Druid integration. If you are looking for
    mentors, happy to help.
   
    Thanks,
    Ashutosh
   
    On Fri, Feb 16, 2018 at 2:20 PM, Tom Barber <[hidden email]> wrote:
   
    > I can second most of that from the peanut gallery, my high level
    > interactions with a few Druid folk and keeping a watchful eye on a very
    > exciting project over the last few years.
    >
    > I think the Druid project would make an excellent addition to the ASF
    > portfolio.
    >
    > Tom
    >
    >
    > On 16/02/18 22:17, Julian Hyde wrote:
    >
    >> As Champion for this proposal, let me say that the Druid project will be
    >> an excellent addition to the ASF. I have been an observer of the project
    >> for a couple of years, and in many respects it is already operating in the
    >> Apache Way. Druid had paid developers from a number of companies, some of
    >> whom were in competition, and its governance was strong enough to navigate
    >> the choppy waters that that can create.
    >>
    >> A number of Druid committers subsequently started to work on Apache
    >> projects (Gian on Calcite, and Slim and Nishant on Hive) and so already
    >> know what to expect.
    >>
    >> You can get a sense of the project dynamic by reading the archives of
    >> their dev list: https://groups.google.com/forum/#!forum/druid-development
    >> <https://groups.google.com/forum/#!forum/druid-development>
    >>
    >> Julian
    >>
    >>
    >> On Feb 16, 2018, at 12:15 PM, Gian Merlino <[hidden email]> wrote:
    >>>
    >>> Hi all,
    >>>
    >>> I would like to open up a discussion about incubating Druid at Apache.
    >>> I've
    >>> included a proposal in this mail and have also posted a draft at
    >>> https://wiki.apache.org/incubator/DruidProposal. More information about
    >>> Druid is also available on our project web site at: http://druid.io/
    >>>
    >>> Thanks for your consideration!
    >>>
    >>> Gian
    >>>
    >>> = Druid Proposal =
    >>>
    >>> == Abstract ==
    >>>
    >>> Druid is a high-performance, column-oriented, distributed data store.
    >>>
    >>> == Proposal ==
    >>>
    >>> Druid is an open source data store designed for real-time exploratory
    >>> analytics on large data sets. Druid's key features are a column-oriented
    >>> storage layout, a distributed shared-nothing architecture, and ability to
    >>> generate and leverage indexing and caching structures. Druid is typically
    >>> deployed in clusters of tens to hundreds of nodes, and has the ability to
    >>> load data from Apache Kafka and Apache Hadoop, among other data sources.
    >>> Druid offers two query languages: a SQL dialect (powered by Apache
    >>> Calcite)
    >>> and a JSON-over-HTTP API.
    >>>
    >>> Druid was originally developed to power a slice-and-dice analytical UI
    >>> built on top of large event streams. The original use case for Druid
    >>> targeted ingest rates of millions of records/sec, retention of over a
    >>> year
    >>> of data, and query latencies of sub-second to a few seconds. Many people
    >>> can benefit from such capability, and many already have (see
    >>> http://druid.io/druid-powered.html). In addition, new use cases have
    >>> emerged since Druid's original development, such as OLAP acceleration of
    >>> data warehouse tables and more highly concurrent applications operating
    >>> with relatively narrower queries.
    >>>
    >>> == Background ==
    >>>
    >>> Druid is a data store designed for fast analytics. It would typically be
    >>> used in lieu of more general purpose query systems like Hadoop !MapReduce
    >>> or Spark when query latency is of the utmost importance. Druid is often
    >>> used as a data store for powering GUI analytical applications.
    >>>
    >>> The buzzwordy description of Druid is a high-performance,
    >>> column-oriented,
    >>> distributed data store. What we mean by this is:
    >>>
    >>> * "high performance": Druid aims to provide low query latency and high
    >>> ingest rates possible.
    >>> * "column-oriented": Druid stores data in a column-oriented format, like
    >>> most other systems designed for analytics. It can also store indexes
    >>> along
    >>> with the columns.
    >>> * "distributed": Druid is deployed in clusters, typically of tens to
    >>> hundreds of nodes.
    >>> * "data store": Druid loads your data and stores a copy of it on the
    >>> cluster's local disks (and may cache it in memory). It doesn't query your
    >>> data from some other storage system.
    >>>
    >>> == Rationale ==
    >>>
    >>> Druid is a mature, active project with a large number of production
    >>> installations, dozens of contributors to each release, and multiple
    >>> vendors
    >>> offering professional support. Given Druid's strong community, its close
    >>> integration with many other Apache projects (such as Kafka, Hadoop, and
    >>> Calcite), and its pre-existing Apache-inspired governance structure, we
    >>> feel that Apache is the best home for the project on a long-term basis.
    >>>
    >>> == Current Status ==
    >>>
    >>> === Meritocracy ===
    >>> Since Druid was first open sourced the original developers have solicited
    >>> contributions from others, including through our blog, the project
    >>> mailing
    >>> lists, and through accepting !GitHub pull requests. We have an
    >>> Apache-inspired governance structure with a PMC and committers, and our
    >>> committer ranks include a good number of people from outside the original
    >>> development team.
    >>>
    >>> === Community ===
    >>>
    >>> The Druid core developers have sought to nurture a community throughout
    >>> the
    >>> life of the project. We use !GitHub as the focal point for bug reports
    >>> and
    >>> code contributions, and the mailing lists for most other discussion. To
    >>> try
    >>> to make people feel welcome, we've also spelled this out on a
    >>> "CONTRIBUTE"
    >>> link from the project page: http://druid.io/community/. Today we have an
    >>> active contributor base (a typical release has ~40 contributors) and
    >>> mailing list.
    >>>
    >>> === Core Developers ===
    >>>
    >>> Druid enjoys good diversity of committer affiliation. The most active
    >>> developers over the past year are affiliated with four different
    >>> companies:
    >>> Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are
    >>> also
    >>> committers on other ASF projects as well, including Apache Airflow,
    >>> Apache
    >>> Curator, and Apache Calcite. The original developers of Druid remain
    >>> involved in the project.
    >>>
    >>> === Alignment ===
    >>>
    >>> Druid's current governance structure is Apache-inspired with a PMC and
    >>> committers chosen by a meritocratic process. Additionally, Druid
    >>> integrates
    >>> with a number of other Apache projects, including Kafka, Hadoop, Hive,
    >>> Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.
    >>>
    >>> == Known Risks ==
    >>>
    >>> === Orphaned products ===
    >>>
    >>> The risk of Druid becoming orphaned is low, due to a diverse committer
    >>> base
    >>> that is invested in the future of the project.
    >>>
    >>> === Inexperience with Open Source ===
    >>>
    >>> Druid's core developers have been running it as a community-oriented open
    >>> source project for some time now, and many of them are committers on
    >>> other
    >>> open source projects as well, including Apache Airflow, Apache Curator,
    >>> and
    >>> Apache Calcite.
    >>>
    >>> === Homogenous Developers ===
    >>>
    >>> Druid's current diversity of committer affiliation means that we have
    >>> become accustomed to working collaboratively and in the open. We hope
    >>> that
    >>> a transition to the ASF helps Druid's contributor base become even more
    >>> diverse.
    >>>
    >>> === Reliance on Salaried Developers ===
    >>>
    >>> Druid's user base and contributor base skews heavily towards salaried
    >>> developers. We believe this is natural since Druid is a technology
    >>> designed
    >>> to be deployed on large clusters, and due to this, tends to be deployed
    >>> by
    >>> organizations rather than by individuals. Nevertheless, many current
    >>> Druid
    >>> developers have continued working on the project even through job
    >>> changes,
    >>> which we take to be a good sign of developer commitment and personal
    >>> interest.
    >>>
    >>> === Relationships with Other Apache Products ===
    >>>
    >>> Druid integrates with a number of other Apache projects. Druid internally
    >>> uses Calcite for SQL planning, and Curator and !ZooKeeper for
    >>> coordination.
    >>> Druid can read data in Avro or Parquet format. Druid can load data from
    >>> streams in Kafka or from files in Hadoop. Druid integrates with Hive as
    >>> an
    >>> option for SQL query acceleration. Druid data can be visualized by
    >>> Superset
    >>> (incubating).
    >>>
    >>> === A Excessive Fascination with the Apache Brand ===
    >>>
    >>> Druid is a successful project with a diverse community. The main reason
    >>> for
    >>> pursuing incubation is to find a stable, long term home for the project
    >>> with a well known governance philosophy.
    >>>
    >>> == Required Resources ==
    >>>
    >>> === Mailing lists ===
    >>>
    >>> We would like to migrate the existing Druid mailing lists from Google
    >>> Groups to Apache.
    >>>
    >>> * druid-user@googlegroups -> [hidden email]
    >>> * druid-development@googlegroups -> [hidden email]
    >>>
    >>> === Source control ===
    >>>
    >>> Druid development currently takes place on !GitHub. We would like to
    >>> continue using !GitHub, if possible, in order to preserve the workflows
    >>> the
    >>> community has developed around !GitHub pull requests.
    >>>
    >>> === Issue tracking ===
    >>> Druid currently uses !GitHub issues for issue tracking. We would like to
    >>> migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID.
    >>>
    >>> == Documentation ==
    >>>
    >>> Druid's documentation can be found at http://druid.io/docs/latest/.
    >>>
    >>> == Initial Source ==
    >>>
    >>> Druid was initially open-sourced by Metamarkets in 2012 and has been run
    >>> in
    >>> a community-governed fashion since then. The code is currently hosted at
    >>> https://github.com/druid-io/ and includes the following repositories:
    >>>
    >>> * druid (primary repository)
    >>> * druid-console (web console for Druid)
    >>> * druid-io.github.io (source for Druid's website at http://druid.io/)
    >>> * tranquility (realtime stream push client for Druid)
    >>> * docker-druid (Docker image for Druid)
    >>> * pydruid (Python library)
    >>> * RDruid (R library)
    >>> * oss-parent (Maven POM files)
    >>>
    >>> == Source and Intellectual Property Submission Plan ==
    >>>
    >>> A complete set of the open source code needs to be licensed from the
    >>> owning
    >>> organization to the Foundation. Commercial legal counsel for the owning
    >>> organization will review the standard Foundation licensing paperwork and
    >>> propose any updates as needed. This license will enable Apache to
    >>> incubate
    >>> and manage the Druid project moving forward.
    >>>
    >>> Other Druid paraphernalia to be transferred to Apache consists of:
    >>>
    >>> * !GitHub organization at https://github.com/druid-io/
    >>> * Twitter account at https://twitter.com/druidio
    >>> * "druid.io" domain name
    >>> * "Druid" trademark assignment per Foundation standard paper.  The
    >>> trademark assignment paperwork shall be reviewed by the owning
    >>> organization's commercial and IP counsel
    >>> * CLAs - all rights in the code licensed above should encompass the CLAs
    >>> that existed between developers and owning organization
    >>>
    >>> A copyright license to the code, trademark assignment of Druid, and
    >>> transfer of other paraphernalia to Apache should be sufficient to cover
    >>> all
    >>> rights required by Apache to operate the project.
    >>>
    >>> == External Dependencies ==
    >>> External dependencies distributed with Druid currently all have one of
    >>> the
    >>> following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with
    >>> one
    >>> exception: the optional Druid MySQL metadata store extension depends on
    >>> MySQL Connector/J, which is GPL licensed. Druid currently packages this
    >>> as
    >>> a separate download; see our current presentation on:
    >>> http://druid.io/downloads.html. As part of incubation we intend to
    >>> determine the best strategy for handling the MySQL extension.
    >>>
    >>> == Cryptography ==
    >>> Not applicable.
    >>>
    >>> == Initial Committers ==
    >>>
    >>> The initial committers for incubation are the current set of committers
    >>> on
    >>> Druid who have expressed interest in being involved in Apache incubation.
    >>> Affiliations are listed where relevant. We may seek to add other
    >>> committers
    >>> during incubation; for example, we would want to add any current Druid
    >>> committers who express an interest after incubation begins.
    >>>
    >>> * Charles Allen ([hidden email]) (Snap)
    >>> * David Lim ([hidden email]) (Imply)
    >>> * Eric Tschetter ([hidden email]) (Splunk)
    >>> * Fangjin Yang ([hidden email]) (Imply)
    >>> * Gian Merlino ([hidden email]) (Imply)
    >>> * Himanshu Gupta ([hidden email]) (Oath)
    >>> * Jihoon Son ([hidden email]) (Imply)
    >>> * Jonathan Wei ([hidden email]) (Imply)
    >>> * Maxime Beauchemin ([hidden email]) (Lyft)
    >>> * Mohamed Slim Bouguerra ([hidden email]) (Hortonworks)
    >>> * Nishant Bangarwa ([hidden email]) (Hortonworks)
    >>> * Parag Jain ([hidden email]) (Oath)
    >>> * Roman Leventov ([hidden email]) (Metamarkets)
    >>> * Xavier Léauté ([hidden email]) (Confluent)
    >>>
    >>> == Sponsors ==
    >>>
    >>> * Champion: Julian Hyde
    >>> * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
    >>> * Sponsoring entity: Apache Incubator
    >>>
    >>
    >>
    >
    > --
    >
    >
    > Spicule Limited is registered in England & Wales. Company Number:
    > 09954122. Registered office: First Floor, Telecom House, 125-135 Preston
    > Road, Brighton, England, BN1 6AF. VAT No. 251478891.
    >
    >
    > All engagements are subject to Spicule Terms and Conditions of Business.
    > This email and its contents are intended solely for the individual to whom
    > it is addressed and may contain information that is confidential,
    > privileged or otherwise protected from disclosure, distributing or copying.
    > Any views or opinions presented in this email are solely those of the
    > author and do not necessarily represent those of Spicule Limited. The
    > company accepts no liability for any damage caused by any virus transmitted
    > by this email. If you have received this message in error, please notify us
    > immediately by reply email before deleting it from your system. Service of
    > legal notice cannot be effected on Spicule Limited by email.
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: [hidden email]
    > For additional commands, e-mail: [hidden email]
    >
    >
   


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Druid incubation proposal

Selvamohan Neethiraj-2
+1 for adding Druid to ASF

Thanks,
Selva-

> On Feb 21, 2018, at 10:03 PM, Jitendra Pandey <[hidden email]> wrote:
>
> +1
> Druid will be a great addition to ASF.
>
> On 2/21/18, 5:06 PM, "Ashutosh Chauhan" <[hidden email]> wrote:
>
>    +1 for Druid in ASF.
>    I have been involved with Hive Druid integration. If you are looking for
>    mentors, happy to help.
>
>    Thanks,
>    Ashutosh
>
>    On Fri, Feb 16, 2018 at 2:20 PM, Tom Barber <[hidden email]> wrote:
>
>> I can second most of that from the peanut gallery, my high level
>> interactions with a few Druid folk and keeping a watchful eye on a very
>> exciting project over the last few years.
>>
>> I think the Druid project would make an excellent addition to the ASF
>> portfolio.
>>
>> Tom
>>
>>
>> On 16/02/18 22:17, Julian Hyde wrote:
>>
>>> As Champion for this proposal, let me say that the Druid project will be
>>> an excellent addition to the ASF. I have been an observer of the project
>>> for a couple of years, and in many respects it is already operating in the
>>> Apache Way. Druid had paid developers from a number of companies, some of
>>> whom were in competition, and its governance was strong enough to navigate
>>> the choppy waters that that can create.
>>>
>>> A number of Druid committers subsequently started to work on Apache
>>> projects (Gian on Calcite, and Slim and Nishant on Hive) and so already
>>> know what to expect.
>>>
>>> You can get a sense of the project dynamic by reading the archives of
>>> their dev list: https://groups.google.com/forum/#!forum/druid-development
>>> <https://groups.google.com/forum/#!forum/druid-development>
>>>
>>> Julian
>>>
>>>
>>> On Feb 16, 2018, at 12:15 PM, Gian Merlino <[hidden email]> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I would like to open up a discussion about incubating Druid at Apache.
>>>> I've
>>>> included a proposal in this mail and have also posted a draft at
>>>> https://wiki.apache.org/incubator/DruidProposal. More information about
>>>> Druid is also available on our project web site at: http://druid.io/
>>>>
>>>> Thanks for your consideration!
>>>>
>>>> Gian
>>>>
>>>> = Druid Proposal =
>>>>
>>>> == Abstract ==
>>>>
>>>> Druid is a high-performance, column-oriented, distributed data store.
>>>>
>>>> == Proposal ==
>>>>
>>>> Druid is an open source data store designed for real-time exploratory
>>>> analytics on large data sets. Druid's key features are a column-oriented
>>>> storage layout, a distributed shared-nothing architecture, and ability to
>>>> generate and leverage indexing and caching structures. Druid is typically
>>>> deployed in clusters of tens to hundreds of nodes, and has the ability to
>>>> load data from Apache Kafka and Apache Hadoop, among other data sources.
>>>> Druid offers two query languages: a SQL dialect (powered by Apache
>>>> Calcite)
>>>> and a JSON-over-HTTP API.
>>>>
>>>> Druid was originally developed to power a slice-and-dice analytical UI
>>>> built on top of large event streams. The original use case for Druid
>>>> targeted ingest rates of millions of records/sec, retention of over a
>>>> year
>>>> of data, and query latencies of sub-second to a few seconds. Many people
>>>> can benefit from such capability, and many already have (see
>>>> http://druid.io/druid-powered.html). In addition, new use cases have
>>>> emerged since Druid's original development, such as OLAP acceleration of
>>>> data warehouse tables and more highly concurrent applications operating
>>>> with relatively narrower queries.
>>>>
>>>> == Background ==
>>>>
>>>> Druid is a data store designed for fast analytics. It would typically be
>>>> used in lieu of more general purpose query systems like Hadoop !MapReduce
>>>> or Spark when query latency is of the utmost importance. Druid is often
>>>> used as a data store for powering GUI analytical applications.
>>>>
>>>> The buzzwordy description of Druid is a high-performance,
>>>> column-oriented,
>>>> distributed data store. What we mean by this is:
>>>>
>>>> * "high performance": Druid aims to provide low query latency and high
>>>> ingest rates possible.
>>>> * "column-oriented": Druid stores data in a column-oriented format, like
>>>> most other systems designed for analytics. It can also store indexes
>>>> along
>>>> with the columns.
>>>> * "distributed": Druid is deployed in clusters, typically of tens to
>>>> hundreds of nodes.
>>>> * "data store": Druid loads your data and stores a copy of it on the
>>>> cluster's local disks (and may cache it in memory). It doesn't query your
>>>> data from some other storage system.
>>>>
>>>> == Rationale ==
>>>>
>>>> Druid is a mature, active project with a large number of production
>>>> installations, dozens of contributors to each release, and multiple
>>>> vendors
>>>> offering professional support. Given Druid's strong community, its close
>>>> integration with many other Apache projects (such as Kafka, Hadoop, and
>>>> Calcite), and its pre-existing Apache-inspired governance structure, we
>>>> feel that Apache is the best home for the project on a long-term basis.
>>>>
>>>> == Current Status ==
>>>>
>>>> === Meritocracy ===
>>>> Since Druid was first open sourced the original developers have solicited
>>>> contributions from others, including through our blog, the project
>>>> mailing
>>>> lists, and through accepting !GitHub pull requests. We have an
>>>> Apache-inspired governance structure with a PMC and committers, and our
>>>> committer ranks include a good number of people from outside the original
>>>> development team.
>>>>
>>>> === Community ===
>>>>
>>>> The Druid core developers have sought to nurture a community throughout
>>>> the
>>>> life of the project. We use !GitHub as the focal point for bug reports
>>>> and
>>>> code contributions, and the mailing lists for most other discussion. To
>>>> try
>>>> to make people feel welcome, we've also spelled this out on a
>>>> "CONTRIBUTE"
>>>> link from the project page: http://druid.io/community/. Today we have an
>>>> active contributor base (a typical release has ~40 contributors) and
>>>> mailing list.
>>>>
>>>> === Core Developers ===
>>>>
>>>> Druid enjoys good diversity of committer affiliation. The most active
>>>> developers over the past year are affiliated with four different
>>>> companies:
>>>> Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are
>>>> also
>>>> committers on other ASF projects as well, including Apache Airflow,
>>>> Apache
>>>> Curator, and Apache Calcite. The original developers of Druid remain
>>>> involved in the project.
>>>>
>>>> === Alignment ===
>>>>
>>>> Druid's current governance structure is Apache-inspired with a PMC and
>>>> committers chosen by a meritocratic process. Additionally, Druid
>>>> integrates
>>>> with a number of other Apache projects, including Kafka, Hadoop, Hive,
>>>> Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.
>>>>
>>>> == Known Risks ==
>>>>
>>>> === Orphaned products ===
>>>>
>>>> The risk of Druid becoming orphaned is low, due to a diverse committer
>>>> base
>>>> that is invested in the future of the project.
>>>>
>>>> === Inexperience with Open Source ===
>>>>
>>>> Druid's core developers have been running it as a community-oriented open
>>>> source project for some time now, and many of them are committers on
>>>> other
>>>> open source projects as well, including Apache Airflow, Apache Curator,
>>>> and
>>>> Apache Calcite.
>>>>
>>>> === Homogenous Developers ===
>>>>
>>>> Druid's current diversity of committer affiliation means that we have
>>>> become accustomed to working collaboratively and in the open. We hope
>>>> that
>>>> a transition to the ASF helps Druid's contributor base become even more
>>>> diverse.
>>>>
>>>> === Reliance on Salaried Developers ===
>>>>
>>>> Druid's user base and contributor base skews heavily towards salaried
>>>> developers. We believe this is natural since Druid is a technology
>>>> designed
>>>> to be deployed on large clusters, and due to this, tends to be deployed
>>>> by
>>>> organizations rather than by individuals. Nevertheless, many current
>>>> Druid
>>>> developers have continued working on the project even through job
>>>> changes,
>>>> which we take to be a good sign of developer commitment and personal
>>>> interest.
>>>>
>>>> === Relationships with Other Apache Products ===
>>>>
>>>> Druid integrates with a number of other Apache projects. Druid internally
>>>> uses Calcite for SQL planning, and Curator and !ZooKeeper for
>>>> coordination.
>>>> Druid can read data in Avro or Parquet format. Druid can load data from
>>>> streams in Kafka or from files in Hadoop. Druid integrates with Hive as
>>>> an
>>>> option for SQL query acceleration. Druid data can be visualized by
>>>> Superset
>>>> (incubating).
>>>>
>>>> === A Excessive Fascination with the Apache Brand ===
>>>>
>>>> Druid is a successful project with a diverse community. The main reason
>>>> for
>>>> pursuing incubation is to find a stable, long term home for the project
>>>> with a well known governance philosophy.
>>>>
>>>> == Required Resources ==
>>>>
>>>> === Mailing lists ===
>>>>
>>>> We would like to migrate the existing Druid mailing lists from Google
>>>> Groups to Apache.
>>>>
>>>> * druid-user@googlegroups -> [hidden email]
>>>> * druid-development@googlegroups -> [hidden email]
>>>>
>>>> === Source control ===
>>>>
>>>> Druid development currently takes place on !GitHub. We would like to
>>>> continue using !GitHub, if possible, in order to preserve the workflows
>>>> the
>>>> community has developed around !GitHub pull requests.
>>>>
>>>> === Issue tracking ===
>>>> Druid currently uses !GitHub issues for issue tracking. We would like to
>>>> migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID.
>>>>
>>>> == Documentation ==
>>>>
>>>> Druid's documentation can be found at http://druid.io/docs/latest/.
>>>>
>>>> == Initial Source ==
>>>>
>>>> Druid was initially open-sourced by Metamarkets in 2012 and has been run
>>>> in
>>>> a community-governed fashion since then. The code is currently hosted at
>>>> https://github.com/druid-io/ and includes the following repositories:
>>>>
>>>> * druid (primary repository)
>>>> * druid-console (web console for Druid)
>>>> * druid-io.github.io (source for Druid's website at http://druid.io/)
>>>> * tranquility (realtime stream push client for Druid)
>>>> * docker-druid (Docker image for Druid)
>>>> * pydruid (Python library)
>>>> * RDruid (R library)
>>>> * oss-parent (Maven POM files)
>>>>
>>>> == Source and Intellectual Property Submission Plan ==
>>>>
>>>> A complete set of the open source code needs to be licensed from the
>>>> owning
>>>> organization to the Foundation. Commercial legal counsel for the owning
>>>> organization will review the standard Foundation licensing paperwork and
>>>> propose any updates as needed. This license will enable Apache to
>>>> incubate
>>>> and manage the Druid project moving forward.
>>>>
>>>> Other Druid paraphernalia to be transferred to Apache consists of:
>>>>
>>>> * !GitHub organization at https://github.com/druid-io/
>>>> * Twitter account at https://twitter.com/druidio
>>>> * "druid.io" domain name
>>>> * "Druid" trademark assignment per Foundation standard paper.  The
>>>> trademark assignment paperwork shall be reviewed by the owning
>>>> organization's commercial and IP counsel
>>>> * CLAs - all rights in the code licensed above should encompass the CLAs
>>>> that existed between developers and owning organization
>>>>
>>>> A copyright license to the code, trademark assignment of Druid, and
>>>> transfer of other paraphernalia to Apache should be sufficient to cover
>>>> all
>>>> rights required by Apache to operate the project.
>>>>
>>>> == External Dependencies ==
>>>> External dependencies distributed with Druid currently all have one of
>>>> the
>>>> following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with
>>>> one
>>>> exception: the optional Druid MySQL metadata store extension depends on
>>>> MySQL Connector/J, which is GPL licensed. Druid currently packages this
>>>> as
>>>> a separate download; see our current presentation on:
>>>> http://druid.io/downloads.html. As part of incubation we intend to
>>>> determine the best strategy for handling the MySQL extension.
>>>>
>>>> == Cryptography ==
>>>> Not applicable.
>>>>
>>>> == Initial Committers ==
>>>>
>>>> The initial committers for incubation are the current set of committers
>>>> on
>>>> Druid who have expressed interest in being involved in Apache incubation.
>>>> Affiliations are listed where relevant. We may seek to add other
>>>> committers
>>>> during incubation; for example, we would want to add any current Druid
>>>> committers who express an interest after incubation begins.
>>>>
>>>> * Charles Allen ([hidden email]) (Snap)
>>>> * David Lim ([hidden email]) (Imply)
>>>> * Eric Tschetter ([hidden email]) (Splunk)
>>>> * Fangjin Yang ([hidden email]) (Imply)
>>>> * Gian Merlino ([hidden email]) (Imply)
>>>> * Himanshu Gupta ([hidden email]) (Oath)
>>>> * Jihoon Son ([hidden email]) (Imply)
>>>> * Jonathan Wei ([hidden email]) (Imply)
>>>> * Maxime Beauchemin ([hidden email]) (Lyft)
>>>> * Mohamed Slim Bouguerra ([hidden email]) (Hortonworks)
>>>> * Nishant Bangarwa ([hidden email]) (Hortonworks)
>>>> * Parag Jain ([hidden email]) (Oath)
>>>> * Roman Leventov ([hidden email]) (Metamarkets)
>>>> * Xavier Léauté ([hidden email]) (Confluent)
>>>>
>>>> == Sponsors ==
>>>>
>>>> * Champion: Julian Hyde
>>>> * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
>>>> * Sponsoring entity: Apache Incubator
>>>>
>>>
>>>
>>
>> --
>>
>>
>> Spicule Limited is registered in England & Wales. Company Number:
>> 09954122. Registered office: First Floor, Telecom House, 125-135 Preston
>> Road, Brighton, England, BN1 6AF. VAT No. 251478891.
>>
>>
>> All engagements are subject to Spicule Terms and Conditions of Business.
>> This email and its contents are intended solely for the individual to whom
>> it is addressed and may contain information that is confidential,
>> privileged or otherwise protected from disclosure, distributing or copying.
>> Any views or opinions presented in this email are solely those of the
>> author and do not necessarily represent those of Spicule Limited. The
>> company accepts no liability for any damage caused by any virus transmitted
>> by this email. If you have received this message in error, please notify us
>> immediately by reply email before deleting it from your system. Service of
>> legal notice cannot be effected on Spicule Limited by email.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


signature.asc (499 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Druid incubation proposal

Chris Mattmann-4
+1 from me...

Chris

On 2/21/18, 7:08 PM, "Selvamohan Neethiraj" <[hidden email]> wrote:

    +1 for adding Druid to ASF
   
    Thanks,
    Selva-
   
    > On Feb 21, 2018, at 10:03 PM, Jitendra Pandey <[hidden email]> wrote:
    >
    > +1
    > Druid will be a great addition to ASF.
    >
    > On 2/21/18, 5:06 PM, "Ashutosh Chauhan" <[hidden email]> wrote:
    >
    >    +1 for Druid in ASF.
    >    I have been involved with Hive Druid integration. If you are looking for
    >    mentors, happy to help.
    >
    >    Thanks,
    >    Ashutosh
    >
    >    On Fri, Feb 16, 2018 at 2:20 PM, Tom Barber <[hidden email]> wrote:
    >
    >> I can second most of that from the peanut gallery, my high level
    >> interactions with a few Druid folk and keeping a watchful eye on a very
    >> exciting project over the last few years.
    >>
    >> I think the Druid project would make an excellent addition to the ASF
    >> portfolio.
    >>
    >> Tom
    >>
    >>
    >> On 16/02/18 22:17, Julian Hyde wrote:
    >>
    >>> As Champion for this proposal, let me say that the Druid project will be
    >>> an excellent addition to the ASF. I have been an observer of the project
    >>> for a couple of years, and in many respects it is already operating in the
    >>> Apache Way. Druid had paid developers from a number of companies, some of
    >>> whom were in competition, and its governance was strong enough to navigate
    >>> the choppy waters that that can create.
    >>>
    >>> A number of Druid committers subsequently started to work on Apache
    >>> projects (Gian on Calcite, and Slim and Nishant on Hive) and so already
    >>> know what to expect.
    >>>
    >>> You can get a sense of the project dynamic by reading the archives of
    >>> their dev list: https://groups.google.com/forum/#!forum/druid-development
    >>> <https://groups.google.com/forum/#!forum/druid-development>
    >>>
    >>> Julian
    >>>
    >>>
    >>> On Feb 16, 2018, at 12:15 PM, Gian Merlino <[hidden email]> wrote:
    >>>>
    >>>> Hi all,
    >>>>
    >>>> I would like to open up a discussion about incubating Druid at Apache.
    >>>> I've
    >>>> included a proposal in this mail and have also posted a draft at
    >>>> https://wiki.apache.org/incubator/DruidProposal. More information about
    >>>> Druid is also available on our project web site at: http://druid.io/
    >>>>
    >>>> Thanks for your consideration!
    >>>>
    >>>> Gian
    >>>>
    >>>> = Druid Proposal =
    >>>>
    >>>> == Abstract ==
    >>>>
    >>>> Druid is a high-performance, column-oriented, distributed data store.
    >>>>
    >>>> == Proposal ==
    >>>>
    >>>> Druid is an open source data store designed for real-time exploratory
    >>>> analytics on large data sets. Druid's key features are a column-oriented
    >>>> storage layout, a distributed shared-nothing architecture, and ability to
    >>>> generate and leverage indexing and caching structures. Druid is typically
    >>>> deployed in clusters of tens to hundreds of nodes, and has the ability to
    >>>> load data from Apache Kafka and Apache Hadoop, among other data sources.
    >>>> Druid offers two query languages: a SQL dialect (powered by Apache
    >>>> Calcite)
    >>>> and a JSON-over-HTTP API.
    >>>>
    >>>> Druid was originally developed to power a slice-and-dice analytical UI
    >>>> built on top of large event streams. The original use case for Druid
    >>>> targeted ingest rates of millions of records/sec, retention of over a
    >>>> year
    >>>> of data, and query latencies of sub-second to a few seconds. Many people
    >>>> can benefit from such capability, and many already have (see
    >>>> http://druid.io/druid-powered.html). In addition, new use cases have
    >>>> emerged since Druid's original development, such as OLAP acceleration of
    >>>> data warehouse tables and more highly concurrent applications operating
    >>>> with relatively narrower queries.
    >>>>
    >>>> == Background ==
    >>>>
    >>>> Druid is a data store designed for fast analytics. It would typically be
    >>>> used in lieu of more general purpose query systems like Hadoop !MapReduce
    >>>> or Spark when query latency is of the utmost importance. Druid is often
    >>>> used as a data store for powering GUI analytical applications.
    >>>>
    >>>> The buzzwordy description of Druid is a high-performance,
    >>>> column-oriented,
    >>>> distributed data store. What we mean by this is:
    >>>>
    >>>> * "high performance": Druid aims to provide low query latency and high
    >>>> ingest rates possible.
    >>>> * "column-oriented": Druid stores data in a column-oriented format, like
    >>>> most other systems designed for analytics. It can also store indexes
    >>>> along
    >>>> with the columns.
    >>>> * "distributed": Druid is deployed in clusters, typically of tens to
    >>>> hundreds of nodes.
    >>>> * "data store": Druid loads your data and stores a copy of it on the
    >>>> cluster's local disks (and may cache it in memory). It doesn't query your
    >>>> data from some other storage system.
    >>>>
    >>>> == Rationale ==
    >>>>
    >>>> Druid is a mature, active project with a large number of production
    >>>> installations, dozens of contributors to each release, and multiple
    >>>> vendors
    >>>> offering professional support. Given Druid's strong community, its close
    >>>> integration with many other Apache projects (such as Kafka, Hadoop, and
    >>>> Calcite), and its pre-existing Apache-inspired governance structure, we
    >>>> feel that Apache is the best home for the project on a long-term basis.
    >>>>
    >>>> == Current Status ==
    >>>>
    >>>> === Meritocracy ===
    >>>> Since Druid was first open sourced the original developers have solicited
    >>>> contributions from others, including through our blog, the project
    >>>> mailing
    >>>> lists, and through accepting !GitHub pull requests. We have an
    >>>> Apache-inspired governance structure with a PMC and committers, and our
    >>>> committer ranks include a good number of people from outside the original
    >>>> development team.
    >>>>
    >>>> === Community ===
    >>>>
    >>>> The Druid core developers have sought to nurture a community throughout
    >>>> the
    >>>> life of the project. We use !GitHub as the focal point for bug reports
    >>>> and
    >>>> code contributions, and the mailing lists for most other discussion. To
    >>>> try
    >>>> to make people feel welcome, we've also spelled this out on a
    >>>> "CONTRIBUTE"
    >>>> link from the project page: http://druid.io/community/. Today we have an
    >>>> active contributor base (a typical release has ~40 contributors) and
    >>>> mailing list.
    >>>>
    >>>> === Core Developers ===
    >>>>
    >>>> Druid enjoys good diversity of committer affiliation. The most active
    >>>> developers over the past year are affiliated with four different
    >>>> companies:
    >>>> Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are
    >>>> also
    >>>> committers on other ASF projects as well, including Apache Airflow,
    >>>> Apache
    >>>> Curator, and Apache Calcite. The original developers of Druid remain
    >>>> involved in the project.
    >>>>
    >>>> === Alignment ===
    >>>>
    >>>> Druid's current governance structure is Apache-inspired with a PMC and
    >>>> committers chosen by a meritocratic process. Additionally, Druid
    >>>> integrates
    >>>> with a number of other Apache projects, including Kafka, Hadoop, Hive,
    >>>> Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.
    >>>>
    >>>> == Known Risks ==
    >>>>
    >>>> === Orphaned products ===
    >>>>
    >>>> The risk of Druid becoming orphaned is low, due to a diverse committer
    >>>> base
    >>>> that is invested in the future of the project.
    >>>>
    >>>> === Inexperience with Open Source ===
    >>>>
    >>>> Druid's core developers have been running it as a community-oriented open
    >>>> source project for some time now, and many of them are committers on
    >>>> other
    >>>> open source projects as well, including Apache Airflow, Apache Curator,
    >>>> and
    >>>> Apache Calcite.
    >>>>
    >>>> === Homogenous Developers ===
    >>>>
    >>>> Druid's current diversity of committer affiliation means that we have
    >>>> become accustomed to working collaboratively and in the open. We hope
    >>>> that
    >>>> a transition to the ASF helps Druid's contributor base become even more
    >>>> diverse.
    >>>>
    >>>> === Reliance on Salaried Developers ===
    >>>>
    >>>> Druid's user base and contributor base skews heavily towards salaried
    >>>> developers. We believe this is natural since Druid is a technology
    >>>> designed
    >>>> to be deployed on large clusters, and due to this, tends to be deployed
    >>>> by
    >>>> organizations rather than by individuals. Nevertheless, many current
    >>>> Druid
    >>>> developers have continued working on the project even through job
    >>>> changes,
    >>>> which we take to be a good sign of developer commitment and personal
    >>>> interest.
    >>>>
    >>>> === Relationships with Other Apache Products ===
    >>>>
    >>>> Druid integrates with a number of other Apache projects. Druid internally
    >>>> uses Calcite for SQL planning, and Curator and !ZooKeeper for
    >>>> coordination.
    >>>> Druid can read data in Avro or Parquet format. Druid can load data from
    >>>> streams in Kafka or from files in Hadoop. Druid integrates with Hive as
    >>>> an
    >>>> option for SQL query acceleration. Druid data can be visualized by
    >>>> Superset
    >>>> (incubating).
    >>>>
    >>>> === A Excessive Fascination with the Apache Brand ===
    >>>>
    >>>> Druid is a successful project with a diverse community. The main reason
    >>>> for
    >>>> pursuing incubation is to find a stable, long term home for the project
    >>>> with a well known governance philosophy.
    >>>>
    >>>> == Required Resources ==
    >>>>
    >>>> === Mailing lists ===
    >>>>
    >>>> We would like to migrate the existing Druid mailing lists from Google
    >>>> Groups to Apache.
    >>>>
    >>>> * druid-user@googlegroups -> [hidden email]
    >>>> * druid-development@googlegroups -> [hidden email]
    >>>>
    >>>> === Source control ===
    >>>>
    >>>> Druid development currently takes place on !GitHub. We would like to
    >>>> continue using !GitHub, if possible, in order to preserve the workflows
    >>>> the
    >>>> community has developed around !GitHub pull requests.
    >>>>
    >>>> === Issue tracking ===
    >>>> Druid currently uses !GitHub issues for issue tracking. We would like to
    >>>> migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID.
    >>>>
    >>>> == Documentation ==
    >>>>
    >>>> Druid's documentation can be found at http://druid.io/docs/latest/.
    >>>>
    >>>> == Initial Source ==
    >>>>
    >>>> Druid was initially open-sourced by Metamarkets in 2012 and has been run
    >>>> in
    >>>> a community-governed fashion since then. The code is currently hosted at
    >>>> https://github.com/druid-io/ and includes the following repositories:
    >>>>
    >>>> * druid (primary repository)
    >>>> * druid-console (web console for Druid)
    >>>> * druid-io.github.io (source for Druid's website at http://druid.io/)
    >>>> * tranquility (realtime stream push client for Druid)
    >>>> * docker-druid (Docker image for Druid)
    >>>> * pydruid (Python library)
    >>>> * RDruid (R library)
    >>>> * oss-parent (Maven POM files)
    >>>>
    >>>> == Source and Intellectual Property Submission Plan ==
    >>>>
    >>>> A complete set of the open source code needs to be licensed from the
    >>>> owning
    >>>> organization to the Foundation. Commercial legal counsel for the owning
    >>>> organization will review the standard Foundation licensing paperwork and
    >>>> propose any updates as needed. This license will enable Apache to
    >>>> incubate
    >>>> and manage the Druid project moving forward.
    >>>>
    >>>> Other Druid paraphernalia to be transferred to Apache consists of:
    >>>>
    >>>> * !GitHub organization at https://github.com/druid-io/
    >>>> * Twitter account at https://twitter.com/druidio
    >>>> * "druid.io" domain name
    >>>> * "Druid" trademark assignment per Foundation standard paper.  The
    >>>> trademark assignment paperwork shall be reviewed by the owning
    >>>> organization's commercial and IP counsel
    >>>> * CLAs - all rights in the code licensed above should encompass the CLAs
    >>>> that existed between developers and owning organization
    >>>>
    >>>> A copyright license to the code, trademark assignment of Druid, and
    >>>> transfer of other paraphernalia to Apache should be sufficient to cover
    >>>> all
    >>>> rights required by Apache to operate the project.
    >>>>
    >>>> == External Dependencies ==
    >>>> External dependencies distributed with Druid currently all have one of
    >>>> the
    >>>> following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with
    >>>> one
    >>>> exception: the optional Druid MySQL metadata store extension depends on
    >>>> MySQL Connector/J, which is GPL licensed. Druid currently packages this
    >>>> as
    >>>> a separate download; see our current presentation on:
    >>>> http://druid.io/downloads.html. As part of incubation we intend to
    >>>> determine the best strategy for handling the MySQL extension.
    >>>>
    >>>> == Cryptography ==
    >>>> Not applicable.
    >>>>
    >>>> == Initial Committers ==
    >>>>
    >>>> The initial committers for incubation are the current set of committers
    >>>> on
    >>>> Druid who have expressed interest in being involved in Apache incubation.
    >>>> Affiliations are listed where relevant. We may seek to add other
    >>>> committers
    >>>> during incubation; for example, we would want to add any current Druid
    >>>> committers who express an interest after incubation begins.
    >>>>
    >>>> * Charles Allen ([hidden email]) (Snap)
    >>>> * David Lim ([hidden email]) (Imply)
    >>>> * Eric Tschetter ([hidden email]) (Splunk)
    >>>> * Fangjin Yang ([hidden email]) (Imply)
    >>>> * Gian Merlino ([hidden email]) (Imply)
    >>>> * Himanshu Gupta ([hidden email]) (Oath)
    >>>> * Jihoon Son ([hidden email]) (Imply)
    >>>> * Jonathan Wei ([hidden email]) (Imply)
    >>>> * Maxime Beauchemin ([hidden email]) (Lyft)
    >>>> * Mohamed Slim Bouguerra ([hidden email]) (Hortonworks)
    >>>> * Nishant Bangarwa ([hidden email]) (Hortonworks)
    >>>> * Parag Jain ([hidden email]) (Oath)
    >>>> * Roman Leventov ([hidden email]) (Metamarkets)
    >>>> * Xavier Léauté ([hidden email]) (Confluent)
    >>>>
    >>>> == Sponsors ==
    >>>>
    >>>> * Champion: Julian Hyde
    >>>> * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
    >>>> * Sponsoring entity: Apache Incubator
    >>>>
    >>>
    >>>
    >>
    >> --
    >>
    >>
    >> Spicule Limited is registered in England & Wales. Company Number:
    >> 09954122. Registered office: First Floor, Telecom House, 125-135 Preston
    >> Road, Brighton, England, BN1 6AF. VAT No. 251478891.
    >>
    >>
    >> All engagements are subject to Spicule Terms and Conditions of Business.
    >> This email and its contents are intended solely for the individual to whom
    >> it is addressed and may contain information that is confidential,
    >> privileged or otherwise protected from disclosure, distributing or copying.
    >> Any views or opinions presented in this email are solely those of the
    >> author and do not necessarily represent those of Spicule Limited. The
    >> company accepts no liability for any damage caused by any virus transmitted
    >> by this email. If you have received this message in error, please notify us
    >> immediately by reply email before deleting it from your system. Service of
    >> legal notice cannot be effected on Spicule Limited by email.
    >>
    >> ---------------------------------------------------------------------
    >> To unsubscribe, e-mail: [hidden email]
    >> For additional commands, e-mail: [hidden email]
    >>
    >>
    >
    >
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: [hidden email]
    > For additional commands, e-mail: [hidden email]
   
   



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Druid incubation proposal

Jordan Zimmerman-2
In reply to this post by Gian Merlino
+1 For Druid at ASF - great project for Apache

> On Feb 16, 2018, at 3:15 PM, Gian Merlino <[hidden email]> wrote:
>
> Hi all,
>
> I would like to open up a discussion about incubating Druid at Apache. I've
> included a proposal in this mail and have also posted a draft at
> https://wiki.apache.org/incubator/DruidProposal. More information about
> Druid is also available on our project web site at: http://druid.io/
>
> Thanks for your consideration!
>
> Gian
>
> = Druid Proposal =
>
> == Abstract ==
>
> Druid is a high-performance, column-oriented, distributed data store.
>
> == Proposal ==
>
> Druid is an open source data store designed for real-time exploratory
> analytics on large data sets. Druid's key features are a column-oriented
> storage layout, a distributed shared-nothing architecture, and ability to
> generate and leverage indexing and caching structures. Druid is typically
> deployed in clusters of tens to hundreds of nodes, and has the ability to
> load data from Apache Kafka and Apache Hadoop, among other data sources.
> Druid offers two query languages: a SQL dialect (powered by Apache Calcite)
> and a JSON-over-HTTP API.
>
> Druid was originally developed to power a slice-and-dice analytical UI
> built on top of large event streams. The original use case for Druid
> targeted ingest rates of millions of records/sec, retention of over a year
> of data, and query latencies of sub-second to a few seconds. Many people
> can benefit from such capability, and many already have (see
> http://druid.io/druid-powered.html). In addition, new use cases have
> emerged since Druid's original development, such as OLAP acceleration of
> data warehouse tables and more highly concurrent applications operating
> with relatively narrower queries.
>
> == Background ==
>
> Druid is a data store designed for fast analytics. It would typically be
> used in lieu of more general purpose query systems like Hadoop !MapReduce
> or Spark when query latency is of the utmost importance. Druid is often
> used as a data store for powering GUI analytical applications.
>
> The buzzwordy description of Druid is a high-performance, column-oriented,
> distributed data store. What we mean by this is:
>
> * "high performance": Druid aims to provide low query latency and high
> ingest rates possible.
> * "column-oriented": Druid stores data in a column-oriented format, like
> most other systems designed for analytics. It can also store indexes along
> with the columns.
> * "distributed": Druid is deployed in clusters, typically of tens to
> hundreds of nodes.
> * "data store": Druid loads your data and stores a copy of it on the
> cluster's local disks (and may cache it in memory). It doesn't query your
> data from some other storage system.
>
> == Rationale ==
>
> Druid is a mature, active project with a large number of production
> installations, dozens of contributors to each release, and multiple vendors
> offering professional support. Given Druid's strong community, its close
> integration with many other Apache projects (such as Kafka, Hadoop, and
> Calcite), and its pre-existing Apache-inspired governance structure, we
> feel that Apache is the best home for the project on a long-term basis.
>
> == Current Status ==
>
> === Meritocracy ===
> Since Druid was first open sourced the original developers have solicited
> contributions from others, including through our blog, the project mailing
> lists, and through accepting !GitHub pull requests. We have an
> Apache-inspired governance structure with a PMC and committers, and our
> committer ranks include a good number of people from outside the original
> development team.
>
> === Community ===
>
> The Druid core developers have sought to nurture a community throughout the
> life of the project. We use !GitHub as the focal point for bug reports and
> code contributions, and the mailing lists for most other discussion. To try
> to make people feel welcome, we've also spelled this out on a "CONTRIBUTE"
> link from the project page: http://druid.io/community/. Today we have an
> active contributor base (a typical release has ~40 contributors) and
> mailing list.
>
> === Core Developers ===
>
> Druid enjoys good diversity of committer affiliation. The most active
> developers over the past year are affiliated with four different companies:
> Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are also
> committers on other ASF projects as well, including Apache Airflow, Apache
> Curator, and Apache Calcite. The original developers of Druid remain
> involved in the project.
>
> === Alignment ===
>
> Druid's current governance structure is Apache-inspired with a PMC and
> committers chosen by a meritocratic process. Additionally, Druid integrates
> with a number of other Apache projects, including Kafka, Hadoop, Hive,
> Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.
>
> == Known Risks ==
>
> === Orphaned products ===
>
> The risk of Druid becoming orphaned is low, due to a diverse committer base
> that is invested in the future of the project.
>
> === Inexperience with Open Source ===
>
> Druid's core developers have been running it as a community-oriented open
> source project for some time now, and many of them are committers on other
> open source projects as well, including Apache Airflow, Apache Curator, and
> Apache Calcite.
>
> === Homogenous Developers ===
>
> Druid's current diversity of committer affiliation means that we have
> become accustomed to working collaboratively and in the open. We hope that
> a transition to the ASF helps Druid's contributor base become even more
> diverse.
>
> === Reliance on Salaried Developers ===
>
> Druid's user base and contributor base skews heavily towards salaried
> developers. We believe this is natural since Druid is a technology designed
> to be deployed on large clusters, and due to this, tends to be deployed by
> organizations rather than by individuals. Nevertheless, many current Druid
> developers have continued working on the project even through job changes,
> which we take to be a good sign of developer commitment and personal
> interest.
>
> === Relationships with Other Apache Products ===
>
> Druid integrates with a number of other Apache projects. Druid internally
> uses Calcite for SQL planning, and Curator and !ZooKeeper for coordination.
> Druid can read data in Avro or Parquet format. Druid can load data from
> streams in Kafka or from files in Hadoop. Druid integrates with Hive as an
> option for SQL query acceleration. Druid data can be visualized by Superset
> (incubating).
>
> === A Excessive Fascination with the Apache Brand ===
>
> Druid is a successful project with a diverse community. The main reason for
> pursuing incubation is to find a stable, long term home for the project
> with a well known governance philosophy.
>
> == Required Resources ==
>
> === Mailing lists ===
>
> We would like to migrate the existing Druid mailing lists from Google
> Groups to Apache.
>
> * druid-user@googlegroups -> [hidden email]
> * druid-development@googlegroups -> [hidden email]
>
> === Source control ===
>
> Druid development currently takes place on !GitHub. We would like to
> continue using !GitHub, if possible, in order to preserve the workflows the
> community has developed around !GitHub pull requests.
>
> === Issue tracking ===
> Druid currently uses !GitHub issues for issue tracking. We would like to
> migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID.
>
> == Documentation ==
>
> Druid's documentation can be found at http://druid.io/docs/latest/.
>
> == Initial Source ==
>
> Druid was initially open-sourced by Metamarkets in 2012 and has been run in
> a community-governed fashion since then. The code is currently hosted at
> https://github.com/druid-io/ and includes the following repositories:
>
> * druid (primary repository)
> * druid-console (web console for Druid)
> * druid-io.github.io (source for Druid's website at http://druid.io/)
> * tranquility (realtime stream push client for Druid)
> * docker-druid (Docker image for Druid)
> * pydruid (Python library)
> * RDruid (R library)
> * oss-parent (Maven POM files)
>
> == Source and Intellectual Property Submission Plan ==
>
> A complete set of the open source code needs to be licensed from the owning
> organization to the Foundation. Commercial legal counsel for the owning
> organization will review the standard Foundation licensing paperwork and
> propose any updates as needed. This license will enable Apache to incubate
> and manage the Druid project moving forward.
>
> Other Druid paraphernalia to be transferred to Apache consists of:
>
> * !GitHub organization at https://github.com/druid-io/
> * Twitter account at https://twitter.com/druidio
> * "druid.io" domain name
> * "Druid" trademark assignment per Foundation standard paper.  The
> trademark assignment paperwork shall be reviewed by the owning
> organization's commercial and IP counsel
> * CLAs - all rights in the code licensed above should encompass the CLAs
> that existed between developers and owning organization
>
> A copyright license to the code, trademark assignment of Druid, and
> transfer of other paraphernalia to Apache should be sufficient to cover all
> rights required by Apache to operate the project.
>
> == External Dependencies ==
> External dependencies distributed with Druid currently all have one of the
> following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with one
> exception: the optional Druid MySQL metadata store extension depends on
> MySQL Connector/J, which is GPL licensed. Druid currently packages this as
> a separate download; see our current presentation on:
> http://druid.io/downloads.html. As part of incubation we intend to
> determine the best strategy for handling the MySQL extension.
>
> == Cryptography ==
> Not applicable.
>
> == Initial Committers ==
>
> The initial committers for incubation are the current set of committers on
> Druid who have expressed interest in being involved in Apache incubation.
> Affiliations are listed where relevant. We may seek to add other committers
> during incubation; for example, we would want to add any current Druid
> committers who express an interest after incubation begins.
>
> * Charles Allen ([hidden email]) (Snap)
> * David Lim ([hidden email]) (Imply)
> * Eric Tschetter ([hidden email]) (Splunk)
> * Fangjin Yang ([hidden email]) (Imply)
> * Gian Merlino ([hidden email]) (Imply)
> * Himanshu Gupta ([hidden email]) (Oath)
> * Jihoon Son ([hidden email]) (Imply)
> * Jonathan Wei ([hidden email]) (Imply)
> * Maxime Beauchemin ([hidden email]) (Lyft)
> * Mohamed Slim Bouguerra ([hidden email]) (Hortonworks)
> * Nishant Bangarwa ([hidden email]) (Hortonworks)
> * Parag Jain ([hidden email]) (Oath)
> * Roman Leventov ([hidden email]) (Metamarkets)
> * Xavier Léauté ([hidden email]) (Confluent)
>
> == Sponsors ==
>
> * Champion: Julian Hyde
> * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
> * Sponsoring entity: Apache Incubator


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Druid incubation proposal

Henning Schmiedehausen-4
In reply to this post by Gian Merlino
Woot!

+1 for druid incubation.

-h



On Fri, Feb 16, 2018 at 12:15 PM, Gian Merlino <[hidden email]> wrote:

> Hi all,
>
> I would like to open up a discussion about incubating Druid at Apache. I've
> included a proposal in this mail and have also posted a draft at
> https://wiki.apache.org/incubator/DruidProposal. More information about
> Druid is also available on our project web site at: http://druid.io/
>
> Thanks for your consideration!
>
> Gian
>
> = Druid Proposal =
>
> == Abstract ==
>
> Druid is a high-performance, column-oriented, distributed data store.
>
> == Proposal ==
>
> Druid is an open source data store designed for real-time exploratory
> analytics on large data sets. Druid's key features are a column-oriented
> storage layout, a distributed shared-nothing architecture, and ability to
> generate and leverage indexing and caching structures. Druid is typically
> deployed in clusters of tens to hundreds of nodes, and has the ability to
> load data from Apache Kafka and Apache Hadoop, among other data sources.
> Druid offers two query languages: a SQL dialect (powered by Apache Calcite)
> and a JSON-over-HTTP API.
>
> Druid was originally developed to power a slice-and-dice analytical UI
> built on top of large event streams. The original use case for Druid
> targeted ingest rates of millions of records/sec, retention of over a year
> of data, and query latencies of sub-second to a few seconds. Many people
> can benefit from such capability, and many already have (see
> http://druid.io/druid-powered.html). In addition, new use cases have
> emerged since Druid's original development, such as OLAP acceleration of
> data warehouse tables and more highly concurrent applications operating
> with relatively narrower queries.
>
> == Background ==
>
> Druid is a data store designed for fast analytics. It would typically be
> used in lieu of more general purpose query systems like Hadoop !MapReduce
> or Spark when query latency is of the utmost importance. Druid is often
> used as a data store for powering GUI analytical applications.
>
> The buzzwordy description of Druid is a high-performance, column-oriented,
> distributed data store. What we mean by this is:
>
>  * "high performance": Druid aims to provide low query latency and high
> ingest rates possible.
>  * "column-oriented": Druid stores data in a column-oriented format, like
> most other systems designed for analytics. It can also store indexes along
> with the columns.
>  * "distributed": Druid is deployed in clusters, typically of tens to
> hundreds of nodes.
>  * "data store": Druid loads your data and stores a copy of it on the
> cluster's local disks (and may cache it in memory). It doesn't query your
> data from some other storage system.
>
> == Rationale ==
>
> Druid is a mature, active project with a large number of production
> installations, dozens of contributors to each release, and multiple vendors
> offering professional support. Given Druid's strong community, its close
> integration with many other Apache projects (such as Kafka, Hadoop, and
> Calcite), and its pre-existing Apache-inspired governance structure, we
> feel that Apache is the best home for the project on a long-term basis.
>
> == Current Status ==
>
> === Meritocracy ===
> Since Druid was first open sourced the original developers have solicited
> contributions from others, including through our blog, the project mailing
> lists, and through accepting !GitHub pull requests. We have an
> Apache-inspired governance structure with a PMC and committers, and our
> committer ranks include a good number of people from outside the original
> development team.
>
> === Community ===
>
> The Druid core developers have sought to nurture a community throughout the
> life of the project. We use !GitHub as the focal point for bug reports and
> code contributions, and the mailing lists for most other discussion. To try
> to make people feel welcome, we've also spelled this out on a "CONTRIBUTE"
> link from the project page: http://druid.io/community/. Today we have an
> active contributor base (a typical release has ~40 contributors) and
> mailing list.
>
> === Core Developers ===
>
> Druid enjoys good diversity of committer affiliation. The most active
> developers over the past year are affiliated with four different companies:
> Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are also
> committers on other ASF projects as well, including Apache Airflow, Apache
> Curator, and Apache Calcite. The original developers of Druid remain
> involved in the project.
>
> === Alignment ===
>
> Druid's current governance structure is Apache-inspired with a PMC and
> committers chosen by a meritocratic process. Additionally, Druid integrates
> with a number of other Apache projects, including Kafka, Hadoop, Hive,
> Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.
>
> == Known Risks ==
>
> === Orphaned products ===
>
> The risk of Druid becoming orphaned is low, due to a diverse committer base
> that is invested in the future of the project.
>
> === Inexperience with Open Source ===
>
> Druid's core developers have been running it as a community-oriented open
> source project for some time now, and many of them are committers on other
> open source projects as well, including Apache Airflow, Apache Curator, and
> Apache Calcite.
>
> === Homogenous Developers ===
>
> Druid's current diversity of committer affiliation means that we have
> become accustomed to working collaboratively and in the open. We hope that
> a transition to the ASF helps Druid's contributor base become even more
> diverse.
>
> === Reliance on Salaried Developers ===
>
> Druid's user base and contributor base skews heavily towards salaried
> developers. We believe this is natural since Druid is a technology designed
> to be deployed on large clusters, and due to this, tends to be deployed by
> organizations rather than by individuals. Nevertheless, many current Druid
> developers have continued working on the project even through job changes,
> which we take to be a good sign of developer commitment and personal
> interest.
>
> === Relationships with Other Apache Products ===
>
> Druid integrates with a number of other Apache projects. Druid internally
> uses Calcite for SQL planning, and Curator and !ZooKeeper for coordination.
> Druid can read data in Avro or Parquet format. Druid can load data from
> streams in Kafka or from files in Hadoop. Druid integrates with Hive as an
> option for SQL query acceleration. Druid data can be visualized by Superset
> (incubating).
>
> === A Excessive Fascination with the Apache Brand ===
>
> Druid is a successful project with a diverse community. The main reason for
> pursuing incubation is to find a stable, long term home for the project
> with a well known governance philosophy.
>
> == Required Resources ==
>
> === Mailing lists ===
>
> We would like to migrate the existing Druid mailing lists from Google
> Groups to Apache.
>
>  * druid-user@googlegroups -> [hidden email]
>  * druid-development@googlegroups -> [hidden email]
>
> === Source control ===
>
> Druid development currently takes place on !GitHub. We would like to
> continue using !GitHub, if possible, in order to preserve the workflows the
> community has developed around !GitHub pull requests.
>
> === Issue tracking ===
> Druid currently uses !GitHub issues for issue tracking. We would like to
> migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID.
>
> == Documentation ==
>
> Druid's documentation can be found at http://druid.io/docs/latest/.
>
> == Initial Source ==
>
> Druid was initially open-sourced by Metamarkets in 2012 and has been run in
> a community-governed fashion since then. The code is currently hosted at
> https://github.com/druid-io/ and includes the following repositories:
>
>  * druid (primary repository)
>  * druid-console (web console for Druid)
>  * druid-io.github.io (source for Druid's website at http://druid.io/)
>  * tranquility (realtime stream push client for Druid)
>  * docker-druid (Docker image for Druid)
>  * pydruid (Python library)
>  * RDruid (R library)
>  * oss-parent (Maven POM files)
>
> == Source and Intellectual Property Submission Plan ==
>
> A complete set of the open source code needs to be licensed from the owning
> organization to the Foundation. Commercial legal counsel for the owning
> organization will review the standard Foundation licensing paperwork and
> propose any updates as needed. This license will enable Apache to incubate
> and manage the Druid project moving forward.
>
> Other Druid paraphernalia to be transferred to Apache consists of:
>
>  * !GitHub organization at https://github.com/druid-io/
>  * Twitter account at https://twitter.com/druidio
>  * "druid.io" domain name
>  * "Druid" trademark assignment per Foundation standard paper.  The
> trademark assignment paperwork shall be reviewed by the owning
> organization's commercial and IP counsel
>  * CLAs - all rights in the code licensed above should encompass the CLAs
> that existed between developers and owning organization
>
> A copyright license to the code, trademark assignment of Druid, and
> transfer of other paraphernalia to Apache should be sufficient to cover all
> rights required by Apache to operate the project.
>
> == External Dependencies ==
> External dependencies distributed with Druid currently all have one of the
> following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with one
> exception: the optional Druid MySQL metadata store extension depends on
> MySQL Connector/J, which is GPL licensed. Druid currently packages this as
> a separate download; see our current presentation on:
> http://druid.io/downloads.html. As part of incubation we intend to
> determine the best strategy for handling the MySQL extension.
>
> == Cryptography ==
> Not applicable.
>
> == Initial Committers ==
>
> The initial committers for incubation are the current set of committers on
> Druid who have expressed interest in being involved in Apache incubation.
> Affiliations are listed where relevant. We may seek to add other committers
> during incubation; for example, we would want to add any current Druid
> committers who express an interest after incubation begins.
>
>  * Charles Allen ([hidden email]) (Snap)
>  * David Lim ([hidden email]) (Imply)
>  * Eric Tschetter ([hidden email]) (Splunk)
>  * Fangjin Yang ([hidden email]) (Imply)
>  * Gian Merlino ([hidden email]) (Imply)
>  * Himanshu Gupta ([hidden email]) (Oath)
>  * Jihoon Son ([hidden email]) (Imply)
>  * Jonathan Wei ([hidden email]) (Imply)
>  * Maxime Beauchemin ([hidden email]) (Lyft)
>  * Mohamed Slim Bouguerra ([hidden email]) (Hortonworks)
>  * Nishant Bangarwa ([hidden email]) (Hortonworks)
>  * Parag Jain ([hidden email]) (Oath)
>  * Roman Leventov ([hidden email]) (Metamarkets)
>  * Xavier Léauté ([hidden email]) (Confluent)
>
> == Sponsors ==
>
>  * Champion: Julian Hyde
>  * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
>  * Sponsoring entity: Apache Incubator
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Druid incubation proposal

Brian McCallister
+1 - glad to see Druid finally (hopefully) landing here!

On Wed, Feb 21, 2018 at 10:57 PM, Henning Schmiedehausen <
[hidden email]> wrote:

> Woot!
>
> +1 for druid incubation.
>
> -h
>
>
>
> On Fri, Feb 16, 2018 at 12:15 PM, Gian Merlino <[hidden email]> wrote:
>
> > Hi all,
> >
> > I would like to open up a discussion about incubating Druid at Apache.
> I've
> > included a proposal in this mail and have also posted a draft at
> > https://wiki.apache.org/incubator/DruidProposal. More information about
> > Druid is also available on our project web site at: http://druid.io/
> >
> > Thanks for your consideration!
> >
> > Gian
> >
> > = Druid Proposal =
> >
> > == Abstract ==
> >
> > Druid is a high-performance, column-oriented, distributed data store.
> >
> > == Proposal ==
> >
> > Druid is an open source data store designed for real-time exploratory
> > analytics on large data sets. Druid's key features are a column-oriented
> > storage layout, a distributed shared-nothing architecture, and ability to
> > generate and leverage indexing and caching structures. Druid is typically
> > deployed in clusters of tens to hundreds of nodes, and has the ability to
> > load data from Apache Kafka and Apache Hadoop, among other data sources.
> > Druid offers two query languages: a SQL dialect (powered by Apache
> Calcite)
> > and a JSON-over-HTTP API.
> >
> > Druid was originally developed to power a slice-and-dice analytical UI
> > built on top of large event streams. The original use case for Druid
> > targeted ingest rates of millions of records/sec, retention of over a
> year
> > of data, and query latencies of sub-second to a few seconds. Many people
> > can benefit from such capability, and many already have (see
> > http://druid.io/druid-powered.html). In addition, new use cases have
> > emerged since Druid's original development, such as OLAP acceleration of
> > data warehouse tables and more highly concurrent applications operating
> > with relatively narrower queries.
> >
> > == Background ==
> >
> > Druid is a data store designed for fast analytics. It would typically be
> > used in lieu of more general purpose query systems like Hadoop !MapReduce
> > or Spark when query latency is of the utmost importance. Druid is often
> > used as a data store for powering GUI analytical applications.
> >
> > The buzzwordy description of Druid is a high-performance,
> column-oriented,
> > distributed data store. What we mean by this is:
> >
> >  * "high performance": Druid aims to provide low query latency and high
> > ingest rates possible.
> >  * "column-oriented": Druid stores data in a column-oriented format, like
> > most other systems designed for analytics. It can also store indexes
> along
> > with the columns.
> >  * "distributed": Druid is deployed in clusters, typically of tens to
> > hundreds of nodes.
> >  * "data store": Druid loads your data and stores a copy of it on the
> > cluster's local disks (and may cache it in memory). It doesn't query your
> > data from some other storage system.
> >
> > == Rationale ==
> >
> > Druid is a mature, active project with a large number of production
> > installations, dozens of contributors to each release, and multiple
> vendors
> > offering professional support. Given Druid's strong community, its close
> > integration with many other Apache projects (such as Kafka, Hadoop, and
> > Calcite), and its pre-existing Apache-inspired governance structure, we
> > feel that Apache is the best home for the project on a long-term basis.
> >
> > == Current Status ==
> >
> > === Meritocracy ===
> > Since Druid was first open sourced the original developers have solicited
> > contributions from others, including through our blog, the project
> mailing
> > lists, and through accepting !GitHub pull requests. We have an
> > Apache-inspired governance structure with a PMC and committers, and our
> > committer ranks include a good number of people from outside the original
> > development team.
> >
> > === Community ===
> >
> > The Druid core developers have sought to nurture a community throughout
> the
> > life of the project. We use !GitHub as the focal point for bug reports
> and
> > code contributions, and the mailing lists for most other discussion. To
> try
> > to make people feel welcome, we've also spelled this out on a
> "CONTRIBUTE"
> > link from the project page: http://druid.io/community/. Today we have an
> > active contributor base (a typical release has ~40 contributors) and
> > mailing list.
> >
> > === Core Developers ===
> >
> > Druid enjoys good diversity of committer affiliation. The most active
> > developers over the past year are affiliated with four different
> companies:
> > Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are
> also
> > committers on other ASF projects as well, including Apache Airflow,
> Apache
> > Curator, and Apache Calcite. The original developers of Druid remain
> > involved in the project.
> >
> > === Alignment ===
> >
> > Druid's current governance structure is Apache-inspired with a PMC and
> > committers chosen by a meritocratic process. Additionally, Druid
> integrates
> > with a number of other Apache projects, including Kafka, Hadoop, Hive,
> > Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.
> >
> > == Known Risks ==
> >
> > === Orphaned products ===
> >
> > The risk of Druid becoming orphaned is low, due to a diverse committer
> base
> > that is invested in the future of the project.
> >
> > === Inexperience with Open Source ===
> >
> > Druid's core developers have been running it as a community-oriented open
> > source project for some time now, and many of them are committers on
> other
> > open source projects as well, including Apache Airflow, Apache Curator,
> and
> > Apache Calcite.
> >
> > === Homogenous Developers ===
> >
> > Druid's current diversity of committer affiliation means that we have
> > become accustomed to working collaboratively and in the open. We hope
> that
> > a transition to the ASF helps Druid's contributor base become even more
> > diverse.
> >
> > === Reliance on Salaried Developers ===
> >
> > Druid's user base and contributor base skews heavily towards salaried
> > developers. We believe this is natural since Druid is a technology
> designed
> > to be deployed on large clusters, and due to this, tends to be deployed
> by
> > organizations rather than by individuals. Nevertheless, many current
> Druid
> > developers have continued working on the project even through job
> changes,
> > which we take to be a good sign of developer commitment and personal
> > interest.
> >
> > === Relationships with Other Apache Products ===
> >
> > Druid integrates with a number of other Apache projects. Druid internally
> > uses Calcite for SQL planning, and Curator and !ZooKeeper for
> coordination.
> > Druid can read data in Avro or Parquet format. Druid can load data from
> > streams in Kafka or from files in Hadoop. Druid integrates with Hive as
> an
> > option for SQL query acceleration. Druid data can be visualized by
> Superset
> > (incubating).
> >
> > === A Excessive Fascination with the Apache Brand ===
> >
> > Druid is a successful project with a diverse community. The main reason
> for
> > pursuing incubation is to find a stable, long term home for the project
> > with a well known governance philosophy.
> >
> > == Required Resources ==
> >
> > === Mailing lists ===
> >
> > We would like to migrate the existing Druid mailing lists from Google
> > Groups to Apache.
> >
> >  * druid-user@googlegroups -> [hidden email]
> >  * druid-development@googlegroups -> [hidden email]
> >
> > === Source control ===
> >
> > Druid development currently takes place on !GitHub. We would like to
> > continue using !GitHub, if possible, in order to preserve the workflows
> the
> > community has developed around !GitHub pull requests.
> >
> > === Issue tracking ===
> > Druid currently uses !GitHub issues for issue tracking. We would like to
> > migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID.
> >
> > == Documentation ==
> >
> > Druid's documentation can be found at http://druid.io/docs/latest/.
> >
> > == Initial Source ==
> >
> > Druid was initially open-sourced by Metamarkets in 2012 and has been run
> in
> > a community-governed fashion since then. The code is currently hosted at
> > https://github.com/druid-io/ and includes the following repositories:
> >
> >  * druid (primary repository)
> >  * druid-console (web console for Druid)
> >  * druid-io.github.io (source for Druid's website at http://druid.io/)
> >  * tranquility (realtime stream push client for Druid)
> >  * docker-druid (Docker image for Druid)
> >  * pydruid (Python library)
> >  * RDruid (R library)
> >  * oss-parent (Maven POM files)
> >
> > == Source and Intellectual Property Submission Plan ==
> >
> > A complete set of the open source code needs to be licensed from the
> owning
> > organization to the Foundation. Commercial legal counsel for the owning
> > organization will review the standard Foundation licensing paperwork and
> > propose any updates as needed. This license will enable Apache to
> incubate
> > and manage the Druid project moving forward.
> >
> > Other Druid paraphernalia to be transferred to Apache consists of:
> >
> >  * !GitHub organization at https://github.com/druid-io/
> >  * Twitter account at https://twitter.com/druidio
> >  * "druid.io" domain name
> >  * "Druid" trademark assignment per Foundation standard paper.  The
> > trademark assignment paperwork shall be reviewed by the owning
> > organization's commercial and IP counsel
> >  * CLAs - all rights in the code licensed above should encompass the CLAs
> > that existed between developers and owning organization
> >
> > A copyright license to the code, trademark assignment of Druid, and
> > transfer of other paraphernalia to Apache should be sufficient to cover
> all
> > rights required by Apache to operate the project.
> >
> > == External Dependencies ==
> > External dependencies distributed with Druid currently all have one of
> the
> > following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with
> one
> > exception: the optional Druid MySQL metadata store extension depends on
> > MySQL Connector/J, which is GPL licensed. Druid currently packages this
> as
> > a separate download; see our current presentation on:
> > http://druid.io/downloads.html. As part of incubation we intend to
> > determine the best strategy for handling the MySQL extension.
> >
> > == Cryptography ==
> > Not applicable.
> >
> > == Initial Committers ==
> >
> > The initial committers for incubation are the current set of committers
> on
> > Druid who have expressed interest in being involved in Apache incubation.
> > Affiliations are listed where relevant. We may seek to add other
> committers
> > during incubation; for example, we would want to add any current Druid
> > committers who express an interest after incubation begins.
> >
> >  * Charles Allen ([hidden email]) (Snap)
> >  * David Lim ([hidden email]) (Imply)
> >  * Eric Tschetter ([hidden email]) (Splunk)
> >  * Fangjin Yang ([hidden email]) (Imply)
> >  * Gian Merlino ([hidden email]) (Imply)
> >  * Himanshu Gupta ([hidden email]) (Oath)
> >  * Jihoon Son ([hidden email]) (Imply)
> >  * Jonathan Wei ([hidden email]) (Imply)
> >  * Maxime Beauchemin ([hidden email]) (Lyft)
> >  * Mohamed Slim Bouguerra ([hidden email]) (Hortonworks)
> >  * Nishant Bangarwa ([hidden email]) (Hortonworks)
> >  * Parag Jain ([hidden email]) (Oath)
> >  * Roman Leventov ([hidden email]) (Metamarkets)
> >  * Xavier Léauté ([hidden email]) (Confluent)
> >
> > == Sponsors ==
> >
> >  * Champion: Julian Hyde
> >  * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
> >  * Sponsoring entity: Apache Incubator
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Druid incubation proposal

Amol Kekre
+1. Great to see Druid joining ASF.


Thks,
Amol



E:[hidden email] | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Thu, Feb 22, 2018 at 8:57 AM, Brian McCallister <[hidden email]> wrote:

> +1 - glad to see Druid finally (hopefully) landing here!
>
> On Wed, Feb 21, 2018 at 10:57 PM, Henning Schmiedehausen <
> [hidden email]> wrote:
>
> > Woot!
> >
> > +1 for druid incubation.
> >
> > -h
> >
> >
> >
> > On Fri, Feb 16, 2018 at 12:15 PM, Gian Merlino <[hidden email]> wrote:
> >
> > > Hi all,
> > >
> > > I would like to open up a discussion about incubating Druid at Apache.
> > I've
> > > included a proposal in this mail and have also posted a draft at
> > > https://wiki.apache.org/incubator/DruidProposal. More information
> about
> > > Druid is also available on our project web site at: http://druid.io/
> > >
> > > Thanks for your consideration!
> > >
> > > Gian
> > >
> > > = Druid Proposal =
> > >
> > > == Abstract ==
> > >
> > > Druid is a high-performance, column-oriented, distributed data store.
> > >
> > > == Proposal ==
> > >
> > > Druid is an open source data store designed for real-time exploratory
> > > analytics on large data sets. Druid's key features are a
> column-oriented
> > > storage layout, a distributed shared-nothing architecture, and ability
> to
> > > generate and leverage indexing and caching structures. Druid is
> typically
> > > deployed in clusters of tens to hundreds of nodes, and has the ability
> to
> > > load data from Apache Kafka and Apache Hadoop, among other data
> sources.
> > > Druid offers two query languages: a SQL dialect (powered by Apache
> > Calcite)
> > > and a JSON-over-HTTP API.
> > >
> > > Druid was originally developed to power a slice-and-dice analytical UI
> > > built on top of large event streams. The original use case for Druid
> > > targeted ingest rates of millions of records/sec, retention of over a
> > year
> > > of data, and query latencies of sub-second to a few seconds. Many
> people
> > > can benefit from such capability, and many already have (see
> > > http://druid.io/druid-powered.html). In addition, new use cases have
> > > emerged since Druid's original development, such as OLAP acceleration
> of
> > > data warehouse tables and more highly concurrent applications operating
> > > with relatively narrower queries.
> > >
> > > == Background ==
> > >
> > > Druid is a data store designed for fast analytics. It would typically
> be
> > > used in lieu of more general purpose query systems like Hadoop
> !MapReduce
> > > or Spark when query latency is of the utmost importance. Druid is often
> > > used as a data store for powering GUI analytical applications.
> > >
> > > The buzzwordy description of Druid is a high-performance,
> > column-oriented,
> > > distributed data store. What we mean by this is:
> > >
> > >  * "high performance": Druid aims to provide low query latency and high
> > > ingest rates possible.
> > >  * "column-oriented": Druid stores data in a column-oriented format,
> like
> > > most other systems designed for analytics. It can also store indexes
> > along
> > > with the columns.
> > >  * "distributed": Druid is deployed in clusters, typically of tens to
> > > hundreds of nodes.
> > >  * "data store": Druid loads your data and stores a copy of it on the
> > > cluster's local disks (and may cache it in memory). It doesn't query
> your
> > > data from some other storage system.
> > >
> > > == Rationale ==
> > >
> > > Druid is a mature, active project with a large number of production
> > > installations, dozens of contributors to each release, and multiple
> > vendors
> > > offering professional support. Given Druid's strong community, its
> close
> > > integration with many other Apache projects (such as Kafka, Hadoop, and
> > > Calcite), and its pre-existing Apache-inspired governance structure, we
> > > feel that Apache is the best home for the project on a long-term basis.
> > >
> > > == Current Status ==
> > >
> > > === Meritocracy ===
> > > Since Druid was first open sourced the original developers have
> solicited
> > > contributions from others, including through our blog, the project
> > mailing
> > > lists, and through accepting !GitHub pull requests. We have an
> > > Apache-inspired governance structure with a PMC and committers, and our
> > > committer ranks include a good number of people from outside the
> original
> > > development team.
> > >
> > > === Community ===
> > >
> > > The Druid core developers have sought to nurture a community throughout
> > the
> > > life of the project. We use !GitHub as the focal point for bug reports
> > and
> > > code contributions, and the mailing lists for most other discussion. To
> > try
> > > to make people feel welcome, we've also spelled this out on a
> > "CONTRIBUTE"
> > > link from the project page: http://druid.io/community/. Today we have
> an
> > > active contributor base (a typical release has ~40 contributors) and
> > > mailing list.
> > >
> > > === Core Developers ===
> > >
> > > Druid enjoys good diversity of committer affiliation. The most active
> > > developers over the past year are affiliated with four different
> > companies:
> > > Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are
> > also
> > > committers on other ASF projects as well, including Apache Airflow,
> > Apache
> > > Curator, and Apache Calcite. The original developers of Druid remain
> > > involved in the project.
> > >
> > > === Alignment ===
> > >
> > > Druid's current governance structure is Apache-inspired with a PMC and
> > > committers chosen by a meritocratic process. Additionally, Druid
> > integrates
> > > with a number of other Apache projects, including Kafka, Hadoop, Hive,
> > > Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.
> > >
> > > == Known Risks ==
> > >
> > > === Orphaned products ===
> > >
> > > The risk of Druid becoming orphaned is low, due to a diverse committer
> > base
> > > that is invested in the future of the project.
> > >
> > > === Inexperience with Open Source ===
> > >
> > > Druid's core developers have been running it as a community-oriented
> open
> > > source project for some time now, and many of them are committers on
> > other
> > > open source projects as well, including Apache Airflow, Apache Curator,
> > and
> > > Apache Calcite.
> > >
> > > === Homogenous Developers ===
> > >
> > > Druid's current diversity of committer affiliation means that we have
> > > become accustomed to working collaboratively and in the open. We hope
> > that
> > > a transition to the ASF helps Druid's contributor base become even more
> > > diverse.
> > >
> > > === Reliance on Salaried Developers ===
> > >
> > > Druid's user base and contributor base skews heavily towards salaried
> > > developers. We believe this is natural since Druid is a technology
> > designed
> > > to be deployed on large clusters, and due to this, tends to be deployed
> > by
> > > organizations rather than by individuals. Nevertheless, many current
> > Druid
> > > developers have continued working on the project even through job
> > changes,
> > > which we take to be a good sign of developer commitment and personal
> > > interest.
> > >
> > > === Relationships with Other Apache Products ===
> > >
> > > Druid integrates with a number of other Apache projects. Druid
> internally
> > > uses Calcite for SQL planning, and Curator and !ZooKeeper for
> > coordination.
> > > Druid can read data in Avro or Parquet format. Druid can load data from
> > > streams in Kafka or from files in Hadoop. Druid integrates with Hive as
> > an
> > > option for SQL query acceleration. Druid data can be visualized by
> > Superset
> > > (incubating).
> > >
> > > === A Excessive Fascination with the Apache Brand ===
> > >
> > > Druid is a successful project with a diverse community. The main reason
> > for
> > > pursuing incubation is to find a stable, long term home for the project
> > > with a well known governance philosophy.
> > >
> > > == Required Resources ==
> > >
> > > === Mailing lists ===
> > >
> > > We would like to migrate the existing Druid mailing lists from Google
> > > Groups to Apache.
> > >
> > >  * druid-user@googlegroups -> [hidden email]
> > >  * druid-development@googlegroups -> [hidden email]
> > >
> > > === Source control ===
> > >
> > > Druid development currently takes place on !GitHub. We would like to
> > > continue using !GitHub, if possible, in order to preserve the workflows
> > the
> > > community has developed around !GitHub pull requests.
> > >
> > > === Issue tracking ===
> > > Druid currently uses !GitHub issues for issue tracking. We would like
> to
> > > migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID.
> > >
> > > == Documentation ==
> > >
> > > Druid's documentation can be found at http://druid.io/docs/latest/.
> > >
> > > == Initial Source ==
> > >
> > > Druid was initially open-sourced by Metamarkets in 2012 and has been
> run
> > in
> > > a community-governed fashion since then. The code is currently hosted
> at
> > > https://github.com/druid-io/ and includes the following repositories:
> > >
> > >  * druid (primary repository)
> > >  * druid-console (web console for Druid)
> > >  * druid-io.github.io (source for Druid's website at http://druid.io/)
> > >  * tranquility (realtime stream push client for Druid)
> > >  * docker-druid (Docker image for Druid)
> > >  * pydruid (Python library)
> > >  * RDruid (R library)
> > >  * oss-parent (Maven POM files)
> > >
> > > == Source and Intellectual Property Submission Plan ==
> > >
> > > A complete set of the open source code needs to be licensed from the
> > owning
> > > organization to the Foundation. Commercial legal counsel for the owning
> > > organization will review the standard Foundation licensing paperwork
> and
> > > propose any updates as needed. This license will enable Apache to
> > incubate
> > > and manage the Druid project moving forward.
> > >
> > > Other Druid paraphernalia to be transferred to Apache consists of:
> > >
> > >  * !GitHub organization at https://github.com/druid-io/
> > >  * Twitter account at https://twitter.com/druidio
> > >  * "druid.io" domain name
> > >  * "Druid" trademark assignment per Foundation standard paper.  The
> > > trademark assignment paperwork shall be reviewed by the owning
> > > organization's commercial and IP counsel
> > >  * CLAs - all rights in the code licensed above should encompass the
> CLAs
> > > that existed between developers and owning organization
> > >
> > > A copyright license to the code, trademark assignment of Druid, and
> > > transfer of other paraphernalia to Apache should be sufficient to cover
> > all
> > > rights required by Apache to operate the project.
> > >
> > > == External Dependencies ==
> > > External dependencies distributed with Druid currently all have one of
> > the
> > > following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with
> > one
> > > exception: the optional Druid MySQL metadata store extension depends on
> > > MySQL Connector/J, which is GPL licensed. Druid currently packages this
> > as
> > > a separate download; see our current presentation on:
> > > http://druid.io/downloads.html. As part of incubation we intend to
> > > determine the best strategy for handling the MySQL extension.
> > >
> > > == Cryptography ==
> > > Not applicable.
> > >
> > > == Initial Committers ==
> > >
> > > The initial committers for incubation are the current set of committers
> > on
> > > Druid who have expressed interest in being involved in Apache
> incubation.
> > > Affiliations are listed where relevant. We may seek to add other
> > committers
> > > during incubation; for example, we would want to add any current Druid
> > > committers who express an interest after incubation begins.
> > >
> > >  * Charles Allen ([hidden email]) (Snap)
> > >  * David Lim ([hidden email]) (Imply)
> > >  * Eric Tschetter ([hidden email]) (Splunk)
> > >  * Fangjin Yang ([hidden email]) (Imply)
> > >  * Gian Merlino ([hidden email]) (Imply)
> > >  * Himanshu Gupta ([hidden email]) (Oath)
> > >  * Jihoon Son ([hidden email]) (Imply)
> > >  * Jonathan Wei ([hidden email]) (Imply)
> > >  * Maxime Beauchemin ([hidden email]) (Lyft)
> > >  * Mohamed Slim Bouguerra ([hidden email]) (Hortonworks)
> > >  * Nishant Bangarwa ([hidden email]) (Hortonworks)
> > >  * Parag Jain ([hidden email]) (Oath)
> > >  * Roman Leventov ([hidden email]) (Metamarkets)
> > >  * Xavier Léauté ([hidden email]) (Confluent)
> > >
> > > == Sponsors ==
> > >
> > >  * Champion: Julian Hyde
> > >  * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
> > >  * Sponsoring entity: Apache Incubator
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Druid incubation proposal

Pramod Immaneni
In reply to this post by Gian Merlino
+1

On Fri, Feb 16, 2018 at 12:15 PM, Gian Merlino <[hidden email]> wrote:

> Hi all,
>
> I would like to open up a discussion about incubating Druid at Apache. I've
> included a proposal in this mail and have also posted a draft at
> https://wiki.apache.org/incubator/DruidProposal. More information about
> Druid is also available on our project web site at: http://druid.io/
>
> Thanks for your consideration!
>
> Gian
>
> = Druid Proposal =
>
> == Abstract ==
>
> Druid is a high-performance, column-oriented, distributed data store.
>
> == Proposal ==
>
> Druid is an open source data store designed for real-time exploratory
> analytics on large data sets. Druid's key features are a column-oriented
> storage layout, a distributed shared-nothing architecture, and ability to
> generate and leverage indexing and caching structures. Druid is typically
> deployed in clusters of tens to hundreds of nodes, and has the ability to
> load data from Apache Kafka and Apache Hadoop, among other data sources.
> Druid offers two query languages: a SQL dialect (powered by Apache Calcite)
> and a JSON-over-HTTP API.
>
> Druid was originally developed to power a slice-and-dice analytical UI
> built on top of large event streams. The original use case for Druid
> targeted ingest rates of millions of records/sec, retention of over a year
> of data, and query latencies of sub-second to a few seconds. Many people
> can benefit from such capability, and many already have (see
> http://druid.io/druid-powered.html). In addition, new use cases have
> emerged since Druid's original development, such as OLAP acceleration of
> data warehouse tables and more highly concurrent applications operating
> with relatively narrower queries.
>
> == Background ==
>
> Druid is a data store designed for fast analytics. It would typically be
> used in lieu of more general purpose query systems like Hadoop !MapReduce
> or Spark when query latency is of the utmost importance. Druid is often
> used as a data store for powering GUI analytical applications.
>
> The buzzwordy description of Druid is a high-performance, column-oriented,
> distributed data store. What we mean by this is:
>
>  * "high performance": Druid aims to provide low query latency and high
> ingest rates possible.
>  * "column-oriented": Druid stores data in a column-oriented format, like
> most other systems designed for analytics. It can also store indexes along
> with the columns.
>  * "distributed": Druid is deployed in clusters, typically of tens to
> hundreds of nodes.
>  * "data store": Druid loads your data and stores a copy of it on the
> cluster's local disks (and may cache it in memory). It doesn't query your
> data from some other storage system.
>
> == Rationale ==
>
> Druid is a mature, active project with a large number of production
> installations, dozens of contributors to each release, and multiple vendors
> offering professional support. Given Druid's strong community, its close
> integration with many other Apache projects (such as Kafka, Hadoop, and
> Calcite), and its pre-existing Apache-inspired governance structure, we
> feel that Apache is the best home for the project on a long-term basis.
>
> == Current Status ==
>
> === Meritocracy ===
> Since Druid was first open sourced the original developers have solicited
> contributions from others, including through our blog, the project mailing
> lists, and through accepting !GitHub pull requests. We have an
> Apache-inspired governance structure with a PMC and committers, and our
> committer ranks include a good number of people from outside the original
> development team.
>
> === Community ===
>
> The Druid core developers have sought to nurture a community throughout the
> life of the project. We use !GitHub as the focal point for bug reports and
> code contributions, and the mailing lists for most other discussion. To try
> to make people feel welcome, we've also spelled this out on a "CONTRIBUTE"
> link from the project page: http://druid.io/community/. Today we have an
> active contributor base (a typical release has ~40 contributors) and
> mailing list.
>
> === Core Developers ===
>
> Druid enjoys good diversity of committer affiliation. The most active
> developers over the past year are affiliated with four different companies:
> Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are also
> committers on other ASF projects as well, including Apache Airflow, Apache
> Curator, and Apache Calcite. The original developers of Druid remain
> involved in the project.
>
> === Alignment ===
>
> Druid's current governance structure is Apache-inspired with a PMC and
> committers chosen by a meritocratic process. Additionally, Druid integrates
> with a number of other Apache projects, including Kafka, Hadoop, Hive,
> Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.
>
> == Known Risks ==
>
> === Orphaned products ===
>
> The risk of Druid becoming orphaned is low, due to a diverse committer base
> that is invested in the future of the project.
>
> === Inexperience with Open Source ===
>
> Druid's core developers have been running it as a community-oriented open
> source project for some time now, and many of them are committers on other
> open source projects as well, including Apache Airflow, Apache Curator, and
> Apache Calcite.
>
> === Homogenous Developers ===
>
> Druid's current diversity of committer affiliation means that we have
> become accustomed to working collaboratively and in the open. We hope that
> a transition to the ASF helps Druid's contributor base become even more
> diverse.
>
> === Reliance on Salaried Developers ===
>
> Druid's user base and contributor base skews heavily towards salaried
> developers. We believe this is natural since Druid is a technology designed
> to be deployed on large clusters, and due to this, tends to be deployed by
> organizations rather than by individuals. Nevertheless, many current Druid
> developers have continued working on the project even through job changes,
> which we take to be a good sign of developer commitment and personal
> interest.
>
> === Relationships with Other Apache Products ===
>
> Druid integrates with a number of other Apache projects. Druid internally
> uses Calcite for SQL planning, and Curator and !ZooKeeper for coordination.
> Druid can read data in Avro or Parquet format. Druid can load data from
> streams in Kafka or from files in Hadoop. Druid integrates with Hive as an
> option for SQL query acceleration. Druid data can be visualized by Superset
> (incubating).
>
> === A Excessive Fascination with the Apache Brand ===
>
> Druid is a successful project with a diverse community. The main reason for
> pursuing incubation is to find a stable, long term home for the project
> with a well known governance philosophy.
>
> == Required Resources ==
>
> === Mailing lists ===
>
> We would like to migrate the existing Druid mailing lists from Google
> Groups to Apache.
>
>  * druid-user@googlegroups -> [hidden email]
>  * druid-development@googlegroups -> [hidden email]
>
> === Source control ===
>
> Druid development currently takes place on !GitHub. We would like to
> continue using !GitHub, if possible, in order to preserve the workflows the
> community has developed around !GitHub pull requests.
>
> === Issue tracking ===
> Druid currently uses !GitHub issues for issue tracking. We would like to
> migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID.
>
> == Documentation ==
>
> Druid's documentation can be found at http://druid.io/docs/latest/.
>
> == Initial Source ==
>
> Druid was initially open-sourced by Metamarkets in 2012 and has been run in
> a community-governed fashion since then. The code is currently hosted at
> https://github.com/druid-io/ and includes the following repositories:
>
>  * druid (primary repository)
>  * druid-console (web console for Druid)
>  * druid-io.github.io (source for Druid's website at http://druid.io/)
>  * tranquility (realtime stream push client for Druid)
>  * docker-druid (Docker image for Druid)
>  * pydruid (Python library)
>  * RDruid (R library)
>  * oss-parent (Maven POM files)
>
> == Source and Intellectual Property Submission Plan ==
>
> A complete set of the open source code needs to be licensed from the owning
> organization to the Foundation. Commercial legal counsel for the owning
> organization will review the standard Foundation licensing paperwork and
> propose any updates as needed. This license will enable Apache to incubate
> and manage the Druid project moving forward.
>
> Other Druid paraphernalia to be transferred to Apache consists of:
>
>  * !GitHub organization at https://github.com/druid-io/
>  * Twitter account at https://twitter.com/druidio
>  * "druid.io" domain name
>  * "Druid" trademark assignment per Foundation standard paper.  The
> trademark assignment paperwork shall be reviewed by the owning
> organization's commercial and IP counsel
>  * CLAs - all rights in the code licensed above should encompass the CLAs
> that existed between developers and owning organization
>
> A copyright license to the code, trademark assignment of Druid, and
> transfer of other paraphernalia to Apache should be sufficient to cover all
> rights required by Apache to operate the project.
>
> == External Dependencies ==
> External dependencies distributed with Druid currently all have one of the
> following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with one
> exception: the optional Druid MySQL metadata store extension depends on
> MySQL Connector/J, which is GPL licensed. Druid currently packages this as
> a separate download; see our current presentation on:
> http://druid.io/downloads.html. As part of incubation we intend to
> determine the best strategy for handling the MySQL extension.
>
> == Cryptography ==
> Not applicable.
>
> == Initial Committers ==
>
> The initial committers for incubation are the current set of committers on
> Druid who have expressed interest in being involved in Apache incubation.
> Affiliations are listed where relevant. We may seek to add other committers
> during incubation; for example, we would want to add any current Druid
> committers who express an interest after incubation begins.
>
>  * Charles Allen ([hidden email]) (Snap)
>  * David Lim ([hidden email]) (Imply)
>  * Eric Tschetter ([hidden email]) (Splunk)
>  * Fangjin Yang ([hidden email]) (Imply)
>  * Gian Merlino ([hidden email]) (Imply)
>  * Himanshu Gupta ([hidden email]) (Oath)
>  * Jihoon Son ([hidden email]) (Imply)
>  * Jonathan Wei ([hidden email]) (Imply)
>  * Maxime Beauchemin ([hidden email]) (Lyft)
>  * Mohamed Slim Bouguerra ([hidden email]) (Hortonworks)
>  * Nishant Bangarwa ([hidden email]) (Hortonworks)
>  * Parag Jain ([hidden email]) (Oath)
>  * Roman Leventov ([hidden email]) (Metamarkets)
>  * Xavier Léauté ([hidden email]) (Confluent)
>
> == Sponsors ==
>
>  * Champion: Julian Hyde
>  * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
>  * Sponsoring entity: Apache Incubator
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Druid incubation proposal

Julian Hyde-3
It seems that we have consensus (and indeed, an ectopic vote is
happening in this discuss thread). I will start a formal vote. All of
you who replied '+1' on this thread, thanks for your support, and
please cast your vote on the formal thread.

Julian


On Thu, Feb 22, 2018 at 9:36 AM, Pramod Immaneni <[hidden email]> wrote:

> +1
>
> On Fri, Feb 16, 2018 at 12:15 PM, Gian Merlino <[hidden email]> wrote:
>
>> Hi all,
>>
>> I would like to open up a discussion about incubating Druid at Apache. I've
>> included a proposal in this mail and have also posted a draft at
>> https://wiki.apache.org/incubator/DruidProposal. More information about
>> Druid is also available on our project web site at: http://druid.io/
>>
>> Thanks for your consideration!
>>
>> Gian
>>
>> = Druid Proposal =
>>
>> == Abstract ==
>>
>> Druid is a high-performance, column-oriented, distributed data store.
>>
>> == Proposal ==
>>
>> Druid is an open source data store designed for real-time exploratory
>> analytics on large data sets. Druid's key features are a column-oriented
>> storage layout, a distributed shared-nothing architecture, and ability to
>> generate and leverage indexing and caching structures. Druid is typically
>> deployed in clusters of tens to hundreds of nodes, and has the ability to
>> load data from Apache Kafka and Apache Hadoop, among other data sources.
>> Druid offers two query languages: a SQL dialect (powered by Apache Calcite)
>> and a JSON-over-HTTP API.
>>
>> Druid was originally developed to power a slice-and-dice analytical UI
>> built on top of large event streams. The original use case for Druid
>> targeted ingest rates of millions of records/sec, retention of over a year
>> of data, and query latencies of sub-second to a few seconds. Many people
>> can benefit from such capability, and many already have (see
>> http://druid.io/druid-powered.html). In addition, new use cases have
>> emerged since Druid's original development, such as OLAP acceleration of
>> data warehouse tables and more highly concurrent applications operating
>> with relatively narrower queries.
>>
>> == Background ==
>>
>> Druid is a data store designed for fast analytics. It would typically be
>> used in lieu of more general purpose query systems like Hadoop !MapReduce
>> or Spark when query latency is of the utmost importance. Druid is often
>> used as a data store for powering GUI analytical applications.
>>
>> The buzzwordy description of Druid is a high-performance, column-oriented,
>> distributed data store. What we mean by this is:
>>
>>  * "high performance": Druid aims to provide low query latency and high
>> ingest rates possible.
>>  * "column-oriented": Druid stores data in a column-oriented format, like
>> most other systems designed for analytics. It can also store indexes along
>> with the columns.
>>  * "distributed": Druid is deployed in clusters, typically of tens to
>> hundreds of nodes.
>>  * "data store": Druid loads your data and stores a copy of it on the
>> cluster's local disks (and may cache it in memory). It doesn't query your
>> data from some other storage system.
>>
>> == Rationale ==
>>
>> Druid is a mature, active project with a large number of production
>> installations, dozens of contributors to each release, and multiple vendors
>> offering professional support. Given Druid's strong community, its close
>> integration with many other Apache projects (such as Kafka, Hadoop, and
>> Calcite), and its pre-existing Apache-inspired governance structure, we
>> feel that Apache is the best home for the project on a long-term basis.
>>
>> == Current Status ==
>>
>> === Meritocracy ===
>> Since Druid was first open sourced the original developers have solicited
>> contributions from others, including through our blog, the project mailing
>> lists, and through accepting !GitHub pull requests. We have an
>> Apache-inspired governance structure with a PMC and committers, and our
>> committer ranks include a good number of people from outside the original
>> development team.
>>
>> === Community ===
>>
>> The Druid core developers have sought to nurture a community throughout the
>> life of the project. We use !GitHub as the focal point for bug reports and
>> code contributions, and the mailing lists for most other discussion. To try
>> to make people feel welcome, we've also spelled this out on a "CONTRIBUTE"
>> link from the project page: http://druid.io/community/. Today we have an
>> active contributor base (a typical release has ~40 contributors) and
>> mailing list.
>>
>> === Core Developers ===
>>
>> Druid enjoys good diversity of committer affiliation. The most active
>> developers over the past year are affiliated with four different companies:
>> Imply, Metamarkets, Yahoo, and Hortonworks. Many Druid committers are also
>> committers on other ASF projects as well, including Apache Airflow, Apache
>> Curator, and Apache Calcite. The original developers of Druid remain
>> involved in the project.
>>
>> === Alignment ===
>>
>> Druid's current governance structure is Apache-inspired with a PMC and
>> committers chosen by a meritocratic process. Additionally, Druid integrates
>> with a number of other Apache projects, including Kafka, Hadoop, Hive,
>> Calcite, Superset (incubating), Spark, Curator, and !ZooKeeper.
>>
>> == Known Risks ==
>>
>> === Orphaned products ===
>>
>> The risk of Druid becoming orphaned is low, due to a diverse committer base
>> that is invested in the future of the project.
>>
>> === Inexperience with Open Source ===
>>
>> Druid's core developers have been running it as a community-oriented open
>> source project for some time now, and many of them are committers on other
>> open source projects as well, including Apache Airflow, Apache Curator, and
>> Apache Calcite.
>>
>> === Homogenous Developers ===
>>
>> Druid's current diversity of committer affiliation means that we have
>> become accustomed to working collaboratively and in the open. We hope that
>> a transition to the ASF helps Druid's contributor base become even more
>> diverse.
>>
>> === Reliance on Salaried Developers ===
>>
>> Druid's user base and contributor base skews heavily towards salaried
>> developers. We believe this is natural since Druid is a technology designed
>> to be deployed on large clusters, and due to this, tends to be deployed by
>> organizations rather than by individuals. Nevertheless, many current Druid
>> developers have continued working on the project even through job changes,
>> which we take to be a good sign of developer commitment and personal
>> interest.
>>
>> === Relationships with Other Apache Products ===
>>
>> Druid integrates with a number of other Apache projects. Druid internally
>> uses Calcite for SQL planning, and Curator and !ZooKeeper for coordination.
>> Druid can read data in Avro or Parquet format. Druid can load data from
>> streams in Kafka or from files in Hadoop. Druid integrates with Hive as an
>> option for SQL query acceleration. Druid data can be visualized by Superset
>> (incubating).
>>
>> === A Excessive Fascination with the Apache Brand ===
>>
>> Druid is a successful project with a diverse community. The main reason for
>> pursuing incubation is to find a stable, long term home for the project
>> with a well known governance philosophy.
>>
>> == Required Resources ==
>>
>> === Mailing lists ===
>>
>> We would like to migrate the existing Druid mailing lists from Google
>> Groups to Apache.
>>
>>  * druid-user@googlegroups -> [hidden email]
>>  * druid-development@googlegroups -> [hidden email]
>>
>> === Source control ===
>>
>> Druid development currently takes place on !GitHub. We would like to
>> continue using !GitHub, if possible, in order to preserve the workflows the
>> community has developed around !GitHub pull requests.
>>
>> === Issue tracking ===
>> Druid currently uses !GitHub issues for issue tracking. We would like to
>> migrate to Apache JIRA at http://issues.apache.org/jira/browse/DRUID.
>>
>> == Documentation ==
>>
>> Druid's documentation can be found at http://druid.io/docs/latest/.
>>
>> == Initial Source ==
>>
>> Druid was initially open-sourced by Metamarkets in 2012 and has been run in
>> a community-governed fashion since then. The code is currently hosted at
>> https://github.com/druid-io/ and includes the following repositories:
>>
>>  * druid (primary repository)
>>  * druid-console (web console for Druid)
>>  * druid-io.github.io (source for Druid's website at http://druid.io/)
>>  * tranquility (realtime stream push client for Druid)
>>  * docker-druid (Docker image for Druid)
>>  * pydruid (Python library)
>>  * RDruid (R library)
>>  * oss-parent (Maven POM files)
>>
>> == Source and Intellectual Property Submission Plan ==
>>
>> A complete set of the open source code needs to be licensed from the owning
>> organization to the Foundation. Commercial legal counsel for the owning
>> organization will review the standard Foundation licensing paperwork and
>> propose any updates as needed. This license will enable Apache to incubate
>> and manage the Druid project moving forward.
>>
>> Other Druid paraphernalia to be transferred to Apache consists of:
>>
>>  * !GitHub organization at https://github.com/druid-io/
>>  * Twitter account at https://twitter.com/druidio
>>  * "druid.io" domain name
>>  * "Druid" trademark assignment per Foundation standard paper.  The
>> trademark assignment paperwork shall be reviewed by the owning
>> organization's commercial and IP counsel
>>  * CLAs - all rights in the code licensed above should encompass the CLAs
>> that existed between developers and owning organization
>>
>> A copyright license to the code, trademark assignment of Druid, and
>> transfer of other paraphernalia to Apache should be sufficient to cover all
>> rights required by Apache to operate the project.
>>
>> == External Dependencies ==
>> External dependencies distributed with Druid currently all have one of the
>> following Category A or B licenses: ASL, BSD, CDDL, EPL, MIT, MPL; with one
>> exception: the optional Druid MySQL metadata store extension depends on
>> MySQL Connector/J, which is GPL licensed. Druid currently packages this as
>> a separate download; see our current presentation on:
>> http://druid.io/downloads.html. As part of incubation we intend to
>> determine the best strategy for handling the MySQL extension.
>>
>> == Cryptography ==
>> Not applicable.
>>
>> == Initial Committers ==
>>
>> The initial committers for incubation are the current set of committers on
>> Druid who have expressed interest in being involved in Apache incubation.
>> Affiliations are listed where relevant. We may seek to add other committers
>> during incubation; for example, we would want to add any current Druid
>> committers who express an interest after incubation begins.
>>
>>  * Charles Allen ([hidden email]) (Snap)
>>  * David Lim ([hidden email]) (Imply)
>>  * Eric Tschetter ([hidden email]) (Splunk)
>>  * Fangjin Yang ([hidden email]) (Imply)
>>  * Gian Merlino ([hidden email]) (Imply)
>>  * Himanshu Gupta ([hidden email]) (Oath)
>>  * Jihoon Son ([hidden email]) (Imply)
>>  * Jonathan Wei ([hidden email]) (Imply)
>>  * Maxime Beauchemin ([hidden email]) (Lyft)
>>  * Mohamed Slim Bouguerra ([hidden email]) (Hortonworks)
>>  * Nishant Bangarwa ([hidden email]) (Hortonworks)
>>  * Parag Jain ([hidden email]) (Oath)
>>  * Roman Leventov ([hidden email]) (Metamarkets)
>>  * Xavier Léauté ([hidden email]) (Confluent)
>>
>> == Sponsors ==
>>
>>  * Champion: Julian Hyde
>>  * Nominated mentors: Julian Hyde, P. Taylor Goetz, Jun Rao
>>  * Sponsoring entity: Apache Incubator
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]