[PROPOSAL] Onyx - proposal for Apache Incubation

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[PROPOSAL] Onyx - proposal for Apache Incubation

Byung-Gon Chun
Dear Apache Incubator Community,

Please accept the following proposal for presentation and discussion:
https://wiki.apache.org/incubator/OnyxProposal

Onyx is a data processing system that aims to flexibly control the runtime
behaviors of a job to adapt to varying deployment characteristics (e.g.,
harnessing transient resources in datacenters, cross-datacenter deployment,
changing runtime based on job characteristics, etc.). Onyx provides ways to
extend the system’s capabilities and incorporate the extensions to the
flexible job execution.
Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an
Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
based on a deployment policy.

I've attached the proposal below.

Best regards,
Byung-Gon Chun

= OnyxProposal =

== Abstract ==
Onyx is a data processing system for flexible employment with
different execution scenarios for various deployment characteristics
on clusters.

== Proposal ==
Today, there is a wide variety of data processing systems with
different designs for better performance and datacenter efficiency.
They include processing data on specific resource environments and
running jobs with specific attributes. Although each system
successfully solves the problems it targets, most systems are designed
in the way that runtime behaviors are built tightly inside the system
core to hide the complexity of distributed computing. This makes it
hard for a single system to support different deployment
characteristics with different runtime behaviors without substantial
effort.

Onyx is a data processing system that aims to flexibly control the
runtime behaviors of a job to adapt to varying deployment
characteristics. Moreover, it provides a means of extending the
system’s capabilities and incorporating the extensions to the flexible
job execution.

In order to be able to easily modify runtime behaviors to adapt to
varying deployment characteristics, Onyx exposes runtime behaviors to
be flexibly configured and modified at both compile-time and runtime
through a set of high-level graph pass interfaces.

We hope to contribute to the big data processing community by enabling
more flexibility and extensibility in job executions. Furthermore, we
can benefit more together as a community when we work together as a
community to mature the system with more use cases and understanding
of diverse deployment characteristics. The Apache Software Foundation
is the perfect place to achieve these aspirations.

== Background ==
Many data processing systems have distinctive runtime behaviors
optimized and configured for specific deployment characteristics like
different resource environments and for handling special job
attributes.

For example, much research have been conducted to overcome the
challenge of running data processing jobs on cheap, unreliable
transient resources. Likewise, techniques for disaggregating different
types of resources, like memory, CPU and GPU, are being actively
developed to use datacenter resources more efficiently. Many
researchers are also working to run data processing jobs in even more
diverse environments, such as across distant datacenters. Similarly,
for special job attributes, many works take different approaches, such
as runtime optimization, to solve problems like data skew, and to
optimize systems for data processing jobs with small-scale input data.

Although each of the systems performs well with the jobs and in the
environments they target, they perform poorly with unconsidered cases,
and do not consider supporting multiple deployment characteristics on
a single system in their designs.

For an application writer to optimize an application to perform well
on a certain system engraved with its underlying behaviors, it
requires a deep understanding of the system itself, which is an
overhead that often requires a lot of time and effort. Moreover, for a
developer to modify such system behaviors, it requires modifications
of the system core, which requires an even deeper understanding of the
system itself.

With this background, Onyx is designed to represent all of its jobs as
an Intermediate Representation (IR) DAG. In the Onyx compiler, user
applications from various programming models (ex. Apache Beam) are
submitted, transformed to an IR DAG, and optimized/customized for the
deployment characteristics. In the IR DAG optimization phase, the DAG
is modified through a series of compiler “passes” which reshape or
annotate the DAG with an expression of the underlying runtime
behaviors. The IR DAG is then submitted as an execution plan for the
Onyx runtime. The runtime includes the unmodified parts of data
processing in the backbone which is transparently integrated with
configurable components exposed for further extension.

== Rationale ==
Onyx’s vision lies in providing means for flexibly supporting a wide
variety of job execution scenarios for users while facilitating system
developers to extend the execution framework with various
functionalities at the same time. The capabilities of the system can
be extended as it grows to meet a more variety of execution scenarios.
We require inputs from users and developers from diverse domains in
order to make it a more thriving and useful project. The Apache
Software Foundation provides the best tools and community to support
this vision.

== Initial Goals ==
Initial goals will be to move the existing codebase to Apache and
integrate with the Apache development process. We further plan to
develop our system to meet the needs for more execution scenarios for
a more variety of deployment characteristics.

== Current Status ==
Onyx codebase is currently hosted in a repository at github.com. The
current version has been developed by system developers at Seoul
National University, Viva Republica, Samsung, and LG.

== Meritocracy ==
We plan to strongly support meritocracy. We will discuss the
requirements in an open forum, and those that continuously contribute
to Onyx with the passion to strengthen the system will be invited as
committers. Contributors that enrich Onyx by providing various use
cases, various implementations of the configurable components
including ideas for optimization techniques will be especially
welcome. Committers with a deep understanding of the system’s
technical aspects as a whole and its philosophy will definitely be
voted as the PMC. We will monitor community participation so that
privileges can be extended to those that contribute.

== Community ==
We hope to expand our contribution community by becoming an Apache
incubator project. The contributions will come from both users and
system developers interested in flexibility and extensibility of job
executions that Onyx can support. We expect users to mainly contribute
to diversify the use cases and deployment characteristics, and
developers to  contribute to implement them.

== Alignment ==
Apache Spark is one of many popular data processing frameworks. The
system is designed towards optimizing jobs using RDDs in memory and
many other optimizations built tightly within the framework. In
contrast to Spark, Onyx aims to provide more flexibility for job
execution in an easy manner.

Apache Tez enables developers to build complex task DAGs with control
over the control plane of job execution. In Onyx, a high-level
programming layer (ex. Apache Beam) is automatically converted to a
basic IR DAG and can be converted to any IR DAG through a series of
easy user writable passes, that can both reshape and modify the
annotation (of execution properties) of the DAG. Moreover, Onyx leaves
more parts of the job execution configurable, such as the scheduler
and the data plane. As opposed to providing a set of properties for
solid optimization, Onyx’s configurable parts can be easily extended
and explored by implementing the pre-defined interfaces. For example,
an arbitrary intermediate data store can be added.

Onyx currently supports Apache Beam programs and we are working on
supporting Apache Spark programs as well. Onyx also utilizes Apache
REEF for container management, which allows Onyx to run in Apache YARN
and Apache Mesos clusters. If necessary, we plan to contribute to and
collaborate with these other Apache projects for the benefit of all.
We plan to extend such integrations with more Apache softwares. Apache
software foundation already hosts many major big-data systems, and we
expect to help further growth of the big-data community by having Onyx
within the Apache foundation.

== Known Risks ==
=== Orphaned Products ===
The risk of the Onyx project being orphaned is minimal. There is
already plenty of work that arduously support different deployment
characteristics, and we propose a general way to implement them with
flexible and extensible configuration knobs. The domain of data
processing is already of high interest, and this domain is expected to
evolve continuously with various other purposes, such as resource
disaggregation and using transient resources for better datacenter
resource utilization.

=== Inexperience with Open Source ===
The initial committers include PMC members and committers of other
Apache projects. They have experience with open source projects,
starting from their incubation to the top-level. They have been
involved in the open source development process, and are familiar with
releasing code under an open source license.

=== Homogeneous Developers ===
The initial set of committers is from a limited set of organizations,
but we expect to attract new contributors from diverse organizations
and will thus grow organically once approved for incubation. Our prior
experience with other open source projects will help various
contributors to actively participate in our project.

=== Reliance on Salaried Developers ===
Many developers are from Seoul National University. This is not applicable.

=== Relationships with Other Apache Products ===
Onyx positions itself among multiple Apache products. It runs on
Apache REEF for container management. It also utilizes many useful
development tools including Apache Maven, Apache Log4J, and multiple
Apache Commons components. Onyx supports the Apache Beam programming
model for user applications. We are currently working on supporting
the Apache Spark programming APIs as well.

=== An Excessive Fascination with the Apache Brand ===
We hope to make Onyx a powerful system for data processing, meeting
various needs for different deployment characteristics, under a more
variety of environments. We see the limitations of simply putting code
on GitHub, and we believe the Apache community will help the growth of
Onyx for the project to become a positively impactful and innovative
open source software. We believe Onyx is a great fit for the Apache
Software Foundation due to the collaboration it aims to achieve from
the big data processing community.

== Documentation ==
The current documentation for Onyx is at https://snuspl.github.io/onyx/.

== Initial Source ==
The Onyx codebase is currently hosted at https://github.com/snuspl/onyx.

== External Dependencies ==
To the best of our knowledge, all Onyx dependencies are distributed
under Apache compatible licenses. Upon acceptance to the incubator, we
would begin a thorough analysis of all transitive dependencies to
verify this fact and further introduce license checking into the build
and release process.

== Cryptography ==
Not applicable.

== Required Resources ==
=== Mailing Lists ===
We will operate two mailing lists as follows:
   * Onyx PMC discussions: [hidden email]
   * Onyx developers: [hidden email]

=== Git Repositories ===
Upon incubation: https://github.com/apache/incubator-onyx.
After the incubation, we would like to move the existing repo
https://github.com/snuspl/onyx to the Apache infrastructure

=== Issue Tracking ===
Onyx currently tracks its issues using the Github issue tracker:
https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
JIRA.

== Initial Committers ==
  * Byung-Gon Chun
  * Jeongyoon Eo
  * Geon-Woo Kim
  * Joo Yeon Kim
  * Gyewon Lee
  * Jung-Gil Lee
  * Sanha Lee
  * Wooyeon Lee
  * Yunseong Lee
  * JangHo Seo
  * Won Wook Song
  * Taegeon Um
  * Youngseok Yang

== Affiliations ==
  * SNU (Seoul National University)
    * Byung-Gon Chun
    * Jeongyoon Eo
    * Geon-Woo Kim
    * Gyewon Lee
    * Sanha Lee
    * Wooyeon Lee
    * Yunseong Lee
    * JangHo Seo
    * Won Wook Song
    * Taegeon Um
    * Youngseok Yang

  * LG
    * Jung-Gil Lee

  * Samsung
    * Joo Yeon Kim

  * Viva Republica
    * Geon-Woo Kim

== Sponsors ==
=== Champions ===
Byung-Gon Chun

=== Mentors ===
  * Hyunsik Choi
  * Byung-Gon Chun
  * Markus Weimer
  * Reynold Xin

=== Sponsoring Entity ===
The Apache Incubator



--
Byung-Gon Chun
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Davor Bonaci-2
Great work -- I think this technology has a lot of promise, and I'd love to
see its evolution inside the Foundation.

Parts of it, like the Onyx Intermediate Representation [1], overlap with
the work-in-progress inside the Apache Beam project ("portability"). We'd
love to work together on this -- would you be open to such collaboration?
If so, it may not be necessary to start from scratch, and leverage the work
already done.

Regarding the name, Onyx would likely have to be renamed, due to a conflict
with a related technology [2].

Davor

[1] https://snuspl.github.io/onyx/docs/ir/
[2] http://www.onyxplatform.org/

On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun <[hidden email]> wrote:

> Dear Apache Incubator Community,
>
> Please accept the following proposal for presentation and discussion:
> https://wiki.apache.org/incubator/OnyxProposal
>
> Onyx is a data processing system that aims to flexibly control the runtime
> behaviors of a job to adapt to varying deployment characteristics (e.g.,
> harnessing transient resources in datacenters, cross-datacenter deployment,
> changing runtime based on job characteristics, etc.). Onyx provides ways to
> extend the system’s capabilities and incorporate the extensions to the
> flexible job execution.
> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an
> Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
> based on a deployment policy.
>
> I've attached the proposal below.
>
> Best regards,
> Byung-Gon Chun
>
> = OnyxProposal =
>
> == Abstract ==
> Onyx is a data processing system for flexible employment with
> different execution scenarios for various deployment characteristics
> on clusters.
>
> == Proposal ==
> Today, there is a wide variety of data processing systems with
> different designs for better performance and datacenter efficiency.
> They include processing data on specific resource environments and
> running jobs with specific attributes. Although each system
> successfully solves the problems it targets, most systems are designed
> in the way that runtime behaviors are built tightly inside the system
> core to hide the complexity of distributed computing. This makes it
> hard for a single system to support different deployment
> characteristics with different runtime behaviors without substantial
> effort.
>
> Onyx is a data processing system that aims to flexibly control the
> runtime behaviors of a job to adapt to varying deployment
> characteristics. Moreover, it provides a means of extending the
> system’s capabilities and incorporating the extensions to the flexible
> job execution.
>
> In order to be able to easily modify runtime behaviors to adapt to
> varying deployment characteristics, Onyx exposes runtime behaviors to
> be flexibly configured and modified at both compile-time and runtime
> through a set of high-level graph pass interfaces.
>
> We hope to contribute to the big data processing community by enabling
> more flexibility and extensibility in job executions. Furthermore, we
> can benefit more together as a community when we work together as a
> community to mature the system with more use cases and understanding
> of diverse deployment characteristics. The Apache Software Foundation
> is the perfect place to achieve these aspirations.
>
> == Background ==
> Many data processing systems have distinctive runtime behaviors
> optimized and configured for specific deployment characteristics like
> different resource environments and for handling special job
> attributes.
>
> For example, much research have been conducted to overcome the
> challenge of running data processing jobs on cheap, unreliable
> transient resources. Likewise, techniques for disaggregating different
> types of resources, like memory, CPU and GPU, are being actively
> developed to use datacenter resources more efficiently. Many
> researchers are also working to run data processing jobs in even more
> diverse environments, such as across distant datacenters. Similarly,
> for special job attributes, many works take different approaches, such
> as runtime optimization, to solve problems like data skew, and to
> optimize systems for data processing jobs with small-scale input data.
>
> Although each of the systems performs well with the jobs and in the
> environments they target, they perform poorly with unconsidered cases,
> and do not consider supporting multiple deployment characteristics on
> a single system in their designs.
>
> For an application writer to optimize an application to perform well
> on a certain system engraved with its underlying behaviors, it
> requires a deep understanding of the system itself, which is an
> overhead that often requires a lot of time and effort. Moreover, for a
> developer to modify such system behaviors, it requires modifications
> of the system core, which requires an even deeper understanding of the
> system itself.
>
> With this background, Onyx is designed to represent all of its jobs as
> an Intermediate Representation (IR) DAG. In the Onyx compiler, user
> applications from various programming models (ex. Apache Beam) are
> submitted, transformed to an IR DAG, and optimized/customized for the
> deployment characteristics. In the IR DAG optimization phase, the DAG
> is modified through a series of compiler “passes” which reshape or
> annotate the DAG with an expression of the underlying runtime
> behaviors. The IR DAG is then submitted as an execution plan for the
> Onyx runtime. The runtime includes the unmodified parts of data
> processing in the backbone which is transparently integrated with
> configurable components exposed for further extension.
>
> == Rationale ==
> Onyx’s vision lies in providing means for flexibly supporting a wide
> variety of job execution scenarios for users while facilitating system
> developers to extend the execution framework with various
> functionalities at the same time. The capabilities of the system can
> be extended as it grows to meet a more variety of execution scenarios.
> We require inputs from users and developers from diverse domains in
> order to make it a more thriving and useful project. The Apache
> Software Foundation provides the best tools and community to support
> this vision.
>
> == Initial Goals ==
> Initial goals will be to move the existing codebase to Apache and
> integrate with the Apache development process. We further plan to
> develop our system to meet the needs for more execution scenarios for
> a more variety of deployment characteristics.
>
> == Current Status ==
> Onyx codebase is currently hosted in a repository at github.com. The
> current version has been developed by system developers at Seoul
> National University, Viva Republica, Samsung, and LG.
>
> == Meritocracy ==
> We plan to strongly support meritocracy. We will discuss the
> requirements in an open forum, and those that continuously contribute
> to Onyx with the passion to strengthen the system will be invited as
> committers. Contributors that enrich Onyx by providing various use
> cases, various implementations of the configurable components
> including ideas for optimization techniques will be especially
> welcome. Committers with a deep understanding of the system’s
> technical aspects as a whole and its philosophy will definitely be
> voted as the PMC. We will monitor community participation so that
> privileges can be extended to those that contribute.
>
> == Community ==
> We hope to expand our contribution community by becoming an Apache
> incubator project. The contributions will come from both users and
> system developers interested in flexibility and extensibility of job
> executions that Onyx can support. We expect users to mainly contribute
> to diversify the use cases and deployment characteristics, and
> developers to  contribute to implement them.
>
> == Alignment ==
> Apache Spark is one of many popular data processing frameworks. The
> system is designed towards optimizing jobs using RDDs in memory and
> many other optimizations built tightly within the framework. In
> contrast to Spark, Onyx aims to provide more flexibility for job
> execution in an easy manner.
>
> Apache Tez enables developers to build complex task DAGs with control
> over the control plane of job execution. In Onyx, a high-level
> programming layer (ex. Apache Beam) is automatically converted to a
> basic IR DAG and can be converted to any IR DAG through a series of
> easy user writable passes, that can both reshape and modify the
> annotation (of execution properties) of the DAG. Moreover, Onyx leaves
> more parts of the job execution configurable, such as the scheduler
> and the data plane. As opposed to providing a set of properties for
> solid optimization, Onyx’s configurable parts can be easily extended
> and explored by implementing the pre-defined interfaces. For example,
> an arbitrary intermediate data store can be added.
>
> Onyx currently supports Apache Beam programs and we are working on
> supporting Apache Spark programs as well. Onyx also utilizes Apache
> REEF for container management, which allows Onyx to run in Apache YARN
> and Apache Mesos clusters. If necessary, we plan to contribute to and
> collaborate with these other Apache projects for the benefit of all.
> We plan to extend such integrations with more Apache softwares. Apache
> software foundation already hosts many major big-data systems, and we
> expect to help further growth of the big-data community by having Onyx
> within the Apache foundation.
>
> == Known Risks ==
> === Orphaned Products ===
> The risk of the Onyx project being orphaned is minimal. There is
> already plenty of work that arduously support different deployment
> characteristics, and we propose a general way to implement them with
> flexible and extensible configuration knobs. The domain of data
> processing is already of high interest, and this domain is expected to
> evolve continuously with various other purposes, such as resource
> disaggregation and using transient resources for better datacenter
> resource utilization.
>
> === Inexperience with Open Source ===
> The initial committers include PMC members and committers of other
> Apache projects. They have experience with open source projects,
> starting from their incubation to the top-level. They have been
> involved in the open source development process, and are familiar with
> releasing code under an open source license.
>
> === Homogeneous Developers ===
> The initial set of committers is from a limited set of organizations,
> but we expect to attract new contributors from diverse organizations
> and will thus grow organically once approved for incubation. Our prior
> experience with other open source projects will help various
> contributors to actively participate in our project.
>
> === Reliance on Salaried Developers ===
> Many developers are from Seoul National University. This is not applicable.
>
> === Relationships with Other Apache Products ===
> Onyx positions itself among multiple Apache products. It runs on
> Apache REEF for container management. It also utilizes many useful
> development tools including Apache Maven, Apache Log4J, and multiple
> Apache Commons components. Onyx supports the Apache Beam programming
> model for user applications. We are currently working on supporting
> the Apache Spark programming APIs as well.
>
> === An Excessive Fascination with the Apache Brand ===
> We hope to make Onyx a powerful system for data processing, meeting
> various needs for different deployment characteristics, under a more
> variety of environments. We see the limitations of simply putting code
> on GitHub, and we believe the Apache community will help the growth of
> Onyx for the project to become a positively impactful and innovative
> open source software. We believe Onyx is a great fit for the Apache
> Software Foundation due to the collaboration it aims to achieve from
> the big data processing community.
>
> == Documentation ==
> The current documentation for Onyx is at https://snuspl.github.io/onyx/.
>
> == Initial Source ==
> The Onyx codebase is currently hosted at https://github.com/snuspl/onyx.
>
> == External Dependencies ==
> To the best of our knowledge, all Onyx dependencies are distributed
> under Apache compatible licenses. Upon acceptance to the incubator, we
> would begin a thorough analysis of all transitive dependencies to
> verify this fact and further introduce license checking into the build
> and release process.
>
> == Cryptography ==
> Not applicable.
>
> == Required Resources ==
> === Mailing Lists ===
> We will operate two mailing lists as follows:
>    * Onyx PMC discussions: [hidden email]
>    * Onyx developers: [hidden email]
>
> === Git Repositories ===
> Upon incubation: https://github.com/apache/incubator-onyx.
> After the incubation, we would like to move the existing repo
> https://github.com/snuspl/onyx to the Apache infrastructure
>
> === Issue Tracking ===
> Onyx currently tracks its issues using the Github issue tracker:
> https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
> JIRA.
>
> == Initial Committers ==
>   * Byung-Gon Chun
>   * Jeongyoon Eo
>   * Geon-Woo Kim
>   * Joo Yeon Kim
>   * Gyewon Lee
>   * Jung-Gil Lee
>   * Sanha Lee
>   * Wooyeon Lee
>   * Yunseong Lee
>   * JangHo Seo
>   * Won Wook Song
>   * Taegeon Um
>   * Youngseok Yang
>
> == Affiliations ==
>   * SNU (Seoul National University)
>     * Byung-Gon Chun
>     * Jeongyoon Eo
>     * Geon-Woo Kim
>     * Gyewon Lee
>     * Sanha Lee
>     * Wooyeon Lee
>     * Yunseong Lee
>     * JangHo Seo
>     * Won Wook Song
>     * Taegeon Um
>     * Youngseok Yang
>
>   * LG
>     * Jung-Gil Lee
>
>   * Samsung
>     * Joo Yeon Kim
>
>   * Viva Republica
>     * Geon-Woo Kim
>
> == Sponsors ==
> === Champions ===
> Byung-Gon Chun
>
> === Mentors ===
>   * Hyunsik Choi
>   * Byung-Gon Chun
>   * Markus Weimer
>   * Reynold Xin
>
> === Sponsoring Entity ===
> The Apache Incubator
>
>
>
> --
> Byung-Gon Chun
>
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Byung-Gon Chun
On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci <[hidden email]> wrote:

> Great work -- I think this technology has a lot of promise, and I'd love to
> see its evolution inside the Foundation.
>
>
Thanks, Davor!


> Parts of it, like the Onyx Intermediate Representation [1], overlap with
> the work-in-progress inside the Apache Beam project ("portability"). We'd
> love to work together on this -- would you be open to such collaboration?
> If so, it may not be necessary to start from scratch, and leverage the work
> already done.
>
>
Sure. We're open to collaboration.


> Regarding the name, Onyx would likely have to be renamed, due to a conflict
> with a related technology [2].
>
>
Thanks for pointing it out. It's difficult to come up with a good short
name. :)
Do you have any suggestion?

Thanks!
-Gon

---
Byung-Gon Chun



> Davor
>
> [1] https://snuspl.github.io/onyx/docs/ir/
> [2] http://www.onyxplatform.org/
>
> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun <[hidden email]> wrote:
>
> > Dear Apache Incubator Community,
> >
> > Please accept the following proposal for presentation and discussion:
> > https://wiki.apache.org/incubator/OnyxProposal
> >
> > Onyx is a data processing system that aims to flexibly control the
> runtime
> > behaviors of a job to adapt to varying deployment characteristics (e.g.,
> > harnessing transient resources in datacenters, cross-datacenter
> deployment,
> > changing runtime based on job characteristics, etc.). Onyx provides ways
> to
> > extend the system’s capabilities and incorporate the extensions to the
> > flexible job execution.
> > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an
> > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
> > based on a deployment policy.
> >
> > I've attached the proposal below.
> >
> > Best regards,
> > Byung-Gon Chun
> >
> > = OnyxProposal =
> >
> > == Abstract ==
> > Onyx is a data processing system for flexible employment with
> > different execution scenarios for various deployment characteristics
> > on clusters.
> >
> > == Proposal ==
> > Today, there is a wide variety of data processing systems with
> > different designs for better performance and datacenter efficiency.
> > They include processing data on specific resource environments and
> > running jobs with specific attributes. Although each system
> > successfully solves the problems it targets, most systems are designed
> > in the way that runtime behaviors are built tightly inside the system
> > core to hide the complexity of distributed computing. This makes it
> > hard for a single system to support different deployment
> > characteristics with different runtime behaviors without substantial
> > effort.
> >
> > Onyx is a data processing system that aims to flexibly control the
> > runtime behaviors of a job to adapt to varying deployment
> > characteristics. Moreover, it provides a means of extending the
> > system’s capabilities and incorporating the extensions to the flexible
> > job execution.
> >
> > In order to be able to easily modify runtime behaviors to adapt to
> > varying deployment characteristics, Onyx exposes runtime behaviors to
> > be flexibly configured and modified at both compile-time and runtime
> > through a set of high-level graph pass interfaces.
> >
> > We hope to contribute to the big data processing community by enabling
> > more flexibility and extensibility in job executions. Furthermore, we
> > can benefit more together as a community when we work together as a
> > community to mature the system with more use cases and understanding
> > of diverse deployment characteristics. The Apache Software Foundation
> > is the perfect place to achieve these aspirations.
> >
> > == Background ==
> > Many data processing systems have distinctive runtime behaviors
> > optimized and configured for specific deployment characteristics like
> > different resource environments and for handling special job
> > attributes.
> >
> > For example, much research have been conducted to overcome the
> > challenge of running data processing jobs on cheap, unreliable
> > transient resources. Likewise, techniques for disaggregating different
> > types of resources, like memory, CPU and GPU, are being actively
> > developed to use datacenter resources more efficiently. Many
> > researchers are also working to run data processing jobs in even more
> > diverse environments, such as across distant datacenters. Similarly,
> > for special job attributes, many works take different approaches, such
> > as runtime optimization, to solve problems like data skew, and to
> > optimize systems for data processing jobs with small-scale input data.
> >
> > Although each of the systems performs well with the jobs and in the
> > environments they target, they perform poorly with unconsidered cases,
> > and do not consider supporting multiple deployment characteristics on
> > a single system in their designs.
> >
> > For an application writer to optimize an application to perform well
> > on a certain system engraved with its underlying behaviors, it
> > requires a deep understanding of the system itself, which is an
> > overhead that often requires a lot of time and effort. Moreover, for a
> > developer to modify such system behaviors, it requires modifications
> > of the system core, which requires an even deeper understanding of the
> > system itself.
> >
> > With this background, Onyx is designed to represent all of its jobs as
> > an Intermediate Representation (IR) DAG. In the Onyx compiler, user
> > applications from various programming models (ex. Apache Beam) are
> > submitted, transformed to an IR DAG, and optimized/customized for the
> > deployment characteristics. In the IR DAG optimization phase, the DAG
> > is modified through a series of compiler “passes” which reshape or
> > annotate the DAG with an expression of the underlying runtime
> > behaviors. The IR DAG is then submitted as an execution plan for the
> > Onyx runtime. The runtime includes the unmodified parts of data
> > processing in the backbone which is transparently integrated with
> > configurable components exposed for further extension.
> >
> > == Rationale ==
> > Onyx’s vision lies in providing means for flexibly supporting a wide
> > variety of job execution scenarios for users while facilitating system
> > developers to extend the execution framework with various
> > functionalities at the same time. The capabilities of the system can
> > be extended as it grows to meet a more variety of execution scenarios.
> > We require inputs from users and developers from diverse domains in
> > order to make it a more thriving and useful project. The Apache
> > Software Foundation provides the best tools and community to support
> > this vision.
> >
> > == Initial Goals ==
> > Initial goals will be to move the existing codebase to Apache and
> > integrate with the Apache development process. We further plan to
> > develop our system to meet the needs for more execution scenarios for
> > a more variety of deployment characteristics.
> >
> > == Current Status ==
> > Onyx codebase is currently hosted in a repository at github.com. The
> > current version has been developed by system developers at Seoul
> > National University, Viva Republica, Samsung, and LG.
> >
> > == Meritocracy ==
> > We plan to strongly support meritocracy. We will discuss the
> > requirements in an open forum, and those that continuously contribute
> > to Onyx with the passion to strengthen the system will be invited as
> > committers. Contributors that enrich Onyx by providing various use
> > cases, various implementations of the configurable components
> > including ideas for optimization techniques will be especially
> > welcome. Committers with a deep understanding of the system’s
> > technical aspects as a whole and its philosophy will definitely be
> > voted as the PMC. We will monitor community participation so that
> > privileges can be extended to those that contribute.
> >
> > == Community ==
> > We hope to expand our contribution community by becoming an Apache
> > incubator project. The contributions will come from both users and
> > system developers interested in flexibility and extensibility of job
> > executions that Onyx can support. We expect users to mainly contribute
> > to diversify the use cases and deployment characteristics, and
> > developers to  contribute to implement them.
> >
> > == Alignment ==
> > Apache Spark is one of many popular data processing frameworks. The
> > system is designed towards optimizing jobs using RDDs in memory and
> > many other optimizations built tightly within the framework. In
> > contrast to Spark, Onyx aims to provide more flexibility for job
> > execution in an easy manner.
> >
> > Apache Tez enables developers to build complex task DAGs with control
> > over the control plane of job execution. In Onyx, a high-level
> > programming layer (ex. Apache Beam) is automatically converted to a
> > basic IR DAG and can be converted to any IR DAG through a series of
> > easy user writable passes, that can both reshape and modify the
> > annotation (of execution properties) of the DAG. Moreover, Onyx leaves
> > more parts of the job execution configurable, such as the scheduler
> > and the data plane. As opposed to providing a set of properties for
> > solid optimization, Onyx’s configurable parts can be easily extended
> > and explored by implementing the pre-defined interfaces. For example,
> > an arbitrary intermediate data store can be added.
> >
> > Onyx currently supports Apache Beam programs and we are working on
> > supporting Apache Spark programs as well. Onyx also utilizes Apache
> > REEF for container management, which allows Onyx to run in Apache YARN
> > and Apache Mesos clusters. If necessary, we plan to contribute to and
> > collaborate with these other Apache projects for the benefit of all.
> > We plan to extend such integrations with more Apache softwares. Apache
> > software foundation already hosts many major big-data systems, and we
> > expect to help further growth of the big-data community by having Onyx
> > within the Apache foundation.
> >
> > == Known Risks ==
> > === Orphaned Products ===
> > The risk of the Onyx project being orphaned is minimal. There is
> > already plenty of work that arduously support different deployment
> > characteristics, and we propose a general way to implement them with
> > flexible and extensible configuration knobs. The domain of data
> > processing is already of high interest, and this domain is expected to
> > evolve continuously with various other purposes, such as resource
> > disaggregation and using transient resources for better datacenter
> > resource utilization.
> >
> > === Inexperience with Open Source ===
> > The initial committers include PMC members and committers of other
> > Apache projects. They have experience with open source projects,
> > starting from their incubation to the top-level. They have been
> > involved in the open source development process, and are familiar with
> > releasing code under an open source license.
> >
> > === Homogeneous Developers ===
> > The initial set of committers is from a limited set of organizations,
> > but we expect to attract new contributors from diverse organizations
> > and will thus grow organically once approved for incubation. Our prior
> > experience with other open source projects will help various
> > contributors to actively participate in our project.
> >
> > === Reliance on Salaried Developers ===
> > Many developers are from Seoul National University. This is not
> applicable.
> >
> > === Relationships with Other Apache Products ===
> > Onyx positions itself among multiple Apache products. It runs on
> > Apache REEF for container management. It also utilizes many useful
> > development tools including Apache Maven, Apache Log4J, and multiple
> > Apache Commons components. Onyx supports the Apache Beam programming
> > model for user applications. We are currently working on supporting
> > the Apache Spark programming APIs as well.
> >
> > === An Excessive Fascination with the Apache Brand ===
> > We hope to make Onyx a powerful system for data processing, meeting
> > various needs for different deployment characteristics, under a more
> > variety of environments. We see the limitations of simply putting code
> > on GitHub, and we believe the Apache community will help the growth of
> > Onyx for the project to become a positively impactful and innovative
> > open source software. We believe Onyx is a great fit for the Apache
> > Software Foundation due to the collaboration it aims to achieve from
> > the big data processing community.
> >
> > == Documentation ==
> > The current documentation for Onyx is at https://snuspl.github.io/onyx/.
> >
> > == Initial Source ==
> > The Onyx codebase is currently hosted at https://github.com/snuspl/onyx.
> >
> > == External Dependencies ==
> > To the best of our knowledge, all Onyx dependencies are distributed
> > under Apache compatible licenses. Upon acceptance to the incubator, we
> > would begin a thorough analysis of all transitive dependencies to
> > verify this fact and further introduce license checking into the build
> > and release process.
> >
> > == Cryptography ==
> > Not applicable.
> >
> > == Required Resources ==
> > === Mailing Lists ===
> > We will operate two mailing lists as follows:
> >    * Onyx PMC discussions: [hidden email]
> >    * Onyx developers: [hidden email]
> >
> > === Git Repositories ===
> > Upon incubation: https://github.com/apache/incubator-onyx.
> > After the incubation, we would like to move the existing repo
> > https://github.com/snuspl/onyx to the Apache infrastructure
> >
> > === Issue Tracking ===
> > Onyx currently tracks its issues using the Github issue tracker:
> > https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
> > JIRA.
> >
> > == Initial Committers ==
> >   * Byung-Gon Chun
> >   * Jeongyoon Eo
> >   * Geon-Woo Kim
> >   * Joo Yeon Kim
> >   * Gyewon Lee
> >   * Jung-Gil Lee
> >   * Sanha Lee
> >   * Wooyeon Lee
> >   * Yunseong Lee
> >   * JangHo Seo
> >   * Won Wook Song
> >   * Taegeon Um
> >   * Youngseok Yang
> >
> > == Affiliations ==
> >   * SNU (Seoul National University)
> >     * Byung-Gon Chun
> >     * Jeongyoon Eo
> >     * Geon-Woo Kim
> >     * Gyewon Lee
> >     * Sanha Lee
> >     * Wooyeon Lee
> >     * Yunseong Lee
> >     * JangHo Seo
> >     * Won Wook Song
> >     * Taegeon Um
> >     * Youngseok Yang
> >
> >   * LG
> >     * Jung-Gil Lee
> >
> >   * Samsung
> >     * Joo Yeon Kim
> >
> >   * Viva Republica
> >     * Geon-Woo Kim
> >
> > == Sponsors ==
> > === Champions ===
> > Byung-Gon Chun
> >
> > === Mentors ===
> >   * Hyunsik Choi
> >   * Byung-Gon Chun
> >   * Markus Weimer
> >   * Reynold Xin
> >
> > === Sponsoring Entity ===
> > The Apache Incubator
> >
> >
> >
> > --
> > Byung-Gon Chun
> >
>



--
Byung-Gon Chun
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Romain Manni-Bucau
Why not doing a beam subproject? Any blocker?

Otherwise +1 to have it @asf, makes a lot of sense.

Le 26 janv. 2018 20:58, "Byung-Gon Chun" <[hidden email]> a écrit :

> On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci <[hidden email]> wrote:
>
> > Great work -- I think this technology has a lot of promise, and I'd love
> to
> > see its evolution inside the Foundation.
> >
> >
> Thanks, Davor!
>
>
> > Parts of it, like the Onyx Intermediate Representation [1], overlap with
> > the work-in-progress inside the Apache Beam project ("portability"). We'd
> > love to work together on this -- would you be open to such collaboration?
> > If so, it may not be necessary to start from scratch, and leverage the
> work
> > already done.
> >
> >
> Sure. We're open to collaboration.
>
>
> > Regarding the name, Onyx would likely have to be renamed, due to a
> conflict
> > with a related technology [2].
> >
> >
> Thanks for pointing it out. It's difficult to come up with a good short
> name. :)
> Do you have any suggestion?
>
> Thanks!
> -Gon
>
> ---
> Byung-Gon Chun
>
>
>
> > Davor
> >
> > [1] https://snuspl.github.io/onyx/docs/ir/
> > [2] http://www.onyxplatform.org/
> >
> > On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun <[hidden email]>
> wrote:
> >
> > > Dear Apache Incubator Community,
> > >
> > > Please accept the following proposal for presentation and discussion:
> > > https://wiki.apache.org/incubator/OnyxProposal
> > >
> > > Onyx is a data processing system that aims to flexibly control the
> > runtime
> > > behaviors of a job to adapt to varying deployment characteristics
> (e.g.,
> > > harnessing transient resources in datacenters, cross-datacenter
> > deployment,
> > > changing runtime based on job characteristics, etc.). Onyx provides
> ways
> > to
> > > extend the system’s capabilities and incorporate the extensions to the
> > > flexible job execution.
> > > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into
> an
> > > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
> > > based on a deployment policy.
> > >
> > > I've attached the proposal below.
> > >
> > > Best regards,
> > > Byung-Gon Chun
> > >
> > > = OnyxProposal =
> > >
> > > == Abstract ==
> > > Onyx is a data processing system for flexible employment with
> > > different execution scenarios for various deployment characteristics
> > > on clusters.
> > >
> > > == Proposal ==
> > > Today, there is a wide variety of data processing systems with
> > > different designs for better performance and datacenter efficiency.
> > > They include processing data on specific resource environments and
> > > running jobs with specific attributes. Although each system
> > > successfully solves the problems it targets, most systems are designed
> > > in the way that runtime behaviors are built tightly inside the system
> > > core to hide the complexity of distributed computing. This makes it
> > > hard for a single system to support different deployment
> > > characteristics with different runtime behaviors without substantial
> > > effort.
> > >
> > > Onyx is a data processing system that aims to flexibly control the
> > > runtime behaviors of a job to adapt to varying deployment
> > > characteristics. Moreover, it provides a means of extending the
> > > system’s capabilities and incorporating the extensions to the flexible
> > > job execution.
> > >
> > > In order to be able to easily modify runtime behaviors to adapt to
> > > varying deployment characteristics, Onyx exposes runtime behaviors to
> > > be flexibly configured and modified at both compile-time and runtime
> > > through a set of high-level graph pass interfaces.
> > >
> > > We hope to contribute to the big data processing community by enabling
> > > more flexibility and extensibility in job executions. Furthermore, we
> > > can benefit more together as a community when we work together as a
> > > community to mature the system with more use cases and understanding
> > > of diverse deployment characteristics. The Apache Software Foundation
> > > is the perfect place to achieve these aspirations.
> > >
> > > == Background ==
> > > Many data processing systems have distinctive runtime behaviors
> > > optimized and configured for specific deployment characteristics like
> > > different resource environments and for handling special job
> > > attributes.
> > >
> > > For example, much research have been conducted to overcome the
> > > challenge of running data processing jobs on cheap, unreliable
> > > transient resources. Likewise, techniques for disaggregating different
> > > types of resources, like memory, CPU and GPU, are being actively
> > > developed to use datacenter resources more efficiently. Many
> > > researchers are also working to run data processing jobs in even more
> > > diverse environments, such as across distant datacenters. Similarly,
> > > for special job attributes, many works take different approaches, such
> > > as runtime optimization, to solve problems like data skew, and to
> > > optimize systems for data processing jobs with small-scale input data.
> > >
> > > Although each of the systems performs well with the jobs and in the
> > > environments they target, they perform poorly with unconsidered cases,
> > > and do not consider supporting multiple deployment characteristics on
> > > a single system in their designs.
> > >
> > > For an application writer to optimize an application to perform well
> > > on a certain system engraved with its underlying behaviors, it
> > > requires a deep understanding of the system itself, which is an
> > > overhead that often requires a lot of time and effort. Moreover, for a
> > > developer to modify such system behaviors, it requires modifications
> > > of the system core, which requires an even deeper understanding of the
> > > system itself.
> > >
> > > With this background, Onyx is designed to represent all of its jobs as
> > > an Intermediate Representation (IR) DAG. In the Onyx compiler, user
> > > applications from various programming models (ex. Apache Beam) are
> > > submitted, transformed to an IR DAG, and optimized/customized for the
> > > deployment characteristics. In the IR DAG optimization phase, the DAG
> > > is modified through a series of compiler “passes” which reshape or
> > > annotate the DAG with an expression of the underlying runtime
> > > behaviors. The IR DAG is then submitted as an execution plan for the
> > > Onyx runtime. The runtime includes the unmodified parts of data
> > > processing in the backbone which is transparently integrated with
> > > configurable components exposed for further extension.
> > >
> > > == Rationale ==
> > > Onyx’s vision lies in providing means for flexibly supporting a wide
> > > variety of job execution scenarios for users while facilitating system
> > > developers to extend the execution framework with various
> > > functionalities at the same time. The capabilities of the system can
> > > be extended as it grows to meet a more variety of execution scenarios.
> > > We require inputs from users and developers from diverse domains in
> > > order to make it a more thriving and useful project. The Apache
> > > Software Foundation provides the best tools and community to support
> > > this vision.
> > >
> > > == Initial Goals ==
> > > Initial goals will be to move the existing codebase to Apache and
> > > integrate with the Apache development process. We further plan to
> > > develop our system to meet the needs for more execution scenarios for
> > > a more variety of deployment characteristics.
> > >
> > > == Current Status ==
> > > Onyx codebase is currently hosted in a repository at github.com. The
> > > current version has been developed by system developers at Seoul
> > > National University, Viva Republica, Samsung, and LG.
> > >
> > > == Meritocracy ==
> > > We plan to strongly support meritocracy. We will discuss the
> > > requirements in an open forum, and those that continuously contribute
> > > to Onyx with the passion to strengthen the system will be invited as
> > > committers. Contributors that enrich Onyx by providing various use
> > > cases, various implementations of the configurable components
> > > including ideas for optimization techniques will be especially
> > > welcome. Committers with a deep understanding of the system’s
> > > technical aspects as a whole and its philosophy will definitely be
> > > voted as the PMC. We will monitor community participation so that
> > > privileges can be extended to those that contribute.
> > >
> > > == Community ==
> > > We hope to expand our contribution community by becoming an Apache
> > > incubator project. The contributions will come from both users and
> > > system developers interested in flexibility and extensibility of job
> > > executions that Onyx can support. We expect users to mainly contribute
> > > to diversify the use cases and deployment characteristics, and
> > > developers to  contribute to implement them.
> > >
> > > == Alignment ==
> > > Apache Spark is one of many popular data processing frameworks. The
> > > system is designed towards optimizing jobs using RDDs in memory and
> > > many other optimizations built tightly within the framework. In
> > > contrast to Spark, Onyx aims to provide more flexibility for job
> > > execution in an easy manner.
> > >
> > > Apache Tez enables developers to build complex task DAGs with control
> > > over the control plane of job execution. In Onyx, a high-level
> > > programming layer (ex. Apache Beam) is automatically converted to a
> > > basic IR DAG and can be converted to any IR DAG through a series of
> > > easy user writable passes, that can both reshape and modify the
> > > annotation (of execution properties) of the DAG. Moreover, Onyx leaves
> > > more parts of the job execution configurable, such as the scheduler
> > > and the data plane. As opposed to providing a set of properties for
> > > solid optimization, Onyx’s configurable parts can be easily extended
> > > and explored by implementing the pre-defined interfaces. For example,
> > > an arbitrary intermediate data store can be added.
> > >
> > > Onyx currently supports Apache Beam programs and we are working on
> > > supporting Apache Spark programs as well. Onyx also utilizes Apache
> > > REEF for container management, which allows Onyx to run in Apache YARN
> > > and Apache Mesos clusters. If necessary, we plan to contribute to and
> > > collaborate with these other Apache projects for the benefit of all.
> > > We plan to extend such integrations with more Apache softwares. Apache
> > > software foundation already hosts many major big-data systems, and we
> > > expect to help further growth of the big-data community by having Onyx
> > > within the Apache foundation.
> > >
> > > == Known Risks ==
> > > === Orphaned Products ===
> > > The risk of the Onyx project being orphaned is minimal. There is
> > > already plenty of work that arduously support different deployment
> > > characteristics, and we propose a general way to implement them with
> > > flexible and extensible configuration knobs. The domain of data
> > > processing is already of high interest, and this domain is expected to
> > > evolve continuously with various other purposes, such as resource
> > > disaggregation and using transient resources for better datacenter
> > > resource utilization.
> > >
> > > === Inexperience with Open Source ===
> > > The initial committers include PMC members and committers of other
> > > Apache projects. They have experience with open source projects,
> > > starting from their incubation to the top-level. They have been
> > > involved in the open source development process, and are familiar with
> > > releasing code under an open source license.
> > >
> > > === Homogeneous Developers ===
> > > The initial set of committers is from a limited set of organizations,
> > > but we expect to attract new contributors from diverse organizations
> > > and will thus grow organically once approved for incubation. Our prior
> > > experience with other open source projects will help various
> > > contributors to actively participate in our project.
> > >
> > > === Reliance on Salaried Developers ===
> > > Many developers are from Seoul National University. This is not
> > applicable.
> > >
> > > === Relationships with Other Apache Products ===
> > > Onyx positions itself among multiple Apache products. It runs on
> > > Apache REEF for container management. It also utilizes many useful
> > > development tools including Apache Maven, Apache Log4J, and multiple
> > > Apache Commons components. Onyx supports the Apache Beam programming
> > > model for user applications. We are currently working on supporting
> > > the Apache Spark programming APIs as well.
> > >
> > > === An Excessive Fascination with the Apache Brand ===
> > > We hope to make Onyx a powerful system for data processing, meeting
> > > various needs for different deployment characteristics, under a more
> > > variety of environments. We see the limitations of simply putting code
> > > on GitHub, and we believe the Apache community will help the growth of
> > > Onyx for the project to become a positively impactful and innovative
> > > open source software. We believe Onyx is a great fit for the Apache
> > > Software Foundation due to the collaboration it aims to achieve from
> > > the big data processing community.
> > >
> > > == Documentation ==
> > > The current documentation for Onyx is at
> https://snuspl.github.io/onyx/.
> > >
> > > == Initial Source ==
> > > The Onyx codebase is currently hosted at
> https://github.com/snuspl/onyx.
> > >
> > > == External Dependencies ==
> > > To the best of our knowledge, all Onyx dependencies are distributed
> > > under Apache compatible licenses. Upon acceptance to the incubator, we
> > > would begin a thorough analysis of all transitive dependencies to
> > > verify this fact and further introduce license checking into the build
> > > and release process.
> > >
> > > == Cryptography ==
> > > Not applicable.
> > >
> > > == Required Resources ==
> > > === Mailing Lists ===
> > > We will operate two mailing lists as follows:
> > >    * Onyx PMC discussions: [hidden email]
> > >    * Onyx developers: [hidden email]
> > >
> > > === Git Repositories ===
> > > Upon incubation: https://github.com/apache/incubator-onyx.
> > > After the incubation, we would like to move the existing repo
> > > https://github.com/snuspl/onyx to the Apache infrastructure
> > >
> > > === Issue Tracking ===
> > > Onyx currently tracks its issues using the Github issue tracker:
> > > https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
> > > JIRA.
> > >
> > > == Initial Committers ==
> > >   * Byung-Gon Chun
> > >   * Jeongyoon Eo
> > >   * Geon-Woo Kim
> > >   * Joo Yeon Kim
> > >   * Gyewon Lee
> > >   * Jung-Gil Lee
> > >   * Sanha Lee
> > >   * Wooyeon Lee
> > >   * Yunseong Lee
> > >   * JangHo Seo
> > >   * Won Wook Song
> > >   * Taegeon Um
> > >   * Youngseok Yang
> > >
> > > == Affiliations ==
> > >   * SNU (Seoul National University)
> > >     * Byung-Gon Chun
> > >     * Jeongyoon Eo
> > >     * Geon-Woo Kim
> > >     * Gyewon Lee
> > >     * Sanha Lee
> > >     * Wooyeon Lee
> > >     * Yunseong Lee
> > >     * JangHo Seo
> > >     * Won Wook Song
> > >     * Taegeon Um
> > >     * Youngseok Yang
> > >
> > >   * LG
> > >     * Jung-Gil Lee
> > >
> > >   * Samsung
> > >     * Joo Yeon Kim
> > >
> > >   * Viva Republica
> > >     * Geon-Woo Kim
> > >
> > > == Sponsors ==
> > > === Champions ===
> > > Byung-Gon Chun
> > >
> > > === Mentors ===
> > >   * Hyunsik Choi
> > >   * Byung-Gon Chun
> > >   * Markus Weimer
> > >   * Reynold Xin
> > >
> > > === Sponsoring Entity ===
> > > The Apache Incubator
> > >
> > >
> > >
> > > --
> > > Byung-Gon Chun
> > >
> >
>
>
>
> --
> Byung-Gon Chun
>
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Byung-Gon Chun
On Sat, Jan 27, 2018 at 5:41 AM, Romain Manni-Bucau <[hidden email]>
wrote:

> Why not doing a beam subproject? Any blocker?
>
>
Thanks for the question, Romain.

We have a flexible, efficient runtime that supports various user programs
(e.g., Beam and Spark programs).
We are taking advantage of Beam as a programming layer, but our focus is
more on optimizing execution on various deployment scenarios.
We also plan to support other programming layers.


> Otherwise +1 to have it @asf, makes a lot of sense.
>
>
Thanks for the support!

-Gon


> Le 26 janv. 2018 20:58, "Byung-Gon Chun" <[hidden email]> a écrit :
>
> > On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci <[hidden email]> wrote:
> >
> > > Great work -- I think this technology has a lot of promise, and I'd
> love
> > to
> > > see its evolution inside the Foundation.
> > >
> > >
> > Thanks, Davor!
> >
> >
> > > Parts of it, like the Onyx Intermediate Representation [1], overlap
> with
> > > the work-in-progress inside the Apache Beam project ("portability").
> We'd
> > > love to work together on this -- would you be open to such
> collaboration?
> > > If so, it may not be necessary to start from scratch, and leverage the
> > work
> > > already done.
> > >
> > >
> > Sure. We're open to collaboration.
> >
> >
> > > Regarding the name, Onyx would likely have to be renamed, due to a
> > conflict
> > > with a related technology [2].
> > >
> > >
> > Thanks for pointing it out. It's difficult to come up with a good short
> > name. :)
> > Do you have any suggestion?
> >
> > Thanks!
> > -Gon
> >
> > ---
> > Byung-Gon Chun
> >
> >
> >
> > > Davor
> > >
> > > [1] https://snuspl.github.io/onyx/docs/ir/
> > > [2] http://www.onyxplatform.org/
> > >
> > > On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun <[hidden email]>
> > wrote:
> > >
> > > > Dear Apache Incubator Community,
> > > >
> > > > Please accept the following proposal for presentation and discussion:
> > > > https://wiki.apache.org/incubator/OnyxProposal
> > > >
> > > > Onyx is a data processing system that aims to flexibly control the
> > > runtime
> > > > behaviors of a job to adapt to varying deployment characteristics
> > (e.g.,
> > > > harnessing transient resources in datacenters, cross-datacenter
> > > deployment,
> > > > changing runtime based on job characteristics, etc.). Onyx provides
> > ways
> > > to
> > > > extend the system’s capabilities and incorporate the extensions to
> the
> > > > flexible job execution.
> > > > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into
> > an
> > > > Intermediate Representation (IR) DAG, which Onyx optimizes and
> deploys
> > > > based on a deployment policy.
> > > >
> > > > I've attached the proposal below.
> > > >
> > > > Best regards,
> > > > Byung-Gon Chun
> > > >
> > > > = OnyxProposal =
> > > >
> > > > == Abstract ==
> > > > Onyx is a data processing system for flexible employment with
> > > > different execution scenarios for various deployment characteristics
> > > > on clusters.
> > > >
> > > > == Proposal ==
> > > > Today, there is a wide variety of data processing systems with
> > > > different designs for better performance and datacenter efficiency.
> > > > They include processing data on specific resource environments and
> > > > running jobs with specific attributes. Although each system
> > > > successfully solves the problems it targets, most systems are
> designed
> > > > in the way that runtime behaviors are built tightly inside the system
> > > > core to hide the complexity of distributed computing. This makes it
> > > > hard for a single system to support different deployment
> > > > characteristics with different runtime behaviors without substantial
> > > > effort.
> > > >
> > > > Onyx is a data processing system that aims to flexibly control the
> > > > runtime behaviors of a job to adapt to varying deployment
> > > > characteristics. Moreover, it provides a means of extending the
> > > > system’s capabilities and incorporating the extensions to the
> flexible
> > > > job execution.
> > > >
> > > > In order to be able to easily modify runtime behaviors to adapt to
> > > > varying deployment characteristics, Onyx exposes runtime behaviors to
> > > > be flexibly configured and modified at both compile-time and runtime
> > > > through a set of high-level graph pass interfaces.
> > > >
> > > > We hope to contribute to the big data processing community by
> enabling
> > > > more flexibility and extensibility in job executions. Furthermore, we
> > > > can benefit more together as a community when we work together as a
> > > > community to mature the system with more use cases and understanding
> > > > of diverse deployment characteristics. The Apache Software Foundation
> > > > is the perfect place to achieve these aspirations.
> > > >
> > > > == Background ==
> > > > Many data processing systems have distinctive runtime behaviors
> > > > optimized and configured for specific deployment characteristics like
> > > > different resource environments and for handling special job
> > > > attributes.
> > > >
> > > > For example, much research have been conducted to overcome the
> > > > challenge of running data processing jobs on cheap, unreliable
> > > > transient resources. Likewise, techniques for disaggregating
> different
> > > > types of resources, like memory, CPU and GPU, are being actively
> > > > developed to use datacenter resources more efficiently. Many
> > > > researchers are also working to run data processing jobs in even more
> > > > diverse environments, such as across distant datacenters. Similarly,
> > > > for special job attributes, many works take different approaches,
> such
> > > > as runtime optimization, to solve problems like data skew, and to
> > > > optimize systems for data processing jobs with small-scale input
> data.
> > > >
> > > > Although each of the systems performs well with the jobs and in the
> > > > environments they target, they perform poorly with unconsidered
> cases,
> > > > and do not consider supporting multiple deployment characteristics on
> > > > a single system in their designs.
> > > >
> > > > For an application writer to optimize an application to perform well
> > > > on a certain system engraved with its underlying behaviors, it
> > > > requires a deep understanding of the system itself, which is an
> > > > overhead that often requires a lot of time and effort. Moreover, for
> a
> > > > developer to modify such system behaviors, it requires modifications
> > > > of the system core, which requires an even deeper understanding of
> the
> > > > system itself.
> > > >
> > > > With this background, Onyx is designed to represent all of its jobs
> as
> > > > an Intermediate Representation (IR) DAG. In the Onyx compiler, user
> > > > applications from various programming models (ex. Apache Beam) are
> > > > submitted, transformed to an IR DAG, and optimized/customized for the
> > > > deployment characteristics. In the IR DAG optimization phase, the DAG
> > > > is modified through a series of compiler “passes” which reshape or
> > > > annotate the DAG with an expression of the underlying runtime
> > > > behaviors. The IR DAG is then submitted as an execution plan for the
> > > > Onyx runtime. The runtime includes the unmodified parts of data
> > > > processing in the backbone which is transparently integrated with
> > > > configurable components exposed for further extension.
> > > >
> > > > == Rationale ==
> > > > Onyx’s vision lies in providing means for flexibly supporting a wide
> > > > variety of job execution scenarios for users while facilitating
> system
> > > > developers to extend the execution framework with various
> > > > functionalities at the same time. The capabilities of the system can
> > > > be extended as it grows to meet a more variety of execution
> scenarios.
> > > > We require inputs from users and developers from diverse domains in
> > > > order to make it a more thriving and useful project. The Apache
> > > > Software Foundation provides the best tools and community to support
> > > > this vision.
> > > >
> > > > == Initial Goals ==
> > > > Initial goals will be to move the existing codebase to Apache and
> > > > integrate with the Apache development process. We further plan to
> > > > develop our system to meet the needs for more execution scenarios for
> > > > a more variety of deployment characteristics.
> > > >
> > > > == Current Status ==
> > > > Onyx codebase is currently hosted in a repository at github.com. The
> > > > current version has been developed by system developers at Seoul
> > > > National University, Viva Republica, Samsung, and LG.
> > > >
> > > > == Meritocracy ==
> > > > We plan to strongly support meritocracy. We will discuss the
> > > > requirements in an open forum, and those that continuously contribute
> > > > to Onyx with the passion to strengthen the system will be invited as
> > > > committers. Contributors that enrich Onyx by providing various use
> > > > cases, various implementations of the configurable components
> > > > including ideas for optimization techniques will be especially
> > > > welcome. Committers with a deep understanding of the system’s
> > > > technical aspects as a whole and its philosophy will definitely be
> > > > voted as the PMC. We will monitor community participation so that
> > > > privileges can be extended to those that contribute.
> > > >
> > > > == Community ==
> > > > We hope to expand our contribution community by becoming an Apache
> > > > incubator project. The contributions will come from both users and
> > > > system developers interested in flexibility and extensibility of job
> > > > executions that Onyx can support. We expect users to mainly
> contribute
> > > > to diversify the use cases and deployment characteristics, and
> > > > developers to  contribute to implement them.
> > > >
> > > > == Alignment ==
> > > > Apache Spark is one of many popular data processing frameworks. The
> > > > system is designed towards optimizing jobs using RDDs in memory and
> > > > many other optimizations built tightly within the framework. In
> > > > contrast to Spark, Onyx aims to provide more flexibility for job
> > > > execution in an easy manner.
> > > >
> > > > Apache Tez enables developers to build complex task DAGs with control
> > > > over the control plane of job execution. In Onyx, a high-level
> > > > programming layer (ex. Apache Beam) is automatically converted to a
> > > > basic IR DAG and can be converted to any IR DAG through a series of
> > > > easy user writable passes, that can both reshape and modify the
> > > > annotation (of execution properties) of the DAG. Moreover, Onyx
> leaves
> > > > more parts of the job execution configurable, such as the scheduler
> > > > and the data plane. As opposed to providing a set of properties for
> > > > solid optimization, Onyx’s configurable parts can be easily extended
> > > > and explored by implementing the pre-defined interfaces. For example,
> > > > an arbitrary intermediate data store can be added.
> > > >
> > > > Onyx currently supports Apache Beam programs and we are working on
> > > > supporting Apache Spark programs as well. Onyx also utilizes Apache
> > > > REEF for container management, which allows Onyx to run in Apache
> YARN
> > > > and Apache Mesos clusters. If necessary, we plan to contribute to and
> > > > collaborate with these other Apache projects for the benefit of all.
> > > > We plan to extend such integrations with more Apache softwares.
> Apache
> > > > software foundation already hosts many major big-data systems, and we
> > > > expect to help further growth of the big-data community by having
> Onyx
> > > > within the Apache foundation.
> > > >
> > > > == Known Risks ==
> > > > === Orphaned Products ===
> > > > The risk of the Onyx project being orphaned is minimal. There is
> > > > already plenty of work that arduously support different deployment
> > > > characteristics, and we propose a general way to implement them with
> > > > flexible and extensible configuration knobs. The domain of data
> > > > processing is already of high interest, and this domain is expected
> to
> > > > evolve continuously with various other purposes, such as resource
> > > > disaggregation and using transient resources for better datacenter
> > > > resource utilization.
> > > >
> > > > === Inexperience with Open Source ===
> > > > The initial committers include PMC members and committers of other
> > > > Apache projects. They have experience with open source projects,
> > > > starting from their incubation to the top-level. They have been
> > > > involved in the open source development process, and are familiar
> with
> > > > releasing code under an open source license.
> > > >
> > > > === Homogeneous Developers ===
> > > > The initial set of committers is from a limited set of organizations,
> > > > but we expect to attract new contributors from diverse organizations
> > > > and will thus grow organically once approved for incubation. Our
> prior
> > > > experience with other open source projects will help various
> > > > contributors to actively participate in our project.
> > > >
> > > > === Reliance on Salaried Developers ===
> > > > Many developers are from Seoul National University. This is not
> > > applicable.
> > > >
> > > > === Relationships with Other Apache Products ===
> > > > Onyx positions itself among multiple Apache products. It runs on
> > > > Apache REEF for container management. It also utilizes many useful
> > > > development tools including Apache Maven, Apache Log4J, and multiple
> > > > Apache Commons components. Onyx supports the Apache Beam programming
> > > > model for user applications. We are currently working on supporting
> > > > the Apache Spark programming APIs as well.
> > > >
> > > > === An Excessive Fascination with the Apache Brand ===
> > > > We hope to make Onyx a powerful system for data processing, meeting
> > > > various needs for different deployment characteristics, under a more
> > > > variety of environments. We see the limitations of simply putting
> code
> > > > on GitHub, and we believe the Apache community will help the growth
> of
> > > > Onyx for the project to become a positively impactful and innovative
> > > > open source software. We believe Onyx is a great fit for the Apache
> > > > Software Foundation due to the collaboration it aims to achieve from
> > > > the big data processing community.
> > > >
> > > > == Documentation ==
> > > > The current documentation for Onyx is at
> > https://snuspl.github.io/onyx/.
> > > >
> > > > == Initial Source ==
> > > > The Onyx codebase is currently hosted at
> > https://github.com/snuspl/onyx.
> > > >
> > > > == External Dependencies ==
> > > > To the best of our knowledge, all Onyx dependencies are distributed
> > > > under Apache compatible licenses. Upon acceptance to the incubator,
> we
> > > > would begin a thorough analysis of all transitive dependencies to
> > > > verify this fact and further introduce license checking into the
> build
> > > > and release process.
> > > >
> > > > == Cryptography ==
> > > > Not applicable.
> > > >
> > > > == Required Resources ==
> > > > === Mailing Lists ===
> > > > We will operate two mailing lists as follows:
> > > >    * Onyx PMC discussions: [hidden email]
> > > >    * Onyx developers: [hidden email]
> > > >
> > > > === Git Repositories ===
> > > > Upon incubation: https://github.com/apache/incubator-onyx.
> > > > After the incubation, we would like to move the existing repo
> > > > https://github.com/snuspl/onyx to the Apache infrastructure
> > > >
> > > > === Issue Tracking ===
> > > > Onyx currently tracks its issues using the Github issue tracker:
> > > > https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
> > > > JIRA.
> > > >
> > > > == Initial Committers ==
> > > >   * Byung-Gon Chun
> > > >   * Jeongyoon Eo
> > > >   * Geon-Woo Kim
> > > >   * Joo Yeon Kim
> > > >   * Gyewon Lee
> > > >   * Jung-Gil Lee
> > > >   * Sanha Lee
> > > >   * Wooyeon Lee
> > > >   * Yunseong Lee
> > > >   * JangHo Seo
> > > >   * Won Wook Song
> > > >   * Taegeon Um
> > > >   * Youngseok Yang
> > > >
> > > > == Affiliations ==
> > > >   * SNU (Seoul National University)
> > > >     * Byung-Gon Chun
> > > >     * Jeongyoon Eo
> > > >     * Geon-Woo Kim
> > > >     * Gyewon Lee
> > > >     * Sanha Lee
> > > >     * Wooyeon Lee
> > > >     * Yunseong Lee
> > > >     * JangHo Seo
> > > >     * Won Wook Song
> > > >     * Taegeon Um
> > > >     * Youngseok Yang
> > > >
> > > >   * LG
> > > >     * Jung-Gil Lee
> > > >
> > > >   * Samsung
> > > >     * Joo Yeon Kim
> > > >
> > > >   * Viva Republica
> > > >     * Geon-Woo Kim
> > > >
> > > > == Sponsors ==
> > > > === Champions ===
> > > > Byung-Gon Chun
> > > >
> > > > === Mentors ===
> > > >   * Hyunsik Choi
> > > >   * Byung-Gon Chun
> > > >   * Markus Weimer
> > > >   * Reynold Xin
> > > >
> > > > === Sponsoring Entity ===
> > > > The Apache Incubator
> > > >
> > > >
> > > >
> > > > --
> > > > Byung-Gon Chun
> > > >
> > >
> >
> >
> >
> > --
> > Byung-Gon Chun
> >
>



--
Byung-Gon Chun
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Romain Manni-Bucau
Le 26 janv. 2018 21:53, "Byung-Gon Chun" <[hidden email]> a écrit :

On Sat, Jan 27, 2018 at 5:41 AM, Romain Manni-Bucau <[hidden email]>
wrote:

> Why not doing a beam subproject? Any blocker?
>
>
Thanks for the question, Romain.

We have a flexible, efficient runtime that supports various user programs
(e.g., Beam and Spark programs).
We are taking advantage of Beam as a programming layer, but our focus is
more on optimizing execution on various deployment scenarios.
We also plan to support other programming layers.



I tend to think it can converge since beam is about portability and
complementary IMHO. Can be worth PoCing.



> Otherwise +1 to have it @asf, makes a lot of sense.
>
>
Thanks for the support!

-Gon


> Le 26 janv. 2018 20:58, "Byung-Gon Chun" <[hidden email]> a écrit :
>
> > On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci <[hidden email]> wrote:
> >
> > > Great work -- I think this technology has a lot of promise, and I'd
> love
> > to
> > > see its evolution inside the Foundation.
> > >
> > >
> > Thanks, Davor!
> >
> >
> > > Parts of it, like the Onyx Intermediate Representation [1], overlap
> with
> > > the work-in-progress inside the Apache Beam project ("portability").
> We'd
> > > love to work together on this -- would you be open to such
> collaboration?
> > > If so, it may not be necessary to start from scratch, and leverage the
> > work
> > > already done.
> > >
> > >
> > Sure. We're open to collaboration.
> >
> >
> > > Regarding the name, Onyx would likely have to be renamed, due to a
> > conflict
> > > with a related technology [2].
> > >
> > >
> > Thanks for pointing it out. It's difficult to come up with a good short
> > name. :)
> > Do you have any suggestion?
> >
> > Thanks!
> > -Gon
> >
> > ---
> > Byung-Gon Chun
> >
> >
> >
> > > Davor
> > >
> > > [1] https://snuspl.github.io/onyx/docs/ir/
> > > [2] http://www.onyxplatform.org/
> > >
> > > On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun <[hidden email]>
> > wrote:
> > >
> > > > Dear Apache Incubator Community,
> > > >
> > > > Please accept the following proposal for presentation and
discussion:

> > > > https://wiki.apache.org/incubator/OnyxProposal
> > > >
> > > > Onyx is a data processing system that aims to flexibly control the
> > > runtime
> > > > behaviors of a job to adapt to varying deployment characteristics
> > (e.g.,
> > > > harnessing transient resources in datacenters, cross-datacenter
> > > deployment,
> > > > changing runtime based on job characteristics, etc.). Onyx provides
> > ways
> > > to
> > > > extend the system’s capabilities and incorporate the extensions to
> the
> > > > flexible job execution.
> > > > Onyx translates a user program (e.g., Apache Beam, Apache Spark)
into

> > an
> > > > Intermediate Representation (IR) DAG, which Onyx optimizes and
> deploys
> > > > based on a deployment policy.
> > > >
> > > > I've attached the proposal below.
> > > >
> > > > Best regards,
> > > > Byung-Gon Chun
> > > >
> > > > = OnyxProposal =
> > > >
> > > > == Abstract ==
> > > > Onyx is a data processing system for flexible employment with
> > > > different execution scenarios for various deployment characteristics
> > > > on clusters.
> > > >
> > > > == Proposal ==
> > > > Today, there is a wide variety of data processing systems with
> > > > different designs for better performance and datacenter efficiency.
> > > > They include processing data on specific resource environments and
> > > > running jobs with specific attributes. Although each system
> > > > successfully solves the problems it targets, most systems are
> designed
> > > > in the way that runtime behaviors are built tightly inside the
system

> > > > core to hide the complexity of distributed computing. This makes it
> > > > hard for a single system to support different deployment
> > > > characteristics with different runtime behaviors without substantial
> > > > effort.
> > > >
> > > > Onyx is a data processing system that aims to flexibly control the
> > > > runtime behaviors of a job to adapt to varying deployment
> > > > characteristics. Moreover, it provides a means of extending the
> > > > system’s capabilities and incorporating the extensions to the
> flexible
> > > > job execution.
> > > >
> > > > In order to be able to easily modify runtime behaviors to adapt to
> > > > varying deployment characteristics, Onyx exposes runtime behaviors
to
> > > > be flexibly configured and modified at both compile-time and runtime
> > > > through a set of high-level graph pass interfaces.
> > > >
> > > > We hope to contribute to the big data processing community by
> enabling
> > > > more flexibility and extensibility in job executions. Furthermore,
we
> > > > can benefit more together as a community when we work together as a
> > > > community to mature the system with more use cases and understanding
> > > > of diverse deployment characteristics. The Apache Software
Foundation
> > > > is the perfect place to achieve these aspirations.
> > > >
> > > > == Background ==
> > > > Many data processing systems have distinctive runtime behaviors
> > > > optimized and configured for specific deployment characteristics
like

> > > > different resource environments and for handling special job
> > > > attributes.
> > > >
> > > > For example, much research have been conducted to overcome the
> > > > challenge of running data processing jobs on cheap, unreliable
> > > > transient resources. Likewise, techniques for disaggregating
> different
> > > > types of resources, like memory, CPU and GPU, are being actively
> > > > developed to use datacenter resources more efficiently. Many
> > > > researchers are also working to run data processing jobs in even
more

> > > > diverse environments, such as across distant datacenters. Similarly,
> > > > for special job attributes, many works take different approaches,
> such
> > > > as runtime optimization, to solve problems like data skew, and to
> > > > optimize systems for data processing jobs with small-scale input
> data.
> > > >
> > > > Although each of the systems performs well with the jobs and in the
> > > > environments they target, they perform poorly with unconsidered
> cases,
> > > > and do not consider supporting multiple deployment characteristics
on

> > > > a single system in their designs.
> > > >
> > > > For an application writer to optimize an application to perform well
> > > > on a certain system engraved with its underlying behaviors, it
> > > > requires a deep understanding of the system itself, which is an
> > > > overhead that often requires a lot of time and effort. Moreover, for
> a
> > > > developer to modify such system behaviors, it requires modifications
> > > > of the system core, which requires an even deeper understanding of
> the
> > > > system itself.
> > > >
> > > > With this background, Onyx is designed to represent all of its jobs
> as
> > > > an Intermediate Representation (IR) DAG. In the Onyx compiler, user
> > > > applications from various programming models (ex. Apache Beam) are
> > > > submitted, transformed to an IR DAG, and optimized/customized for
the
> > > > deployment characteristics. In the IR DAG optimization phase, the
DAG

> > > > is modified through a series of compiler “passes” which reshape or
> > > > annotate the DAG with an expression of the underlying runtime
> > > > behaviors. The IR DAG is then submitted as an execution plan for the
> > > > Onyx runtime. The runtime includes the unmodified parts of data
> > > > processing in the backbone which is transparently integrated with
> > > > configurable components exposed for further extension.
> > > >
> > > > == Rationale ==
> > > > Onyx’s vision lies in providing means for flexibly supporting a wide
> > > > variety of job execution scenarios for users while facilitating
> system
> > > > developers to extend the execution framework with various
> > > > functionalities at the same time. The capabilities of the system can
> > > > be extended as it grows to meet a more variety of execution
> scenarios.
> > > > We require inputs from users and developers from diverse domains in
> > > > order to make it a more thriving and useful project. The Apache
> > > > Software Foundation provides the best tools and community to support
> > > > this vision.
> > > >
> > > > == Initial Goals ==
> > > > Initial goals will be to move the existing codebase to Apache and
> > > > integrate with the Apache development process. We further plan to
> > > > develop our system to meet the needs for more execution scenarios
for

> > > > a more variety of deployment characteristics.
> > > >
> > > > == Current Status ==
> > > > Onyx codebase is currently hosted in a repository at github.com. The
> > > > current version has been developed by system developers at Seoul
> > > > National University, Viva Republica, Samsung, and LG.
> > > >
> > > > == Meritocracy ==
> > > > We plan to strongly support meritocracy. We will discuss the
> > > > requirements in an open forum, and those that continuously
contribute

> > > > to Onyx with the passion to strengthen the system will be invited as
> > > > committers. Contributors that enrich Onyx by providing various use
> > > > cases, various implementations of the configurable components
> > > > including ideas for optimization techniques will be especially
> > > > welcome. Committers with a deep understanding of the system’s
> > > > technical aspects as a whole and its philosophy will definitely be
> > > > voted as the PMC. We will monitor community participation so that
> > > > privileges can be extended to those that contribute.
> > > >
> > > > == Community ==
> > > > We hope to expand our contribution community by becoming an Apache
> > > > incubator project. The contributions will come from both users and
> > > > system developers interested in flexibility and extensibility of job
> > > > executions that Onyx can support. We expect users to mainly
> contribute
> > > > to diversify the use cases and deployment characteristics, and
> > > > developers to  contribute to implement them.
> > > >
> > > > == Alignment ==
> > > > Apache Spark is one of many popular data processing frameworks. The
> > > > system is designed towards optimizing jobs using RDDs in memory and
> > > > many other optimizations built tightly within the framework. In
> > > > contrast to Spark, Onyx aims to provide more flexibility for job
> > > > execution in an easy manner.
> > > >
> > > > Apache Tez enables developers to build complex task DAGs with
control

> > > > over the control plane of job execution. In Onyx, a high-level
> > > > programming layer (ex. Apache Beam) is automatically converted to a
> > > > basic IR DAG and can be converted to any IR DAG through a series of
> > > > easy user writable passes, that can both reshape and modify the
> > > > annotation (of execution properties) of the DAG. Moreover, Onyx
> leaves
> > > > more parts of the job execution configurable, such as the scheduler
> > > > and the data plane. As opposed to providing a set of properties for
> > > > solid optimization, Onyx’s configurable parts can be easily extended
> > > > and explored by implementing the pre-defined interfaces. For
example,
> > > > an arbitrary intermediate data store can be added.
> > > >
> > > > Onyx currently supports Apache Beam programs and we are working on
> > > > supporting Apache Spark programs as well. Onyx also utilizes Apache
> > > > REEF for container management, which allows Onyx to run in Apache
> YARN
> > > > and Apache Mesos clusters. If necessary, we plan to contribute to
and
> > > > collaborate with these other Apache projects for the benefit of all.
> > > > We plan to extend such integrations with more Apache softwares.
> Apache
> > > > software foundation already hosts many major big-data systems, and
we

> > > > expect to help further growth of the big-data community by having
> Onyx
> > > > within the Apache foundation.
> > > >
> > > > == Known Risks ==
> > > > === Orphaned Products ===
> > > > The risk of the Onyx project being orphaned is minimal. There is
> > > > already plenty of work that arduously support different deployment
> > > > characteristics, and we propose a general way to implement them with
> > > > flexible and extensible configuration knobs. The domain of data
> > > > processing is already of high interest, and this domain is expected
> to
> > > > evolve continuously with various other purposes, such as resource
> > > > disaggregation and using transient resources for better datacenter
> > > > resource utilization.
> > > >
> > > > === Inexperience with Open Source ===
> > > > The initial committers include PMC members and committers of other
> > > > Apache projects. They have experience with open source projects,
> > > > starting from their incubation to the top-level. They have been
> > > > involved in the open source development process, and are familiar
> with
> > > > releasing code under an open source license.
> > > >
> > > > === Homogeneous Developers ===
> > > > The initial set of committers is from a limited set of
organizations,

> > > > but we expect to attract new contributors from diverse organizations
> > > > and will thus grow organically once approved for incubation. Our
> prior
> > > > experience with other open source projects will help various
> > > > contributors to actively participate in our project.
> > > >
> > > > === Reliance on Salaried Developers ===
> > > > Many developers are from Seoul National University. This is not
> > > applicable.
> > > >
> > > > === Relationships with Other Apache Products ===
> > > > Onyx positions itself among multiple Apache products. It runs on
> > > > Apache REEF for container management. It also utilizes many useful
> > > > development tools including Apache Maven, Apache Log4J, and multiple
> > > > Apache Commons components. Onyx supports the Apache Beam programming
> > > > model for user applications. We are currently working on supporting
> > > > the Apache Spark programming APIs as well.
> > > >
> > > > === An Excessive Fascination with the Apache Brand ===
> > > > We hope to make Onyx a powerful system for data processing, meeting
> > > > various needs for different deployment characteristics, under a more
> > > > variety of environments. We see the limitations of simply putting
> code
> > > > on GitHub, and we believe the Apache community will help the growth
> of
> > > > Onyx for the project to become a positively impactful and innovative
> > > > open source software. We believe Onyx is a great fit for the Apache
> > > > Software Foundation due to the collaboration it aims to achieve from
> > > > the big data processing community.
> > > >
> > > > == Documentation ==
> > > > The current documentation for Onyx is at
> > https://snuspl.github.io/onyx/.
> > > >
> > > > == Initial Source ==
> > > > The Onyx codebase is currently hosted at
> > https://github.com/snuspl/onyx.
> > > >
> > > > == External Dependencies ==
> > > > To the best of our knowledge, all Onyx dependencies are distributed
> > > > under Apache compatible licenses. Upon acceptance to the incubator,
> we
> > > > would begin a thorough analysis of all transitive dependencies to
> > > > verify this fact and further introduce license checking into the
> build
> > > > and release process.
> > > >
> > > > == Cryptography ==
> > > > Not applicable.
> > > >
> > > > == Required Resources ==
> > > > === Mailing Lists ===
> > > > We will operate two mailing lists as follows:
> > > >    * Onyx PMC discussions: [hidden email]
> > > >    * Onyx developers: [hidden email]
> > > >
> > > > === Git Repositories ===
> > > > Upon incubation: https://github.com/apache/incubator-onyx.
> > > > After the incubation, we would like to move the existing repo
> > > > https://github.com/snuspl/onyx to the Apache infrastructure
> > > >
> > > > === Issue Tracking ===
> > > > Onyx currently tracks its issues using the Github issue tracker:
> > > > https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
> > > > JIRA.
> > > >
> > > > == Initial Committers ==
> > > >   * Byung-Gon Chun
> > > >   * Jeongyoon Eo
> > > >   * Geon-Woo Kim
> > > >   * Joo Yeon Kim
> > > >   * Gyewon Lee
> > > >   * Jung-Gil Lee
> > > >   * Sanha Lee
> > > >   * Wooyeon Lee
> > > >   * Yunseong Lee
> > > >   * JangHo Seo
> > > >   * Won Wook Song
> > > >   * Taegeon Um
> > > >   * Youngseok Yang
> > > >
> > > > == Affiliations ==
> > > >   * SNU (Seoul National University)
> > > >     * Byung-Gon Chun
> > > >     * Jeongyoon Eo
> > > >     * Geon-Woo Kim
> > > >     * Gyewon Lee
> > > >     * Sanha Lee
> > > >     * Wooyeon Lee
> > > >     * Yunseong Lee
> > > >     * JangHo Seo
> > > >     * Won Wook Song
> > > >     * Taegeon Um
> > > >     * Youngseok Yang
> > > >
> > > >   * LG
> > > >     * Jung-Gil Lee
> > > >
> > > >   * Samsung
> > > >     * Joo Yeon Kim
> > > >
> > > >   * Viva Republica
> > > >     * Geon-Woo Kim
> > > >
> > > > == Sponsors ==
> > > > === Champions ===
> > > > Byung-Gon Chun
> > > >
> > > > === Mentors ===
> > > >   * Hyunsik Choi
> > > >   * Byung-Gon Chun
> > > >   * Markus Weimer
> > > >   * Reynold Xin
> > > >
> > > > === Sponsoring Entity ===
> > > > The Apache Incubator
> > > >
> > > >
> > > >
> > > > --
> > > > Byung-Gon Chun
> > > >
> > >
> >
> >
> >
> > --
> > Byung-Gon Chun
> >
>



--
Byung-Gon Chun
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Byung-Gon Chun
In reply to this post by Byung-Gon Chun
Since we cannot use the name Onyx, we would like to change the project name
to Surf.
I hope that this name works.

-Gon

---
Byung-Gon Chun


On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun <[hidden email]> wrote:

>
>
> On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci <[hidden email]> wrote:
>
>> Great work -- I think this technology has a lot of promise, and I'd love
>> to
>> see its evolution inside the Foundation.
>>
>>
> Thanks, Davor!
>
>
>> Parts of it, like the Onyx Intermediate Representation [1], overlap with
>> the work-in-progress inside the Apache Beam project ("portability"). We'd
>> love to work together on this -- would you be open to such collaboration?
>> If so, it may not be necessary to start from scratch, and leverage the
>> work
>> already done.
>>
>>
> Sure. We're open to collaboration.
>
>
>> Regarding the name, Onyx would likely have to be renamed, due to a
>> conflict
>> with a related technology [2].
>>
>>
> Thanks for pointing it out. It's difficult to come up with a good short
> name. :)
> Do you have any suggestion?
>
> Thanks!
> -Gon
>
> ---
> Byung-Gon Chun
>
>
>
>> Davor
>>
>> [1] https://snuspl.github.io/onyx/docs/ir/
>> [2] http://www.onyxplatform.org/
>>
>> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun <[hidden email]> wrote:
>>
>> > Dear Apache Incubator Community,
>> >
>> > Please accept the following proposal for presentation and discussion:
>> > https://wiki.apache.org/incubator/OnyxProposal
>> >
>> > Onyx is a data processing system that aims to flexibly control the
>> runtime
>> > behaviors of a job to adapt to varying deployment characteristics (e.g.,
>> > harnessing transient resources in datacenters, cross-datacenter
>> deployment,
>> > changing runtime based on job characteristics, etc.). Onyx provides
>> ways to
>> > extend the system’s capabilities and incorporate the extensions to the
>> > flexible job execution.
>> > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an
>> > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
>> > based on a deployment policy.
>> >
>> > I've attached the proposal below.
>> >
>> > Best regards,
>> > Byung-Gon Chun
>> >
>> > = OnyxProposal =
>> >
>> > == Abstract ==
>> > Onyx is a data processing system for flexible employment with
>> > different execution scenarios for various deployment characteristics
>> > on clusters.
>> >
>> > == Proposal ==
>> > Today, there is a wide variety of data processing systems with
>> > different designs for better performance and datacenter efficiency.
>> > They include processing data on specific resource environments and
>> > running jobs with specific attributes. Although each system
>> > successfully solves the problems it targets, most systems are designed
>> > in the way that runtime behaviors are built tightly inside the system
>> > core to hide the complexity of distributed computing. This makes it
>> > hard for a single system to support different deployment
>> > characteristics with different runtime behaviors without substantial
>> > effort.
>> >
>> > Onyx is a data processing system that aims to flexibly control the
>> > runtime behaviors of a job to adapt to varying deployment
>> > characteristics. Moreover, it provides a means of extending the
>> > system’s capabilities and incorporating the extensions to the flexible
>> > job execution.
>> >
>> > In order to be able to easily modify runtime behaviors to adapt to
>> > varying deployment characteristics, Onyx exposes runtime behaviors to
>> > be flexibly configured and modified at both compile-time and runtime
>> > through a set of high-level graph pass interfaces.
>> >
>> > We hope to contribute to the big data processing community by enabling
>> > more flexibility and extensibility in job executions. Furthermore, we
>> > can benefit more together as a community when we work together as a
>> > community to mature the system with more use cases and understanding
>> > of diverse deployment characteristics. The Apache Software Foundation
>> > is the perfect place to achieve these aspirations.
>> >
>> > == Background ==
>> > Many data processing systems have distinctive runtime behaviors
>> > optimized and configured for specific deployment characteristics like
>> > different resource environments and for handling special job
>> > attributes.
>> >
>> > For example, much research have been conducted to overcome the
>> > challenge of running data processing jobs on cheap, unreliable
>> > transient resources. Likewise, techniques for disaggregating different
>> > types of resources, like memory, CPU and GPU, are being actively
>> > developed to use datacenter resources more efficiently. Many
>> > researchers are also working to run data processing jobs in even more
>> > diverse environments, such as across distant datacenters. Similarly,
>> > for special job attributes, many works take different approaches, such
>> > as runtime optimization, to solve problems like data skew, and to
>> > optimize systems for data processing jobs with small-scale input data.
>> >
>> > Although each of the systems performs well with the jobs and in the
>> > environments they target, they perform poorly with unconsidered cases,
>> > and do not consider supporting multiple deployment characteristics on
>> > a single system in their designs.
>> >
>> > For an application writer to optimize an application to perform well
>> > on a certain system engraved with its underlying behaviors, it
>> > requires a deep understanding of the system itself, which is an
>> > overhead that often requires a lot of time and effort. Moreover, for a
>> > developer to modify such system behaviors, it requires modifications
>> > of the system core, which requires an even deeper understanding of the
>> > system itself.
>> >
>> > With this background, Onyx is designed to represent all of its jobs as
>> > an Intermediate Representation (IR) DAG. In the Onyx compiler, user
>> > applications from various programming models (ex. Apache Beam) are
>> > submitted, transformed to an IR DAG, and optimized/customized for the
>> > deployment characteristics. In the IR DAG optimization phase, the DAG
>> > is modified through a series of compiler “passes” which reshape or
>> > annotate the DAG with an expression of the underlying runtime
>> > behaviors. The IR DAG is then submitted as an execution plan for the
>> > Onyx runtime. The runtime includes the unmodified parts of data
>> > processing in the backbone which is transparently integrated with
>> > configurable components exposed for further extension.
>> >
>> > == Rationale ==
>> > Onyx’s vision lies in providing means for flexibly supporting a wide
>> > variety of job execution scenarios for users while facilitating system
>> > developers to extend the execution framework with various
>> > functionalities at the same time. The capabilities of the system can
>> > be extended as it grows to meet a more variety of execution scenarios.
>> > We require inputs from users and developers from diverse domains in
>> > order to make it a more thriving and useful project. The Apache
>> > Software Foundation provides the best tools and community to support
>> > this vision.
>> >
>> > == Initial Goals ==
>> > Initial goals will be to move the existing codebase to Apache and
>> > integrate with the Apache development process. We further plan to
>> > develop our system to meet the needs for more execution scenarios for
>> > a more variety of deployment characteristics.
>> >
>> > == Current Status ==
>> > Onyx codebase is currently hosted in a repository at github.com. The
>> > current version has been developed by system developers at Seoul
>> > National University, Viva Republica, Samsung, and LG.
>> >
>> > == Meritocracy ==
>> > We plan to strongly support meritocracy. We will discuss the
>> > requirements in an open forum, and those that continuously contribute
>> > to Onyx with the passion to strengthen the system will be invited as
>> > committers. Contributors that enrich Onyx by providing various use
>> > cases, various implementations of the configurable components
>> > including ideas for optimization techniques will be especially
>> > welcome. Committers with a deep understanding of the system’s
>> > technical aspects as a whole and its philosophy will definitely be
>> > voted as the PMC. We will monitor community participation so that
>> > privileges can be extended to those that contribute.
>> >
>> > == Community ==
>> > We hope to expand our contribution community by becoming an Apache
>> > incubator project. The contributions will come from both users and
>> > system developers interested in flexibility and extensibility of job
>> > executions that Onyx can support. We expect users to mainly contribute
>> > to diversify the use cases and deployment characteristics, and
>> > developers to  contribute to implement them.
>> >
>> > == Alignment ==
>> > Apache Spark is one of many popular data processing frameworks. The
>> > system is designed towards optimizing jobs using RDDs in memory and
>> > many other optimizations built tightly within the framework. In
>> > contrast to Spark, Onyx aims to provide more flexibility for job
>> > execution in an easy manner.
>> >
>> > Apache Tez enables developers to build complex task DAGs with control
>> > over the control plane of job execution. In Onyx, a high-level
>> > programming layer (ex. Apache Beam) is automatically converted to a
>> > basic IR DAG and can be converted to any IR DAG through a series of
>> > easy user writable passes, that can both reshape and modify the
>> > annotation (of execution properties) of the DAG. Moreover, Onyx leaves
>> > more parts of the job execution configurable, such as the scheduler
>> > and the data plane. As opposed to providing a set of properties for
>> > solid optimization, Onyx’s configurable parts can be easily extended
>> > and explored by implementing the pre-defined interfaces. For example,
>> > an arbitrary intermediate data store can be added.
>> >
>> > Onyx currently supports Apache Beam programs and we are working on
>> > supporting Apache Spark programs as well. Onyx also utilizes Apache
>> > REEF for container management, which allows Onyx to run in Apache YARN
>> > and Apache Mesos clusters. If necessary, we plan to contribute to and
>> > collaborate with these other Apache projects for the benefit of all.
>> > We plan to extend such integrations with more Apache softwares. Apache
>> > software foundation already hosts many major big-data systems, and we
>> > expect to help further growth of the big-data community by having Onyx
>> > within the Apache foundation.
>> >
>> > == Known Risks ==
>> > === Orphaned Products ===
>> > The risk of the Onyx project being orphaned is minimal. There is
>> > already plenty of work that arduously support different deployment
>> > characteristics, and we propose a general way to implement them with
>> > flexible and extensible configuration knobs. The domain of data
>> > processing is already of high interest, and this domain is expected to
>> > evolve continuously with various other purposes, such as resource
>> > disaggregation and using transient resources for better datacenter
>> > resource utilization.
>> >
>> > === Inexperience with Open Source ===
>> > The initial committers include PMC members and committers of other
>> > Apache projects. They have experience with open source projects,
>> > starting from their incubation to the top-level. They have been
>> > involved in the open source development process, and are familiar with
>> > releasing code under an open source license.
>> >
>> > === Homogeneous Developers ===
>> > The initial set of committers is from a limited set of organizations,
>> > but we expect to attract new contributors from diverse organizations
>> > and will thus grow organically once approved for incubation. Our prior
>> > experience with other open source projects will help various
>> > contributors to actively participate in our project.
>> >
>> > === Reliance on Salaried Developers ===
>> > Many developers are from Seoul National University. This is not
>> applicable.
>> >
>> > === Relationships with Other Apache Products ===
>> > Onyx positions itself among multiple Apache products. It runs on
>> > Apache REEF for container management. It also utilizes many useful
>> > development tools including Apache Maven, Apache Log4J, and multiple
>> > Apache Commons components. Onyx supports the Apache Beam programming
>> > model for user applications. We are currently working on supporting
>> > the Apache Spark programming APIs as well.
>> >
>> > === An Excessive Fascination with the Apache Brand ===
>> > We hope to make Onyx a powerful system for data processing, meeting
>> > various needs for different deployment characteristics, under a more
>> > variety of environments. We see the limitations of simply putting code
>> > on GitHub, and we believe the Apache community will help the growth of
>> > Onyx for the project to become a positively impactful and innovative
>> > open source software. We believe Onyx is a great fit for the Apache
>> > Software Foundation due to the collaboration it aims to achieve from
>> > the big data processing community.
>> >
>> > == Documentation ==
>> > The current documentation for Onyx is at https://snuspl.github.io/onyx/
>> .
>> >
>> > == Initial Source ==
>> > The Onyx codebase is currently hosted at https://github.com/snuspl/onyx
>> .
>> >
>> > == External Dependencies ==
>> > To the best of our knowledge, all Onyx dependencies are distributed
>> > under Apache compatible licenses. Upon acceptance to the incubator, we
>> > would begin a thorough analysis of all transitive dependencies to
>> > verify this fact and further introduce license checking into the build
>> > and release process.
>> >
>> > == Cryptography ==
>> > Not applicable.
>> >
>> > == Required Resources ==
>> > === Mailing Lists ===
>> > We will operate two mailing lists as follows:
>> >    * Onyx PMC discussions: [hidden email]
>> >    * Onyx developers: [hidden email]
>> >
>> > === Git Repositories ===
>> > Upon incubation: https://github.com/apache/incubator-onyx.
>> > After the incubation, we would like to move the existing repo
>> > https://github.com/snuspl/onyx to the Apache infrastructure
>> >
>> > === Issue Tracking ===
>> > Onyx currently tracks its issues using the Github issue tracker:
>> > https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
>> > JIRA.
>> >
>> > == Initial Committers ==
>> >   * Byung-Gon Chun
>> >   * Jeongyoon Eo
>> >   * Geon-Woo Kim
>> >   * Joo Yeon Kim
>> >   * Gyewon Lee
>> >   * Jung-Gil Lee
>> >   * Sanha Lee
>> >   * Wooyeon Lee
>> >   * Yunseong Lee
>> >   * JangHo Seo
>> >   * Won Wook Song
>> >   * Taegeon Um
>> >   * Youngseok Yang
>> >
>> > == Affiliations ==
>> >   * SNU (Seoul National University)
>> >     * Byung-Gon Chun
>> >     * Jeongyoon Eo
>> >     * Geon-Woo Kim
>> >     * Gyewon Lee
>> >     * Sanha Lee
>> >     * Wooyeon Lee
>> >     * Yunseong Lee
>> >     * JangHo Seo
>> >     * Won Wook Song
>> >     * Taegeon Um
>> >     * Youngseok Yang
>> >
>> >   * LG
>> >     * Jung-Gil Lee
>> >
>> >   * Samsung
>> >     * Joo Yeon Kim
>> >
>> >   * Viva Republica
>> >     * Geon-Woo Kim
>> >
>> > == Sponsors ==
>> > === Champions ===
>> > Byung-Gon Chun
>> >
>> > === Mentors ===
>> >   * Hyunsik Choi
>> >   * Byung-Gon Chun
>> >   * Markus Weimer
>> >   * Reynold Xin
>> >
>> > === Sponsoring Entity ===
>> > The Apache Incubator
>> >
>> >
>> >
>> > --
>> > Byung-Gon Chun
>> >
>>
>
>
>
> --
> Byung-Gon Chun
>



--
Byung-Gon Chun
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

sebb-2-2
A brief search for 'Surf Software' shows quite a few hits.
I have not looked to see if they would be likely to be confused with
this project or cause problems for others.

But it as though there might be a problem:
Surfer -  Golden Software
surf @ sourceforge
Surf Software company


On 27 January 2018 at 08:03, Byung-Gon Chun <[hidden email]> wrote:

> Since we cannot use the name Onyx, we would like to change the project name
> to Surf.
> I hope that this name works.
>
> -Gon
>
> ---
> Byung-Gon Chun
>
>
> On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun <[hidden email]> wrote:
>
>>
>>
>> On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci <[hidden email]> wrote:
>>
>>> Great work -- I think this technology has a lot of promise, and I'd love
>>> to
>>> see its evolution inside the Foundation.
>>>
>>>
>> Thanks, Davor!
>>
>>
>>> Parts of it, like the Onyx Intermediate Representation [1], overlap with
>>> the work-in-progress inside the Apache Beam project ("portability"). We'd
>>> love to work together on this -- would you be open to such collaboration?
>>> If so, it may not be necessary to start from scratch, and leverage the
>>> work
>>> already done.
>>>
>>>
>> Sure. We're open to collaboration.
>>
>>
>>> Regarding the name, Onyx would likely have to be renamed, due to a
>>> conflict
>>> with a related technology [2].
>>>
>>>
>> Thanks for pointing it out. It's difficult to come up with a good short
>> name. :)
>> Do you have any suggestion?
>>
>> Thanks!
>> -Gon
>>
>> ---
>> Byung-Gon Chun
>>
>>
>>
>>> Davor
>>>
>>> [1] https://snuspl.github.io/onyx/docs/ir/
>>> [2] http://www.onyxplatform.org/
>>>
>>> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun <[hidden email]> wrote:
>>>
>>> > Dear Apache Incubator Community,
>>> >
>>> > Please accept the following proposal for presentation and discussion:
>>> > https://wiki.apache.org/incubator/OnyxProposal
>>> >
>>> > Onyx is a data processing system that aims to flexibly control the
>>> runtime
>>> > behaviors of a job to adapt to varying deployment characteristics (e.g.,
>>> > harnessing transient resources in datacenters, cross-datacenter
>>> deployment,
>>> > changing runtime based on job characteristics, etc.). Onyx provides
>>> ways to
>>> > extend the system’s capabilities and incorporate the extensions to the
>>> > flexible job execution.
>>> > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an
>>> > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
>>> > based on a deployment policy.
>>> >
>>> > I've attached the proposal below.
>>> >
>>> > Best regards,
>>> > Byung-Gon Chun
>>> >
>>> > = OnyxProposal =
>>> >
>>> > == Abstract ==
>>> > Onyx is a data processing system for flexible employment with
>>> > different execution scenarios for various deployment characteristics
>>> > on clusters.
>>> >
>>> > == Proposal ==
>>> > Today, there is a wide variety of data processing systems with
>>> > different designs for better performance and datacenter efficiency.
>>> > They include processing data on specific resource environments and
>>> > running jobs with specific attributes. Although each system
>>> > successfully solves the problems it targets, most systems are designed
>>> > in the way that runtime behaviors are built tightly inside the system
>>> > core to hide the complexity of distributed computing. This makes it
>>> > hard for a single system to support different deployment
>>> > characteristics with different runtime behaviors without substantial
>>> > effort.
>>> >
>>> > Onyx is a data processing system that aims to flexibly control the
>>> > runtime behaviors of a job to adapt to varying deployment
>>> > characteristics. Moreover, it provides a means of extending the
>>> > system’s capabilities and incorporating the extensions to the flexible
>>> > job execution.
>>> >
>>> > In order to be able to easily modify runtime behaviors to adapt to
>>> > varying deployment characteristics, Onyx exposes runtime behaviors to
>>> > be flexibly configured and modified at both compile-time and runtime
>>> > through a set of high-level graph pass interfaces.
>>> >
>>> > We hope to contribute to the big data processing community by enabling
>>> > more flexibility and extensibility in job executions. Furthermore, we
>>> > can benefit more together as a community when we work together as a
>>> > community to mature the system with more use cases and understanding
>>> > of diverse deployment characteristics. The Apache Software Foundation
>>> > is the perfect place to achieve these aspirations.
>>> >
>>> > == Background ==
>>> > Many data processing systems have distinctive runtime behaviors
>>> > optimized and configured for specific deployment characteristics like
>>> > different resource environments and for handling special job
>>> > attributes.
>>> >
>>> > For example, much research have been conducted to overcome the
>>> > challenge of running data processing jobs on cheap, unreliable
>>> > transient resources. Likewise, techniques for disaggregating different
>>> > types of resources, like memory, CPU and GPU, are being actively
>>> > developed to use datacenter resources more efficiently. Many
>>> > researchers are also working to run data processing jobs in even more
>>> > diverse environments, such as across distant datacenters. Similarly,
>>> > for special job attributes, many works take different approaches, such
>>> > as runtime optimization, to solve problems like data skew, and to
>>> > optimize systems for data processing jobs with small-scale input data.
>>> >
>>> > Although each of the systems performs well with the jobs and in the
>>> > environments they target, they perform poorly with unconsidered cases,
>>> > and do not consider supporting multiple deployment characteristics on
>>> > a single system in their designs.
>>> >
>>> > For an application writer to optimize an application to perform well
>>> > on a certain system engraved with its underlying behaviors, it
>>> > requires a deep understanding of the system itself, which is an
>>> > overhead that often requires a lot of time and effort. Moreover, for a
>>> > developer to modify such system behaviors, it requires modifications
>>> > of the system core, which requires an even deeper understanding of the
>>> > system itself.
>>> >
>>> > With this background, Onyx is designed to represent all of its jobs as
>>> > an Intermediate Representation (IR) DAG. In the Onyx compiler, user
>>> > applications from various programming models (ex. Apache Beam) are
>>> > submitted, transformed to an IR DAG, and optimized/customized for the
>>> > deployment characteristics. In the IR DAG optimization phase, the DAG
>>> > is modified through a series of compiler “passes” which reshape or
>>> > annotate the DAG with an expression of the underlying runtime
>>> > behaviors. The IR DAG is then submitted as an execution plan for the
>>> > Onyx runtime. The runtime includes the unmodified parts of data
>>> > processing in the backbone which is transparently integrated with
>>> > configurable components exposed for further extension.
>>> >
>>> > == Rationale ==
>>> > Onyx’s vision lies in providing means for flexibly supporting a wide
>>> > variety of job execution scenarios for users while facilitating system
>>> > developers to extend the execution framework with various
>>> > functionalities at the same time. The capabilities of the system can
>>> > be extended as it grows to meet a more variety of execution scenarios.
>>> > We require inputs from users and developers from diverse domains in
>>> > order to make it a more thriving and useful project. The Apache
>>> > Software Foundation provides the best tools and community to support
>>> > this vision.
>>> >
>>> > == Initial Goals ==
>>> > Initial goals will be to move the existing codebase to Apache and
>>> > integrate with the Apache development process. We further plan to
>>> > develop our system to meet the needs for more execution scenarios for
>>> > a more variety of deployment characteristics.
>>> >
>>> > == Current Status ==
>>> > Onyx codebase is currently hosted in a repository at github.com. The
>>> > current version has been developed by system developers at Seoul
>>> > National University, Viva Republica, Samsung, and LG.
>>> >
>>> > == Meritocracy ==
>>> > We plan to strongly support meritocracy. We will discuss the
>>> > requirements in an open forum, and those that continuously contribute
>>> > to Onyx with the passion to strengthen the system will be invited as
>>> > committers. Contributors that enrich Onyx by providing various use
>>> > cases, various implementations of the configurable components
>>> > including ideas for optimization techniques will be especially
>>> > welcome. Committers with a deep understanding of the system’s
>>> > technical aspects as a whole and its philosophy will definitely be
>>> > voted as the PMC. We will monitor community participation so that
>>> > privileges can be extended to those that contribute.
>>> >
>>> > == Community ==
>>> > We hope to expand our contribution community by becoming an Apache
>>> > incubator project. The contributions will come from both users and
>>> > system developers interested in flexibility and extensibility of job
>>> > executions that Onyx can support. We expect users to mainly contribute
>>> > to diversify the use cases and deployment characteristics, and
>>> > developers to  contribute to implement them.
>>> >
>>> > == Alignment ==
>>> > Apache Spark is one of many popular data processing frameworks. The
>>> > system is designed towards optimizing jobs using RDDs in memory and
>>> > many other optimizations built tightly within the framework. In
>>> > contrast to Spark, Onyx aims to provide more flexibility for job
>>> > execution in an easy manner.
>>> >
>>> > Apache Tez enables developers to build complex task DAGs with control
>>> > over the control plane of job execution. In Onyx, a high-level
>>> > programming layer (ex. Apache Beam) is automatically converted to a
>>> > basic IR DAG and can be converted to any IR DAG through a series of
>>> > easy user writable passes, that can both reshape and modify the
>>> > annotation (of execution properties) of the DAG. Moreover, Onyx leaves
>>> > more parts of the job execution configurable, such as the scheduler
>>> > and the data plane. As opposed to providing a set of properties for
>>> > solid optimization, Onyx’s configurable parts can be easily extended
>>> > and explored by implementing the pre-defined interfaces. For example,
>>> > an arbitrary intermediate data store can be added.
>>> >
>>> > Onyx currently supports Apache Beam programs and we are working on
>>> > supporting Apache Spark programs as well. Onyx also utilizes Apache
>>> > REEF for container management, which allows Onyx to run in Apache YARN
>>> > and Apache Mesos clusters. If necessary, we plan to contribute to and
>>> > collaborate with these other Apache projects for the benefit of all.
>>> > We plan to extend such integrations with more Apache softwares. Apache
>>> > software foundation already hosts many major big-data systems, and we
>>> > expect to help further growth of the big-data community by having Onyx
>>> > within the Apache foundation.
>>> >
>>> > == Known Risks ==
>>> > === Orphaned Products ===
>>> > The risk of the Onyx project being orphaned is minimal. There is
>>> > already plenty of work that arduously support different deployment
>>> > characteristics, and we propose a general way to implement them with
>>> > flexible and extensible configuration knobs. The domain of data
>>> > processing is already of high interest, and this domain is expected to
>>> > evolve continuously with various other purposes, such as resource
>>> > disaggregation and using transient resources for better datacenter
>>> > resource utilization.
>>> >
>>> > === Inexperience with Open Source ===
>>> > The initial committers include PMC members and committers of other
>>> > Apache projects. They have experience with open source projects,
>>> > starting from their incubation to the top-level. They have been
>>> > involved in the open source development process, and are familiar with
>>> > releasing code under an open source license.
>>> >
>>> > === Homogeneous Developers ===
>>> > The initial set of committers is from a limited set of organizations,
>>> > but we expect to attract new contributors from diverse organizations
>>> > and will thus grow organically once approved for incubation. Our prior
>>> > experience with other open source projects will help various
>>> > contributors to actively participate in our project.
>>> >
>>> > === Reliance on Salaried Developers ===
>>> > Many developers are from Seoul National University. This is not
>>> applicable.
>>> >
>>> > === Relationships with Other Apache Products ===
>>> > Onyx positions itself among multiple Apache products. It runs on
>>> > Apache REEF for container management. It also utilizes many useful
>>> > development tools including Apache Maven, Apache Log4J, and multiple
>>> > Apache Commons components. Onyx supports the Apache Beam programming
>>> > model for user applications. We are currently working on supporting
>>> > the Apache Spark programming APIs as well.
>>> >
>>> > === An Excessive Fascination with the Apache Brand ===
>>> > We hope to make Onyx a powerful system for data processing, meeting
>>> > various needs for different deployment characteristics, under a more
>>> > variety of environments. We see the limitations of simply putting code
>>> > on GitHub, and we believe the Apache community will help the growth of
>>> > Onyx for the project to become a positively impactful and innovative
>>> > open source software. We believe Onyx is a great fit for the Apache
>>> > Software Foundation due to the collaboration it aims to achieve from
>>> > the big data processing community.
>>> >
>>> > == Documentation ==
>>> > The current documentation for Onyx is at https://snuspl.github.io/onyx/
>>> .
>>> >
>>> > == Initial Source ==
>>> > The Onyx codebase is currently hosted at https://github.com/snuspl/onyx
>>> .
>>> >
>>> > == External Dependencies ==
>>> > To the best of our knowledge, all Onyx dependencies are distributed
>>> > under Apache compatible licenses. Upon acceptance to the incubator, we
>>> > would begin a thorough analysis of all transitive dependencies to
>>> > verify this fact and further introduce license checking into the build
>>> > and release process.
>>> >
>>> > == Cryptography ==
>>> > Not applicable.
>>> >
>>> > == Required Resources ==
>>> > === Mailing Lists ===
>>> > We will operate two mailing lists as follows:
>>> >    * Onyx PMC discussions: [hidden email]
>>> >    * Onyx developers: [hidden email]
>>> >
>>> > === Git Repositories ===
>>> > Upon incubation: https://github.com/apache/incubator-onyx.
>>> > After the incubation, we would like to move the existing repo
>>> > https://github.com/snuspl/onyx to the Apache infrastructure
>>> >
>>> > === Issue Tracking ===
>>> > Onyx currently tracks its issues using the Github issue tracker:
>>> > https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
>>> > JIRA.
>>> >
>>> > == Initial Committers ==
>>> >   * Byung-Gon Chun
>>> >   * Jeongyoon Eo
>>> >   * Geon-Woo Kim
>>> >   * Joo Yeon Kim
>>> >   * Gyewon Lee
>>> >   * Jung-Gil Lee
>>> >   * Sanha Lee
>>> >   * Wooyeon Lee
>>> >   * Yunseong Lee
>>> >   * JangHo Seo
>>> >   * Won Wook Song
>>> >   * Taegeon Um
>>> >   * Youngseok Yang
>>> >
>>> > == Affiliations ==
>>> >   * SNU (Seoul National University)
>>> >     * Byung-Gon Chun
>>> >     * Jeongyoon Eo
>>> >     * Geon-Woo Kim
>>> >     * Gyewon Lee
>>> >     * Sanha Lee
>>> >     * Wooyeon Lee
>>> >     * Yunseong Lee
>>> >     * JangHo Seo
>>> >     * Won Wook Song
>>> >     * Taegeon Um
>>> >     * Youngseok Yang
>>> >
>>> >   * LG
>>> >     * Jung-Gil Lee
>>> >
>>> >   * Samsung
>>> >     * Joo Yeon Kim
>>> >
>>> >   * Viva Republica
>>> >     * Geon-Woo Kim
>>> >
>>> > == Sponsors ==
>>> > === Champions ===
>>> > Byung-Gon Chun
>>> >
>>> > === Mentors ===
>>> >   * Hyunsik Choi
>>> >   * Byung-Gon Chun
>>> >   * Markus Weimer
>>> >   * Reynold Xin
>>> >
>>> > === Sponsoring Entity ===
>>> > The Apache Incubator
>>> >
>>> >
>>> >
>>> > --
>>> > Byung-Gon Chun
>>> >
>>>
>>
>>
>>
>> --
>> Byung-Gon Chun
>>
>
>
>
> --
> Byung-Gon Chun

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Dave Fisher-5
Checking “Serf Software” which sounds the same.

(1) there is already Apache Serf
(2) Serf is a product from Hashicorp at https://www.serf.io/. This would definitely confuse as it is apparently comparable to ZooKeeper.

Regards,
Dave

Sent from my iPhone

> On Jan 27, 2018, at 3:12 AM, sebb <[hidden email]> wrote:
>
> A brief search for 'Surf Software' shows quite a few hits.
> I have not looked to see if they would be likely to be confused with
> this project or cause problems for others.
>
> But it as though there might be a problem:
> Surfer -  Golden Software
> surf @ sourceforge
> Surf Software company
>
>
>> On 27 January 2018 at 08:03, Byung-Gon Chun <[hidden email]> wrote:
>> Since we cannot use the name Onyx, we would like to change the project name
>> to Surf.
>> I hope that this name works.
>>
>> -Gon
>>
>> ---
>> Byung-Gon Chun
>>
>>
>>> On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun <[hidden email]> wrote:
>>>
>>>
>>>
>>>> On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci <[hidden email]> wrote:
>>>>
>>>> Great work -- I think this technology has a lot of promise, and I'd love
>>>> to
>>>> see its evolution inside the Foundation.
>>>>
>>>>
>>> Thanks, Davor!
>>>
>>>
>>>> Parts of it, like the Onyx Intermediate Representation [1], overlap with
>>>> the work-in-progress inside the Apache Beam project ("portability"). We'd
>>>> love to work together on this -- would you be open to such collaboration?
>>>> If so, it may not be necessary to start from scratch, and leverage the
>>>> work
>>>> already done.
>>>>
>>>>
>>> Sure. We're open to collaboration.
>>>
>>>
>>>> Regarding the name, Onyx would likely have to be renamed, due to a
>>>> conflict
>>>> with a related technology [2].
>>>>
>>>>
>>> Thanks for pointing it out. It's difficult to come up with a good short
>>> name. :)
>>> Do you have any suggestion?
>>>
>>> Thanks!
>>> -Gon
>>>
>>> ---
>>> Byung-Gon Chun
>>>
>>>
>>>
>>>> Davor
>>>>
>>>> [1] https://snuspl.github.io/onyx/docs/ir/
>>>> [2] http://www.onyxplatform.org/
>>>>
>>>>> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun <[hidden email]> wrote:
>>>>>
>>>>> Dear Apache Incubator Community,
>>>>>
>>>>> Please accept the following proposal for presentation and discussion:
>>>>> https://wiki.apache.org/incubator/OnyxProposal
>>>>>
>>>>> Onyx is a data processing system that aims to flexibly control the
>>>> runtime
>>>>> behaviors of a job to adapt to varying deployment characteristics (e.g.,
>>>>> harnessing transient resources in datacenters, cross-datacenter
>>>> deployment,
>>>>> changing runtime based on job characteristics, etc.). Onyx provides
>>>> ways to
>>>>> extend the system’s capabilities and incorporate the extensions to the
>>>>> flexible job execution.
>>>>> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an
>>>>> Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
>>>>> based on a deployment policy.
>>>>>
>>>>> I've attached the proposal below.
>>>>>
>>>>> Best regards,
>>>>> Byung-Gon Chun
>>>>>
>>>>> = OnyxProposal =
>>>>>
>>>>> == Abstract ==
>>>>> Onyx is a data processing system for flexible employment with
>>>>> different execution scenarios for various deployment characteristics
>>>>> on clusters.
>>>>>
>>>>> == Proposal ==
>>>>> Today, there is a wide variety of data processing systems with
>>>>> different designs for better performance and datacenter efficiency.
>>>>> They include processing data on specific resource environments and
>>>>> running jobs with specific attributes. Although each system
>>>>> successfully solves the problems it targets, most systems are designed
>>>>> in the way that runtime behaviors are built tightly inside the system
>>>>> core to hide the complexity of distributed computing. This makes it
>>>>> hard for a single system to support different deployment
>>>>> characteristics with different runtime behaviors without substantial
>>>>> effort.
>>>>>
>>>>> Onyx is a data processing system that aims to flexibly control the
>>>>> runtime behaviors of a job to adapt to varying deployment
>>>>> characteristics. Moreover, it provides a means of extending the
>>>>> system’s capabilities and incorporating the extensions to the flexible
>>>>> job execution.
>>>>>
>>>>> In order to be able to easily modify runtime behaviors to adapt to
>>>>> varying deployment characteristics, Onyx exposes runtime behaviors to
>>>>> be flexibly configured and modified at both compile-time and runtime
>>>>> through a set of high-level graph pass interfaces.
>>>>>
>>>>> We hope to contribute to the big data processing community by enabling
>>>>> more flexibility and extensibility in job executions. Furthermore, we
>>>>> can benefit more together as a community when we work together as a
>>>>> community to mature the system with more use cases and understanding
>>>>> of diverse deployment characteristics. The Apache Software Foundation
>>>>> is the perfect place to achieve these aspirations.
>>>>>
>>>>> == Background ==
>>>>> Many data processing systems have distinctive runtime behaviors
>>>>> optimized and configured for specific deployment characteristics like
>>>>> different resource environments and for handling special job
>>>>> attributes.
>>>>>
>>>>> For example, much research have been conducted to overcome the
>>>>> challenge of running data processing jobs on cheap, unreliable
>>>>> transient resources. Likewise, techniques for disaggregating different
>>>>> types of resources, like memory, CPU and GPU, are being actively
>>>>> developed to use datacenter resources more efficiently. Many
>>>>> researchers are also working to run data processing jobs in even more
>>>>> diverse environments, such as across distant datacenters. Similarly,
>>>>> for special job attributes, many works take different approaches, such
>>>>> as runtime optimization, to solve problems like data skew, and to
>>>>> optimize systems for data processing jobs with small-scale input data.
>>>>>
>>>>> Although each of the systems performs well with the jobs and in the
>>>>> environments they target, they perform poorly with unconsidered cases,
>>>>> and do not consider supporting multiple deployment characteristics on
>>>>> a single system in their designs.
>>>>>
>>>>> For an application writer to optimize an application to perform well
>>>>> on a certain system engraved with its underlying behaviors, it
>>>>> requires a deep understanding of the system itself, which is an
>>>>> overhead that often requires a lot of time and effort. Moreover, for a
>>>>> developer to modify such system behaviors, it requires modifications
>>>>> of the system core, which requires an even deeper understanding of the
>>>>> system itself.
>>>>>
>>>>> With this background, Onyx is designed to represent all of its jobs as
>>>>> an Intermediate Representation (IR) DAG. In the Onyx compiler, user
>>>>> applications from various programming models (ex. Apache Beam) are
>>>>> submitted, transformed to an IR DAG, and optimized/customized for the
>>>>> deployment characteristics. In the IR DAG optimization phase, the DAG
>>>>> is modified through a series of compiler “passes” which reshape or
>>>>> annotate the DAG with an expression of the underlying runtime
>>>>> behaviors. The IR DAG is then submitted as an execution plan for the
>>>>> Onyx runtime. The runtime includes the unmodified parts of data
>>>>> processing in the backbone which is transparently integrated with
>>>>> configurable components exposed for further extension.
>>>>>
>>>>> == Rationale ==
>>>>> Onyx’s vision lies in providing means for flexibly supporting a wide
>>>>> variety of job execution scenarios for users while facilitating system
>>>>> developers to extend the execution framework with various
>>>>> functionalities at the same time. The capabilities of the system can
>>>>> be extended as it grows to meet a more variety of execution scenarios.
>>>>> We require inputs from users and developers from diverse domains in
>>>>> order to make it a more thriving and useful project. The Apache
>>>>> Software Foundation provides the best tools and community to support
>>>>> this vision.
>>>>>
>>>>> == Initial Goals ==
>>>>> Initial goals will be to move the existing codebase to Apache and
>>>>> integrate with the Apache development process. We further plan to
>>>>> develop our system to meet the needs for more execution scenarios for
>>>>> a more variety of deployment characteristics.
>>>>>
>>>>> == Current Status ==
>>>>> Onyx codebase is currently hosted in a repository at github.com. The
>>>>> current version has been developed by system developers at Seoul
>>>>> National University, Viva Republica, Samsung, and LG.
>>>>>
>>>>> == Meritocracy ==
>>>>> We plan to strongly support meritocracy. We will discuss the
>>>>> requirements in an open forum, and those that continuously contribute
>>>>> to Onyx with the passion to strengthen the system will be invited as
>>>>> committers. Contributors that enrich Onyx by providing various use
>>>>> cases, various implementations of the configurable components
>>>>> including ideas for optimization techniques will be especially
>>>>> welcome. Committers with a deep understanding of the system’s
>>>>> technical aspects as a whole and its philosophy will definitely be
>>>>> voted as the PMC. We will monitor community participation so that
>>>>> privileges can be extended to those that contribute.
>>>>>
>>>>> == Community ==
>>>>> We hope to expand our contribution community by becoming an Apache
>>>>> incubator project. The contributions will come from both users and
>>>>> system developers interested in flexibility and extensibility of job
>>>>> executions that Onyx can support. We expect users to mainly contribute
>>>>> to diversify the use cases and deployment characteristics, and
>>>>> developers to  contribute to implement them.
>>>>>
>>>>> == Alignment ==
>>>>> Apache Spark is one of many popular data processing frameworks. The
>>>>> system is designed towards optimizing jobs using RDDs in memory and
>>>>> many other optimizations built tightly within the framework. In
>>>>> contrast to Spark, Onyx aims to provide more flexibility for job
>>>>> execution in an easy manner.
>>>>>
>>>>> Apache Tez enables developers to build complex task DAGs with control
>>>>> over the control plane of job execution. In Onyx, a high-level
>>>>> programming layer (ex. Apache Beam) is automatically converted to a
>>>>> basic IR DAG and can be converted to any IR DAG through a series of
>>>>> easy user writable passes, that can both reshape and modify the
>>>>> annotation (of execution properties) of the DAG. Moreover, Onyx leaves
>>>>> more parts of the job execution configurable, such as the scheduler
>>>>> and the data plane. As opposed to providing a set of properties for
>>>>> solid optimization, Onyx’s configurable parts can be easily extended
>>>>> and explored by implementing the pre-defined interfaces. For example,
>>>>> an arbitrary intermediate data store can be added.
>>>>>
>>>>> Onyx currently supports Apache Beam programs and we are working on
>>>>> supporting Apache Spark programs as well. Onyx also utilizes Apache
>>>>> REEF for container management, which allows Onyx to run in Apache YARN
>>>>> and Apache Mesos clusters. If necessary, we plan to contribute to and
>>>>> collaborate with these other Apache projects for the benefit of all.
>>>>> We plan to extend such integrations with more Apache softwares. Apache
>>>>> software foundation already hosts many major big-data systems, and we
>>>>> expect to help further growth of the big-data community by having Onyx
>>>>> within the Apache foundation.
>>>>>
>>>>> == Known Risks ==
>>>>> === Orphaned Products ===
>>>>> The risk of the Onyx project being orphaned is minimal. There is
>>>>> already plenty of work that arduously support different deployment
>>>>> characteristics, and we propose a general way to implement them with
>>>>> flexible and extensible configuration knobs. The domain of data
>>>>> processing is already of high interest, and this domain is expected to
>>>>> evolve continuously with various other purposes, such as resource
>>>>> disaggregation and using transient resources for better datacenter
>>>>> resource utilization.
>>>>>
>>>>> === Inexperience with Open Source ===
>>>>> The initial committers include PMC members and committers of other
>>>>> Apache projects. They have experience with open source projects,
>>>>> starting from their incubation to the top-level. They have been
>>>>> involved in the open source development process, and are familiar with
>>>>> releasing code under an open source license.
>>>>>
>>>>> === Homogeneous Developers ===
>>>>> The initial set of committers is from a limited set of organizations,
>>>>> but we expect to attract new contributors from diverse organizations
>>>>> and will thus grow organically once approved for incubation. Our prior
>>>>> experience with other open source projects will help various
>>>>> contributors to actively participate in our project.
>>>>>
>>>>> === Reliance on Salaried Developers ===
>>>>> Many developers are from Seoul National University. This is not
>>>> applicable.
>>>>>
>>>>> === Relationships with Other Apache Products ===
>>>>> Onyx positions itself among multiple Apache products. It runs on
>>>>> Apache REEF for container management. It also utilizes many useful
>>>>> development tools including Apache Maven, Apache Log4J, and multiple
>>>>> Apache Commons components. Onyx supports the Apache Beam programming
>>>>> model for user applications. We are currently working on supporting
>>>>> the Apache Spark programming APIs as well.
>>>>>
>>>>> === An Excessive Fascination with the Apache Brand ===
>>>>> We hope to make Onyx a powerful system for data processing, meeting
>>>>> various needs for different deployment characteristics, under a more
>>>>> variety of environments. We see the limitations of simply putting code
>>>>> on GitHub, and we believe the Apache community will help the growth of
>>>>> Onyx for the project to become a positively impactful and innovative
>>>>> open source software. We believe Onyx is a great fit for the Apache
>>>>> Software Foundation due to the collaboration it aims to achieve from
>>>>> the big data processing community.
>>>>>
>>>>> == Documentation ==
>>>>> The current documentation for Onyx is at https://snuspl.github.io/onyx/
>>>> .
>>>>>
>>>>> == Initial Source ==
>>>>> The Onyx codebase is currently hosted at https://github.com/snuspl/onyx
>>>> .
>>>>>
>>>>> == External Dependencies ==
>>>>> To the best of our knowledge, all Onyx dependencies are distributed
>>>>> under Apache compatible licenses. Upon acceptance to the incubator, we
>>>>> would begin a thorough analysis of all transitive dependencies to
>>>>> verify this fact and further introduce license checking into the build
>>>>> and release process.
>>>>>
>>>>> == Cryptography ==
>>>>> Not applicable.
>>>>>
>>>>> == Required Resources ==
>>>>> === Mailing Lists ===
>>>>> We will operate two mailing lists as follows:
>>>>>   * Onyx PMC discussions: [hidden email]
>>>>>   * Onyx developers: [hidden email]
>>>>>
>>>>> === Git Repositories ===
>>>>> Upon incubation: https://github.com/apache/incubator-onyx.
>>>>> After the incubation, we would like to move the existing repo
>>>>> https://github.com/snuspl/onyx to the Apache infrastructure
>>>>>
>>>>> === Issue Tracking ===
>>>>> Onyx currently tracks its issues using the Github issue tracker:
>>>>> https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
>>>>> JIRA.
>>>>>
>>>>> == Initial Committers ==
>>>>>  * Byung-Gon Chun
>>>>>  * Jeongyoon Eo
>>>>>  * Geon-Woo Kim
>>>>>  * Joo Yeon Kim
>>>>>  * Gyewon Lee
>>>>>  * Jung-Gil Lee
>>>>>  * Sanha Lee
>>>>>  * Wooyeon Lee
>>>>>  * Yunseong Lee
>>>>>  * JangHo Seo
>>>>>  * Won Wook Song
>>>>>  * Taegeon Um
>>>>>  * Youngseok Yang
>>>>>
>>>>> == Affiliations ==
>>>>>  * SNU (Seoul National University)
>>>>>    * Byung-Gon Chun
>>>>>    * Jeongyoon Eo
>>>>>    * Geon-Woo Kim
>>>>>    * Gyewon Lee
>>>>>    * Sanha Lee
>>>>>    * Wooyeon Lee
>>>>>    * Yunseong Lee
>>>>>    * JangHo Seo
>>>>>    * Won Wook Song
>>>>>    * Taegeon Um
>>>>>    * Youngseok Yang
>>>>>
>>>>>  * LG
>>>>>    * Jung-Gil Lee
>>>>>
>>>>>  * Samsung
>>>>>    * Joo Yeon Kim
>>>>>
>>>>>  * Viva Republica
>>>>>    * Geon-Woo Kim
>>>>>
>>>>> == Sponsors ==
>>>>> === Champions ===
>>>>> Byung-Gon Chun
>>>>>
>>>>> === Mentors ===
>>>>>  * Hyunsik Choi
>>>>>  * Byung-Gon Chun
>>>>>  * Markus Weimer
>>>>>  * Reynold Xin
>>>>>
>>>>> === Sponsoring Entity ===
>>>>> The Apache Incubator
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Byung-Gon Chun
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Byung-Gon Chun
>>>
>>
>>
>>
>> --
>> Byung-Gon Chun
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Leif Hedstrom-3
Did we rule out Onyx for sure? Just because some other project might use it on say github doesn’t necessarily exclude us from having an Apache Onyx?

FWIW, I agree that surf is too similar in pronunciation to Apache serf. :)

Cheers,

— Leif

> On Jan 27, 2018, at 07:31, Dave Fisher <[hidden email]> wrote:
>
> Checking “Serf Software” which sounds the same.
>
> (1) there is already Apache Serf
> (2) Serf is a product from Hashicorp at https://www.serf.io/. This would definitely confuse as it is apparently comparable to ZooKeeper.
>
> Regards,
> Dave
>
> Sent from my iPhone
>
>> On Jan 27, 2018, at 3:12 AM, sebb <[hidden email]> wrote:
>>
>> A brief search for 'Surf Software' shows quite a few hits.
>> I have not looked to see if they would be likely to be confused with
>> this project or cause problems for others.
>>
>> But it as though there might be a problem:
>> Surfer -  Golden Software
>> surf @ sourceforge
>> Surf Software company
>>
>>
>>> On 27 January 2018 at 08:03, Byung-Gon Chun <[hidden email]> wrote:
>>> Since we cannot use the name Onyx, we would like to change the project name
>>> to Surf.
>>> I hope that this name works.
>>>
>>> -Gon
>>>
>>> ---
>>> Byung-Gon Chun
>>>
>>>
>>>> On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun <[hidden email]> wrote:
>>>>
>>>>
>>>>
>>>>> On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci <[hidden email]> wrote:
>>>>>
>>>>> Great work -- I think this technology has a lot of promise, and I'd love
>>>>> to
>>>>> see its evolution inside the Foundation.
>>>>>
>>>>>
>>>> Thanks, Davor!
>>>>
>>>>
>>>>> Parts of it, like the Onyx Intermediate Representation [1], overlap with
>>>>> the work-in-progress inside the Apache Beam project ("portability"). We'd
>>>>> love to work together on this -- would you be open to such collaboration?
>>>>> If so, it may not be necessary to start from scratch, and leverage the
>>>>> work
>>>>> already done.
>>>>>
>>>>>
>>>> Sure. We're open to collaboration.
>>>>
>>>>
>>>>> Regarding the name, Onyx would likely have to be renamed, due to a
>>>>> conflict
>>>>> with a related technology [2].
>>>>>
>>>>>
>>>> Thanks for pointing it out. It's difficult to come up with a good short
>>>> name. :)
>>>> Do you have any suggestion?
>>>>
>>>> Thanks!
>>>> -Gon
>>>>
>>>> ---
>>>> Byung-Gon Chun
>>>>
>>>>
>>>>
>>>>> Davor
>>>>>
>>>>> [1] https://snuspl.github.io/onyx/docs/ir/
>>>>> [2] http://www.onyxplatform.org/
>>>>>
>>>>>> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun <[hidden email]> wrote:
>>>>>>
>>>>>> Dear Apache Incubator Community,
>>>>>>
>>>>>> Please accept the following proposal for presentation and discussion:
>>>>>> https://wiki.apache.org/incubator/OnyxProposal
>>>>>>
>>>>>> Onyx is a data processing system that aims to flexibly control the
>>>>> runtime
>>>>>> behaviors of a job to adapt to varying deployment characteristics (e.g.,
>>>>>> harnessing transient resources in datacenters, cross-datacenter
>>>>> deployment,
>>>>>> changing runtime based on job characteristics, etc.). Onyx provides
>>>>> ways to
>>>>>> extend the system’s capabilities and incorporate the extensions to the
>>>>>> flexible job execution.
>>>>>> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an
>>>>>> Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
>>>>>> based on a deployment policy.
>>>>>>
>>>>>> I've attached the proposal below.
>>>>>>
>>>>>> Best regards,
>>>>>> Byung-Gon Chun
>>>>>>
>>>>>> = OnyxProposal =
>>>>>>
>>>>>> == Abstract ==
>>>>>> Onyx is a data processing system for flexible employment with
>>>>>> different execution scenarios for various deployment characteristics
>>>>>> on clusters.
>>>>>>
>>>>>> == Proposal ==
>>>>>> Today, there is a wide variety of data processing systems with
>>>>>> different designs for better performance and datacenter efficiency.
>>>>>> They include processing data on specific resource environments and
>>>>>> running jobs with specific attributes. Although each system
>>>>>> successfully solves the problems it targets, most systems are designed
>>>>>> in the way that runtime behaviors are built tightly inside the system
>>>>>> core to hide the complexity of distributed computing. This makes it
>>>>>> hard for a single system to support different deployment
>>>>>> characteristics with different runtime behaviors without substantial
>>>>>> effort.
>>>>>>
>>>>>> Onyx is a data processing system that aims to flexibly control the
>>>>>> runtime behaviors of a job to adapt to varying deployment
>>>>>> characteristics. Moreover, it provides a means of extending the
>>>>>> system’s capabilities and incorporating the extensions to the flexible
>>>>>> job execution.
>>>>>>
>>>>>> In order to be able to easily modify runtime behaviors to adapt to
>>>>>> varying deployment characteristics, Onyx exposes runtime behaviors to
>>>>>> be flexibly configured and modified at both compile-time and runtime
>>>>>> through a set of high-level graph pass interfaces.
>>>>>>
>>>>>> We hope to contribute to the big data processing community by enabling
>>>>>> more flexibility and extensibility in job executions. Furthermore, we
>>>>>> can benefit more together as a community when we work together as a
>>>>>> community to mature the system with more use cases and understanding
>>>>>> of diverse deployment characteristics. The Apache Software Foundation
>>>>>> is the perfect place to achieve these aspirations.
>>>>>>
>>>>>> == Background ==
>>>>>> Many data processing systems have distinctive runtime behaviors
>>>>>> optimized and configured for specific deployment characteristics like
>>>>>> different resource environments and for handling special job
>>>>>> attributes.
>>>>>>
>>>>>> For example, much research have been conducted to overcome the
>>>>>> challenge of running data processing jobs on cheap, unreliable
>>>>>> transient resources. Likewise, techniques for disaggregating different
>>>>>> types of resources, like memory, CPU and GPU, are being actively
>>>>>> developed to use datacenter resources more efficiently. Many
>>>>>> researchers are also working to run data processing jobs in even more
>>>>>> diverse environments, such as across distant datacenters. Similarly,
>>>>>> for special job attributes, many works take different approaches, such
>>>>>> as runtime optimization, to solve problems like data skew, and to
>>>>>> optimize systems for data processing jobs with small-scale input data.
>>>>>>
>>>>>> Although each of the systems performs well with the jobs and in the
>>>>>> environments they target, they perform poorly with unconsidered cases,
>>>>>> and do not consider supporting multiple deployment characteristics on
>>>>>> a single system in their designs.
>>>>>>
>>>>>> For an application writer to optimize an application to perform well
>>>>>> on a certain system engraved with its underlying behaviors, it
>>>>>> requires a deep understanding of the system itself, which is an
>>>>>> overhead that often requires a lot of time and effort. Moreover, for a
>>>>>> developer to modify such system behaviors, it requires modifications
>>>>>> of the system core, which requires an even deeper understanding of the
>>>>>> system itself.
>>>>>>
>>>>>> With this background, Onyx is designed to represent all of its jobs as
>>>>>> an Intermediate Representation (IR) DAG. In the Onyx compiler, user
>>>>>> applications from various programming models (ex. Apache Beam) are
>>>>>> submitted, transformed to an IR DAG, and optimized/customized for the
>>>>>> deployment characteristics. In the IR DAG optimization phase, the DAG
>>>>>> is modified through a series of compiler “passes” which reshape or
>>>>>> annotate the DAG with an expression of the underlying runtime
>>>>>> behaviors. The IR DAG is then submitted as an execution plan for the
>>>>>> Onyx runtime. The runtime includes the unmodified parts of data
>>>>>> processing in the backbone which is transparently integrated with
>>>>>> configurable components exposed for further extension.
>>>>>>
>>>>>> == Rationale ==
>>>>>> Onyx’s vision lies in providing means for flexibly supporting a wide
>>>>>> variety of job execution scenarios for users while facilitating system
>>>>>> developers to extend the execution framework with various
>>>>>> functionalities at the same time. The capabilities of the system can
>>>>>> be extended as it grows to meet a more variety of execution scenarios.
>>>>>> We require inputs from users and developers from diverse domains in
>>>>>> order to make it a more thriving and useful project. The Apache
>>>>>> Software Foundation provides the best tools and community to support
>>>>>> this vision.
>>>>>>
>>>>>> == Initial Goals ==
>>>>>> Initial goals will be to move the existing codebase to Apache and
>>>>>> integrate with the Apache development process. We further plan to
>>>>>> develop our system to meet the needs for more execution scenarios for
>>>>>> a more variety of deployment characteristics.
>>>>>>
>>>>>> == Current Status ==
>>>>>> Onyx codebase is currently hosted in a repository at github.com. The
>>>>>> current version has been developed by system developers at Seoul
>>>>>> National University, Viva Republica, Samsung, and LG.
>>>>>>
>>>>>> == Meritocracy ==
>>>>>> We plan to strongly support meritocracy. We will discuss the
>>>>>> requirements in an open forum, and those that continuously contribute
>>>>>> to Onyx with the passion to strengthen the system will be invited as
>>>>>> committers. Contributors that enrich Onyx by providing various use
>>>>>> cases, various implementations of the configurable components
>>>>>> including ideas for optimization techniques will be especially
>>>>>> welcome. Committers with a deep understanding of the system’s
>>>>>> technical aspects as a whole and its philosophy will definitely be
>>>>>> voted as the PMC. We will monitor community participation so that
>>>>>> privileges can be extended to those that contribute.
>>>>>>
>>>>>> == Community ==
>>>>>> We hope to expand our contribution community by becoming an Apache
>>>>>> incubator project. The contributions will come from both users and
>>>>>> system developers interested in flexibility and extensibility of job
>>>>>> executions that Onyx can support. We expect users to mainly contribute
>>>>>> to diversify the use cases and deployment characteristics, and
>>>>>> developers to  contribute to implement them.
>>>>>>
>>>>>> == Alignment ==
>>>>>> Apache Spark is one of many popular data processing frameworks. The
>>>>>> system is designed towards optimizing jobs using RDDs in memory and
>>>>>> many other optimizations built tightly within the framework. In
>>>>>> contrast to Spark, Onyx aims to provide more flexibility for job
>>>>>> execution in an easy manner.
>>>>>>
>>>>>> Apache Tez enables developers to build complex task DAGs with control
>>>>>> over the control plane of job execution. In Onyx, a high-level
>>>>>> programming layer (ex. Apache Beam) is automatically converted to a
>>>>>> basic IR DAG and can be converted to any IR DAG through a series of
>>>>>> easy user writable passes, that can both reshape and modify the
>>>>>> annotation (of execution properties) of the DAG. Moreover, Onyx leaves
>>>>>> more parts of the job execution configurable, such as the scheduler
>>>>>> and the data plane. As opposed to providing a set of properties for
>>>>>> solid optimization, Onyx’s configurable parts can be easily extended
>>>>>> and explored by implementing the pre-defined interfaces. For example,
>>>>>> an arbitrary intermediate data store can be added.
>>>>>>
>>>>>> Onyx currently supports Apache Beam programs and we are working on
>>>>>> supporting Apache Spark programs as well. Onyx also utilizes Apache
>>>>>> REEF for container management, which allows Onyx to run in Apache YARN
>>>>>> and Apache Mesos clusters. If necessary, we plan to contribute to and
>>>>>> collaborate with these other Apache projects for the benefit of all.
>>>>>> We plan to extend such integrations with more Apache softwares. Apache
>>>>>> software foundation already hosts many major big-data systems, and we
>>>>>> expect to help further growth of the big-data community by having Onyx
>>>>>> within the Apache foundation.
>>>>>>
>>>>>> == Known Risks ==
>>>>>> === Orphaned Products ===
>>>>>> The risk of the Onyx project being orphaned is minimal. There is
>>>>>> already plenty of work that arduously support different deployment
>>>>>> characteristics, and we propose a general way to implement them with
>>>>>> flexible and extensible configuration knobs. The domain of data
>>>>>> processing is already of high interest, and this domain is expected to
>>>>>> evolve continuously with various other purposes, such as resource
>>>>>> disaggregation and using transient resources for better datacenter
>>>>>> resource utilization.
>>>>>>
>>>>>> === Inexperience with Open Source ===
>>>>>> The initial committers include PMC members and committers of other
>>>>>> Apache projects. They have experience with open source projects,
>>>>>> starting from their incubation to the top-level. They have been
>>>>>> involved in the open source development process, and are familiar with
>>>>>> releasing code under an open source license.
>>>>>>
>>>>>> === Homogeneous Developers ===
>>>>>> The initial set of committers is from a limited set of organizations,
>>>>>> but we expect to attract new contributors from diverse organizations
>>>>>> and will thus grow organically once approved for incubation. Our prior
>>>>>> experience with other open source projects will help various
>>>>>> contributors to actively participate in our project.
>>>>>>
>>>>>> === Reliance on Salaried Developers ===
>>>>>> Many developers are from Seoul National University. This is not
>>>>> applicable.
>>>>>>
>>>>>> === Relationships with Other Apache Products ===
>>>>>> Onyx positions itself among multiple Apache products. It runs on
>>>>>> Apache REEF for container management. It also utilizes many useful
>>>>>> development tools including Apache Maven, Apache Log4J, and multiple
>>>>>> Apache Commons components. Onyx supports the Apache Beam programming
>>>>>> model for user applications. We are currently working on supporting
>>>>>> the Apache Spark programming APIs as well.
>>>>>>
>>>>>> === An Excessive Fascination with the Apache Brand ===
>>>>>> We hope to make Onyx a powerful system for data processing, meeting
>>>>>> various needs for different deployment characteristics, under a more
>>>>>> variety of environments. We see the limitations of simply putting code
>>>>>> on GitHub, and we believe the Apache community will help the growth of
>>>>>> Onyx for the project to become a positively impactful and innovative
>>>>>> open source software. We believe Onyx is a great fit for the Apache
>>>>>> Software Foundation due to the collaboration it aims to achieve from
>>>>>> the big data processing community.
>>>>>>
>>>>>> == Documentation ==
>>>>>> The current documentation for Onyx is at https://snuspl.github.io/onyx/
>>>>> .
>>>>>>
>>>>>> == Initial Source ==
>>>>>> The Onyx codebase is currently hosted at https://github.com/snuspl/onyx
>>>>> .
>>>>>>
>>>>>> == External Dependencies ==
>>>>>> To the best of our knowledge, all Onyx dependencies are distributed
>>>>>> under Apache compatible licenses. Upon acceptance to the incubator, we
>>>>>> would begin a thorough analysis of all transitive dependencies to
>>>>>> verify this fact and further introduce license checking into the build
>>>>>> and release process.
>>>>>>
>>>>>> == Cryptography ==
>>>>>> Not applicable.
>>>>>>
>>>>>> == Required Resources ==
>>>>>> === Mailing Lists ===
>>>>>> We will operate two mailing lists as follows:
>>>>>>  * Onyx PMC discussions: [hidden email]
>>>>>>  * Onyx developers: [hidden email]
>>>>>>
>>>>>> === Git Repositories ===
>>>>>> Upon incubation: https://github.com/apache/incubator-onyx.
>>>>>> After the incubation, we would like to move the existing repo
>>>>>> https://github.com/snuspl/onyx to the Apache infrastructure
>>>>>>
>>>>>> === Issue Tracking ===
>>>>>> Onyx currently tracks its issues using the Github issue tracker:
>>>>>> https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
>>>>>> JIRA.
>>>>>>
>>>>>> == Initial Committers ==
>>>>>> * Byung-Gon Chun
>>>>>> * Jeongyoon Eo
>>>>>> * Geon-Woo Kim
>>>>>> * Joo Yeon Kim
>>>>>> * Gyewon Lee
>>>>>> * Jung-Gil Lee
>>>>>> * Sanha Lee
>>>>>> * Wooyeon Lee
>>>>>> * Yunseong Lee
>>>>>> * JangHo Seo
>>>>>> * Won Wook Song
>>>>>> * Taegeon Um
>>>>>> * Youngseok Yang
>>>>>>
>>>>>> == Affiliations ==
>>>>>> * SNU (Seoul National University)
>>>>>>   * Byung-Gon Chun
>>>>>>   * Jeongyoon Eo
>>>>>>   * Geon-Woo Kim
>>>>>>   * Gyewon Lee
>>>>>>   * Sanha Lee
>>>>>>   * Wooyeon Lee
>>>>>>   * Yunseong Lee
>>>>>>   * JangHo Seo
>>>>>>   * Won Wook Song
>>>>>>   * Taegeon Um
>>>>>>   * Youngseok Yang
>>>>>>
>>>>>> * LG
>>>>>>   * Jung-Gil Lee
>>>>>>
>>>>>> * Samsung
>>>>>>   * Joo Yeon Kim
>>>>>>
>>>>>> * Viva Republica
>>>>>>   * Geon-Woo Kim
>>>>>>
>>>>>> == Sponsors ==
>>>>>> === Champions ===
>>>>>> Byung-Gon Chun
>>>>>>
>>>>>> === Mentors ===
>>>>>> * Hyunsik Choi
>>>>>> * Byung-Gon Chun
>>>>>> * Markus Weimer
>>>>>> * Reynold Xin
>>>>>>
>>>>>> === Sponsoring Entity ===
>>>>>> The Apache Incubator
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Byung-Gon Chun
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Byung-Gon Chun
>>>>
>>>
>>>
>>>
>>> --
>>> Byung-Gon Chun
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Byung-Gon Chun
Thank you for all the information! It looks like Surf doesn't work.

If possible, we'd like to keep Onyx.
Another name we came up with is Coral.

Thanks!
-Gon


On Sun, Jan 28, 2018 at 4:21 AM, Leif Hedstrom <[hidden email]> wrote:

> Did we rule out Onyx for sure? Just because some other project might use
> it on say github doesn’t necessarily exclude us from having an Apache Onyx?
>
> FWIW, I agree that surf is too similar in pronunciation to Apache serf. :)
>
> Cheers,
>
> — Leif
>
> > On Jan 27, 2018, at 07:31, Dave Fisher <[hidden email]> wrote:
> >
> > Checking “Serf Software” which sounds the same.
> >
> > (1) there is already Apache Serf
> > (2) Serf is a product from Hashicorp at https://www.serf.io/. This
> would definitely confuse as it is apparently comparable to ZooKeeper.
> >
> > Regards,
> > Dave
> >
> > Sent from my iPhone
> >
> >> On Jan 27, 2018, at 3:12 AM, sebb <[hidden email]> wrote:
> >>
> >> A brief search for 'Surf Software' shows quite a few hits.
> >> I have not looked to see if they would be likely to be confused with
> >> this project or cause problems for others.
> >>
> >> But it as though there might be a problem:
> >> Surfer -  Golden Software
> >> surf @ sourceforge
> >> Surf Software company
> >>
> >>
> >>> On 27 January 2018 at 08:03, Byung-Gon Chun <[hidden email]> wrote:
> >>> Since we cannot use the name Onyx, we would like to change the project
> name
> >>> to Surf.
> >>> I hope that this name works.
> >>>
> >>> -Gon
> >>>
> >>> ---
> >>> Byung-Gon Chun
> >>>
> >>>
> >>>> On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun <[hidden email]>
> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci <[hidden email]>
> wrote:
> >>>>>
> >>>>> Great work -- I think this technology has a lot of promise, and I'd
> love
> >>>>> to
> >>>>> see its evolution inside the Foundation.
> >>>>>
> >>>>>
> >>>> Thanks, Davor!
> >>>>
> >>>>
> >>>>> Parts of it, like the Onyx Intermediate Representation [1], overlap
> with
> >>>>> the work-in-progress inside the Apache Beam project ("portability").
> We'd
> >>>>> love to work together on this -- would you be open to such
> collaboration?
> >>>>> If so, it may not be necessary to start from scratch, and leverage
> the
> >>>>> work
> >>>>> already done.
> >>>>>
> >>>>>
> >>>> Sure. We're open to collaboration.
> >>>>
> >>>>
> >>>>> Regarding the name, Onyx would likely have to be renamed, due to a
> >>>>> conflict
> >>>>> with a related technology [2].
> >>>>>
> >>>>>
> >>>> Thanks for pointing it out. It's difficult to come up with a good
> short
> >>>> name. :)
> >>>> Do you have any suggestion?
> >>>>
> >>>> Thanks!
> >>>> -Gon
> >>>>
> >>>> ---
> >>>> Byung-Gon Chun
> >>>>
> >>>>
> >>>>
> >>>>> Davor
> >>>>>
> >>>>> [1] https://snuspl.github.io/onyx/docs/ir/
> >>>>> [2] http://www.onyxplatform.org/
> >>>>>
> >>>>>> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun <[hidden email]>
> wrote:
> >>>>>>
> >>>>>> Dear Apache Incubator Community,
> >>>>>>
> >>>>>> Please accept the following proposal for presentation and
> discussion:
> >>>>>> https://wiki.apache.org/incubator/OnyxProposal
> >>>>>>
> >>>>>> Onyx is a data processing system that aims to flexibly control the
> >>>>> runtime
> >>>>>> behaviors of a job to adapt to varying deployment characteristics
> (e.g.,
> >>>>>> harnessing transient resources in datacenters, cross-datacenter
> >>>>> deployment,
> >>>>>> changing runtime based on job characteristics, etc.). Onyx provides
> >>>>> ways to
> >>>>>> extend the system’s capabilities and incorporate the extensions to
> the
> >>>>>> flexible job execution.
> >>>>>> Onyx translates a user program (e.g., Apache Beam, Apache Spark)
> into an
> >>>>>> Intermediate Representation (IR) DAG, which Onyx optimizes and
> deploys
> >>>>>> based on a deployment policy.
> >>>>>>
> >>>>>> I've attached the proposal below.
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Byung-Gon Chun
> >>>>>>
> >>>>>> = OnyxProposal =
> >>>>>>
> >>>>>> == Abstract ==
> >>>>>> Onyx is a data processing system for flexible employment with
> >>>>>> different execution scenarios for various deployment characteristics
> >>>>>> on clusters.
> >>>>>>
> >>>>>> == Proposal ==
> >>>>>> Today, there is a wide variety of data processing systems with
> >>>>>> different designs for better performance and datacenter efficiency.
> >>>>>> They include processing data on specific resource environments and
> >>>>>> running jobs with specific attributes. Although each system
> >>>>>> successfully solves the problems it targets, most systems are
> designed
> >>>>>> in the way that runtime behaviors are built tightly inside the
> system
> >>>>>> core to hide the complexity of distributed computing. This makes it
> >>>>>> hard for a single system to support different deployment
> >>>>>> characteristics with different runtime behaviors without substantial
> >>>>>> effort.
> >>>>>>
> >>>>>> Onyx is a data processing system that aims to flexibly control the
> >>>>>> runtime behaviors of a job to adapt to varying deployment
> >>>>>> characteristics. Moreover, it provides a means of extending the
> >>>>>> system’s capabilities and incorporating the extensions to the
> flexible
> >>>>>> job execution.
> >>>>>>
> >>>>>> In order to be able to easily modify runtime behaviors to adapt to
> >>>>>> varying deployment characteristics, Onyx exposes runtime behaviors
> to
> >>>>>> be flexibly configured and modified at both compile-time and runtime
> >>>>>> through a set of high-level graph pass interfaces.
> >>>>>>
> >>>>>> We hope to contribute to the big data processing community by
> enabling
> >>>>>> more flexibility and extensibility in job executions. Furthermore,
> we
> >>>>>> can benefit more together as a community when we work together as a
> >>>>>> community to mature the system with more use cases and understanding
> >>>>>> of diverse deployment characteristics. The Apache Software
> Foundation
> >>>>>> is the perfect place to achieve these aspirations.
> >>>>>>
> >>>>>> == Background ==
> >>>>>> Many data processing systems have distinctive runtime behaviors
> >>>>>> optimized and configured for specific deployment characteristics
> like
> >>>>>> different resource environments and for handling special job
> >>>>>> attributes.
> >>>>>>
> >>>>>> For example, much research have been conducted to overcome the
> >>>>>> challenge of running data processing jobs on cheap, unreliable
> >>>>>> transient resources. Likewise, techniques for disaggregating
> different
> >>>>>> types of resources, like memory, CPU and GPU, are being actively
> >>>>>> developed to use datacenter resources more efficiently. Many
> >>>>>> researchers are also working to run data processing jobs in even
> more
> >>>>>> diverse environments, such as across distant datacenters. Similarly,
> >>>>>> for special job attributes, many works take different approaches,
> such
> >>>>>> as runtime optimization, to solve problems like data skew, and to
> >>>>>> optimize systems for data processing jobs with small-scale input
> data.
> >>>>>>
> >>>>>> Although each of the systems performs well with the jobs and in the
> >>>>>> environments they target, they perform poorly with unconsidered
> cases,
> >>>>>> and do not consider supporting multiple deployment characteristics
> on
> >>>>>> a single system in their designs.
> >>>>>>
> >>>>>> For an application writer to optimize an application to perform well
> >>>>>> on a certain system engraved with its underlying behaviors, it
> >>>>>> requires a deep understanding of the system itself, which is an
> >>>>>> overhead that often requires a lot of time and effort. Moreover,
> for a
> >>>>>> developer to modify such system behaviors, it requires modifications
> >>>>>> of the system core, which requires an even deeper understanding of
> the
> >>>>>> system itself.
> >>>>>>
> >>>>>> With this background, Onyx is designed to represent all of its jobs
> as
> >>>>>> an Intermediate Representation (IR) DAG. In the Onyx compiler, user
> >>>>>> applications from various programming models (ex. Apache Beam) are
> >>>>>> submitted, transformed to an IR DAG, and optimized/customized for
> the
> >>>>>> deployment characteristics. In the IR DAG optimization phase, the
> DAG
> >>>>>> is modified through a series of compiler “passes” which reshape or
> >>>>>> annotate the DAG with an expression of the underlying runtime
> >>>>>> behaviors. The IR DAG is then submitted as an execution plan for the
> >>>>>> Onyx runtime. The runtime includes the unmodified parts of data
> >>>>>> processing in the backbone which is transparently integrated with
> >>>>>> configurable components exposed for further extension.
> >>>>>>
> >>>>>> == Rationale ==
> >>>>>> Onyx’s vision lies in providing means for flexibly supporting a wide
> >>>>>> variety of job execution scenarios for users while facilitating
> system
> >>>>>> developers to extend the execution framework with various
> >>>>>> functionalities at the same time. The capabilities of the system can
> >>>>>> be extended as it grows to meet a more variety of execution
> scenarios.
> >>>>>> We require inputs from users and developers from diverse domains in
> >>>>>> order to make it a more thriving and useful project. The Apache
> >>>>>> Software Foundation provides the best tools and community to support
> >>>>>> this vision.
> >>>>>>
> >>>>>> == Initial Goals ==
> >>>>>> Initial goals will be to move the existing codebase to Apache and
> >>>>>> integrate with the Apache development process. We further plan to
> >>>>>> develop our system to meet the needs for more execution scenarios
> for
> >>>>>> a more variety of deployment characteristics.
> >>>>>>
> >>>>>> == Current Status ==
> >>>>>> Onyx codebase is currently hosted in a repository at github.com.
> The
> >>>>>> current version has been developed by system developers at Seoul
> >>>>>> National University, Viva Republica, Samsung, and LG.
> >>>>>>
> >>>>>> == Meritocracy ==
> >>>>>> We plan to strongly support meritocracy. We will discuss the
> >>>>>> requirements in an open forum, and those that continuously
> contribute
> >>>>>> to Onyx with the passion to strengthen the system will be invited as
> >>>>>> committers. Contributors that enrich Onyx by providing various use
> >>>>>> cases, various implementations of the configurable components
> >>>>>> including ideas for optimization techniques will be especially
> >>>>>> welcome. Committers with a deep understanding of the system’s
> >>>>>> technical aspects as a whole and its philosophy will definitely be
> >>>>>> voted as the PMC. We will monitor community participation so that
> >>>>>> privileges can be extended to those that contribute.
> >>>>>>
> >>>>>> == Community ==
> >>>>>> We hope to expand our contribution community by becoming an Apache
> >>>>>> incubator project. The contributions will come from both users and
> >>>>>> system developers interested in flexibility and extensibility of job
> >>>>>> executions that Onyx can support. We expect users to mainly
> contribute
> >>>>>> to diversify the use cases and deployment characteristics, and
> >>>>>> developers to  contribute to implement them.
> >>>>>>
> >>>>>> == Alignment ==
> >>>>>> Apache Spark is one of many popular data processing frameworks. The
> >>>>>> system is designed towards optimizing jobs using RDDs in memory and
> >>>>>> many other optimizations built tightly within the framework. In
> >>>>>> contrast to Spark, Onyx aims to provide more flexibility for job
> >>>>>> execution in an easy manner.
> >>>>>>
> >>>>>> Apache Tez enables developers to build complex task DAGs with
> control
> >>>>>> over the control plane of job execution. In Onyx, a high-level
> >>>>>> programming layer (ex. Apache Beam) is automatically converted to a
> >>>>>> basic IR DAG and can be converted to any IR DAG through a series of
> >>>>>> easy user writable passes, that can both reshape and modify the
> >>>>>> annotation (of execution properties) of the DAG. Moreover, Onyx
> leaves
> >>>>>> more parts of the job execution configurable, such as the scheduler
> >>>>>> and the data plane. As opposed to providing a set of properties for
> >>>>>> solid optimization, Onyx’s configurable parts can be easily extended
> >>>>>> and explored by implementing the pre-defined interfaces. For
> example,
> >>>>>> an arbitrary intermediate data store can be added.
> >>>>>>
> >>>>>> Onyx currently supports Apache Beam programs and we are working on
> >>>>>> supporting Apache Spark programs as well. Onyx also utilizes Apache
> >>>>>> REEF for container management, which allows Onyx to run in Apache
> YARN
> >>>>>> and Apache Mesos clusters. If necessary, we plan to contribute to
> and
> >>>>>> collaborate with these other Apache projects for the benefit of all.
> >>>>>> We plan to extend such integrations with more Apache softwares.
> Apache
> >>>>>> software foundation already hosts many major big-data systems, and
> we
> >>>>>> expect to help further growth of the big-data community by having
> Onyx
> >>>>>> within the Apache foundation.
> >>>>>>
> >>>>>> == Known Risks ==
> >>>>>> === Orphaned Products ===
> >>>>>> The risk of the Onyx project being orphaned is minimal. There is
> >>>>>> already plenty of work that arduously support different deployment
> >>>>>> characteristics, and we propose a general way to implement them with
> >>>>>> flexible and extensible configuration knobs. The domain of data
> >>>>>> processing is already of high interest, and this domain is expected
> to
> >>>>>> evolve continuously with various other purposes, such as resource
> >>>>>> disaggregation and using transient resources for better datacenter
> >>>>>> resource utilization.
> >>>>>>
> >>>>>> === Inexperience with Open Source ===
> >>>>>> The initial committers include PMC members and committers of other
> >>>>>> Apache projects. They have experience with open source projects,
> >>>>>> starting from their incubation to the top-level. They have been
> >>>>>> involved in the open source development process, and are familiar
> with
> >>>>>> releasing code under an open source license.
> >>>>>>
> >>>>>> === Homogeneous Developers ===
> >>>>>> The initial set of committers is from a limited set of
> organizations,
> >>>>>> but we expect to attract new contributors from diverse organizations
> >>>>>> and will thus grow organically once approved for incubation. Our
> prior
> >>>>>> experience with other open source projects will help various
> >>>>>> contributors to actively participate in our project.
> >>>>>>
> >>>>>> === Reliance on Salaried Developers ===
> >>>>>> Many developers are from Seoul National University. This is not
> >>>>> applicable.
> >>>>>>
> >>>>>> === Relationships with Other Apache Products ===
> >>>>>> Onyx positions itself among multiple Apache products. It runs on
> >>>>>> Apache REEF for container management. It also utilizes many useful
> >>>>>> development tools including Apache Maven, Apache Log4J, and multiple
> >>>>>> Apache Commons components. Onyx supports the Apache Beam programming
> >>>>>> model for user applications. We are currently working on supporting
> >>>>>> the Apache Spark programming APIs as well.
> >>>>>>
> >>>>>> === An Excessive Fascination with the Apache Brand ===
> >>>>>> We hope to make Onyx a powerful system for data processing, meeting
> >>>>>> various needs for different deployment characteristics, under a more
> >>>>>> variety of environments. We see the limitations of simply putting
> code
> >>>>>> on GitHub, and we believe the Apache community will help the growth
> of
> >>>>>> Onyx for the project to become a positively impactful and innovative
> >>>>>> open source software. We believe Onyx is a great fit for the Apache
> >>>>>> Software Foundation due to the collaboration it aims to achieve from
> >>>>>> the big data processing community.
> >>>>>>
> >>>>>> == Documentation ==
> >>>>>> The current documentation for Onyx is at
> https://snuspl.github.io/onyx/
> >>>>> .
> >>>>>>
> >>>>>> == Initial Source ==
> >>>>>> The Onyx codebase is currently hosted at
> https://github.com/snuspl/onyx
> >>>>> .
> >>>>>>
> >>>>>> == External Dependencies ==
> >>>>>> To the best of our knowledge, all Onyx dependencies are distributed
> >>>>>> under Apache compatible licenses. Upon acceptance to the incubator,
> we
> >>>>>> would begin a thorough analysis of all transitive dependencies to
> >>>>>> verify this fact and further introduce license checking into the
> build
> >>>>>> and release process.
> >>>>>>
> >>>>>> == Cryptography ==
> >>>>>> Not applicable.
> >>>>>>
> >>>>>> == Required Resources ==
> >>>>>> === Mailing Lists ===
> >>>>>> We will operate two mailing lists as follows:
> >>>>>>  * Onyx PMC discussions: [hidden email]
> >>>>>>  * Onyx developers: [hidden email]
> >>>>>>
> >>>>>> === Git Repositories ===
> >>>>>> Upon incubation: https://github.com/apache/incubator-onyx.
> >>>>>> After the incubation, we would like to move the existing repo
> >>>>>> https://github.com/snuspl/onyx to the Apache infrastructure
> >>>>>>
> >>>>>> === Issue Tracking ===
> >>>>>> Onyx currently tracks its issues using the Github issue tracker:
> >>>>>> https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
> >>>>>> JIRA.
> >>>>>>
> >>>>>> == Initial Committers ==
> >>>>>> * Byung-Gon Chun
> >>>>>> * Jeongyoon Eo
> >>>>>> * Geon-Woo Kim
> >>>>>> * Joo Yeon Kim
> >>>>>> * Gyewon Lee
> >>>>>> * Jung-Gil Lee
> >>>>>> * Sanha Lee
> >>>>>> * Wooyeon Lee
> >>>>>> * Yunseong Lee
> >>>>>> * JangHo Seo
> >>>>>> * Won Wook Song
> >>>>>> * Taegeon Um
> >>>>>> * Youngseok Yang
> >>>>>>
> >>>>>> == Affiliations ==
> >>>>>> * SNU (Seoul National University)
> >>>>>>   * Byung-Gon Chun
> >>>>>>   * Jeongyoon Eo
> >>>>>>   * Geon-Woo Kim
> >>>>>>   * Gyewon Lee
> >>>>>>   * Sanha Lee
> >>>>>>   * Wooyeon Lee
> >>>>>>   * Yunseong Lee
> >>>>>>   * JangHo Seo
> >>>>>>   * Won Wook Song
> >>>>>>   * Taegeon Um
> >>>>>>   * Youngseok Yang
> >>>>>>
> >>>>>> * LG
> >>>>>>   * Jung-Gil Lee
> >>>>>>
> >>>>>> * Samsung
> >>>>>>   * Joo Yeon Kim
> >>>>>>
> >>>>>> * Viva Republica
> >>>>>>   * Geon-Woo Kim
> >>>>>>
> >>>>>> == Sponsors ==
> >>>>>> === Champions ===
> >>>>>> Byung-Gon Chun
> >>>>>>
> >>>>>> === Mentors ===
> >>>>>> * Hyunsik Choi
> >>>>>> * Byung-Gon Chun
> >>>>>> * Markus Weimer
> >>>>>> * Reynold Xin
> >>>>>>
> >>>>>> === Sponsoring Entity ===
> >>>>>> The Apache Incubator
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Byung-Gon Chun
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Byung-Gon Chun
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Byung-Gon Chun
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Byung-Gon Chun
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Jean-Baptiste Onofré
In reply to this post by Byung-Gon Chun
Hi,

sorry to be a little bit late on this.

It's a very interesting proposal. It sounds pretty close to the portability
layer we want to add in Apache Beam. I would love to see interaction between the
two communities.

I have two minor questions:

1. about the name: Onyx sounds very generic and the name is used in other
technologies. Maybe another unique name would be more accurate.
2. the Onyx code is on github right now, under the Apache 2.0 license. Does this
code has any affiliation with companies ? Meaning that we would need a SGA for
the code donation.

If you need any help for the incubation, I would be more than happy to help !

Regards
JB

On 01/26/2018 12:28 AM, Byung-Gon Chun wrote:

> Dear Apache Incubator Community,
>
> Please accept the following proposal for presentation and discussion:
> https://wiki.apache.org/incubator/OnyxProposal
>
> Onyx is a data processing system that aims to flexibly control the runtime
> behaviors of a job to adapt to varying deployment characteristics (e.g.,
> harnessing transient resources in datacenters, cross-datacenter deployment,
> changing runtime based on job characteristics, etc.). Onyx provides ways to
> extend the system’s capabilities and incorporate the extensions to the
> flexible job execution.
> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an
> Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
> based on a deployment policy.
>
> I've attached the proposal below.
>
> Best regards,
> Byung-Gon Chun
>
> = OnyxProposal =
>
> == Abstract ==
> Onyx is a data processing system for flexible employment with
> different execution scenarios for various deployment characteristics
> on clusters.
>
> == Proposal ==
> Today, there is a wide variety of data processing systems with
> different designs for better performance and datacenter efficiency.
> They include processing data on specific resource environments and
> running jobs with specific attributes. Although each system
> successfully solves the problems it targets, most systems are designed
> in the way that runtime behaviors are built tightly inside the system
> core to hide the complexity of distributed computing. This makes it
> hard for a single system to support different deployment
> characteristics with different runtime behaviors without substantial
> effort.
>
> Onyx is a data processing system that aims to flexibly control the
> runtime behaviors of a job to adapt to varying deployment
> characteristics. Moreover, it provides a means of extending the
> system’s capabilities and incorporating the extensions to the flexible
> job execution.
>
> In order to be able to easily modify runtime behaviors to adapt to
> varying deployment characteristics, Onyx exposes runtime behaviors to
> be flexibly configured and modified at both compile-time and runtime
> through a set of high-level graph pass interfaces.
>
> We hope to contribute to the big data processing community by enabling
> more flexibility and extensibility in job executions. Furthermore, we
> can benefit more together as a community when we work together as a
> community to mature the system with more use cases and understanding
> of diverse deployment characteristics. The Apache Software Foundation
> is the perfect place to achieve these aspirations.
>
> == Background ==
> Many data processing systems have distinctive runtime behaviors
> optimized and configured for specific deployment characteristics like
> different resource environments and for handling special job
> attributes.
>
> For example, much research have been conducted to overcome the
> challenge of running data processing jobs on cheap, unreliable
> transient resources. Likewise, techniques for disaggregating different
> types of resources, like memory, CPU and GPU, are being actively
> developed to use datacenter resources more efficiently. Many
> researchers are also working to run data processing jobs in even more
> diverse environments, such as across distant datacenters. Similarly,
> for special job attributes, many works take different approaches, such
> as runtime optimization, to solve problems like data skew, and to
> optimize systems for data processing jobs with small-scale input data.
>
> Although each of the systems performs well with the jobs and in the
> environments they target, they perform poorly with unconsidered cases,
> and do not consider supporting multiple deployment characteristics on
> a single system in their designs.
>
> For an application writer to optimize an application to perform well
> on a certain system engraved with its underlying behaviors, it
> requires a deep understanding of the system itself, which is an
> overhead that often requires a lot of time and effort. Moreover, for a
> developer to modify such system behaviors, it requires modifications
> of the system core, which requires an even deeper understanding of the
> system itself.
>
> With this background, Onyx is designed to represent all of its jobs as
> an Intermediate Representation (IR) DAG. In the Onyx compiler, user
> applications from various programming models (ex. Apache Beam) are
> submitted, transformed to an IR DAG, and optimized/customized for the
> deployment characteristics. In the IR DAG optimization phase, the DAG
> is modified through a series of compiler “passes” which reshape or
> annotate the DAG with an expression of the underlying runtime
> behaviors. The IR DAG is then submitted as an execution plan for the
> Onyx runtime. The runtime includes the unmodified parts of data
> processing in the backbone which is transparently integrated with
> configurable components exposed for further extension.
>
> == Rationale ==
> Onyx’s vision lies in providing means for flexibly supporting a wide
> variety of job execution scenarios for users while facilitating system
> developers to extend the execution framework with various
> functionalities at the same time. The capabilities of the system can
> be extended as it grows to meet a more variety of execution scenarios.
> We require inputs from users and developers from diverse domains in
> order to make it a more thriving and useful project. The Apache
> Software Foundation provides the best tools and community to support
> this vision.
>
> == Initial Goals ==
> Initial goals will be to move the existing codebase to Apache and
> integrate with the Apache development process. We further plan to
> develop our system to meet the needs for more execution scenarios for
> a more variety of deployment characteristics.
>
> == Current Status ==
> Onyx codebase is currently hosted in a repository at github.com. The
> current version has been developed by system developers at Seoul
> National University, Viva Republica, Samsung, and LG.
>
> == Meritocracy ==
> We plan to strongly support meritocracy. We will discuss the
> requirements in an open forum, and those that continuously contribute
> to Onyx with the passion to strengthen the system will be invited as
> committers. Contributors that enrich Onyx by providing various use
> cases, various implementations of the configurable components
> including ideas for optimization techniques will be especially
> welcome. Committers with a deep understanding of the system’s
> technical aspects as a whole and its philosophy will definitely be
> voted as the PMC. We will monitor community participation so that
> privileges can be extended to those that contribute.
>
> == Community ==
> We hope to expand our contribution community by becoming an Apache
> incubator project. The contributions will come from both users and
> system developers interested in flexibility and extensibility of job
> executions that Onyx can support. We expect users to mainly contribute
> to diversify the use cases and deployment characteristics, and
> developers to  contribute to implement them.
>
> == Alignment ==
> Apache Spark is one of many popular data processing frameworks. The
> system is designed towards optimizing jobs using RDDs in memory and
> many other optimizations built tightly within the framework. In
> contrast to Spark, Onyx aims to provide more flexibility for job
> execution in an easy manner.
>
> Apache Tez enables developers to build complex task DAGs with control
> over the control plane of job execution. In Onyx, a high-level
> programming layer (ex. Apache Beam) is automatically converted to a
> basic IR DAG and can be converted to any IR DAG through a series of
> easy user writable passes, that can both reshape and modify the
> annotation (of execution properties) of the DAG. Moreover, Onyx leaves
> more parts of the job execution configurable, such as the scheduler
> and the data plane. As opposed to providing a set of properties for
> solid optimization, Onyx’s configurable parts can be easily extended
> and explored by implementing the pre-defined interfaces. For example,
> an arbitrary intermediate data store can be added.
>
> Onyx currently supports Apache Beam programs and we are working on
> supporting Apache Spark programs as well. Onyx also utilizes Apache
> REEF for container management, which allows Onyx to run in Apache YARN
> and Apache Mesos clusters. If necessary, we plan to contribute to and
> collaborate with these other Apache projects for the benefit of all.
> We plan to extend such integrations with more Apache softwares. Apache
> software foundation already hosts many major big-data systems, and we
> expect to help further growth of the big-data community by having Onyx
> within the Apache foundation.
>
> == Known Risks ==
> === Orphaned Products ===
> The risk of the Onyx project being orphaned is minimal. There is
> already plenty of work that arduously support different deployment
> characteristics, and we propose a general way to implement them with
> flexible and extensible configuration knobs. The domain of data
> processing is already of high interest, and this domain is expected to
> evolve continuously with various other purposes, such as resource
> disaggregation and using transient resources for better datacenter
> resource utilization.
>
> === Inexperience with Open Source ===
> The initial committers include PMC members and committers of other
> Apache projects. They have experience with open source projects,
> starting from their incubation to the top-level. They have been
> involved in the open source development process, and are familiar with
> releasing code under an open source license.
>
> === Homogeneous Developers ===
> The initial set of committers is from a limited set of organizations,
> but we expect to attract new contributors from diverse organizations
> and will thus grow organically once approved for incubation. Our prior
> experience with other open source projects will help various
> contributors to actively participate in our project.
>
> === Reliance on Salaried Developers ===
> Many developers are from Seoul National University. This is not applicable.
>
> === Relationships with Other Apache Products ===
> Onyx positions itself among multiple Apache products. It runs on
> Apache REEF for container management. It also utilizes many useful
> development tools including Apache Maven, Apache Log4J, and multiple
> Apache Commons components. Onyx supports the Apache Beam programming
> model for user applications. We are currently working on supporting
> the Apache Spark programming APIs as well.
>
> === An Excessive Fascination with the Apache Brand ===
> We hope to make Onyx a powerful system for data processing, meeting
> various needs for different deployment characteristics, under a more
> variety of environments. We see the limitations of simply putting code
> on GitHub, and we believe the Apache community will help the growth of
> Onyx for the project to become a positively impactful and innovative
> open source software. We believe Onyx is a great fit for the Apache
> Software Foundation due to the collaboration it aims to achieve from
> the big data processing community.
>
> == Documentation ==
> The current documentation for Onyx is at https://snuspl.github.io/onyx/.
>
> == Initial Source ==
> The Onyx codebase is currently hosted at https://github.com/snuspl/onyx.
>
> == External Dependencies ==
> To the best of our knowledge, all Onyx dependencies are distributed
> under Apache compatible licenses. Upon acceptance to the incubator, we
> would begin a thorough analysis of all transitive dependencies to
> verify this fact and further introduce license checking into the build
> and release process.
>
> == Cryptography ==
> Not applicable.
>
> == Required Resources ==
> === Mailing Lists ===
> We will operate two mailing lists as follows:
>    * Onyx PMC discussions: [hidden email]
>    * Onyx developers: [hidden email]
>
> === Git Repositories ===
> Upon incubation: https://github.com/apache/incubator-onyx.
> After the incubation, we would like to move the existing repo
> https://github.com/snuspl/onyx to the Apache infrastructure
>
> === Issue Tracking ===
> Onyx currently tracks its issues using the Github issue tracker:
> https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
> JIRA.
>
> == Initial Committers ==
>   * Byung-Gon Chun
>   * Jeongyoon Eo
>   * Geon-Woo Kim
>   * Joo Yeon Kim
>   * Gyewon Lee
>   * Jung-Gil Lee
>   * Sanha Lee
>   * Wooyeon Lee
>   * Yunseong Lee
>   * JangHo Seo
>   * Won Wook Song
>   * Taegeon Um
>   * Youngseok Yang
>
> == Affiliations ==
>   * SNU (Seoul National University)
>     * Byung-Gon Chun
>     * Jeongyoon Eo
>     * Geon-Woo Kim
>     * Gyewon Lee
>     * Sanha Lee
>     * Wooyeon Lee
>     * Yunseong Lee
>     * JangHo Seo
>     * Won Wook Song
>     * Taegeon Um
>     * Youngseok Yang
>
>   * LG
>     * Jung-Gil Lee
>
>   * Samsung
>     * Joo Yeon Kim
>
>   * Viva Republica
>     * Geon-Woo Kim
>
> == Sponsors ==
> === Champions ===
> Byung-Gon Chun
>
> === Mentors ===
>   * Hyunsik Choi
>   * Byung-Gon Chun
>   * Markus Weimer
>   * Reynold Xin
>
> === Sponsoring Entity ===
> The Apache Incubator
>
>
>

--
Jean-Baptiste Onofré
[hidden email]
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Byung-Gon Chun
Thanks for the comments, JB!
My replies are inlined below.

On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré <[hidden email]>
wrote:

> Hi,
>
> sorry to be a little bit late on this.
>
> It's a very interesting proposal. It sounds pretty close to the portability
> layer we want to add in Apache Beam. I would love to see interaction
> between the
> two communities.
>
> I have two minor questions:
>
> 1. about the name: Onyx sounds very generic and the name is used in other
> technologies. Maybe another unique name would be more accurate.
>

We proposed Coral instead. How does this sound?


> 2. the Onyx code is on github right now, under the Apache 2.0 license.
> Does this
> code has any affiliation with companies ? Meaning that we would need a SGA
> for
> the code donation.
>
> It does not. The developers are affiliated with Seoul National University.
In this case, do we still need a SGA?


> If you need any help for the incubation, I would be more than happy to
> help !
>
>
Thanks for the offer. Would you be interested in being a mentor of the
project?

Thanks.
-Gon



> Regards
> JB
>
> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote:
> > Dear Apache Incubator Community,
> >
> > Please accept the following proposal for presentation and discussion:
> > https://wiki.apache.org/incubator/OnyxProposal
> >
> > Onyx is a data processing system that aims to flexibly control the
> runtime
> > behaviors of a job to adapt to varying deployment characteristics (e.g.,
> > harnessing transient resources in datacenters, cross-datacenter
> deployment,
> > changing runtime based on job characteristics, etc.). Onyx provides ways
> to
> > extend the system’s capabilities and incorporate the extensions to the
> > flexible job execution.
> > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an
> > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
> > based on a deployment policy.
> >
> > I've attached the proposal below.
> >
> > Best regards,
> > Byung-Gon Chun
> >
> > = OnyxProposal =
> >
> > == Abstract ==
> > Onyx is a data processing system for flexible employment with
> > different execution scenarios for various deployment characteristics
> > on clusters.
> >
> > == Proposal ==
> > Today, there is a wide variety of data processing systems with
> > different designs for better performance and datacenter efficiency.
> > They include processing data on specific resource environments and
> > running jobs with specific attributes. Although each system
> > successfully solves the problems it targets, most systems are designed
> > in the way that runtime behaviors are built tightly inside the system
> > core to hide the complexity of distributed computing. This makes it
> > hard for a single system to support different deployment
> > characteristics with different runtime behaviors without substantial
> > effort.
> >
> > Onyx is a data processing system that aims to flexibly control the
> > runtime behaviors of a job to adapt to varying deployment
> > characteristics. Moreover, it provides a means of extending the
> > system’s capabilities and incorporating the extensions to the flexible
> > job execution.
> >
> > In order to be able to easily modify runtime behaviors to adapt to
> > varying deployment characteristics, Onyx exposes runtime behaviors to
> > be flexibly configured and modified at both compile-time and runtime
> > through a set of high-level graph pass interfaces.
> >
> > We hope to contribute to the big data processing community by enabling
> > more flexibility and extensibility in job executions. Furthermore, we
> > can benefit more together as a community when we work together as a
> > community to mature the system with more use cases and understanding
> > of diverse deployment characteristics. The Apache Software Foundation
> > is the perfect place to achieve these aspirations.
> >
> > == Background ==
> > Many data processing systems have distinctive runtime behaviors
> > optimized and configured for specific deployment characteristics like
> > different resource environments and for handling special job
> > attributes.
> >
> > For example, much research have been conducted to overcome the
> > challenge of running data processing jobs on cheap, unreliable
> > transient resources. Likewise, techniques for disaggregating different
> > types of resources, like memory, CPU and GPU, are being actively
> > developed to use datacenter resources more efficiently. Many
> > researchers are also working to run data processing jobs in even more
> > diverse environments, such as across distant datacenters. Similarly,
> > for special job attributes, many works take different approaches, such
> > as runtime optimization, to solve problems like data skew, and to
> > optimize systems for data processing jobs with small-scale input data.
> >
> > Although each of the systems performs well with the jobs and in the
> > environments they target, they perform poorly with unconsidered cases,
> > and do not consider supporting multiple deployment characteristics on
> > a single system in their designs.
> >
> > For an application writer to optimize an application to perform well
> > on a certain system engraved with its underlying behaviors, it
> > requires a deep understanding of the system itself, which is an
> > overhead that often requires a lot of time and effort. Moreover, for a
> > developer to modify such system behaviors, it requires modifications
> > of the system core, which requires an even deeper understanding of the
> > system itself.
> >
> > With this background, Onyx is designed to represent all of its jobs as
> > an Intermediate Representation (IR) DAG. In the Onyx compiler, user
> > applications from various programming models (ex. Apache Beam) are
> > submitted, transformed to an IR DAG, and optimized/customized for the
> > deployment characteristics. In the IR DAG optimization phase, the DAG
> > is modified through a series of compiler “passes” which reshape or
> > annotate the DAG with an expression of the underlying runtime
> > behaviors. The IR DAG is then submitted as an execution plan for the
> > Onyx runtime. The runtime includes the unmodified parts of data
> > processing in the backbone which is transparently integrated with
> > configurable components exposed for further extension.
> >
> > == Rationale ==
> > Onyx’s vision lies in providing means for flexibly supporting a wide
> > variety of job execution scenarios for users while facilitating system
> > developers to extend the execution framework with various
> > functionalities at the same time. The capabilities of the system can
> > be extended as it grows to meet a more variety of execution scenarios.
> > We require inputs from users and developers from diverse domains in
> > order to make it a more thriving and useful project. The Apache
> > Software Foundation provides the best tools and community to support
> > this vision.
> >
> > == Initial Goals ==
> > Initial goals will be to move the existing codebase to Apache and
> > integrate with the Apache development process. We further plan to
> > develop our system to meet the needs for more execution scenarios for
> > a more variety of deployment characteristics.
> >
> > == Current Status ==
> > Onyx codebase is currently hosted in a repository at github.com. The
> > current version has been developed by system developers at Seoul
> > National University, Viva Republica, Samsung, and LG.
> >
> > == Meritocracy ==
> > We plan to strongly support meritocracy. We will discuss the
> > requirements in an open forum, and those that continuously contribute
> > to Onyx with the passion to strengthen the system will be invited as
> > committers. Contributors that enrich Onyx by providing various use
> > cases, various implementations of the configurable components
> > including ideas for optimization techniques will be especially
> > welcome. Committers with a deep understanding of the system’s
> > technical aspects as a whole and its philosophy will definitely be
> > voted as the PMC. We will monitor community participation so that
> > privileges can be extended to those that contribute.
> >
> > == Community ==
> > We hope to expand our contribution community by becoming an Apache
> > incubator project. The contributions will come from both users and
> > system developers interested in flexibility and extensibility of job
> > executions that Onyx can support. We expect users to mainly contribute
> > to diversify the use cases and deployment characteristics, and
> > developers to  contribute to implement them.
> >
> > == Alignment ==
> > Apache Spark is one of many popular data processing frameworks. The
> > system is designed towards optimizing jobs using RDDs in memory and
> > many other optimizations built tightly within the framework. In
> > contrast to Spark, Onyx aims to provide more flexibility for job
> > execution in an easy manner.
> >
> > Apache Tez enables developers to build complex task DAGs with control
> > over the control plane of job execution. In Onyx, a high-level
> > programming layer (ex. Apache Beam) is automatically converted to a
> > basic IR DAG and can be converted to any IR DAG through a series of
> > easy user writable passes, that can both reshape and modify the
> > annotation (of execution properties) of the DAG. Moreover, Onyx leaves
> > more parts of the job execution configurable, such as the scheduler
> > and the data plane. As opposed to providing a set of properties for
> > solid optimization, Onyx’s configurable parts can be easily extended
> > and explored by implementing the pre-defined interfaces. For example,
> > an arbitrary intermediate data store can be added.
> >
> > Onyx currently supports Apache Beam programs and we are working on
> > supporting Apache Spark programs as well. Onyx also utilizes Apache
> > REEF for container management, which allows Onyx to run in Apache YARN
> > and Apache Mesos clusters. If necessary, we plan to contribute to and
> > collaborate with these other Apache projects for the benefit of all.
> > We plan to extend such integrations with more Apache softwares. Apache
> > software foundation already hosts many major big-data systems, and we
> > expect to help further growth of the big-data community by having Onyx
> > within the Apache foundation.
> >
> > == Known Risks ==
> > === Orphaned Products ===
> > The risk of the Onyx project being orphaned is minimal. There is
> > already plenty of work that arduously support different deployment
> > characteristics, and we propose a general way to implement them with
> > flexible and extensible configuration knobs. The domain of data
> > processing is already of high interest, and this domain is expected to
> > evolve continuously with various other purposes, such as resource
> > disaggregation and using transient resources for better datacenter
> > resource utilization.
> >
> > === Inexperience with Open Source ===
> > The initial committers include PMC members and committers of other
> > Apache projects. They have experience with open source projects,
> > starting from their incubation to the top-level. They have been
> > involved in the open source development process, and are familiar with
> > releasing code under an open source license.
> >
> > === Homogeneous Developers ===
> > The initial set of committers is from a limited set of organizations,
> > but we expect to attract new contributors from diverse organizations
> > and will thus grow organically once approved for incubation. Our prior
> > experience with other open source projects will help various
> > contributors to actively participate in our project.
> >
> > === Reliance on Salaried Developers ===
> > Many developers are from Seoul National University. This is not
> applicable.
> >
> > === Relationships with Other Apache Products ===
> > Onyx positions itself among multiple Apache products. It runs on
> > Apache REEF for container management. It also utilizes many useful
> > development tools including Apache Maven, Apache Log4J, and multiple
> > Apache Commons components. Onyx supports the Apache Beam programming
> > model for user applications. We are currently working on supporting
> > the Apache Spark programming APIs as well.
> >
> > === An Excessive Fascination with the Apache Brand ===
> > We hope to make Onyx a powerful system for data processing, meeting
> > various needs for different deployment characteristics, under a more
> > variety of environments. We see the limitations of simply putting code
> > on GitHub, and we believe the Apache community will help the growth of
> > Onyx for the project to become a positively impactful and innovative
> > open source software. We believe Onyx is a great fit for the Apache
> > Software Foundation due to the collaboration it aims to achieve from
> > the big data processing community.
> >
> > == Documentation ==
> > The current documentation for Onyx is at https://snuspl.github.io/onyx/.
> >
> > == Initial Source ==
> > The Onyx codebase is currently hosted at https://github.com/snuspl/onyx.
> >
> > == External Dependencies ==
> > To the best of our knowledge, all Onyx dependencies are distributed
> > under Apache compatible licenses. Upon acceptance to the incubator, we
> > would begin a thorough analysis of all transitive dependencies to
> > verify this fact and further introduce license checking into the build
> > and release process.
> >
> > == Cryptography ==
> > Not applicable.
> >
> > == Required Resources ==
> > === Mailing Lists ===
> > We will operate two mailing lists as follows:
> >    * Onyx PMC discussions: [hidden email]
> >    * Onyx developers: [hidden email]
> >
> > === Git Repositories ===
> > Upon incubation: https://github.com/apache/incubator-onyx.
> > After the incubation, we would like to move the existing repo
> > https://github.com/snuspl/onyx to the Apache infrastructure
> >
> > === Issue Tracking ===
> > Onyx currently tracks its issues using the Github issue tracker:
> > https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
> > JIRA.
> >
> > == Initial Committers ==
> >   * Byung-Gon Chun
> >   * Jeongyoon Eo
> >   * Geon-Woo Kim
> >   * Joo Yeon Kim
> >   * Gyewon Lee
> >   * Jung-Gil Lee
> >   * Sanha Lee
> >   * Wooyeon Lee
> >   * Yunseong Lee
> >   * JangHo Seo
> >   * Won Wook Song
> >   * Taegeon Um
> >   * Youngseok Yang
> >
> > == Affiliations ==
> >   * SNU (Seoul National University)
> >     * Byung-Gon Chun
> >     * Jeongyoon Eo
> >     * Geon-Woo Kim
> >     * Gyewon Lee
> >     * Sanha Lee
> >     * Wooyeon Lee
> >     * Yunseong Lee
> >     * JangHo Seo
> >     * Won Wook Song
> >     * Taegeon Um
> >     * Youngseok Yang
> >
> >   * LG
> >     * Jung-Gil Lee
> >
> >   * Samsung
> >     * Joo Yeon Kim
> >
> >   * Viva Republica
> >     * Geon-Woo Kim
> >
> > == Sponsors ==
> > === Champions ===
> > Byung-Gon Chun
> >
> > === Mentors ===
> >   * Hyunsik Choi
> >   * Byung-Gon Chun
> >   * Markus Weimer
> >   * Reynold Xin
> >
> > === Sponsoring Entity ===
> > The Apache Incubator
> >
> >
> >
>
> --
> Jean-Baptiste Onofré
> [hidden email]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Byung-Gon Chun
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Byung-Gon Chun
If Coral as our project name is fine, I will start voting in a couple of
days.
Let me know if you have any concern.

Thanks.
-Gon

On Tue, Jan 30, 2018 at 6:17 PM, Byung-Gon Chun <[hidden email]> wrote:

> Thanks for the comments, JB!
> My replies are inlined below.
>
> On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré <[hidden email]>
> wrote:
>
>> Hi,
>>
>> sorry to be a little bit late on this.
>>
>> It's a very interesting proposal. It sounds pretty close to the
>> portability
>> layer we want to add in Apache Beam. I would love to see interaction
>> between the
>> two communities.
>>
>> I have two minor questions:
>>
>> 1. about the name: Onyx sounds very generic and the name is used in other
>> technologies. Maybe another unique name would be more accurate.
>>
>
> We proposed Coral instead. How does this sound?
>
>
>> 2. the Onyx code is on github right now, under the Apache 2.0 license.
>> Does this
>> code has any affiliation with companies ? Meaning that we would need a
>> SGA for
>> the code donation.
>>
>> It does not. The developers are affiliated with Seoul National
> University.
> In this case, do we still need a SGA?
>
>
>> If you need any help for the incubation, I would be more than happy to
>> help !
>>
>>
> Thanks for the offer. Would you be interested in being a mentor of the
> project?
>
> Thanks.
> -Gon
>
>
>
>> Regards
>> JB
>>
>> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote:
>> > Dear Apache Incubator Community,
>> >
>> > Please accept the following proposal for presentation and discussion:
>> > https://wiki.apache.org/incubator/OnyxProposal
>> >
>> > Onyx is a data processing system that aims to flexibly control the
>> runtime
>> > behaviors of a job to adapt to varying deployment characteristics (e.g.,
>> > harnessing transient resources in datacenters, cross-datacenter
>> deployment,
>> > changing runtime based on job characteristics, etc.). Onyx provides
>> ways to
>> > extend the system’s capabilities and incorporate the extensions to the
>> > flexible job execution.
>> > Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an
>> > Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
>> > based on a deployment policy.
>> >
>> > I've attached the proposal below.
>> >
>> > Best regards,
>> > Byung-Gon Chun
>> >
>> > = OnyxProposal =
>> >
>> > == Abstract ==
>> > Onyx is a data processing system for flexible employment with
>> > different execution scenarios for various deployment characteristics
>> > on clusters.
>> >
>> > == Proposal ==
>> > Today, there is a wide variety of data processing systems with
>> > different designs for better performance and datacenter efficiency.
>> > They include processing data on specific resource environments and
>> > running jobs with specific attributes. Although each system
>> > successfully solves the problems it targets, most systems are designed
>> > in the way that runtime behaviors are built tightly inside the system
>> > core to hide the complexity of distributed computing. This makes it
>> > hard for a single system to support different deployment
>> > characteristics with different runtime behaviors without substantial
>> > effort.
>> >
>> > Onyx is a data processing system that aims to flexibly control the
>> > runtime behaviors of a job to adapt to varying deployment
>> > characteristics. Moreover, it provides a means of extending the
>> > system’s capabilities and incorporating the extensions to the flexible
>> > job execution.
>> >
>> > In order to be able to easily modify runtime behaviors to adapt to
>> > varying deployment characteristics, Onyx exposes runtime behaviors to
>> > be flexibly configured and modified at both compile-time and runtime
>> > through a set of high-level graph pass interfaces.
>> >
>> > We hope to contribute to the big data processing community by enabling
>> > more flexibility and extensibility in job executions. Furthermore, we
>> > can benefit more together as a community when we work together as a
>> > community to mature the system with more use cases and understanding
>> > of diverse deployment characteristics. The Apache Software Foundation
>> > is the perfect place to achieve these aspirations.
>> >
>> > == Background ==
>> > Many data processing systems have distinctive runtime behaviors
>> > optimized and configured for specific deployment characteristics like
>> > different resource environments and for handling special job
>> > attributes.
>> >
>> > For example, much research have been conducted to overcome the
>> > challenge of running data processing jobs on cheap, unreliable
>> > transient resources. Likewise, techniques for disaggregating different
>> > types of resources, like memory, CPU and GPU, are being actively
>> > developed to use datacenter resources more efficiently. Many
>> > researchers are also working to run data processing jobs in even more
>> > diverse environments, such as across distant datacenters. Similarly,
>> > for special job attributes, many works take different approaches, such
>> > as runtime optimization, to solve problems like data skew, and to
>> > optimize systems for data processing jobs with small-scale input data.
>> >
>> > Although each of the systems performs well with the jobs and in the
>> > environments they target, they perform poorly with unconsidered cases,
>> > and do not consider supporting multiple deployment characteristics on
>> > a single system in their designs.
>> >
>> > For an application writer to optimize an application to perform well
>> > on a certain system engraved with its underlying behaviors, it
>> > requires a deep understanding of the system itself, which is an
>> > overhead that often requires a lot of time and effort. Moreover, for a
>> > developer to modify such system behaviors, it requires modifications
>> > of the system core, which requires an even deeper understanding of the
>> > system itself.
>> >
>> > With this background, Onyx is designed to represent all of its jobs as
>> > an Intermediate Representation (IR) DAG. In the Onyx compiler, user
>> > applications from various programming models (ex. Apache Beam) are
>> > submitted, transformed to an IR DAG, and optimized/customized for the
>> > deployment characteristics. In the IR DAG optimization phase, the DAG
>> > is modified through a series of compiler “passes” which reshape or
>> > annotate the DAG with an expression of the underlying runtime
>> > behaviors. The IR DAG is then submitted as an execution plan for the
>> > Onyx runtime. The runtime includes the unmodified parts of data
>> > processing in the backbone which is transparently integrated with
>> > configurable components exposed for further extension.
>> >
>> > == Rationale ==
>> > Onyx’s vision lies in providing means for flexibly supporting a wide
>> > variety of job execution scenarios for users while facilitating system
>> > developers to extend the execution framework with various
>> > functionalities at the same time. The capabilities of the system can
>> > be extended as it grows to meet a more variety of execution scenarios.
>> > We require inputs from users and developers from diverse domains in
>> > order to make it a more thriving and useful project. The Apache
>> > Software Foundation provides the best tools and community to support
>> > this vision.
>> >
>> > == Initial Goals ==
>> > Initial goals will be to move the existing codebase to Apache and
>> > integrate with the Apache development process. We further plan to
>> > develop our system to meet the needs for more execution scenarios for
>> > a more variety of deployment characteristics.
>> >
>> > == Current Status ==
>> > Onyx codebase is currently hosted in a repository at github.com. The
>> > current version has been developed by system developers at Seoul
>> > National University, Viva Republica, Samsung, and LG.
>> >
>> > == Meritocracy ==
>> > We plan to strongly support meritocracy. We will discuss the
>> > requirements in an open forum, and those that continuously contribute
>> > to Onyx with the passion to strengthen the system will be invited as
>> > committers. Contributors that enrich Onyx by providing various use
>> > cases, various implementations of the configurable components
>> > including ideas for optimization techniques will be especially
>> > welcome. Committers with a deep understanding of the system’s
>> > technical aspects as a whole and its philosophy will definitely be
>> > voted as the PMC. We will monitor community participation so that
>> > privileges can be extended to those that contribute.
>> >
>> > == Community ==
>> > We hope to expand our contribution community by becoming an Apache
>> > incubator project. The contributions will come from both users and
>> > system developers interested in flexibility and extensibility of job
>> > executions that Onyx can support. We expect users to mainly contribute
>> > to diversify the use cases and deployment characteristics, and
>> > developers to  contribute to implement them.
>> >
>> > == Alignment ==
>> > Apache Spark is one of many popular data processing frameworks. The
>> > system is designed towards optimizing jobs using RDDs in memory and
>> > many other optimizations built tightly within the framework. In
>> > contrast to Spark, Onyx aims to provide more flexibility for job
>> > execution in an easy manner.
>> >
>> > Apache Tez enables developers to build complex task DAGs with control
>> > over the control plane of job execution. In Onyx, a high-level
>> > programming layer (ex. Apache Beam) is automatically converted to a
>> > basic IR DAG and can be converted to any IR DAG through a series of
>> > easy user writable passes, that can both reshape and modify the
>> > annotation (of execution properties) of the DAG. Moreover, Onyx leaves
>> > more parts of the job execution configurable, such as the scheduler
>> > and the data plane. As opposed to providing a set of properties for
>> > solid optimization, Onyx’s configurable parts can be easily extended
>> > and explored by implementing the pre-defined interfaces. For example,
>> > an arbitrary intermediate data store can be added.
>> >
>> > Onyx currently supports Apache Beam programs and we are working on
>> > supporting Apache Spark programs as well. Onyx also utilizes Apache
>> > REEF for container management, which allows Onyx to run in Apache YARN
>> > and Apache Mesos clusters. If necessary, we plan to contribute to and
>> > collaborate with these other Apache projects for the benefit of all.
>> > We plan to extend such integrations with more Apache softwares. Apache
>> > software foundation already hosts many major big-data systems, and we
>> > expect to help further growth of the big-data community by having Onyx
>> > within the Apache foundation.
>> >
>> > == Known Risks ==
>> > === Orphaned Products ===
>> > The risk of the Onyx project being orphaned is minimal. There is
>> > already plenty of work that arduously support different deployment
>> > characteristics, and we propose a general way to implement them with
>> > flexible and extensible configuration knobs. The domain of data
>> > processing is already of high interest, and this domain is expected to
>> > evolve continuously with various other purposes, such as resource
>> > disaggregation and using transient resources for better datacenter
>> > resource utilization.
>> >
>> > === Inexperience with Open Source ===
>> > The initial committers include PMC members and committers of other
>> > Apache projects. They have experience with open source projects,
>> > starting from their incubation to the top-level. They have been
>> > involved in the open source development process, and are familiar with
>> > releasing code under an open source license.
>> >
>> > === Homogeneous Developers ===
>> > The initial set of committers is from a limited set of organizations,
>> > but we expect to attract new contributors from diverse organizations
>> > and will thus grow organically once approved for incubation. Our prior
>> > experience with other open source projects will help various
>> > contributors to actively participate in our project.
>> >
>> > === Reliance on Salaried Developers ===
>> > Many developers are from Seoul National University. This is not
>> applicable.
>> >
>> > === Relationships with Other Apache Products ===
>> > Onyx positions itself among multiple Apache products. It runs on
>> > Apache REEF for container management. It also utilizes many useful
>> > development tools including Apache Maven, Apache Log4J, and multiple
>> > Apache Commons components. Onyx supports the Apache Beam programming
>> > model for user applications. We are currently working on supporting
>> > the Apache Spark programming APIs as well.
>> >
>> > === An Excessive Fascination with the Apache Brand ===
>> > We hope to make Onyx a powerful system for data processing, meeting
>> > various needs for different deployment characteristics, under a more
>> > variety of environments. We see the limitations of simply putting code
>> > on GitHub, and we believe the Apache community will help the growth of
>> > Onyx for the project to become a positively impactful and innovative
>> > open source software. We believe Onyx is a great fit for the Apache
>> > Software Foundation due to the collaboration it aims to achieve from
>> > the big data processing community.
>> >
>> > == Documentation ==
>> > The current documentation for Onyx is at https://snuspl.github.io/onyx/
>> .
>> >
>> > == Initial Source ==
>> > The Onyx codebase is currently hosted at https://github.com/snuspl/onyx
>> .
>> >
>> > == External Dependencies ==
>> > To the best of our knowledge, all Onyx dependencies are distributed
>> > under Apache compatible licenses. Upon acceptance to the incubator, we
>> > would begin a thorough analysis of all transitive dependencies to
>> > verify this fact and further introduce license checking into the build
>> > and release process.
>> >
>> > == Cryptography ==
>> > Not applicable.
>> >
>> > == Required Resources ==
>> > === Mailing Lists ===
>> > We will operate two mailing lists as follows:
>> >    * Onyx PMC discussions: [hidden email]
>> >    * Onyx developers: [hidden email]
>> >
>> > === Git Repositories ===
>> > Upon incubation: https://github.com/apache/incubator-onyx.
>> > After the incubation, we would like to move the existing repo
>> > https://github.com/snuspl/onyx to the Apache infrastructure
>> >
>> > === Issue Tracking ===
>> > Onyx currently tracks its issues using the Github issue tracker:
>> > https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
>> > JIRA.
>> >
>> > == Initial Committers ==
>> >   * Byung-Gon Chun
>> >   * Jeongyoon Eo
>> >   * Geon-Woo Kim
>> >   * Joo Yeon Kim
>> >   * Gyewon Lee
>> >   * Jung-Gil Lee
>> >   * Sanha Lee
>> >   * Wooyeon Lee
>> >   * Yunseong Lee
>> >   * JangHo Seo
>> >   * Won Wook Song
>> >   * Taegeon Um
>> >   * Youngseok Yang
>> >
>> > == Affiliations ==
>> >   * SNU (Seoul National University)
>> >     * Byung-Gon Chun
>> >     * Jeongyoon Eo
>> >     * Geon-Woo Kim
>> >     * Gyewon Lee
>> >     * Sanha Lee
>> >     * Wooyeon Lee
>> >     * Yunseong Lee
>> >     * JangHo Seo
>> >     * Won Wook Song
>> >     * Taegeon Um
>> >     * Youngseok Yang
>> >
>> >   * LG
>> >     * Jung-Gil Lee
>> >
>> >   * Samsung
>> >     * Joo Yeon Kim
>> >
>> >   * Viva Republica
>> >     * Geon-Woo Kim
>> >
>> > == Sponsors ==
>> > === Champions ===
>> > Byung-Gon Chun
>> >
>> > === Mentors ===
>> >   * Hyunsik Choi
>> >   * Byung-Gon Chun
>> >   * Markus Weimer
>> >   * Reynold Xin
>> >
>> > === Sponsoring Entity ===
>> > The Apache Incubator
>> >
>> >
>> >
>>
>> --
>> Jean-Baptiste Onofré
>> [hidden email]
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
>
> --
> Byung-Gon Chun
>



--
Byung-Gon Chun
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Jean-Baptiste Onofré
In reply to this post by Byung-Gon Chun
Hi,

Coral is a good name !

Does the code belong to Seoul National University ? In that case, in addition of
your ICLA, we would need a SGA (it's not blocker for the project bootstrapping
or code donation, but we, at least, will need it later for graduation). On the
other hand, if the committers are all part on the university, you can also sign
a CCLA.

Happy to be mentor on the project if you want me ! ;)

Thanks,
Regards
JB

On 01/30/2018 10:17 AM, Byung-Gon Chun wrote:

> Thanks for the comments, JB!
> My replies are inlined below.
>
> On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré <[hidden email]>
> wrote:
>
>> Hi,
>>
>> sorry to be a little bit late on this.
>>
>> It's a very interesting proposal. It sounds pretty close to the portability
>> layer we want to add in Apache Beam. I would love to see interaction
>> between the
>> two communities.
>>
>> I have two minor questions:
>>
>> 1. about the name: Onyx sounds very generic and the name is used in other
>> technologies. Maybe another unique name would be more accurate.
>>
>
> We proposed Coral instead. How does this sound?
>
>
>> 2. the Onyx code is on github right now, under the Apache 2.0 license.
>> Does this
>> code has any affiliation with companies ? Meaning that we would need a SGA
>> for
>> the code donation.
>>
>> It does not. The developers are affiliated with Seoul National University.
> In this case, do we still need a SGA?
>
>
>> If you need any help for the incubation, I would be more than happy to
>> help !
>>
>>
> Thanks for the offer. Would you be interested in being a mentor of the
> project?
>
> Thanks.
> -Gon
>
>
>
>> Regards
>> JB
>>
>> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote:
>>> Dear Apache Incubator Community,
>>>
>>> Please accept the following proposal for presentation and discussion:
>>> https://wiki.apache.org/incubator/OnyxProposal
>>>
>>> Onyx is a data processing system that aims to flexibly control the
>> runtime
>>> behaviors of a job to adapt to varying deployment characteristics (e.g.,
>>> harnessing transient resources in datacenters, cross-datacenter
>> deployment,
>>> changing runtime based on job characteristics, etc.). Onyx provides ways
>> to
>>> extend the system’s capabilities and incorporate the extensions to the
>>> flexible job execution.
>>> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an
>>> Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
>>> based on a deployment policy.
>>>
>>> I've attached the proposal below.
>>>
>>> Best regards,
>>> Byung-Gon Chun
>>>
>>> = OnyxProposal =
>>>
>>> == Abstract ==
>>> Onyx is a data processing system for flexible employment with
>>> different execution scenarios for various deployment characteristics
>>> on clusters.
>>>
>>> == Proposal ==
>>> Today, there is a wide variety of data processing systems with
>>> different designs for better performance and datacenter efficiency.
>>> They include processing data on specific resource environments and
>>> running jobs with specific attributes. Although each system
>>> successfully solves the problems it targets, most systems are designed
>>> in the way that runtime behaviors are built tightly inside the system
>>> core to hide the complexity of distributed computing. This makes it
>>> hard for a single system to support different deployment
>>> characteristics with different runtime behaviors without substantial
>>> effort.
>>>
>>> Onyx is a data processing system that aims to flexibly control the
>>> runtime behaviors of a job to adapt to varying deployment
>>> characteristics. Moreover, it provides a means of extending the
>>> system’s capabilities and incorporating the extensions to the flexible
>>> job execution.
>>>
>>> In order to be able to easily modify runtime behaviors to adapt to
>>> varying deployment characteristics, Onyx exposes runtime behaviors to
>>> be flexibly configured and modified at both compile-time and runtime
>>> through a set of high-level graph pass interfaces.
>>>
>>> We hope to contribute to the big data processing community by enabling
>>> more flexibility and extensibility in job executions. Furthermore, we
>>> can benefit more together as a community when we work together as a
>>> community to mature the system with more use cases and understanding
>>> of diverse deployment characteristics. The Apache Software Foundation
>>> is the perfect place to achieve these aspirations.
>>>
>>> == Background ==
>>> Many data processing systems have distinctive runtime behaviors
>>> optimized and configured for specific deployment characteristics like
>>> different resource environments and for handling special job
>>> attributes.
>>>
>>> For example, much research have been conducted to overcome the
>>> challenge of running data processing jobs on cheap, unreliable
>>> transient resources. Likewise, techniques for disaggregating different
>>> types of resources, like memory, CPU and GPU, are being actively
>>> developed to use datacenter resources more efficiently. Many
>>> researchers are also working to run data processing jobs in even more
>>> diverse environments, such as across distant datacenters. Similarly,
>>> for special job attributes, many works take different approaches, such
>>> as runtime optimization, to solve problems like data skew, and to
>>> optimize systems for data processing jobs with small-scale input data.
>>>
>>> Although each of the systems performs well with the jobs and in the
>>> environments they target, they perform poorly with unconsidered cases,
>>> and do not consider supporting multiple deployment characteristics on
>>> a single system in their designs.
>>>
>>> For an application writer to optimize an application to perform well
>>> on a certain system engraved with its underlying behaviors, it
>>> requires a deep understanding of the system itself, which is an
>>> overhead that often requires a lot of time and effort. Moreover, for a
>>> developer to modify such system behaviors, it requires modifications
>>> of the system core, which requires an even deeper understanding of the
>>> system itself.
>>>
>>> With this background, Onyx is designed to represent all of its jobs as
>>> an Intermediate Representation (IR) DAG. In the Onyx compiler, user
>>> applications from various programming models (ex. Apache Beam) are
>>> submitted, transformed to an IR DAG, and optimized/customized for the
>>> deployment characteristics. In the IR DAG optimization phase, the DAG
>>> is modified through a series of compiler “passes” which reshape or
>>> annotate the DAG with an expression of the underlying runtime
>>> behaviors. The IR DAG is then submitted as an execution plan for the
>>> Onyx runtime. The runtime includes the unmodified parts of data
>>> processing in the backbone which is transparently integrated with
>>> configurable components exposed for further extension.
>>>
>>> == Rationale ==
>>> Onyx’s vision lies in providing means for flexibly supporting a wide
>>> variety of job execution scenarios for users while facilitating system
>>> developers to extend the execution framework with various
>>> functionalities at the same time. The capabilities of the system can
>>> be extended as it grows to meet a more variety of execution scenarios.
>>> We require inputs from users and developers from diverse domains in
>>> order to make it a more thriving and useful project. The Apache
>>> Software Foundation provides the best tools and community to support
>>> this vision.
>>>
>>> == Initial Goals ==
>>> Initial goals will be to move the existing codebase to Apache and
>>> integrate with the Apache development process. We further plan to
>>> develop our system to meet the needs for more execution scenarios for
>>> a more variety of deployment characteristics.
>>>
>>> == Current Status ==
>>> Onyx codebase is currently hosted in a repository at github.com. The
>>> current version has been developed by system developers at Seoul
>>> National University, Viva Republica, Samsung, and LG.
>>>
>>> == Meritocracy ==
>>> We plan to strongly support meritocracy. We will discuss the
>>> requirements in an open forum, and those that continuously contribute
>>> to Onyx with the passion to strengthen the system will be invited as
>>> committers. Contributors that enrich Onyx by providing various use
>>> cases, various implementations of the configurable components
>>> including ideas for optimization techniques will be especially
>>> welcome. Committers with a deep understanding of the system’s
>>> technical aspects as a whole and its philosophy will definitely be
>>> voted as the PMC. We will monitor community participation so that
>>> privileges can be extended to those that contribute.
>>>
>>> == Community ==
>>> We hope to expand our contribution community by becoming an Apache
>>> incubator project. The contributions will come from both users and
>>> system developers interested in flexibility and extensibility of job
>>> executions that Onyx can support. We expect users to mainly contribute
>>> to diversify the use cases and deployment characteristics, and
>>> developers to  contribute to implement them.
>>>
>>> == Alignment ==
>>> Apache Spark is one of many popular data processing frameworks. The
>>> system is designed towards optimizing jobs using RDDs in memory and
>>> many other optimizations built tightly within the framework. In
>>> contrast to Spark, Onyx aims to provide more flexibility for job
>>> execution in an easy manner.
>>>
>>> Apache Tez enables developers to build complex task DAGs with control
>>> over the control plane of job execution. In Onyx, a high-level
>>> programming layer (ex. Apache Beam) is automatically converted to a
>>> basic IR DAG and can be converted to any IR DAG through a series of
>>> easy user writable passes, that can both reshape and modify the
>>> annotation (of execution properties) of the DAG. Moreover, Onyx leaves
>>> more parts of the job execution configurable, such as the scheduler
>>> and the data plane. As opposed to providing a set of properties for
>>> solid optimization, Onyx’s configurable parts can be easily extended
>>> and explored by implementing the pre-defined interfaces. For example,
>>> an arbitrary intermediate data store can be added.
>>>
>>> Onyx currently supports Apache Beam programs and we are working on
>>> supporting Apache Spark programs as well. Onyx also utilizes Apache
>>> REEF for container management, which allows Onyx to run in Apache YARN
>>> and Apache Mesos clusters. If necessary, we plan to contribute to and
>>> collaborate with these other Apache projects for the benefit of all.
>>> We plan to extend such integrations with more Apache softwares. Apache
>>> software foundation already hosts many major big-data systems, and we
>>> expect to help further growth of the big-data community by having Onyx
>>> within the Apache foundation.
>>>
>>> == Known Risks ==
>>> === Orphaned Products ===
>>> The risk of the Onyx project being orphaned is minimal. There is
>>> already plenty of work that arduously support different deployment
>>> characteristics, and we propose a general way to implement them with
>>> flexible and extensible configuration knobs. The domain of data
>>> processing is already of high interest, and this domain is expected to
>>> evolve continuously with various other purposes, such as resource
>>> disaggregation and using transient resources for better datacenter
>>> resource utilization.
>>>
>>> === Inexperience with Open Source ===
>>> The initial committers include PMC members and committers of other
>>> Apache projects. They have experience with open source projects,
>>> starting from their incubation to the top-level. They have been
>>> involved in the open source development process, and are familiar with
>>> releasing code under an open source license.
>>>
>>> === Homogeneous Developers ===
>>> The initial set of committers is from a limited set of organizations,
>>> but we expect to attract new contributors from diverse organizations
>>> and will thus grow organically once approved for incubation. Our prior
>>> experience with other open source projects will help various
>>> contributors to actively participate in our project.
>>>
>>> === Reliance on Salaried Developers ===
>>> Many developers are from Seoul National University. This is not
>> applicable.
>>>
>>> === Relationships with Other Apache Products ===
>>> Onyx positions itself among multiple Apache products. It runs on
>>> Apache REEF for container management. It also utilizes many useful
>>> development tools including Apache Maven, Apache Log4J, and multiple
>>> Apache Commons components. Onyx supports the Apache Beam programming
>>> model for user applications. We are currently working on supporting
>>> the Apache Spark programming APIs as well.
>>>
>>> === An Excessive Fascination with the Apache Brand ===
>>> We hope to make Onyx a powerful system for data processing, meeting
>>> various needs for different deployment characteristics, under a more
>>> variety of environments. We see the limitations of simply putting code
>>> on GitHub, and we believe the Apache community will help the growth of
>>> Onyx for the project to become a positively impactful and innovative
>>> open source software. We believe Onyx is a great fit for the Apache
>>> Software Foundation due to the collaboration it aims to achieve from
>>> the big data processing community.
>>>
>>> == Documentation ==
>>> The current documentation for Onyx is at https://snuspl.github.io/onyx/.
>>>
>>> == Initial Source ==
>>> The Onyx codebase is currently hosted at https://github.com/snuspl/onyx.
>>>
>>> == External Dependencies ==
>>> To the best of our knowledge, all Onyx dependencies are distributed
>>> under Apache compatible licenses. Upon acceptance to the incubator, we
>>> would begin a thorough analysis of all transitive dependencies to
>>> verify this fact and further introduce license checking into the build
>>> and release process.
>>>
>>> == Cryptography ==
>>> Not applicable.
>>>
>>> == Required Resources ==
>>> === Mailing Lists ===
>>> We will operate two mailing lists as follows:
>>>    * Onyx PMC discussions: [hidden email]
>>>    * Onyx developers: [hidden email]
>>>
>>> === Git Repositories ===
>>> Upon incubation: https://github.com/apache/incubator-onyx.
>>> After the incubation, we would like to move the existing repo
>>> https://github.com/snuspl/onyx to the Apache infrastructure
>>>
>>> === Issue Tracking ===
>>> Onyx currently tracks its issues using the Github issue tracker:
>>> https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
>>> JIRA.
>>>
>>> == Initial Committers ==
>>>   * Byung-Gon Chun
>>>   * Jeongyoon Eo
>>>   * Geon-Woo Kim
>>>   * Joo Yeon Kim
>>>   * Gyewon Lee
>>>   * Jung-Gil Lee
>>>   * Sanha Lee
>>>   * Wooyeon Lee
>>>   * Yunseong Lee
>>>   * JangHo Seo
>>>   * Won Wook Song
>>>   * Taegeon Um
>>>   * Youngseok Yang
>>>
>>> == Affiliations ==
>>>   * SNU (Seoul National University)
>>>     * Byung-Gon Chun
>>>     * Jeongyoon Eo
>>>     * Geon-Woo Kim
>>>     * Gyewon Lee
>>>     * Sanha Lee
>>>     * Wooyeon Lee
>>>     * Yunseong Lee
>>>     * JangHo Seo
>>>     * Won Wook Song
>>>     * Taegeon Um
>>>     * Youngseok Yang
>>>
>>>   * LG
>>>     * Jung-Gil Lee
>>>
>>>   * Samsung
>>>     * Joo Yeon Kim
>>>
>>>   * Viva Republica
>>>     * Geon-Woo Kim
>>>
>>> == Sponsors ==
>>> === Champions ===
>>> Byung-Gon Chun
>>>
>>> === Mentors ===
>>>   * Hyunsik Choi
>>>   * Byung-Gon Chun
>>>   * Markus Weimer
>>>   * Reynold Xin
>>>
>>> === Sponsoring Entity ===
>>> The Apache Incubator
>>>
>>>
>>>
>>
>> --
>> Jean-Baptiste Onofré
>> [hidden email]
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
>

--
Jean-Baptiste Onofré
[hidden email]
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Byung-Gon Chun
On Wed, Jan 31, 2018 at 4:04 PM, Jean-Baptiste Onofré <[hidden email]>
wrote:

> Hi,
>
> Coral is a good name !
>

Thanks!


>
> Does the code belong to Seoul National University ? In that case, in
> addition of
> your ICLA, we would need a SGA (it's not blocker for the project
> bootstrapping
> or code donation, but we, at least, will need it later for graduation). On
> the
> other hand, if the committers are all part on the university, you can also
> sign
> a CCLA.
>

I will figure this out.


>
> Happy to be mentor on the project if you want me ! ;)
>
>
Thanks! I will add you to the mentor list.

-Gon


> Thanks,
> Regards
> JB
>
> On 01/30/2018 10:17 AM, Byung-Gon Chun wrote:
> > Thanks for the comments, JB!
> > My replies are inlined below.
> >
> > On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré <[hidden email]>
> > wrote:
> >
> >> Hi,
> >>
> >> sorry to be a little bit late on this.
> >>
> >> It's a very interesting proposal. It sounds pretty close to the
> portability
> >> layer we want to add in Apache Beam. I would love to see interaction
> >> between the
> >> two communities.
> >>
> >> I have two minor questions:
> >>
> >> 1. about the name: Onyx sounds very generic and the name is used in
> other
> >> technologies. Maybe another unique name would be more accurate.
> >>
> >
> > We proposed Coral instead. How does this sound?
> >
> >
> >> 2. the Onyx code is on github right now, under the Apache 2.0 license.
> >> Does this
> >> code has any affiliation with companies ? Meaning that we would need a
> SGA
> >> for
> >> the code donation.
> >>
> >> It does not. The developers are affiliated with Seoul National
> University.
> > In this case, do we still need a SGA?
> >
> >
> >> If you need any help for the incubation, I would be more than happy to
> >> help !
> >>
> >>
> > Thanks for the offer. Would you be interested in being a mentor of the
> > project?
> >
> > Thanks.
> > -Gon
> >
> >
> >
> >> Regards
> >> JB
> >>
> >> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote:
> >>> Dear Apache Incubator Community,
> >>>
> >>> Please accept the following proposal for presentation and discussion:
> >>> https://wiki.apache.org/incubator/OnyxProposal
> >>>
> >>> Onyx is a data processing system that aims to flexibly control the
> >> runtime
> >>> behaviors of a job to adapt to varying deployment characteristics
> (e.g.,
> >>> harnessing transient resources in datacenters, cross-datacenter
> >> deployment,
> >>> changing runtime based on job characteristics, etc.). Onyx provides
> ways
> >> to
> >>> extend the system’s capabilities and incorporate the extensions to the
> >>> flexible job execution.
> >>> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into
> an
> >>> Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
> >>> based on a deployment policy.
> >>>
> >>> I've attached the proposal below.
> >>>
> >>> Best regards,
> >>> Byung-Gon Chun
> >>>
> >>> = OnyxProposal =
> >>>
> >>> == Abstract ==
> >>> Onyx is a data processing system for flexible employment with
> >>> different execution scenarios for various deployment characteristics
> >>> on clusters.
> >>>
> >>> == Proposal ==
> >>> Today, there is a wide variety of data processing systems with
> >>> different designs for better performance and datacenter efficiency.
> >>> They include processing data on specific resource environments and
> >>> running jobs with specific attributes. Although each system
> >>> successfully solves the problems it targets, most systems are designed
> >>> in the way that runtime behaviors are built tightly inside the system
> >>> core to hide the complexity of distributed computing. This makes it
> >>> hard for a single system to support different deployment
> >>> characteristics with different runtime behaviors without substantial
> >>> effort.
> >>>
> >>> Onyx is a data processing system that aims to flexibly control the
> >>> runtime behaviors of a job to adapt to varying deployment
> >>> characteristics. Moreover, it provides a means of extending the
> >>> system’s capabilities and incorporating the extensions to the flexible
> >>> job execution.
> >>>
> >>> In order to be able to easily modify runtime behaviors to adapt to
> >>> varying deployment characteristics, Onyx exposes runtime behaviors to
> >>> be flexibly configured and modified at both compile-time and runtime
> >>> through a set of high-level graph pass interfaces.
> >>>
> >>> We hope to contribute to the big data processing community by enabling
> >>> more flexibility and extensibility in job executions. Furthermore, we
> >>> can benefit more together as a community when we work together as a
> >>> community to mature the system with more use cases and understanding
> >>> of diverse deployment characteristics. The Apache Software Foundation
> >>> is the perfect place to achieve these aspirations.
> >>>
> >>> == Background ==
> >>> Many data processing systems have distinctive runtime behaviors
> >>> optimized and configured for specific deployment characteristics like
> >>> different resource environments and for handling special job
> >>> attributes.
> >>>
> >>> For example, much research have been conducted to overcome the
> >>> challenge of running data processing jobs on cheap, unreliable
> >>> transient resources. Likewise, techniques for disaggregating different
> >>> types of resources, like memory, CPU and GPU, are being actively
> >>> developed to use datacenter resources more efficiently. Many
> >>> researchers are also working to run data processing jobs in even more
> >>> diverse environments, such as across distant datacenters. Similarly,
> >>> for special job attributes, many works take different approaches, such
> >>> as runtime optimization, to solve problems like data skew, and to
> >>> optimize systems for data processing jobs with small-scale input data.
> >>>
> >>> Although each of the systems performs well with the jobs and in the
> >>> environments they target, they perform poorly with unconsidered cases,
> >>> and do not consider supporting multiple deployment characteristics on
> >>> a single system in their designs.
> >>>
> >>> For an application writer to optimize an application to perform well
> >>> on a certain system engraved with its underlying behaviors, it
> >>> requires a deep understanding of the system itself, which is an
> >>> overhead that often requires a lot of time and effort. Moreover, for a
> >>> developer to modify such system behaviors, it requires modifications
> >>> of the system core, which requires an even deeper understanding of the
> >>> system itself.
> >>>
> >>> With this background, Onyx is designed to represent all of its jobs as
> >>> an Intermediate Representation (IR) DAG. In the Onyx compiler, user
> >>> applications from various programming models (ex. Apache Beam) are
> >>> submitted, transformed to an IR DAG, and optimized/customized for the
> >>> deployment characteristics. In the IR DAG optimization phase, the DAG
> >>> is modified through a series of compiler “passes” which reshape or
> >>> annotate the DAG with an expression of the underlying runtime
> >>> behaviors. The IR DAG is then submitted as an execution plan for the
> >>> Onyx runtime. The runtime includes the unmodified parts of data
> >>> processing in the backbone which is transparently integrated with
> >>> configurable components exposed for further extension.
> >>>
> >>> == Rationale ==
> >>> Onyx’s vision lies in providing means for flexibly supporting a wide
> >>> variety of job execution scenarios for users while facilitating system
> >>> developers to extend the execution framework with various
> >>> functionalities at the same time. The capabilities of the system can
> >>> be extended as it grows to meet a more variety of execution scenarios.
> >>> We require inputs from users and developers from diverse domains in
> >>> order to make it a more thriving and useful project. The Apache
> >>> Software Foundation provides the best tools and community to support
> >>> this vision.
> >>>
> >>> == Initial Goals ==
> >>> Initial goals will be to move the existing codebase to Apache and
> >>> integrate with the Apache development process. We further plan to
> >>> develop our system to meet the needs for more execution scenarios for
> >>> a more variety of deployment characteristics.
> >>>
> >>> == Current Status ==
> >>> Onyx codebase is currently hosted in a repository at github.com. The
> >>> current version has been developed by system developers at Seoul
> >>> National University, Viva Republica, Samsung, and LG.
> >>>
> >>> == Meritocracy ==
> >>> We plan to strongly support meritocracy. We will discuss the
> >>> requirements in an open forum, and those that continuously contribute
> >>> to Onyx with the passion to strengthen the system will be invited as
> >>> committers. Contributors that enrich Onyx by providing various use
> >>> cases, various implementations of the configurable components
> >>> including ideas for optimization techniques will be especially
> >>> welcome. Committers with a deep understanding of the system’s
> >>> technical aspects as a whole and its philosophy will definitely be
> >>> voted as the PMC. We will monitor community participation so that
> >>> privileges can be extended to those that contribute.
> >>>
> >>> == Community ==
> >>> We hope to expand our contribution community by becoming an Apache
> >>> incubator project. The contributions will come from both users and
> >>> system developers interested in flexibility and extensibility of job
> >>> executions that Onyx can support. We expect users to mainly contribute
> >>> to diversify the use cases and deployment characteristics, and
> >>> developers to  contribute to implement them.
> >>>
> >>> == Alignment ==
> >>> Apache Spark is one of many popular data processing frameworks. The
> >>> system is designed towards optimizing jobs using RDDs in memory and
> >>> many other optimizations built tightly within the framework. In
> >>> contrast to Spark, Onyx aims to provide more flexibility for job
> >>> execution in an easy manner.
> >>>
> >>> Apache Tez enables developers to build complex task DAGs with control
> >>> over the control plane of job execution. In Onyx, a high-level
> >>> programming layer (ex. Apache Beam) is automatically converted to a
> >>> basic IR DAG and can be converted to any IR DAG through a series of
> >>> easy user writable passes, that can both reshape and modify the
> >>> annotation (of execution properties) of the DAG. Moreover, Onyx leaves
> >>> more parts of the job execution configurable, such as the scheduler
> >>> and the data plane. As opposed to providing a set of properties for
> >>> solid optimization, Onyx’s configurable parts can be easily extended
> >>> and explored by implementing the pre-defined interfaces. For example,
> >>> an arbitrary intermediate data store can be added.
> >>>
> >>> Onyx currently supports Apache Beam programs and we are working on
> >>> supporting Apache Spark programs as well. Onyx also utilizes Apache
> >>> REEF for container management, which allows Onyx to run in Apache YARN
> >>> and Apache Mesos clusters. If necessary, we plan to contribute to and
> >>> collaborate with these other Apache projects for the benefit of all.
> >>> We plan to extend such integrations with more Apache softwares. Apache
> >>> software foundation already hosts many major big-data systems, and we
> >>> expect to help further growth of the big-data community by having Onyx
> >>> within the Apache foundation.
> >>>
> >>> == Known Risks ==
> >>> === Orphaned Products ===
> >>> The risk of the Onyx project being orphaned is minimal. There is
> >>> already plenty of work that arduously support different deployment
> >>> characteristics, and we propose a general way to implement them with
> >>> flexible and extensible configuration knobs. The domain of data
> >>> processing is already of high interest, and this domain is expected to
> >>> evolve continuously with various other purposes, such as resource
> >>> disaggregation and using transient resources for better datacenter
> >>> resource utilization.
> >>>
> >>> === Inexperience with Open Source ===
> >>> The initial committers include PMC members and committers of other
> >>> Apache projects. They have experience with open source projects,
> >>> starting from their incubation to the top-level. They have been
> >>> involved in the open source development process, and are familiar with
> >>> releasing code under an open source license.
> >>>
> >>> === Homogeneous Developers ===
> >>> The initial set of committers is from a limited set of organizations,
> >>> but we expect to attract new contributors from diverse organizations
> >>> and will thus grow organically once approved for incubation. Our prior
> >>> experience with other open source projects will help various
> >>> contributors to actively participate in our project.
> >>>
> >>> === Reliance on Salaried Developers ===
> >>> Many developers are from Seoul National University. This is not
> >> applicable.
> >>>
> >>> === Relationships with Other Apache Products ===
> >>> Onyx positions itself among multiple Apache products. It runs on
> >>> Apache REEF for container management. It also utilizes many useful
> >>> development tools including Apache Maven, Apache Log4J, and multiple
> >>> Apache Commons components. Onyx supports the Apache Beam programming
> >>> model for user applications. We are currently working on supporting
> >>> the Apache Spark programming APIs as well.
> >>>
> >>> === An Excessive Fascination with the Apache Brand ===
> >>> We hope to make Onyx a powerful system for data processing, meeting
> >>> various needs for different deployment characteristics, under a more
> >>> variety of environments. We see the limitations of simply putting code
> >>> on GitHub, and we believe the Apache community will help the growth of
> >>> Onyx for the project to become a positively impactful and innovative
> >>> open source software. We believe Onyx is a great fit for the Apache
> >>> Software Foundation due to the collaboration it aims to achieve from
> >>> the big data processing community.
> >>>
> >>> == Documentation ==
> >>> The current documentation for Onyx is at
> https://snuspl.github.io/onyx/.
> >>>
> >>> == Initial Source ==
> >>> The Onyx codebase is currently hosted at
> https://github.com/snuspl/onyx.
> >>>
> >>> == External Dependencies ==
> >>> To the best of our knowledge, all Onyx dependencies are distributed
> >>> under Apache compatible licenses. Upon acceptance to the incubator, we
> >>> would begin a thorough analysis of all transitive dependencies to
> >>> verify this fact and further introduce license checking into the build
> >>> and release process.
> >>>
> >>> == Cryptography ==
> >>> Not applicable.
> >>>
> >>> == Required Resources ==
> >>> === Mailing Lists ===
> >>> We will operate two mailing lists as follows:
> >>>    * Onyx PMC discussions: [hidden email]
> >>>    * Onyx developers: [hidden email]
> >>>
> >>> === Git Repositories ===
> >>> Upon incubation: https://github.com/apache/incubator-onyx.
> >>> After the incubation, we would like to move the existing repo
> >>> https://github.com/snuspl/onyx to the Apache infrastructure
> >>>
> >>> === Issue Tracking ===
> >>> Onyx currently tracks its issues using the Github issue tracker:
> >>> https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
> >>> JIRA.
> >>>
> >>> == Initial Committers ==
> >>>   * Byung-Gon Chun
> >>>   * Jeongyoon Eo
> >>>   * Geon-Woo Kim
> >>>   * Joo Yeon Kim
> >>>   * Gyewon Lee
> >>>   * Jung-Gil Lee
> >>>   * Sanha Lee
> >>>   * Wooyeon Lee
> >>>   * Yunseong Lee
> >>>   * JangHo Seo
> >>>   * Won Wook Song
> >>>   * Taegeon Um
> >>>   * Youngseok Yang
> >>>
> >>> == Affiliations ==
> >>>   * SNU (Seoul National University)
> >>>     * Byung-Gon Chun
> >>>     * Jeongyoon Eo
> >>>     * Geon-Woo Kim
> >>>     * Gyewon Lee
> >>>     * Sanha Lee
> >>>     * Wooyeon Lee
> >>>     * Yunseong Lee
> >>>     * JangHo Seo
> >>>     * Won Wook Song
> >>>     * Taegeon Um
> >>>     * Youngseok Yang
> >>>
> >>>   * LG
> >>>     * Jung-Gil Lee
> >>>
> >>>   * Samsung
> >>>     * Joo Yeon Kim
> >>>
> >>>   * Viva Republica
> >>>     * Geon-Woo Kim
> >>>
> >>> == Sponsors ==
> >>> === Champions ===
> >>> Byung-Gon Chun
> >>>
> >>> === Mentors ===
> >>>   * Hyunsik Choi
> >>>   * Byung-Gon Chun
> >>>   * Markus Weimer
> >>>   * Reynold Xin
> >>>
> >>> === Sponsoring Entity ===
> >>> The Apache Incubator
> >>>
> >>>
> >>>
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> [hidden email]
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
> >
> >
>
> --
> Jean-Baptiste Onofré
> [hidden email]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Byung-Gon Chun
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

John D. Ament-2
In reply to this post by Byung-Gon Chun
Sorry for mid-posting.

This isn't the list to determine if a project name is suitable.  There's a
JIRA project dedicated to that, and if you need a quick answer better to
email trademarks@ to get a more precise answer.

The question is really going to be, is "Apache Onyx" going to be easily
confused with something else.

John

On Sun, Jan 28, 2018 at 4:50 AM Byung-Gon Chun <[hidden email]> wrote:

> Thank you for all the information! It looks like Surf doesn't work.
>
> If possible, we'd like to keep Onyx.
> Another name we came up with is Coral.
>
> Thanks!
> -Gon
>
>
> On Sun, Jan 28, 2018 at 4:21 AM, Leif Hedstrom <[hidden email]> wrote:
>
> > Did we rule out Onyx for sure? Just because some other project might use
> > it on say github doesn’t necessarily exclude us from having an Apache
> Onyx?
> >
> > FWIW, I agree that surf is too similar in pronunciation to Apache serf.
> :)
> >
> > Cheers,
> >
> > — Leif
> >
> > > On Jan 27, 2018, at 07:31, Dave Fisher <[hidden email]> wrote:
> > >
> > > Checking “Serf Software” which sounds the same.
> > >
> > > (1) there is already Apache Serf
> > > (2) Serf is a product from Hashicorp at https://www.serf.io/. This
> > would definitely confuse as it is apparently comparable to ZooKeeper.
> > >
> > > Regards,
> > > Dave
> > >
> > > Sent from my iPhone
> > >
> > >> On Jan 27, 2018, at 3:12 AM, sebb <[hidden email]> wrote:
> > >>
> > >> A brief search for 'Surf Software' shows quite a few hits.
> > >> I have not looked to see if they would be likely to be confused with
> > >> this project or cause problems for others.
> > >>
> > >> But it as though there might be a problem:
> > >> Surfer -  Golden Software
> > >> surf @ sourceforge
> > >> Surf Software company
> > >>
> > >>
> > >>> On 27 January 2018 at 08:03, Byung-Gon Chun <[hidden email]>
> wrote:
> > >>> Since we cannot use the name Onyx, we would like to change the
> project
> > name
> > >>> to Surf.
> > >>> I hope that this name works.
> > >>>
> > >>> -Gon
> > >>>
> > >>> ---
> > >>> Byung-Gon Chun
> > >>>
> > >>>
> > >>>> On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun <[hidden email]>
> > wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>>> On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci <[hidden email]>
> > wrote:
> > >>>>>
> > >>>>> Great work -- I think this technology has a lot of promise, and I'd
> > love
> > >>>>> to
> > >>>>> see its evolution inside the Foundation.
> > >>>>>
> > >>>>>
> > >>>> Thanks, Davor!
> > >>>>
> > >>>>
> > >>>>> Parts of it, like the Onyx Intermediate Representation [1], overlap
> > with
> > >>>>> the work-in-progress inside the Apache Beam project
> ("portability").
> > We'd
> > >>>>> love to work together on this -- would you be open to such
> > collaboration?
> > >>>>> If so, it may not be necessary to start from scratch, and leverage
> > the
> > >>>>> work
> > >>>>> already done.
> > >>>>>
> > >>>>>
> > >>>> Sure. We're open to collaboration.
> > >>>>
> > >>>>
> > >>>>> Regarding the name, Onyx would likely have to be renamed, due to a
> > >>>>> conflict
> > >>>>> with a related technology [2].
> > >>>>>
> > >>>>>
> > >>>> Thanks for pointing it out. It's difficult to come up with a good
> > short
> > >>>> name. :)
> > >>>> Do you have any suggestion?
> > >>>>
> > >>>> Thanks!
> > >>>> -Gon
> > >>>>
> > >>>> ---
> > >>>> Byung-Gon Chun
> > >>>>
> > >>>>
> > >>>>
> > >>>>> Davor
> > >>>>>
> > >>>>> [1] https://snuspl.github.io/onyx/docs/ir/
> > >>>>> [2] http://www.onyxplatform.org/
> > >>>>>
> > >>>>>> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun <[hidden email]
> >
> > wrote:
> > >>>>>>
> > >>>>>> Dear Apache Incubator Community,
> > >>>>>>
> > >>>>>> Please accept the following proposal for presentation and
> > discussion:
> > >>>>>> https://wiki.apache.org/incubator/OnyxProposal
> > >>>>>>
> > >>>>>> Onyx is a data processing system that aims to flexibly control the
> > >>>>> runtime
> > >>>>>> behaviors of a job to adapt to varying deployment characteristics
> > (e.g.,
> > >>>>>> harnessing transient resources in datacenters, cross-datacenter
> > >>>>> deployment,
> > >>>>>> changing runtime based on job characteristics, etc.). Onyx
> provides
> > >>>>> ways to
> > >>>>>> extend the system’s capabilities and incorporate the extensions to
> > the
> > >>>>>> flexible job execution.
> > >>>>>> Onyx translates a user program (e.g., Apache Beam, Apache Spark)
> > into an
> > >>>>>> Intermediate Representation (IR) DAG, which Onyx optimizes and
> > deploys
> > >>>>>> based on a deployment policy.
> > >>>>>>
> > >>>>>> I've attached the proposal below.
> > >>>>>>
> > >>>>>> Best regards,
> > >>>>>> Byung-Gon Chun
> > >>>>>>
> > >>>>>> = OnyxProposal =
> > >>>>>>
> > >>>>>> == Abstract ==
> > >>>>>> Onyx is a data processing system for flexible employment with
> > >>>>>> different execution scenarios for various deployment
> characteristics
> > >>>>>> on clusters.
> > >>>>>>
> > >>>>>> == Proposal ==
> > >>>>>> Today, there is a wide variety of data processing systems with
> > >>>>>> different designs for better performance and datacenter
> efficiency.
> > >>>>>> They include processing data on specific resource environments and
> > >>>>>> running jobs with specific attributes. Although each system
> > >>>>>> successfully solves the problems it targets, most systems are
> > designed
> > >>>>>> in the way that runtime behaviors are built tightly inside the
> > system
> > >>>>>> core to hide the complexity of distributed computing. This makes
> it
> > >>>>>> hard for a single system to support different deployment
> > >>>>>> characteristics with different runtime behaviors without
> substantial
> > >>>>>> effort.
> > >>>>>>
> > >>>>>> Onyx is a data processing system that aims to flexibly control the
> > >>>>>> runtime behaviors of a job to adapt to varying deployment
> > >>>>>> characteristics. Moreover, it provides a means of extending the
> > >>>>>> system’s capabilities and incorporating the extensions to the
> > flexible
> > >>>>>> job execution.
> > >>>>>>
> > >>>>>> In order to be able to easily modify runtime behaviors to adapt to
> > >>>>>> varying deployment characteristics, Onyx exposes runtime behaviors
> > to
> > >>>>>> be flexibly configured and modified at both compile-time and
> runtime
> > >>>>>> through a set of high-level graph pass interfaces.
> > >>>>>>
> > >>>>>> We hope to contribute to the big data processing community by
> > enabling
> > >>>>>> more flexibility and extensibility in job executions. Furthermore,
> > we
> > >>>>>> can benefit more together as a community when we work together as
> a
> > >>>>>> community to mature the system with more use cases and
> understanding
> > >>>>>> of diverse deployment characteristics. The Apache Software
> > Foundation
> > >>>>>> is the perfect place to achieve these aspirations.
> > >>>>>>
> > >>>>>> == Background ==
> > >>>>>> Many data processing systems have distinctive runtime behaviors
> > >>>>>> optimized and configured for specific deployment characteristics
> > like
> > >>>>>> different resource environments and for handling special job
> > >>>>>> attributes.
> > >>>>>>
> > >>>>>> For example, much research have been conducted to overcome the
> > >>>>>> challenge of running data processing jobs on cheap, unreliable
> > >>>>>> transient resources. Likewise, techniques for disaggregating
> > different
> > >>>>>> types of resources, like memory, CPU and GPU, are being actively
> > >>>>>> developed to use datacenter resources more efficiently. Many
> > >>>>>> researchers are also working to run data processing jobs in even
> > more
> > >>>>>> diverse environments, such as across distant datacenters.
> Similarly,
> > >>>>>> for special job attributes, many works take different approaches,
> > such
> > >>>>>> as runtime optimization, to solve problems like data skew, and to
> > >>>>>> optimize systems for data processing jobs with small-scale input
> > data.
> > >>>>>>
> > >>>>>> Although each of the systems performs well with the jobs and in
> the
> > >>>>>> environments they target, they perform poorly with unconsidered
> > cases,
> > >>>>>> and do not consider supporting multiple deployment characteristics
> > on
> > >>>>>> a single system in their designs.
> > >>>>>>
> > >>>>>> For an application writer to optimize an application to perform
> well
> > >>>>>> on a certain system engraved with its underlying behaviors, it
> > >>>>>> requires a deep understanding of the system itself, which is an
> > >>>>>> overhead that often requires a lot of time and effort. Moreover,
> > for a
> > >>>>>> developer to modify such system behaviors, it requires
> modifications
> > >>>>>> of the system core, which requires an even deeper understanding of
> > the
> > >>>>>> system itself.
> > >>>>>>
> > >>>>>> With this background, Onyx is designed to represent all of its
> jobs
> > as
> > >>>>>> an Intermediate Representation (IR) DAG. In the Onyx compiler,
> user
> > >>>>>> applications from various programming models (ex. Apache Beam) are
> > >>>>>> submitted, transformed to an IR DAG, and optimized/customized for
> > the
> > >>>>>> deployment characteristics. In the IR DAG optimization phase, the
> > DAG
> > >>>>>> is modified through a series of compiler “passes” which reshape or
> > >>>>>> annotate the DAG with an expression of the underlying runtime
> > >>>>>> behaviors. The IR DAG is then submitted as an execution plan for
> the
> > >>>>>> Onyx runtime. The runtime includes the unmodified parts of data
> > >>>>>> processing in the backbone which is transparently integrated with
> > >>>>>> configurable components exposed for further extension.
> > >>>>>>
> > >>>>>> == Rationale ==
> > >>>>>> Onyx’s vision lies in providing means for flexibly supporting a
> wide
> > >>>>>> variety of job execution scenarios for users while facilitating
> > system
> > >>>>>> developers to extend the execution framework with various
> > >>>>>> functionalities at the same time. The capabilities of the system
> can
> > >>>>>> be extended as it grows to meet a more variety of execution
> > scenarios.
> > >>>>>> We require inputs from users and developers from diverse domains
> in
> > >>>>>> order to make it a more thriving and useful project. The Apache
> > >>>>>> Software Foundation provides the best tools and community to
> support
> > >>>>>> this vision.
> > >>>>>>
> > >>>>>> == Initial Goals ==
> > >>>>>> Initial goals will be to move the existing codebase to Apache and
> > >>>>>> integrate with the Apache development process. We further plan to
> > >>>>>> develop our system to meet the needs for more execution scenarios
> > for
> > >>>>>> a more variety of deployment characteristics.
> > >>>>>>
> > >>>>>> == Current Status ==
> > >>>>>> Onyx codebase is currently hosted in a repository at github.com.
> > The
> > >>>>>> current version has been developed by system developers at Seoul
> > >>>>>> National University, Viva Republica, Samsung, and LG.
> > >>>>>>
> > >>>>>> == Meritocracy ==
> > >>>>>> We plan to strongly support meritocracy. We will discuss the
> > >>>>>> requirements in an open forum, and those that continuously
> > contribute
> > >>>>>> to Onyx with the passion to strengthen the system will be invited
> as
> > >>>>>> committers. Contributors that enrich Onyx by providing various use
> > >>>>>> cases, various implementations of the configurable components
> > >>>>>> including ideas for optimization techniques will be especially
> > >>>>>> welcome. Committers with a deep understanding of the system’s
> > >>>>>> technical aspects as a whole and its philosophy will definitely be
> > >>>>>> voted as the PMC. We will monitor community participation so that
> > >>>>>> privileges can be extended to those that contribute.
> > >>>>>>
> > >>>>>> == Community ==
> > >>>>>> We hope to expand our contribution community by becoming an Apache
> > >>>>>> incubator project. The contributions will come from both users and
> > >>>>>> system developers interested in flexibility and extensibility of
> job
> > >>>>>> executions that Onyx can support. We expect users to mainly
> > contribute
> > >>>>>> to diversify the use cases and deployment characteristics, and
> > >>>>>> developers to  contribute to implement them.
> > >>>>>>
> > >>>>>> == Alignment ==
> > >>>>>> Apache Spark is one of many popular data processing frameworks.
> The
> > >>>>>> system is designed towards optimizing jobs using RDDs in memory
> and
> > >>>>>> many other optimizations built tightly within the framework. In
> > >>>>>> contrast to Spark, Onyx aims to provide more flexibility for job
> > >>>>>> execution in an easy manner.
> > >>>>>>
> > >>>>>> Apache Tez enables developers to build complex task DAGs with
> > control
> > >>>>>> over the control plane of job execution. In Onyx, a high-level
> > >>>>>> programming layer (ex. Apache Beam) is automatically converted to
> a
> > >>>>>> basic IR DAG and can be converted to any IR DAG through a series
> of
> > >>>>>> easy user writable passes, that can both reshape and modify the
> > >>>>>> annotation (of execution properties) of the DAG. Moreover, Onyx
> > leaves
> > >>>>>> more parts of the job execution configurable, such as the
> scheduler
> > >>>>>> and the data plane. As opposed to providing a set of properties
> for
> > >>>>>> solid optimization, Onyx’s configurable parts can be easily
> extended
> > >>>>>> and explored by implementing the pre-defined interfaces. For
> > example,
> > >>>>>> an arbitrary intermediate data store can be added.
> > >>>>>>
> > >>>>>> Onyx currently supports Apache Beam programs and we are working on
> > >>>>>> supporting Apache Spark programs as well. Onyx also utilizes
> Apache
> > >>>>>> REEF for container management, which allows Onyx to run in Apache
> > YARN
> > >>>>>> and Apache Mesos clusters. If necessary, we plan to contribute to
> > and
> > >>>>>> collaborate with these other Apache projects for the benefit of
> all.
> > >>>>>> We plan to extend such integrations with more Apache softwares.
> > Apache
> > >>>>>> software foundation already hosts many major big-data systems, and
> > we
> > >>>>>> expect to help further growth of the big-data community by having
> > Onyx
> > >>>>>> within the Apache foundation.
> > >>>>>>
> > >>>>>> == Known Risks ==
> > >>>>>> === Orphaned Products ===
> > >>>>>> The risk of the Onyx project being orphaned is minimal. There is
> > >>>>>> already plenty of work that arduously support different deployment
> > >>>>>> characteristics, and we propose a general way to implement them
> with
> > >>>>>> flexible and extensible configuration knobs. The domain of data
> > >>>>>> processing is already of high interest, and this domain is
> expected
> > to
> > >>>>>> evolve continuously with various other purposes, such as resource
> > >>>>>> disaggregation and using transient resources for better datacenter
> > >>>>>> resource utilization.
> > >>>>>>
> > >>>>>> === Inexperience with Open Source ===
> > >>>>>> The initial committers include PMC members and committers of other
> > >>>>>> Apache projects. They have experience with open source projects,
> > >>>>>> starting from their incubation to the top-level. They have been
> > >>>>>> involved in the open source development process, and are familiar
> > with
> > >>>>>> releasing code under an open source license.
> > >>>>>>
> > >>>>>> === Homogeneous Developers ===
> > >>>>>> The initial set of committers is from a limited set of
> > organizations,
> > >>>>>> but we expect to attract new contributors from diverse
> organizations
> > >>>>>> and will thus grow organically once approved for incubation. Our
> > prior
> > >>>>>> experience with other open source projects will help various
> > >>>>>> contributors to actively participate in our project.
> > >>>>>>
> > >>>>>> === Reliance on Salaried Developers ===
> > >>>>>> Many developers are from Seoul National University. This is not
> > >>>>> applicable.
> > >>>>>>
> > >>>>>> === Relationships with Other Apache Products ===
> > >>>>>> Onyx positions itself among multiple Apache products. It runs on
> > >>>>>> Apache REEF for container management. It also utilizes many useful
> > >>>>>> development tools including Apache Maven, Apache Log4J, and
> multiple
> > >>>>>> Apache Commons components. Onyx supports the Apache Beam
> programming
> > >>>>>> model for user applications. We are currently working on
> supporting
> > >>>>>> the Apache Spark programming APIs as well.
> > >>>>>>
> > >>>>>> === An Excessive Fascination with the Apache Brand ===
> > >>>>>> We hope to make Onyx a powerful system for data processing,
> meeting
> > >>>>>> various needs for different deployment characteristics, under a
> more
> > >>>>>> variety of environments. We see the limitations of simply putting
> > code
> > >>>>>> on GitHub, and we believe the Apache community will help the
> growth
> > of
> > >>>>>> Onyx for the project to become a positively impactful and
> innovative
> > >>>>>> open source software. We believe Onyx is a great fit for the
> Apache
> > >>>>>> Software Foundation due to the collaboration it aims to achieve
> from
> > >>>>>> the big data processing community.
> > >>>>>>
> > >>>>>> == Documentation ==
> > >>>>>> The current documentation for Onyx is at
> > https://snuspl.github.io/onyx/
> > >>>>> .
> > >>>>>>
> > >>>>>> == Initial Source ==
> > >>>>>> The Onyx codebase is currently hosted at
> > https://github.com/snuspl/onyx
> > >>>>> .
> > >>>>>>
> > >>>>>> == External Dependencies ==
> > >>>>>> To the best of our knowledge, all Onyx dependencies are
> distributed
> > >>>>>> under Apache compatible licenses. Upon acceptance to the
> incubator,
> > we
> > >>>>>> would begin a thorough analysis of all transitive dependencies to
> > >>>>>> verify this fact and further introduce license checking into the
> > build
> > >>>>>> and release process.
> > >>>>>>
> > >>>>>> == Cryptography ==
> > >>>>>> Not applicable.
> > >>>>>>
> > >>>>>> == Required Resources ==
> > >>>>>> === Mailing Lists ===
> > >>>>>> We will operate two mailing lists as follows:
> > >>>>>>  * Onyx PMC discussions: [hidden email]
> > >>>>>>  * Onyx developers: [hidden email]
> > >>>>>>
> > >>>>>> === Git Repositories ===
> > >>>>>> Upon incubation: https://github.com/apache/incubator-onyx.
> > >>>>>> After the incubation, we would like to move the existing repo
> > >>>>>> https://github.com/snuspl/onyx to the Apache infrastructure
> > >>>>>>
> > >>>>>> === Issue Tracking ===
> > >>>>>> Onyx currently tracks its issues using the Github issue tracker:
> > >>>>>> https://github.com/snuspl/onyx/issues. We plan to migrate to
> Apache
> > >>>>>> JIRA.
> > >>>>>>
> > >>>>>> == Initial Committers ==
> > >>>>>> * Byung-Gon Chun
> > >>>>>> * Jeongyoon Eo
> > >>>>>> * Geon-Woo Kim
> > >>>>>> * Joo Yeon Kim
> > >>>>>> * Gyewon Lee
> > >>>>>> * Jung-Gil Lee
> > >>>>>> * Sanha Lee
> > >>>>>> * Wooyeon Lee
> > >>>>>> * Yunseong Lee
> > >>>>>> * JangHo Seo
> > >>>>>> * Won Wook Song
> > >>>>>> * Taegeon Um
> > >>>>>> * Youngseok Yang
> > >>>>>>
> > >>>>>> == Affiliations ==
> > >>>>>> * SNU (Seoul National University)
> > >>>>>>   * Byung-Gon Chun
> > >>>>>>   * Jeongyoon Eo
> > >>>>>>   * Geon-Woo Kim
> > >>>>>>   * Gyewon Lee
> > >>>>>>   * Sanha Lee
> > >>>>>>   * Wooyeon Lee
> > >>>>>>   * Yunseong Lee
> > >>>>>>   * JangHo Seo
> > >>>>>>   * Won Wook Song
> > >>>>>>   * Taegeon Um
> > >>>>>>   * Youngseok Yang
> > >>>>>>
> > >>>>>> * LG
> > >>>>>>   * Jung-Gil Lee
> > >>>>>>
> > >>>>>> * Samsung
> > >>>>>>   * Joo Yeon Kim
> > >>>>>>
> > >>>>>> * Viva Republica
> > >>>>>>   * Geon-Woo Kim
> > >>>>>>
> > >>>>>> == Sponsors ==
> > >>>>>> === Champions ===
> > >>>>>> Byung-Gon Chun
> > >>>>>>
> > >>>>>> === Mentors ===
> > >>>>>> * Hyunsik Choi
> > >>>>>> * Byung-Gon Chun
> > >>>>>> * Markus Weimer
> > >>>>>> * Reynold Xin
> > >>>>>>
> > >>>>>> === Sponsoring Entity ===
> > >>>>>> The Apache Incubator
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> Byung-Gon Chun
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Byung-Gon Chun
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Byung-Gon Chun
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: [hidden email]
> > >> For additional commands, e-mail: [hidden email]
> > >>
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
>
> --
> Byung-Gon Chun
>
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Jean-Baptiste Onofré
In reply to this post by Byung-Gon Chun
Thanks, much appreciated !

Regards
JB

On 01/31/2018 09:50 AM, Byung-Gon Chun wrote:

> On Wed, Jan 31, 2018 at 4:04 PM, Jean-Baptiste Onofré <[hidden email]>
> wrote:
>
>> Hi,
>>
>> Coral is a good name !
>>
>
> Thanks!
>
>
>>
>> Does the code belong to Seoul National University ? In that case, in
>> addition of
>> your ICLA, we would need a SGA (it's not blocker for the project
>> bootstrapping
>> or code donation, but we, at least, will need it later for graduation). On
>> the
>> other hand, if the committers are all part on the university, you can also
>> sign
>> a CCLA.
>>
>
> I will figure this out.
>
>
>>
>> Happy to be mentor on the project if you want me ! ;)
>>
>>
> Thanks! I will add you to the mentor list.
>
> -Gon
>
>
>> Thanks,
>> Regards
>> JB
>>
>> On 01/30/2018 10:17 AM, Byung-Gon Chun wrote:
>>> Thanks for the comments, JB!
>>> My replies are inlined below.
>>>
>>> On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré <[hidden email]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> sorry to be a little bit late on this.
>>>>
>>>> It's a very interesting proposal. It sounds pretty close to the
>> portability
>>>> layer we want to add in Apache Beam. I would love to see interaction
>>>> between the
>>>> two communities.
>>>>
>>>> I have two minor questions:
>>>>
>>>> 1. about the name: Onyx sounds very generic and the name is used in
>> other
>>>> technologies. Maybe another unique name would be more accurate.
>>>>
>>>
>>> We proposed Coral instead. How does this sound?
>>>
>>>
>>>> 2. the Onyx code is on github right now, under the Apache 2.0 license.
>>>> Does this
>>>> code has any affiliation with companies ? Meaning that we would need a
>> SGA
>>>> for
>>>> the code donation.
>>>>
>>>> It does not. The developers are affiliated with Seoul National
>> University.
>>> In this case, do we still need a SGA?
>>>
>>>
>>>> If you need any help for the incubation, I would be more than happy to
>>>> help !
>>>>
>>>>
>>> Thanks for the offer. Would you be interested in being a mentor of the
>>> project?
>>>
>>> Thanks.
>>> -Gon
>>>
>>>
>>>
>>>> Regards
>>>> JB
>>>>
>>>> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote:
>>>>> Dear Apache Incubator Community,
>>>>>
>>>>> Please accept the following proposal for presentation and discussion:
>>>>> https://wiki.apache.org/incubator/OnyxProposal
>>>>>
>>>>> Onyx is a data processing system that aims to flexibly control the
>>>> runtime
>>>>> behaviors of a job to adapt to varying deployment characteristics
>> (e.g.,
>>>>> harnessing transient resources in datacenters, cross-datacenter
>>>> deployment,
>>>>> changing runtime based on job characteristics, etc.). Onyx provides
>> ways
>>>> to
>>>>> extend the system’s capabilities and incorporate the extensions to the
>>>>> flexible job execution.
>>>>> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into
>> an
>>>>> Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
>>>>> based on a deployment policy.
>>>>>
>>>>> I've attached the proposal below.
>>>>>
>>>>> Best regards,
>>>>> Byung-Gon Chun
>>>>>
>>>>> = OnyxProposal =
>>>>>
>>>>> == Abstract ==
>>>>> Onyx is a data processing system for flexible employment with
>>>>> different execution scenarios for various deployment characteristics
>>>>> on clusters.
>>>>>
>>>>> == Proposal ==
>>>>> Today, there is a wide variety of data processing systems with
>>>>> different designs for better performance and datacenter efficiency.
>>>>> They include processing data on specific resource environments and
>>>>> running jobs with specific attributes. Although each system
>>>>> successfully solves the problems it targets, most systems are designed
>>>>> in the way that runtime behaviors are built tightly inside the system
>>>>> core to hide the complexity of distributed computing. This makes it
>>>>> hard for a single system to support different deployment
>>>>> characteristics with different runtime behaviors without substantial
>>>>> effort.
>>>>>
>>>>> Onyx is a data processing system that aims to flexibly control the
>>>>> runtime behaviors of a job to adapt to varying deployment
>>>>> characteristics. Moreover, it provides a means of extending the
>>>>> system’s capabilities and incorporating the extensions to the flexible
>>>>> job execution.
>>>>>
>>>>> In order to be able to easily modify runtime behaviors to adapt to
>>>>> varying deployment characteristics, Onyx exposes runtime behaviors to
>>>>> be flexibly configured and modified at both compile-time and runtime
>>>>> through a set of high-level graph pass interfaces.
>>>>>
>>>>> We hope to contribute to the big data processing community by enabling
>>>>> more flexibility and extensibility in job executions. Furthermore, we
>>>>> can benefit more together as a community when we work together as a
>>>>> community to mature the system with more use cases and understanding
>>>>> of diverse deployment characteristics. The Apache Software Foundation
>>>>> is the perfect place to achieve these aspirations.
>>>>>
>>>>> == Background ==
>>>>> Many data processing systems have distinctive runtime behaviors
>>>>> optimized and configured for specific deployment characteristics like
>>>>> different resource environments and for handling special job
>>>>> attributes.
>>>>>
>>>>> For example, much research have been conducted to overcome the
>>>>> challenge of running data processing jobs on cheap, unreliable
>>>>> transient resources. Likewise, techniques for disaggregating different
>>>>> types of resources, like memory, CPU and GPU, are being actively
>>>>> developed to use datacenter resources more efficiently. Many
>>>>> researchers are also working to run data processing jobs in even more
>>>>> diverse environments, such as across distant datacenters. Similarly,
>>>>> for special job attributes, many works take different approaches, such
>>>>> as runtime optimization, to solve problems like data skew, and to
>>>>> optimize systems for data processing jobs with small-scale input data.
>>>>>
>>>>> Although each of the systems performs well with the jobs and in the
>>>>> environments they target, they perform poorly with unconsidered cases,
>>>>> and do not consider supporting multiple deployment characteristics on
>>>>> a single system in their designs.
>>>>>
>>>>> For an application writer to optimize an application to perform well
>>>>> on a certain system engraved with its underlying behaviors, it
>>>>> requires a deep understanding of the system itself, which is an
>>>>> overhead that often requires a lot of time and effort. Moreover, for a
>>>>> developer to modify such system behaviors, it requires modifications
>>>>> of the system core, which requires an even deeper understanding of the
>>>>> system itself.
>>>>>
>>>>> With this background, Onyx is designed to represent all of its jobs as
>>>>> an Intermediate Representation (IR) DAG. In the Onyx compiler, user
>>>>> applications from various programming models (ex. Apache Beam) are
>>>>> submitted, transformed to an IR DAG, and optimized/customized for the
>>>>> deployment characteristics. In the IR DAG optimization phase, the DAG
>>>>> is modified through a series of compiler “passes” which reshape or
>>>>> annotate the DAG with an expression of the underlying runtime
>>>>> behaviors. The IR DAG is then submitted as an execution plan for the
>>>>> Onyx runtime. The runtime includes the unmodified parts of data
>>>>> processing in the backbone which is transparently integrated with
>>>>> configurable components exposed for further extension.
>>>>>
>>>>> == Rationale ==
>>>>> Onyx’s vision lies in providing means for flexibly supporting a wide
>>>>> variety of job execution scenarios for users while facilitating system
>>>>> developers to extend the execution framework with various
>>>>> functionalities at the same time. The capabilities of the system can
>>>>> be extended as it grows to meet a more variety of execution scenarios.
>>>>> We require inputs from users and developers from diverse domains in
>>>>> order to make it a more thriving and useful project. The Apache
>>>>> Software Foundation provides the best tools and community to support
>>>>> this vision.
>>>>>
>>>>> == Initial Goals ==
>>>>> Initial goals will be to move the existing codebase to Apache and
>>>>> integrate with the Apache development process. We further plan to
>>>>> develop our system to meet the needs for more execution scenarios for
>>>>> a more variety of deployment characteristics.
>>>>>
>>>>> == Current Status ==
>>>>> Onyx codebase is currently hosted in a repository at github.com. The
>>>>> current version has been developed by system developers at Seoul
>>>>> National University, Viva Republica, Samsung, and LG.
>>>>>
>>>>> == Meritocracy ==
>>>>> We plan to strongly support meritocracy. We will discuss the
>>>>> requirements in an open forum, and those that continuously contribute
>>>>> to Onyx with the passion to strengthen the system will be invited as
>>>>> committers. Contributors that enrich Onyx by providing various use
>>>>> cases, various implementations of the configurable components
>>>>> including ideas for optimization techniques will be especially
>>>>> welcome. Committers with a deep understanding of the system’s
>>>>> technical aspects as a whole and its philosophy will definitely be
>>>>> voted as the PMC. We will monitor community participation so that
>>>>> privileges can be extended to those that contribute.
>>>>>
>>>>> == Community ==
>>>>> We hope to expand our contribution community by becoming an Apache
>>>>> incubator project. The contributions will come from both users and
>>>>> system developers interested in flexibility and extensibility of job
>>>>> executions that Onyx can support. We expect users to mainly contribute
>>>>> to diversify the use cases and deployment characteristics, and
>>>>> developers to  contribute to implement them.
>>>>>
>>>>> == Alignment ==
>>>>> Apache Spark is one of many popular data processing frameworks. The
>>>>> system is designed towards optimizing jobs using RDDs in memory and
>>>>> many other optimizations built tightly within the framework. In
>>>>> contrast to Spark, Onyx aims to provide more flexibility for job
>>>>> execution in an easy manner.
>>>>>
>>>>> Apache Tez enables developers to build complex task DAGs with control
>>>>> over the control plane of job execution. In Onyx, a high-level
>>>>> programming layer (ex. Apache Beam) is automatically converted to a
>>>>> basic IR DAG and can be converted to any IR DAG through a series of
>>>>> easy user writable passes, that can both reshape and modify the
>>>>> annotation (of execution properties) of the DAG. Moreover, Onyx leaves
>>>>> more parts of the job execution configurable, such as the scheduler
>>>>> and the data plane. As opposed to providing a set of properties for
>>>>> solid optimization, Onyx’s configurable parts can be easily extended
>>>>> and explored by implementing the pre-defined interfaces. For example,
>>>>> an arbitrary intermediate data store can be added.
>>>>>
>>>>> Onyx currently supports Apache Beam programs and we are working on
>>>>> supporting Apache Spark programs as well. Onyx also utilizes Apache
>>>>> REEF for container management, which allows Onyx to run in Apache YARN
>>>>> and Apache Mesos clusters. If necessary, we plan to contribute to and
>>>>> collaborate with these other Apache projects for the benefit of all.
>>>>> We plan to extend such integrations with more Apache softwares. Apache
>>>>> software foundation already hosts many major big-data systems, and we
>>>>> expect to help further growth of the big-data community by having Onyx
>>>>> within the Apache foundation.
>>>>>
>>>>> == Known Risks ==
>>>>> === Orphaned Products ===
>>>>> The risk of the Onyx project being orphaned is minimal. There is
>>>>> already plenty of work that arduously support different deployment
>>>>> characteristics, and we propose a general way to implement them with
>>>>> flexible and extensible configuration knobs. The domain of data
>>>>> processing is already of high interest, and this domain is expected to
>>>>> evolve continuously with various other purposes, such as resource
>>>>> disaggregation and using transient resources for better datacenter
>>>>> resource utilization.
>>>>>
>>>>> === Inexperience with Open Source ===
>>>>> The initial committers include PMC members and committers of other
>>>>> Apache projects. They have experience with open source projects,
>>>>> starting from their incubation to the top-level. They have been
>>>>> involved in the open source development process, and are familiar with
>>>>> releasing code under an open source license.
>>>>>
>>>>> === Homogeneous Developers ===
>>>>> The initial set of committers is from a limited set of organizations,
>>>>> but we expect to attract new contributors from diverse organizations
>>>>> and will thus grow organically once approved for incubation. Our prior
>>>>> experience with other open source projects will help various
>>>>> contributors to actively participate in our project.
>>>>>
>>>>> === Reliance on Salaried Developers ===
>>>>> Many developers are from Seoul National University. This is not
>>>> applicable.
>>>>>
>>>>> === Relationships with Other Apache Products ===
>>>>> Onyx positions itself among multiple Apache products. It runs on
>>>>> Apache REEF for container management. It also utilizes many useful
>>>>> development tools including Apache Maven, Apache Log4J, and multiple
>>>>> Apache Commons components. Onyx supports the Apache Beam programming
>>>>> model for user applications. We are currently working on supporting
>>>>> the Apache Spark programming APIs as well.
>>>>>
>>>>> === An Excessive Fascination with the Apache Brand ===
>>>>> We hope to make Onyx a powerful system for data processing, meeting
>>>>> various needs for different deployment characteristics, under a more
>>>>> variety of environments. We see the limitations of simply putting code
>>>>> on GitHub, and we believe the Apache community will help the growth of
>>>>> Onyx for the project to become a positively impactful and innovative
>>>>> open source software. We believe Onyx is a great fit for the Apache
>>>>> Software Foundation due to the collaboration it aims to achieve from
>>>>> the big data processing community.
>>>>>
>>>>> == Documentation ==
>>>>> The current documentation for Onyx is at
>> https://snuspl.github.io/onyx/.
>>>>>
>>>>> == Initial Source ==
>>>>> The Onyx codebase is currently hosted at
>> https://github.com/snuspl/onyx.
>>>>>
>>>>> == External Dependencies ==
>>>>> To the best of our knowledge, all Onyx dependencies are distributed
>>>>> under Apache compatible licenses. Upon acceptance to the incubator, we
>>>>> would begin a thorough analysis of all transitive dependencies to
>>>>> verify this fact and further introduce license checking into the build
>>>>> and release process.
>>>>>
>>>>> == Cryptography ==
>>>>> Not applicable.
>>>>>
>>>>> == Required Resources ==
>>>>> === Mailing Lists ===
>>>>> We will operate two mailing lists as follows:
>>>>>    * Onyx PMC discussions: [hidden email]
>>>>>    * Onyx developers: [hidden email]
>>>>>
>>>>> === Git Repositories ===
>>>>> Upon incubation: https://github.com/apache/incubator-onyx.
>>>>> After the incubation, we would like to move the existing repo
>>>>> https://github.com/snuspl/onyx to the Apache infrastructure
>>>>>
>>>>> === Issue Tracking ===
>>>>> Onyx currently tracks its issues using the Github issue tracker:
>>>>> https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
>>>>> JIRA.
>>>>>
>>>>> == Initial Committers ==
>>>>>   * Byung-Gon Chun
>>>>>   * Jeongyoon Eo
>>>>>   * Geon-Woo Kim
>>>>>   * Joo Yeon Kim
>>>>>   * Gyewon Lee
>>>>>   * Jung-Gil Lee
>>>>>   * Sanha Lee
>>>>>   * Wooyeon Lee
>>>>>   * Yunseong Lee
>>>>>   * JangHo Seo
>>>>>   * Won Wook Song
>>>>>   * Taegeon Um
>>>>>   * Youngseok Yang
>>>>>
>>>>> == Affiliations ==
>>>>>   * SNU (Seoul National University)
>>>>>     * Byung-Gon Chun
>>>>>     * Jeongyoon Eo
>>>>>     * Geon-Woo Kim
>>>>>     * Gyewon Lee
>>>>>     * Sanha Lee
>>>>>     * Wooyeon Lee
>>>>>     * Yunseong Lee
>>>>>     * JangHo Seo
>>>>>     * Won Wook Song
>>>>>     * Taegeon Um
>>>>>     * Youngseok Yang
>>>>>
>>>>>   * LG
>>>>>     * Jung-Gil Lee
>>>>>
>>>>>   * Samsung
>>>>>     * Joo Yeon Kim
>>>>>
>>>>>   * Viva Republica
>>>>>     * Geon-Woo Kim
>>>>>
>>>>> == Sponsors ==
>>>>> === Champions ===
>>>>> Byung-Gon Chun
>>>>>
>>>>> === Mentors ===
>>>>>   * Hyunsik Choi
>>>>>   * Byung-Gon Chun
>>>>>   * Markus Weimer
>>>>>   * Reynold Xin
>>>>>
>>>>> === Sponsoring Entity ===
>>>>> The Apache Incubator
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Jean-Baptiste Onofré
>>>> [hidden email]
>>>> http://blog.nanthrax.net
>>>> Talend - http://www.talend.com
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>>
>>
>> --
>> Jean-Baptiste Onofré
>> [hidden email]
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
>

--
Jean-Baptiste Onofré
[hidden email]
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Byung-Gon Chun
In reply to this post by John D. Ament-2
Thanks for the information, John!


On Wed, Jan 31, 2018 at 9:50 PM, John D. Ament <[hidden email]>
wrote:

> Sorry for mid-posting.
>
> This isn't the list to determine if a project name is suitable.  There's a
> JIRA project dedicated to that, and if you need a quick answer better to
> email trademarks@ to get a more precise answer.
>
> The question is really going to be, is "Apache Onyx" going to be easily
> confused with something else.
>
> John
>
> On Sun, Jan 28, 2018 at 4:50 AM Byung-Gon Chun <[hidden email]> wrote:
>
> > Thank you for all the information! It looks like Surf doesn't work.
> >
> > If possible, we'd like to keep Onyx.
> > Another name we came up with is Coral.
> >
> > Thanks!
> > -Gon
> >
> >
> > On Sun, Jan 28, 2018 at 4:21 AM, Leif Hedstrom <[hidden email]> wrote:
> >
> > > Did we rule out Onyx for sure? Just because some other project might
> use
> > > it on say github doesn’t necessarily exclude us from having an Apache
> > Onyx?
> > >
> > > FWIW, I agree that surf is too similar in pronunciation to Apache serf.
> > :)
> > >
> > > Cheers,
> > >
> > > — Leif
> > >
> > > > On Jan 27, 2018, at 07:31, Dave Fisher <[hidden email]>
> wrote:
> > > >
> > > > Checking “Serf Software” which sounds the same.
> > > >
> > > > (1) there is already Apache Serf
> > > > (2) Serf is a product from Hashicorp at https://www.serf.io/. This
> > > would definitely confuse as it is apparently comparable to ZooKeeper.
> > > >
> > > > Regards,
> > > > Dave
> > > >
> > > > Sent from my iPhone
> > > >
> > > >> On Jan 27, 2018, at 3:12 AM, sebb <[hidden email]> wrote:
> > > >>
> > > >> A brief search for 'Surf Software' shows quite a few hits.
> > > >> I have not looked to see if they would be likely to be confused with
> > > >> this project or cause problems for others.
> > > >>
> > > >> But it as though there might be a problem:
> > > >> Surfer -  Golden Software
> > > >> surf @ sourceforge
> > > >> Surf Software company
> > > >>
> > > >>
> > > >>> On 27 January 2018 at 08:03, Byung-Gon Chun <[hidden email]>
> > wrote:
> > > >>> Since we cannot use the name Onyx, we would like to change the
> > project
> > > name
> > > >>> to Surf.
> > > >>> I hope that this name works.
> > > >>>
> > > >>> -Gon
> > > >>>
> > > >>> ---
> > > >>> Byung-Gon Chun
> > > >>>
> > > >>>
> > > >>>> On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun <[hidden email]
> >
> > > wrote:
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>> On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci <[hidden email]>
> > > wrote:
> > > >>>>>
> > > >>>>> Great work -- I think this technology has a lot of promise, and
> I'd
> > > love
> > > >>>>> to
> > > >>>>> see its evolution inside the Foundation.
> > > >>>>>
> > > >>>>>
> > > >>>> Thanks, Davor!
> > > >>>>
> > > >>>>
> > > >>>>> Parts of it, like the Onyx Intermediate Representation [1],
> overlap
> > > with
> > > >>>>> the work-in-progress inside the Apache Beam project
> > ("portability").
> > > We'd
> > > >>>>> love to work together on this -- would you be open to such
> > > collaboration?
> > > >>>>> If so, it may not be necessary to start from scratch, and
> leverage
> > > the
> > > >>>>> work
> > > >>>>> already done.
> > > >>>>>
> > > >>>>>
> > > >>>> Sure. We're open to collaboration.
> > > >>>>
> > > >>>>
> > > >>>>> Regarding the name, Onyx would likely have to be renamed, due to
> a
> > > >>>>> conflict
> > > >>>>> with a related technology [2].
> > > >>>>>
> > > >>>>>
> > > >>>> Thanks for pointing it out. It's difficult to come up with a good
> > > short
> > > >>>> name. :)
> > > >>>> Do you have any suggestion?
> > > >>>>
> > > >>>> Thanks!
> > > >>>> -Gon
> > > >>>>
> > > >>>> ---
> > > >>>> Byung-Gon Chun
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>> Davor
> > > >>>>>
> > > >>>>> [1] https://snuspl.github.io/onyx/docs/ir/
> > > >>>>> [2] http://www.onyxplatform.org/
> > > >>>>>
> > > >>>>>> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun <
> [hidden email]
> > >
> > > wrote:
> > > >>>>>>
> > > >>>>>> Dear Apache Incubator Community,
> > > >>>>>>
> > > >>>>>> Please accept the following proposal for presentation and
> > > discussion:
> > > >>>>>> https://wiki.apache.org/incubator/OnyxProposal
> > > >>>>>>
> > > >>>>>> Onyx is a data processing system that aims to flexibly control
> the
> > > >>>>> runtime
> > > >>>>>> behaviors of a job to adapt to varying deployment
> characteristics
> > > (e.g.,
> > > >>>>>> harnessing transient resources in datacenters, cross-datacenter
> > > >>>>> deployment,
> > > >>>>>> changing runtime based on job characteristics, etc.). Onyx
> > provides
> > > >>>>> ways to
> > > >>>>>> extend the system’s capabilities and incorporate the extensions
> to
> > > the
> > > >>>>>> flexible job execution.
> > > >>>>>> Onyx translates a user program (e.g., Apache Beam, Apache Spark)
> > > into an
> > > >>>>>> Intermediate Representation (IR) DAG, which Onyx optimizes and
> > > deploys
> > > >>>>>> based on a deployment policy.
> > > >>>>>>
> > > >>>>>> I've attached the proposal below.
> > > >>>>>>
> > > >>>>>> Best regards,
> > > >>>>>> Byung-Gon Chun
> > > >>>>>>
> > > >>>>>> = OnyxProposal =
> > > >>>>>>
> > > >>>>>> == Abstract ==
> > > >>>>>> Onyx is a data processing system for flexible employment with
> > > >>>>>> different execution scenarios for various deployment
> > characteristics
> > > >>>>>> on clusters.
> > > >>>>>>
> > > >>>>>> == Proposal ==
> > > >>>>>> Today, there is a wide variety of data processing systems with
> > > >>>>>> different designs for better performance and datacenter
> > efficiency.
> > > >>>>>> They include processing data on specific resource environments
> and
> > > >>>>>> running jobs with specific attributes. Although each system
> > > >>>>>> successfully solves the problems it targets, most systems are
> > > designed
> > > >>>>>> in the way that runtime behaviors are built tightly inside the
> > > system
> > > >>>>>> core to hide the complexity of distributed computing. This makes
> > it
> > > >>>>>> hard for a single system to support different deployment
> > > >>>>>> characteristics with different runtime behaviors without
> > substantial
> > > >>>>>> effort.
> > > >>>>>>
> > > >>>>>> Onyx is a data processing system that aims to flexibly control
> the
> > > >>>>>> runtime behaviors of a job to adapt to varying deployment
> > > >>>>>> characteristics. Moreover, it provides a means of extending the
> > > >>>>>> system’s capabilities and incorporating the extensions to the
> > > flexible
> > > >>>>>> job execution.
> > > >>>>>>
> > > >>>>>> In order to be able to easily modify runtime behaviors to adapt
> to
> > > >>>>>> varying deployment characteristics, Onyx exposes runtime
> behaviors
> > > to
> > > >>>>>> be flexibly configured and modified at both compile-time and
> > runtime
> > > >>>>>> through a set of high-level graph pass interfaces.
> > > >>>>>>
> > > >>>>>> We hope to contribute to the big data processing community by
> > > enabling
> > > >>>>>> more flexibility and extensibility in job executions.
> Furthermore,
> > > we
> > > >>>>>> can benefit more together as a community when we work together
> as
> > a
> > > >>>>>> community to mature the system with more use cases and
> > understanding
> > > >>>>>> of diverse deployment characteristics. The Apache Software
> > > Foundation
> > > >>>>>> is the perfect place to achieve these aspirations.
> > > >>>>>>
> > > >>>>>> == Background ==
> > > >>>>>> Many data processing systems have distinctive runtime behaviors
> > > >>>>>> optimized and configured for specific deployment characteristics
> > > like
> > > >>>>>> different resource environments and for handling special job
> > > >>>>>> attributes.
> > > >>>>>>
> > > >>>>>> For example, much research have been conducted to overcome the
> > > >>>>>> challenge of running data processing jobs on cheap, unreliable
> > > >>>>>> transient resources. Likewise, techniques for disaggregating
> > > different
> > > >>>>>> types of resources, like memory, CPU and GPU, are being actively
> > > >>>>>> developed to use datacenter resources more efficiently. Many
> > > >>>>>> researchers are also working to run data processing jobs in even
> > > more
> > > >>>>>> diverse environments, such as across distant datacenters.
> > Similarly,
> > > >>>>>> for special job attributes, many works take different
> approaches,
> > > such
> > > >>>>>> as runtime optimization, to solve problems like data skew, and
> to
> > > >>>>>> optimize systems for data processing jobs with small-scale input
> > > data.
> > > >>>>>>
> > > >>>>>> Although each of the systems performs well with the jobs and in
> > the
> > > >>>>>> environments they target, they perform poorly with unconsidered
> > > cases,
> > > >>>>>> and do not consider supporting multiple deployment
> characteristics
> > > on
> > > >>>>>> a single system in their designs.
> > > >>>>>>
> > > >>>>>> For an application writer to optimize an application to perform
> > well
> > > >>>>>> on a certain system engraved with its underlying behaviors, it
> > > >>>>>> requires a deep understanding of the system itself, which is an
> > > >>>>>> overhead that often requires a lot of time and effort. Moreover,
> > > for a
> > > >>>>>> developer to modify such system behaviors, it requires
> > modifications
> > > >>>>>> of the system core, which requires an even deeper understanding
> of
> > > the
> > > >>>>>> system itself.
> > > >>>>>>
> > > >>>>>> With this background, Onyx is designed to represent all of its
> > jobs
> > > as
> > > >>>>>> an Intermediate Representation (IR) DAG. In the Onyx compiler,
> > user
> > > >>>>>> applications from various programming models (ex. Apache Beam)
> are
> > > >>>>>> submitted, transformed to an IR DAG, and optimized/customized
> for
> > > the
> > > >>>>>> deployment characteristics. In the IR DAG optimization phase,
> the
> > > DAG
> > > >>>>>> is modified through a series of compiler “passes” which reshape
> or
> > > >>>>>> annotate the DAG with an expression of the underlying runtime
> > > >>>>>> behaviors. The IR DAG is then submitted as an execution plan for
> > the
> > > >>>>>> Onyx runtime. The runtime includes the unmodified parts of data
> > > >>>>>> processing in the backbone which is transparently integrated
> with
> > > >>>>>> configurable components exposed for further extension.
> > > >>>>>>
> > > >>>>>> == Rationale ==
> > > >>>>>> Onyx’s vision lies in providing means for flexibly supporting a
> > wide
> > > >>>>>> variety of job execution scenarios for users while facilitating
> > > system
> > > >>>>>> developers to extend the execution framework with various
> > > >>>>>> functionalities at the same time. The capabilities of the system
> > can
> > > >>>>>> be extended as it grows to meet a more variety of execution
> > > scenarios.
> > > >>>>>> We require inputs from users and developers from diverse domains
> > in
> > > >>>>>> order to make it a more thriving and useful project. The Apache
> > > >>>>>> Software Foundation provides the best tools and community to
> > support
> > > >>>>>> this vision.
> > > >>>>>>
> > > >>>>>> == Initial Goals ==
> > > >>>>>> Initial goals will be to move the existing codebase to Apache
> and
> > > >>>>>> integrate with the Apache development process. We further plan
> to
> > > >>>>>> develop our system to meet the needs for more execution
> scenarios
> > > for
> > > >>>>>> a more variety of deployment characteristics.
> > > >>>>>>
> > > >>>>>> == Current Status ==
> > > >>>>>> Onyx codebase is currently hosted in a repository at github.com
> .
> > > The
> > > >>>>>> current version has been developed by system developers at Seoul
> > > >>>>>> National University, Viva Republica, Samsung, and LG.
> > > >>>>>>
> > > >>>>>> == Meritocracy ==
> > > >>>>>> We plan to strongly support meritocracy. We will discuss the
> > > >>>>>> requirements in an open forum, and those that continuously
> > > contribute
> > > >>>>>> to Onyx with the passion to strengthen the system will be
> invited
> > as
> > > >>>>>> committers. Contributors that enrich Onyx by providing various
> use
> > > >>>>>> cases, various implementations of the configurable components
> > > >>>>>> including ideas for optimization techniques will be especially
> > > >>>>>> welcome. Committers with a deep understanding of the system’s
> > > >>>>>> technical aspects as a whole and its philosophy will definitely
> be
> > > >>>>>> voted as the PMC. We will monitor community participation so
> that
> > > >>>>>> privileges can be extended to those that contribute.
> > > >>>>>>
> > > >>>>>> == Community ==
> > > >>>>>> We hope to expand our contribution community by becoming an
> Apache
> > > >>>>>> incubator project. The contributions will come from both users
> and
> > > >>>>>> system developers interested in flexibility and extensibility of
> > job
> > > >>>>>> executions that Onyx can support. We expect users to mainly
> > > contribute
> > > >>>>>> to diversify the use cases and deployment characteristics, and
> > > >>>>>> developers to  contribute to implement them.
> > > >>>>>>
> > > >>>>>> == Alignment ==
> > > >>>>>> Apache Spark is one of many popular data processing frameworks.
> > The
> > > >>>>>> system is designed towards optimizing jobs using RDDs in memory
> > and
> > > >>>>>> many other optimizations built tightly within the framework. In
> > > >>>>>> contrast to Spark, Onyx aims to provide more flexibility for job
> > > >>>>>> execution in an easy manner.
> > > >>>>>>
> > > >>>>>> Apache Tez enables developers to build complex task DAGs with
> > > control
> > > >>>>>> over the control plane of job execution. In Onyx, a high-level
> > > >>>>>> programming layer (ex. Apache Beam) is automatically converted
> to
> > a
> > > >>>>>> basic IR DAG and can be converted to any IR DAG through a series
> > of
> > > >>>>>> easy user writable passes, that can both reshape and modify the
> > > >>>>>> annotation (of execution properties) of the DAG. Moreover, Onyx
> > > leaves
> > > >>>>>> more parts of the job execution configurable, such as the
> > scheduler
> > > >>>>>> and the data plane. As opposed to providing a set of properties
> > for
> > > >>>>>> solid optimization, Onyx’s configurable parts can be easily
> > extended
> > > >>>>>> and explored by implementing the pre-defined interfaces. For
> > > example,
> > > >>>>>> an arbitrary intermediate data store can be added.
> > > >>>>>>
> > > >>>>>> Onyx currently supports Apache Beam programs and we are working
> on
> > > >>>>>> supporting Apache Spark programs as well. Onyx also utilizes
> > Apache
> > > >>>>>> REEF for container management, which allows Onyx to run in
> Apache
> > > YARN
> > > >>>>>> and Apache Mesos clusters. If necessary, we plan to contribute
> to
> > > and
> > > >>>>>> collaborate with these other Apache projects for the benefit of
> > all.
> > > >>>>>> We plan to extend such integrations with more Apache softwares.
> > > Apache
> > > >>>>>> software foundation already hosts many major big-data systems,
> and
> > > we
> > > >>>>>> expect to help further growth of the big-data community by
> having
> > > Onyx
> > > >>>>>> within the Apache foundation.
> > > >>>>>>
> > > >>>>>> == Known Risks ==
> > > >>>>>> === Orphaned Products ===
> > > >>>>>> The risk of the Onyx project being orphaned is minimal. There is
> > > >>>>>> already plenty of work that arduously support different
> deployment
> > > >>>>>> characteristics, and we propose a general way to implement them
> > with
> > > >>>>>> flexible and extensible configuration knobs. The domain of data
> > > >>>>>> processing is already of high interest, and this domain is
> > expected
> > > to
> > > >>>>>> evolve continuously with various other purposes, such as
> resource
> > > >>>>>> disaggregation and using transient resources for better
> datacenter
> > > >>>>>> resource utilization.
> > > >>>>>>
> > > >>>>>> === Inexperience with Open Source ===
> > > >>>>>> The initial committers include PMC members and committers of
> other
> > > >>>>>> Apache projects. They have experience with open source projects,
> > > >>>>>> starting from their incubation to the top-level. They have been
> > > >>>>>> involved in the open source development process, and are
> familiar
> > > with
> > > >>>>>> releasing code under an open source license.
> > > >>>>>>
> > > >>>>>> === Homogeneous Developers ===
> > > >>>>>> The initial set of committers is from a limited set of
> > > organizations,
> > > >>>>>> but we expect to attract new contributors from diverse
> > organizations
> > > >>>>>> and will thus grow organically once approved for incubation. Our
> > > prior
> > > >>>>>> experience with other open source projects will help various
> > > >>>>>> contributors to actively participate in our project.
> > > >>>>>>
> > > >>>>>> === Reliance on Salaried Developers ===
> > > >>>>>> Many developers are from Seoul National University. This is not
> > > >>>>> applicable.
> > > >>>>>>
> > > >>>>>> === Relationships with Other Apache Products ===
> > > >>>>>> Onyx positions itself among multiple Apache products. It runs on
> > > >>>>>> Apache REEF for container management. It also utilizes many
> useful
> > > >>>>>> development tools including Apache Maven, Apache Log4J, and
> > multiple
> > > >>>>>> Apache Commons components. Onyx supports the Apache Beam
> > programming
> > > >>>>>> model for user applications. We are currently working on
> > supporting
> > > >>>>>> the Apache Spark programming APIs as well.
> > > >>>>>>
> > > >>>>>> === An Excessive Fascination with the Apache Brand ===
> > > >>>>>> We hope to make Onyx a powerful system for data processing,
> > meeting
> > > >>>>>> various needs for different deployment characteristics, under a
> > more
> > > >>>>>> variety of environments. We see the limitations of simply
> putting
> > > code
> > > >>>>>> on GitHub, and we believe the Apache community will help the
> > growth
> > > of
> > > >>>>>> Onyx for the project to become a positively impactful and
> > innovative
> > > >>>>>> open source software. We believe Onyx is a great fit for the
> > Apache
> > > >>>>>> Software Foundation due to the collaboration it aims to achieve
> > from
> > > >>>>>> the big data processing community.
> > > >>>>>>
> > > >>>>>> == Documentation ==
> > > >>>>>> The current documentation for Onyx is at
> > > https://snuspl.github.io/onyx/
> > > >>>>> .
> > > >>>>>>
> > > >>>>>> == Initial Source ==
> > > >>>>>> The Onyx codebase is currently hosted at
> > > https://github.com/snuspl/onyx
> > > >>>>> .
> > > >>>>>>
> > > >>>>>> == External Dependencies ==
> > > >>>>>> To the best of our knowledge, all Onyx dependencies are
> > distributed
> > > >>>>>> under Apache compatible licenses. Upon acceptance to the
> > incubator,
> > > we
> > > >>>>>> would begin a thorough analysis of all transitive dependencies
> to
> > > >>>>>> verify this fact and further introduce license checking into the
> > > build
> > > >>>>>> and release process.
> > > >>>>>>
> > > >>>>>> == Cryptography ==
> > > >>>>>> Not applicable.
> > > >>>>>>
> > > >>>>>> == Required Resources ==
> > > >>>>>> === Mailing Lists ===
> > > >>>>>> We will operate two mailing lists as follows:
> > > >>>>>>  * Onyx PMC discussions: [hidden email]
> > > >>>>>>  * Onyx developers: [hidden email]
> > > >>>>>>
> > > >>>>>> === Git Repositories ===
> > > >>>>>> Upon incubation: https://github.com/apache/incubator-onyx.
> > > >>>>>> After the incubation, we would like to move the existing repo
> > > >>>>>> https://github.com/snuspl/onyx to the Apache infrastructure
> > > >>>>>>
> > > >>>>>> === Issue Tracking ===
> > > >>>>>> Onyx currently tracks its issues using the Github issue tracker:
> > > >>>>>> https://github.com/snuspl/onyx/issues. We plan to migrate to
> > Apache
> > > >>>>>> JIRA.
> > > >>>>>>
> > > >>>>>> == Initial Committers ==
> > > >>>>>> * Byung-Gon Chun
> > > >>>>>> * Jeongyoon Eo
> > > >>>>>> * Geon-Woo Kim
> > > >>>>>> * Joo Yeon Kim
> > > >>>>>> * Gyewon Lee
> > > >>>>>> * Jung-Gil Lee
> > > >>>>>> * Sanha Lee
> > > >>>>>> * Wooyeon Lee
> > > >>>>>> * Yunseong Lee
> > > >>>>>> * JangHo Seo
> > > >>>>>> * Won Wook Song
> > > >>>>>> * Taegeon Um
> > > >>>>>> * Youngseok Yang
> > > >>>>>>
> > > >>>>>> == Affiliations ==
> > > >>>>>> * SNU (Seoul National University)
> > > >>>>>>   * Byung-Gon Chun
> > > >>>>>>   * Jeongyoon Eo
> > > >>>>>>   * Geon-Woo Kim
> > > >>>>>>   * Gyewon Lee
> > > >>>>>>   * Sanha Lee
> > > >>>>>>   * Wooyeon Lee
> > > >>>>>>   * Yunseong Lee
> > > >>>>>>   * JangHo Seo
> > > >>>>>>   * Won Wook Song
> > > >>>>>>   * Taegeon Um
> > > >>>>>>   * Youngseok Yang
> > > >>>>>>
> > > >>>>>> * LG
> > > >>>>>>   * Jung-Gil Lee
> > > >>>>>>
> > > >>>>>> * Samsung
> > > >>>>>>   * Joo Yeon Kim
> > > >>>>>>
> > > >>>>>> * Viva Republica
> > > >>>>>>   * Geon-Woo Kim
> > > >>>>>>
> > > >>>>>> == Sponsors ==
> > > >>>>>> === Champions ===
> > > >>>>>> Byung-Gon Chun
> > > >>>>>>
> > > >>>>>> === Mentors ===
> > > >>>>>> * Hyunsik Choi
> > > >>>>>> * Byung-Gon Chun
> > > >>>>>> * Markus Weimer
> > > >>>>>> * Reynold Xin
> > > >>>>>>
> > > >>>>>> === Sponsoring Entity ===
> > > >>>>>> The Apache Incubator
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> --
> > > >>>>>> Byung-Gon Chun
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Byung-Gon Chun
> > > >>>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Byung-Gon Chun
> > > >>
> > > >> ------------------------------------------------------------
> ---------
> > > >> To unsubscribe, e-mail: [hidden email]
> > > >> For additional commands, e-mail: [hidden email]
> > > >>
> > > >
> > > >
> > > > ------------------------------------------------------------
> ---------
> > > > To unsubscribe, e-mail: [hidden email]
> > > > For additional commands, e-mail: [hidden email]
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
> >
> >
> > --
> > Byung-Gon Chun
> >
>



--
Byung-Gon Chun
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Byung-Gon Chun
In reply to this post by Jean-Baptiste Onofré
Thank you all for the feedback.
We changed the project name to Coral.
You can find the proposal at https://wiki.apache.org/incubator/CoralProposal
.

I will soon send out a voting email.

Thanks.
-Gon




On Wed, Jan 31, 2018 at 11:54 PM, Jean-Baptiste Onofré <[hidden email]>
wrote:

> Thanks, much appreciated !
>
> Regards
> JB
>
> On 01/31/2018 09:50 AM, Byung-Gon Chun wrote:
> > On Wed, Jan 31, 2018 at 4:04 PM, Jean-Baptiste Onofré <[hidden email]>
> > wrote:
> >
> >> Hi,
> >>
> >> Coral is a good name !
> >>
> >
> > Thanks!
> >
> >
> >>
> >> Does the code belong to Seoul National University ? In that case, in
> >> addition of
> >> your ICLA, we would need a SGA (it's not blocker for the project
> >> bootstrapping
> >> or code donation, but we, at least, will need it later for graduation).
> On
> >> the
> >> other hand, if the committers are all part on the university, you can
> also
> >> sign
> >> a CCLA.
> >>
> >
> > I will figure this out.
> >
> >
> >>
> >> Happy to be mentor on the project if you want me ! ;)
> >>
> >>
> > Thanks! I will add you to the mentor list.
> >
> > -Gon
> >
> >
> >> Thanks,
> >> Regards
> >> JB
> >>
> >> On 01/30/2018 10:17 AM, Byung-Gon Chun wrote:
> >>> Thanks for the comments, JB!
> >>> My replies are inlined below.
> >>>
> >>> On Tue, Jan 30, 2018 at 5:52 PM, Jean-Baptiste Onofré <[hidden email]
> >
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> sorry to be a little bit late on this.
> >>>>
> >>>> It's a very interesting proposal. It sounds pretty close to the
> >> portability
> >>>> layer we want to add in Apache Beam. I would love to see interaction
> >>>> between the
> >>>> two communities.
> >>>>
> >>>> I have two minor questions:
> >>>>
> >>>> 1. about the name: Onyx sounds very generic and the name is used in
> >> other
> >>>> technologies. Maybe another unique name would be more accurate.
> >>>>
> >>>
> >>> We proposed Coral instead. How does this sound?
> >>>
> >>>
> >>>> 2. the Onyx code is on github right now, under the Apache 2.0 license.
> >>>> Does this
> >>>> code has any affiliation with companies ? Meaning that we would need a
> >> SGA
> >>>> for
> >>>> the code donation.
> >>>>
> >>>> It does not. The developers are affiliated with Seoul National
> >> University.
> >>> In this case, do we still need a SGA?
> >>>
> >>>
> >>>> If you need any help for the incubation, I would be more than happy to
> >>>> help !
> >>>>
> >>>>
> >>> Thanks for the offer. Would you be interested in being a mentor of the
> >>> project?
> >>>
> >>> Thanks.
> >>> -Gon
> >>>
> >>>
> >>>
> >>>> Regards
> >>>> JB
> >>>>
> >>>> On 01/26/2018 12:28 AM, Byung-Gon Chun wrote:
> >>>>> Dear Apache Incubator Community,
> >>>>>
> >>>>> Please accept the following proposal for presentation and discussion:
> >>>>> https://wiki.apache.org/incubator/OnyxProposal
> >>>>>
> >>>>> Onyx is a data processing system that aims to flexibly control the
> >>>> runtime
> >>>>> behaviors of a job to adapt to varying deployment characteristics
> >> (e.g.,
> >>>>> harnessing transient resources in datacenters, cross-datacenter
> >>>> deployment,
> >>>>> changing runtime based on job characteristics, etc.). Onyx provides
> >> ways
> >>>> to
> >>>>> extend the system’s capabilities and incorporate the extensions to
> the
> >>>>> flexible job execution.
> >>>>> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into
> >> an
> >>>>> Intermediate Representation (IR) DAG, which Onyx optimizes and
> deploys
> >>>>> based on a deployment policy.
> >>>>>
> >>>>> I've attached the proposal below.
> >>>>>
> >>>>> Best regards,
> >>>>> Byung-Gon Chun
> >>>>>
> >>>>> = OnyxProposal =
> >>>>>
> >>>>> == Abstract ==
> >>>>> Onyx is a data processing system for flexible employment with
> >>>>> different execution scenarios for various deployment characteristics
> >>>>> on clusters.
> >>>>>
> >>>>> == Proposal ==
> >>>>> Today, there is a wide variety of data processing systems with
> >>>>> different designs for better performance and datacenter efficiency.
> >>>>> They include processing data on specific resource environments and
> >>>>> running jobs with specific attributes. Although each system
> >>>>> successfully solves the problems it targets, most systems are
> designed
> >>>>> in the way that runtime behaviors are built tightly inside the system
> >>>>> core to hide the complexity of distributed computing. This makes it
> >>>>> hard for a single system to support different deployment
> >>>>> characteristics with different runtime behaviors without substantial
> >>>>> effort.
> >>>>>
> >>>>> Onyx is a data processing system that aims to flexibly control the
> >>>>> runtime behaviors of a job to adapt to varying deployment
> >>>>> characteristics. Moreover, it provides a means of extending the
> >>>>> system’s capabilities and incorporating the extensions to the
> flexible
> >>>>> job execution.
> >>>>>
> >>>>> In order to be able to easily modify runtime behaviors to adapt to
> >>>>> varying deployment characteristics, Onyx exposes runtime behaviors to
> >>>>> be flexibly configured and modified at both compile-time and runtime
> >>>>> through a set of high-level graph pass interfaces.
> >>>>>
> >>>>> We hope to contribute to the big data processing community by
> enabling
> >>>>> more flexibility and extensibility in job executions. Furthermore, we
> >>>>> can benefit more together as a community when we work together as a
> >>>>> community to mature the system with more use cases and understanding
> >>>>> of diverse deployment characteristics. The Apache Software Foundation
> >>>>> is the perfect place to achieve these aspirations.
> >>>>>
> >>>>> == Background ==
> >>>>> Many data processing systems have distinctive runtime behaviors
> >>>>> optimized and configured for specific deployment characteristics like
> >>>>> different resource environments and for handling special job
> >>>>> attributes.
> >>>>>
> >>>>> For example, much research have been conducted to overcome the
> >>>>> challenge of running data processing jobs on cheap, unreliable
> >>>>> transient resources. Likewise, techniques for disaggregating
> different
> >>>>> types of resources, like memory, CPU and GPU, are being actively
> >>>>> developed to use datacenter resources more efficiently. Many
> >>>>> researchers are also working to run data processing jobs in even more
> >>>>> diverse environments, such as across distant datacenters. Similarly,
> >>>>> for special job attributes, many works take different approaches,
> such
> >>>>> as runtime optimization, to solve problems like data skew, and to
> >>>>> optimize systems for data processing jobs with small-scale input
> data.
> >>>>>
> >>>>> Although each of the systems performs well with the jobs and in the
> >>>>> environments they target, they perform poorly with unconsidered
> cases,
> >>>>> and do not consider supporting multiple deployment characteristics on
> >>>>> a single system in their designs.
> >>>>>
> >>>>> For an application writer to optimize an application to perform well
> >>>>> on a certain system engraved with its underlying behaviors, it
> >>>>> requires a deep understanding of the system itself, which is an
> >>>>> overhead that often requires a lot of time and effort. Moreover, for
> a
> >>>>> developer to modify such system behaviors, it requires modifications
> >>>>> of the system core, which requires an even deeper understanding of
> the
> >>>>> system itself.
> >>>>>
> >>>>> With this background, Onyx is designed to represent all of its jobs
> as
> >>>>> an Intermediate Representation (IR) DAG. In the Onyx compiler, user
> >>>>> applications from various programming models (ex. Apache Beam) are
> >>>>> submitted, transformed to an IR DAG, and optimized/customized for the
> >>>>> deployment characteristics. In the IR DAG optimization phase, the DAG
> >>>>> is modified through a series of compiler “passes” which reshape or
> >>>>> annotate the DAG with an expression of the underlying runtime
> >>>>> behaviors. The IR DAG is then submitted as an execution plan for the
> >>>>> Onyx runtime. The runtime includes the unmodified parts of data
> >>>>> processing in the backbone which is transparently integrated with
> >>>>> configurable components exposed for further extension.
> >>>>>
> >>>>> == Rationale ==
> >>>>> Onyx’s vision lies in providing means for flexibly supporting a wide
> >>>>> variety of job execution scenarios for users while facilitating
> system
> >>>>> developers to extend the execution framework with various
> >>>>> functionalities at the same time. The capabilities of the system can
> >>>>> be extended as it grows to meet a more variety of execution
> scenarios.
> >>>>> We require inputs from users and developers from diverse domains in
> >>>>> order to make it a more thriving and useful project. The Apache
> >>>>> Software Foundation provides the best tools and community to support
> >>>>> this vision.
> >>>>>
> >>>>> == Initial Goals ==
> >>>>> Initial goals will be to move the existing codebase to Apache and
> >>>>> integrate with the Apache development process. We further plan to
> >>>>> develop our system to meet the needs for more execution scenarios for
> >>>>> a more variety of deployment characteristics.
> >>>>>
> >>>>> == Current Status ==
> >>>>> Onyx codebase is currently hosted in a repository at github.com. The
> >>>>> current version has been developed by system developers at Seoul
> >>>>> National University, Viva Republica, Samsung, and LG.
> >>>>>
> >>>>> == Meritocracy ==
> >>>>> We plan to strongly support meritocracy. We will discuss the
> >>>>> requirements in an open forum, and those that continuously contribute
> >>>>> to Onyx with the passion to strengthen the system will be invited as
> >>>>> committers. Contributors that enrich Onyx by providing various use
> >>>>> cases, various implementations of the configurable components
> >>>>> including ideas for optimization techniques will be especially
> >>>>> welcome. Committers with a deep understanding of the system’s
> >>>>> technical aspects as a whole and its philosophy will definitely be
> >>>>> voted as the PMC. We will monitor community participation so that
> >>>>> privileges can be extended to those that contribute.
> >>>>>
> >>>>> == Community ==
> >>>>> We hope to expand our contribution community by becoming an Apache
> >>>>> incubator project. The contributions will come from both users and
> >>>>> system developers interested in flexibility and extensibility of job
> >>>>> executions that Onyx can support. We expect users to mainly
> contribute
> >>>>> to diversify the use cases and deployment characteristics, and
> >>>>> developers to  contribute to implement them.
> >>>>>
> >>>>> == Alignment ==
> >>>>> Apache Spark is one of many popular data processing frameworks. The
> >>>>> system is designed towards optimizing jobs using RDDs in memory and
> >>>>> many other optimizations built tightly within the framework. In
> >>>>> contrast to Spark, Onyx aims to provide more flexibility for job
> >>>>> execution in an easy manner.
> >>>>>
> >>>>> Apache Tez enables developers to build complex task DAGs with control
> >>>>> over the control plane of job execution. In Onyx, a high-level
> >>>>> programming layer (ex. Apache Beam) is automatically converted to a
> >>>>> basic IR DAG and can be converted to any IR DAG through a series of
> >>>>> easy user writable passes, that can both reshape and modify the
> >>>>> annotation (of execution properties) of the DAG. Moreover, Onyx
> leaves
> >>>>> more parts of the job execution configurable, such as the scheduler
> >>>>> and the data plane. As opposed to providing a set of properties for
> >>>>> solid optimization, Onyx’s configurable parts can be easily extended
> >>>>> and explored by implementing the pre-defined interfaces. For example,
> >>>>> an arbitrary intermediate data store can be added.
> >>>>>
> >>>>> Onyx currently supports Apache Beam programs and we are working on
> >>>>> supporting Apache Spark programs as well. Onyx also utilizes Apache
> >>>>> REEF for container management, which allows Onyx to run in Apache
> YARN
> >>>>> and Apache Mesos clusters. If necessary, we plan to contribute to and
> >>>>> collaborate with these other Apache projects for the benefit of all.
> >>>>> We plan to extend such integrations with more Apache softwares.
> Apache
> >>>>> software foundation already hosts many major big-data systems, and we
> >>>>> expect to help further growth of the big-data community by having
> Onyx
> >>>>> within the Apache foundation.
> >>>>>
> >>>>> == Known Risks ==
> >>>>> === Orphaned Products ===
> >>>>> The risk of the Onyx project being orphaned is minimal. There is
> >>>>> already plenty of work that arduously support different deployment
> >>>>> characteristics, and we propose a general way to implement them with
> >>>>> flexible and extensible configuration knobs. The domain of data
> >>>>> processing is already of high interest, and this domain is expected
> to
> >>>>> evolve continuously with various other purposes, such as resource
> >>>>> disaggregation and using transient resources for better datacenter
> >>>>> resource utilization.
> >>>>>
> >>>>> === Inexperience with Open Source ===
> >>>>> The initial committers include PMC members and committers of other
> >>>>> Apache projects. They have experience with open source projects,
> >>>>> starting from their incubation to the top-level. They have been
> >>>>> involved in the open source development process, and are familiar
> with
> >>>>> releasing code under an open source license.
> >>>>>
> >>>>> === Homogeneous Developers ===
> >>>>> The initial set of committers is from a limited set of organizations,
> >>>>> but we expect to attract new contributors from diverse organizations
> >>>>> and will thus grow organically once approved for incubation. Our
> prior
> >>>>> experience with other open source projects will help various
> >>>>> contributors to actively participate in our project.
> >>>>>
> >>>>> === Reliance on Salaried Developers ===
> >>>>> Many developers are from Seoul National University. This is not
> >>>> applicable.
> >>>>>
> >>>>> === Relationships with Other Apache Products ===
> >>>>> Onyx positions itself among multiple Apache products. It runs on
> >>>>> Apache REEF for container management. It also utilizes many useful
> >>>>> development tools including Apache Maven, Apache Log4J, and multiple
> >>>>> Apache Commons components. Onyx supports the Apache Beam programming
> >>>>> model for user applications. We are currently working on supporting
> >>>>> the Apache Spark programming APIs as well.
> >>>>>
> >>>>> === An Excessive Fascination with the Apache Brand ===
> >>>>> We hope to make Onyx a powerful system for data processing, meeting
> >>>>> various needs for different deployment characteristics, under a more
> >>>>> variety of environments. We see the limitations of simply putting
> code
> >>>>> on GitHub, and we believe the Apache community will help the growth
> of
> >>>>> Onyx for the project to become a positively impactful and innovative
> >>>>> open source software. We believe Onyx is a great fit for the Apache
> >>>>> Software Foundation due to the collaboration it aims to achieve from
> >>>>> the big data processing community.
> >>>>>
> >>>>> == Documentation ==
> >>>>> The current documentation for Onyx is at
> >> https://snuspl.github.io/onyx/.
> >>>>>
> >>>>> == Initial Source ==
> >>>>> The Onyx codebase is currently hosted at
> >> https://github.com/snuspl/onyx.
> >>>>>
> >>>>> == External Dependencies ==
> >>>>> To the best of our knowledge, all Onyx dependencies are distributed
> >>>>> under Apache compatible licenses. Upon acceptance to the incubator,
> we
> >>>>> would begin a thorough analysis of all transitive dependencies to
> >>>>> verify this fact and further introduce license checking into the
> build
> >>>>> and release process.
> >>>>>
> >>>>> == Cryptography ==
> >>>>> Not applicable.
> >>>>>
> >>>>> == Required Resources ==
> >>>>> === Mailing Lists ===
> >>>>> We will operate two mailing lists as follows:
> >>>>>    * Onyx PMC discussions: [hidden email]
> >>>>>    * Onyx developers: [hidden email]
> >>>>>
> >>>>> === Git Repositories ===
> >>>>> Upon incubation: https://github.com/apache/incubator-onyx.
> >>>>> After the incubation, we would like to move the existing repo
> >>>>> https://github.com/snuspl/onyx to the Apache infrastructure
> >>>>>
> >>>>> === Issue Tracking ===
> >>>>> Onyx currently tracks its issues using the Github issue tracker:
> >>>>> https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
> >>>>> JIRA.
> >>>>>
> >>>>> == Initial Committers ==
> >>>>>   * Byung-Gon Chun
> >>>>>   * Jeongyoon Eo
> >>>>>   * Geon-Woo Kim
> >>>>>   * Joo Yeon Kim
> >>>>>   * Gyewon Lee
> >>>>>   * Jung-Gil Lee
> >>>>>   * Sanha Lee
> >>>>>   * Wooyeon Lee
> >>>>>   * Yunseong Lee
> >>>>>   * JangHo Seo
> >>>>>   * Won Wook Song
> >>>>>   * Taegeon Um
> >>>>>   * Youngseok Yang
> >>>>>
> >>>>> == Affiliations ==
> >>>>>   * SNU (Seoul National University)
> >>>>>     * Byung-Gon Chun
> >>>>>     * Jeongyoon Eo
> >>>>>     * Geon-Woo Kim
> >>>>>     * Gyewon Lee
> >>>>>     * Sanha Lee
> >>>>>     * Wooyeon Lee
> >>>>>     * Yunseong Lee
> >>>>>     * JangHo Seo
> >>>>>     * Won Wook Song
> >>>>>     * Taegeon Um
> >>>>>     * Youngseok Yang
> >>>>>
> >>>>>   * LG
> >>>>>     * Jung-Gil Lee
> >>>>>
> >>>>>   * Samsung
> >>>>>     * Joo Yeon Kim
> >>>>>
> >>>>>   * Viva Republica
> >>>>>     * Geon-Woo Kim
> >>>>>
> >>>>> == Sponsors ==
> >>>>> === Champions ===
> >>>>> Byung-Gon Chun
> >>>>>
> >>>>> === Mentors ===
> >>>>>   * Hyunsik Choi
> >>>>>   * Byung-Gon Chun
> >>>>>   * Markus Weimer
> >>>>>   * Reynold Xin
> >>>>>
> >>>>> === Sponsoring Entity ===
> >>>>> The Apache Incubator
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>> --
> >>>> Jean-Baptiste Onofré
> >>>> [hidden email]
> >>>> http://blog.nanthrax.net
> >>>> Talend - http://www.talend.com
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [hidden email]
> >>>> For additional commands, e-mail: [hidden email]
> >>>>
> >>>>
> >>>
> >>>
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> [hidden email]
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
> >
> >
>
> --
> Jean-Baptiste Onofré
> [hidden email]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Byung-Gon Chun
12