[DISCUSS] Rya Incubator Proposal

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Rya Incubator Proposal

Adina Crainiceanu
Hi,

We would like to start a discussion on accepting Rya, a scalable RDF data
management system built on top of Accumulo. into Apache Incubator.

The proposal is available online at
https://wiki.apache.org/incubator/RyaProposal and also at the end of this
email.

We are looking for additional mentors to help us with the project. Any
advice and help will be appreciated.

Thank you very much,
Adina



= Rya Proposal =

== Abstract ==

Rya (pronounced "ree-uh" /rēə/) is a cloud-based RDF triple store that
supports SPARQL queries.

== Proposal ==

Rya is a scalable RDF data management system built on top of Accumulo. Rya
uses novel storage methods, indexing schemes, and query processing
techniques that scale to billions of triples across multiple nodes. Rya
provides fast and easy access to the data through SPARQL, a conventional
query mechanism for RDF data.

== Background ==

RDF is a World Wide Web Consortium (W3C) standard used in describing
resources on the Web. The smallest data unit is a triple consisting of
subject, predicate, and object. Using this framework, it is very easy to
describe any resource, not just Web related. For example, if you want to
say that Alice is a professor, you can represent this as an RDF triple like
(Alice, rdf:type, Professor). In general, RDF is an open world framework
that allows anyone to make any statement about any resource, which makes it
 a popular choice for expressing a large variety of data.

RDF is used in conjunction with the Web Ontology Language (OWL). OWL is a
framework for describing models or ontologies for RDF. It defines concepts,
relationships, and/or structure of RDF documents. These models can be used
to 'reason/infer' information about entities within a given domain. For
example, you can express that a Professor is a sub class of Faculty,
(Professor, rdfs:subClassOf, Faculty) and knowing that (Alice, rdf:type,
Professor), it can be inferred that (Alice, rdf:type, Faculty).

SPARQL is an RDF query language. Similar with SQL, SPARQL has SELECT and
WHERE clauses; however, it is based on querying and retrieving RDF triples.

Work on Rya, a large scale distributed system for  storing and querying RDF
data, started in 2010.

== Rationale ==

With the increase in data size, there is a need for scalable systems for
storing and retrieving RDF data in a cluster of nodes. We believe that Rya
can fulfil that role. We expect that communities within government, health
care, finance, and others who generate large amounts of RDF data will be
most interested in this project.

From its inception, the project operated with an Apache-style license, but
it was open to mostly US government-related projects only. We believe that
having the project and the development open for all will benefit both the
project and the interested communities.

== Current Status ==

The project source code and documentation are currently hosted in a private
repository on Github. New users are added to the repository upon request.

=== Meritocracy ===

Meritocracy is the model that we currently follow, and we want to build a
larger and more diverse developer community by becoming an Apache project.

=== Community ===

Rya has being building a community of users and developers for the past 3
years. There is currently an active workgroup with monthly meetings and the
number of participants in the meeting is increasing.

=== Core Developers ===

The core developers are a diverse group of people who are either government
employees or former / current government contractors from different
companies.

=== Alignment ===

Rya is built on top of Accumulo, an Apache project.

== Known Risks ==

=== Orphaned Products ===

There is a very small risk of becoming orphaned. The current contributors
are strongly committed to the project, there is a large enough number of
developers interested in contributing to the project, and we believe that
the support for the project will continue to grow from the interested
communities.

=== Inexperience with Open Source ===

The initial committers have various degrees of experience with open source
projects - from very new to experienced. This project was open source
within government from the beginning. We do not expect to have difficulties
in operating under Apache's development process.

=== Homogenous Developers ===

The current list of developers form a heterogeneous group, with people for
academia, government, and industry, collaborating from distributed
geographic locations. We aim to expand the list of contributors with the
help of the Apache incubation process.

=== Reliance on Salaried Developers ===

Many but not all of the developers working on the project are salaried
employees, paid to work on this project. They will continue to contribute
to the open source project. Some of the initial committers continued as
volunteers even if no longer employed to work on this project and they plan
to continue supporting the project.

=== Relationships with Other Apache Products ===

Rya uses Apache Accumulo, Hadoop, Zookeeper, Maven.

=== Apache Brand ===

Rya has generated interest in the government. It also generated interest
within academia and industry. We believe that everyone could benefit from
having Rya as an open source project. Due to its strong ties to Accumulo,
an Apache project, and due to the values of the Apache Foundation, we
believe that Apache incubator is the right place for Rya.

== Documentation ==

Two peer-reviewed publications [1,2] about Rya were published in 2012 and
2015. More documentation is available in the code.

[1] Roshan Punnoose, Adina Crainiceanu, David Rapp. Rya: A Scalable RDF
Triple Store for the Clouds. Proceedings of the 1st International Workshop
on Cloud Intelligence, Pages 4:1-4:8, August 2012

[2] Roshan Punnoose, Adina Crainiceanu, David Rapp. SPARQL in the Clouds
Using Rya. Information Systems, Volume 48, Pages 181-195, March 2015
(Available online 23 July 2013)

== Initial Source ==

The code is currently available in a private Github repository.
https://github.com/LAS-NCSU/rya

== Source and Intellectual Property Submission Plan ==

The source code has been released under the Apache License, Version 2.
Software grant, and CCLAs have been submitted. ICLAs for initial committers
have been submitted or are in progress.

== External Dependencies ==

 * Open RDF (BSD license)
 * GeoMesa (Apache License, Version 2.0)
 * Accumulo (Apache License, Version 2.0)
 * Hadoop (Apache License, Version 2.0)
 * TinkerPop (Apache License, Version 2.0)
 * IndexingSail (Apache License, Version 2.0)

== Cryptography ==

The proposal does not involve any cryptographic code.

== Required Resources ==

=== Mailing lists ===

 * [hidden email]
 * [hidden email]
 * [hidden email]

=== Git Repository ===

https://git-wip-us.apache.org/repos/asf/incubator-rya.git

=== Issue Tracking ===

JIRA Rya

== Initial Committers ==

 * Roshan Punnoose, roshanp at gmail dot com
 * David Rapp, dnrapp at ncsu dot edu
 * Adina Crainiceanu, adinancr at gmail dot com
 * Aaron Mihalik, aaron.mihalik at gmail dot com
 * Puja Valiyil, pujav65 at gmail dot com
 * Jennifer Brown, jennifer.brown at parsons dot com
 * Steve Wagner, steve.r.wagner at gmail dot com

== Affiliations ==

 * Roshan Punnoose, Enlighten IT Consulting
 * David Rapp, North Carolina State University
 * Adina Crainiceanu, US Naval Academy
 * Aaron Mihalik, Parsons
 * Puja Valiyil, Parsons
 * Jennifer Brown, Parsons
 * Steve Wagner, Enlighten IT Consulting

== Sponsors ==

=== Champion ===

Adam Fuchs, ASF Member, afuchs at apache dot org

=== Nominated Mentors ===

Josh Elser josh dot elser at gmail dot com

We are seeking additional mentors

=== Sponsoring Entity ===

Apache Incubator
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Rya Incubator Proposal

Sean Busbey
This sounds like a great addition.

Given this proposal, why keep the repo private?

How has community growth/outreach worked to date?

Any documented forward looking plans for the community?

re: mentors, I'd love to see some mentors with strong ties outside of
Accumulo for the project. That said, I'd be happy help.


On Thu, Sep 3, 2015 at 8:03 AM, Adina Crainiceanu <[hidden email]> wrote:

> Hi,
>
> We would like to start a discussion on accepting Rya, a scalable RDF data
> management system built on top of Accumulo. into Apache Incubator.
>
> The proposal is available online at
> https://wiki.apache.org/incubator/RyaProposal and also at the end of this
> email.
>
> We are looking for additional mentors to help us with the project. Any
> advice and help will be appreciated.
>
> Thank you very much,
> Adina
>
>
>
> = Rya Proposal =
>
> == Abstract ==
>
> Rya (pronounced "ree-uh" /rēə/) is a cloud-based RDF triple store that
> supports SPARQL queries.
>
> == Proposal ==
>
> Rya is a scalable RDF data management system built on top of Accumulo. Rya
> uses novel storage methods, indexing schemes, and query processing
> techniques that scale to billions of triples across multiple nodes. Rya
> provides fast and easy access to the data through SPARQL, a conventional
> query mechanism for RDF data.
>
> == Background ==
>
> RDF is a World Wide Web Consortium (W3C) standard used in describing
> resources on the Web. The smallest data unit is a triple consisting of
> subject, predicate, and object. Using this framework, it is very easy to
> describe any resource, not just Web related. For example, if you want to
> say that Alice is a professor, you can represent this as an RDF triple like
> (Alice, rdf:type, Professor). In general, RDF is an open world framework
> that allows anyone to make any statement about any resource, which makes it
>  a popular choice for expressing a large variety of data.
>
> RDF is used in conjunction with the Web Ontology Language (OWL). OWL is a
> framework for describing models or ontologies for RDF. It defines concepts,
> relationships, and/or structure of RDF documents. These models can be used
> to 'reason/infer' information about entities within a given domain. For
> example, you can express that a Professor is a sub class of Faculty,
> (Professor, rdfs:subClassOf, Faculty) and knowing that (Alice, rdf:type,
> Professor), it can be inferred that (Alice, rdf:type, Faculty).
>
> SPARQL is an RDF query language. Similar with SQL, SPARQL has SELECT and
> WHERE clauses; however, it is based on querying and retrieving RDF triples.
>
> Work on Rya, a large scale distributed system for  storing and querying RDF
> data, started in 2010.
>
> == Rationale ==
>
> With the increase in data size, there is a need for scalable systems for
> storing and retrieving RDF data in a cluster of nodes. We believe that Rya
> can fulfil that role. We expect that communities within government, health
> care, finance, and others who generate large amounts of RDF data will be
> most interested in this project.
>
> From its inception, the project operated with an Apache-style license, but
> it was open to mostly US government-related projects only. We believe that
> having the project and the development open for all will benefit both the
> project and the interested communities.
>
> == Current Status ==
>
> The project source code and documentation are currently hosted in a private
> repository on Github. New users are added to the repository upon request.
>
> === Meritocracy ===
>
> Meritocracy is the model that we currently follow, and we want to build a
> larger and more diverse developer community by becoming an Apache project.
>
> === Community ===
>
> Rya has being building a community of users and developers for the past 3
> years. There is currently an active workgroup with monthly meetings and the
> number of participants in the meeting is increasing.
>
> === Core Developers ===
>
> The core developers are a diverse group of people who are either government
> employees or former / current government contractors from different
> companies.
>
> === Alignment ===
>
> Rya is built on top of Accumulo, an Apache project.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> There is a very small risk of becoming orphaned. The current contributors
> are strongly committed to the project, there is a large enough number of
> developers interested in contributing to the project, and we believe that
> the support for the project will continue to grow from the interested
> communities.
>
> === Inexperience with Open Source ===
>
> The initial committers have various degrees of experience with open source
> projects - from very new to experienced. This project was open source
> within government from the beginning. We do not expect to have difficulties
> in operating under Apache's development process.
>
> === Homogenous Developers ===
>
> The current list of developers form a heterogeneous group, with people for
> academia, government, and industry, collaborating from distributed
> geographic locations. We aim to expand the list of contributors with the
> help of the Apache incubation process.
>
> === Reliance on Salaried Developers ===
>
> Many but not all of the developers working on the project are salaried
> employees, paid to work on this project. They will continue to contribute
> to the open source project. Some of the initial committers continued as
> volunteers even if no longer employed to work on this project and they plan
> to continue supporting the project.
>
> === Relationships with Other Apache Products ===
>
> Rya uses Apache Accumulo, Hadoop, Zookeeper, Maven.
>
> === Apache Brand ===
>
> Rya has generated interest in the government. It also generated interest
> within academia and industry. We believe that everyone could benefit from
> having Rya as an open source project. Due to its strong ties to Accumulo,
> an Apache project, and due to the values of the Apache Foundation, we
> believe that Apache incubator is the right place for Rya.
>
> == Documentation ==
>
> Two peer-reviewed publications [1,2] about Rya were published in 2012 and
> 2015. More documentation is available in the code.
>
> [1] Roshan Punnoose, Adina Crainiceanu, David Rapp. Rya: A Scalable RDF
> Triple Store for the Clouds. Proceedings of the 1st International Workshop
> on Cloud Intelligence, Pages 4:1-4:8, August 2012
>
> [2] Roshan Punnoose, Adina Crainiceanu, David Rapp. SPARQL in the Clouds
> Using Rya. Information Systems, Volume 48, Pages 181-195, March 2015
> (Available online 23 July 2013)
>
> == Initial Source ==
>
> The code is currently available in a private Github repository.
> https://github.com/LAS-NCSU/rya
>
> == Source and Intellectual Property Submission Plan ==
>
> The source code has been released under the Apache License, Version 2.
> Software grant, and CCLAs have been submitted. ICLAs for initial committers
> have been submitted or are in progress.
>
> == External Dependencies ==
>
>  * Open RDF (BSD license)
>  * GeoMesa (Apache License, Version 2.0)
>  * Accumulo (Apache License, Version 2.0)
>  * Hadoop (Apache License, Version 2.0)
>  * TinkerPop (Apache License, Version 2.0)
>  * IndexingSail (Apache License, Version 2.0)
>
> == Cryptography ==
>
> The proposal does not involve any cryptographic code.
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * [hidden email]
>  * [hidden email]
>  * [hidden email]
>
> === Git Repository ===
>
> https://git-wip-us.apache.org/repos/asf/incubator-rya.git
>
> === Issue Tracking ===
>
> JIRA Rya
>
> == Initial Committers ==
>
>  * Roshan Punnoose, roshanp at gmail dot com
>  * David Rapp, dnrapp at ncsu dot edu
>  * Adina Crainiceanu, adinancr at gmail dot com
>  * Aaron Mihalik, aaron.mihalik at gmail dot com
>  * Puja Valiyil, pujav65 at gmail dot com
>  * Jennifer Brown, jennifer.brown at parsons dot com
>  * Steve Wagner, steve.r.wagner at gmail dot com
>
> == Affiliations ==
>
>  * Roshan Punnoose, Enlighten IT Consulting
>  * David Rapp, North Carolina State University
>  * Adina Crainiceanu, US Naval Academy
>  * Aaron Mihalik, Parsons
>  * Puja Valiyil, Parsons
>  * Jennifer Brown, Parsons
>  * Steve Wagner, Enlighten IT Consulting
>
> == Sponsors ==
>
> === Champion ===
>
> Adam Fuchs, ASF Member, afuchs at apache dot org
>
> === Nominated Mentors ===
>
> Josh Elser josh dot elser at gmail dot com
>
> We are seeking additional mentors
>
> === Sponsoring Entity ===
>
> Apache Incubator
>



--
Sean
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Rya Incubator Proposal

Josh Elser
In reply to this post by Adina Crainiceanu
As specified in the proposal, I'm happy to volunteer as a mentor
(currently a member, haven't officially requested to join IPMC).

<snip>

>
> === Inexperience with Open Source ===
>
> The initial committers have various degrees of experience with open source
> projects - from very new to experienced. This project was open source
> within government from the beginning. We do not expect to have difficulties
> in operating under Apache's development process.

This statement struck me as a little odd. While I understand that you
tried to operate as an open source project, it seems impossible by
definition to be open source. My biggest concern would just be that you
are aware that you will have difficulties transitioning to a real open
codebase and growing a community in the open.


> == Initial Source ==
>
> The code is currently available in a private Github repository.
> https://github.com/LAS-NCSU/rya

What's the plan to make this a non-private repo?



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Rya Incubator Proposal

Adina Crainiceanu
In reply to this post by Sean Busbey
Sean,

Thank you for your comments. I'll try to answer your questions below.

>> Given this proposal, why keep the repo private?

We are in the process of making the repository public.

>>How has community growth/outreach worked to date?

We received many emails of interest after the first Rya article was
published in 2012 in a peer-reviewed venue. We started collaborating with
people outside the original group (Roshan, Dave and I) and in time, the
interest in the project grew. More people started using the project, and
some contributed to the project. The people who contributed the most are
now part of the initial committers list on this proposal. We published
another article and we gave presentations at a Semantic web meetup and
Accumulo Summit 2015.  They all generated interest, hence the proposal to
have Rya become part of the incubator.

>> Any documented forward looking plans for the community?

The pmc members listed in the proposal represent a variety of organizations
and they plan to stay involved in the project long-term. This will form a
solid foundation, and we expect the community to grow as we open it up
under the ASF structure

>> re: mentors, I'd love to see some mentors with strong ties outside of Accumulo
for the project. That said, I'd be happy help
Thank you for the offer.

Adina

On Thu, Sep 3, 2015 at 12:17 PM, Sean Busbey <[hidden email]> wrote:

> This sounds like a great addition.
>
> Given this proposal, why keep the repo private?
>
> How has community growth/outreach worked to date?
>
> Any documented forward looking plans for the community?
>
> re: mentors, I'd love to see some mentors with strong ties outside of
> Accumulo for the project. That said, I'd be happy help.
>
>
> On Thu, Sep 3, 2015 at 8:03 AM, Adina Crainiceanu <[hidden email]> wrote:
>
> > Hi,
> >
> > We would like to start a discussion on accepting Rya, a scalable RDF data
> > management system built on top of Accumulo. into Apache Incubator.
> >
> > The proposal is available online at
> > https://wiki.apache.org/incubator/RyaProposal and also at the end of
> this
> > email.
> >
> > We are looking for additional mentors to help us with the project. Any
> > advice and help will be appreciated.
> >
> > Thank you very much,
> > Adina
> >
> >
> >
> > = Rya Proposal =
> >
> > == Abstract ==
> >
> > Rya (pronounced "ree-uh" /rēə/) is a cloud-based RDF triple store that
> > supports SPARQL queries.
> >
> > == Proposal ==
> >
> > Rya is a scalable RDF data management system built on top of Accumulo.
> Rya
> > uses novel storage methods, indexing schemes, and query processing
> > techniques that scale to billions of triples across multiple nodes. Rya
> > provides fast and easy access to the data through SPARQL, a conventional
> > query mechanism for RDF data.
> >
> > == Background ==
> >
> > RDF is a World Wide Web Consortium (W3C) standard used in describing
> > resources on the Web. The smallest data unit is a triple consisting of
> > subject, predicate, and object. Using this framework, it is very easy to
> > describe any resource, not just Web related. For example, if you want to
> > say that Alice is a professor, you can represent this as an RDF triple
> like
> > (Alice, rdf:type, Professor). In general, RDF is an open world framework
> > that allows anyone to make any statement about any resource, which makes
> it
> >  a popular choice for expressing a large variety of data.
> >
> > RDF is used in conjunction with the Web Ontology Language (OWL). OWL is a
> > framework for describing models or ontologies for RDF. It defines
> concepts,
> > relationships, and/or structure of RDF documents. These models can be
> used
> > to 'reason/infer' information about entities within a given domain. For
> > example, you can express that a Professor is a sub class of Faculty,
> > (Professor, rdfs:subClassOf, Faculty) and knowing that (Alice, rdf:type,
> > Professor), it can be inferred that (Alice, rdf:type, Faculty).
> >
> > SPARQL is an RDF query language. Similar with SQL, SPARQL has SELECT and
> > WHERE clauses; however, it is based on querying and retrieving RDF
> triples.
> >
> > Work on Rya, a large scale distributed system for  storing and querying
> RDF
> > data, started in 2010.
> >
> > == Rationale ==
> >
> > With the increase in data size, there is a need for scalable systems for
> > storing and retrieving RDF data in a cluster of nodes. We believe that
> Rya
> > can fulfil that role. We expect that communities within government,
> health
> > care, finance, and others who generate large amounts of RDF data will be
> > most interested in this project.
> >
> > From its inception, the project operated with an Apache-style license,
> but
> > it was open to mostly US government-related projects only. We believe
> that
> > having the project and the development open for all will benefit both the
> > project and the interested communities.
> >
> > == Current Status ==
> >
> > The project source code and documentation are currently hosted in a
> private
> > repository on Github. New users are added to the repository upon request.
> >
> > === Meritocracy ===
> >
> > Meritocracy is the model that we currently follow, and we want to build a
> > larger and more diverse developer community by becoming an Apache
> project.
> >
> > === Community ===
> >
> > Rya has being building a community of users and developers for the past 3
> > years. There is currently an active workgroup with monthly meetings and
> the
> > number of participants in the meeting is increasing.
> >
> > === Core Developers ===
> >
> > The core developers are a diverse group of people who are either
> government
> > employees or former / current government contractors from different
> > companies.
> >
> > === Alignment ===
> >
> > Rya is built on top of Accumulo, an Apache project.
> >
> > == Known Risks ==
> >
> > === Orphaned Products ===
> >
> > There is a very small risk of becoming orphaned. The current contributors
> > are strongly committed to the project, there is a large enough number of
> > developers interested in contributing to the project, and we believe that
> > the support for the project will continue to grow from the interested
> > communities.
> >
> > === Inexperience with Open Source ===
> >
> > The initial committers have various degrees of experience with open
> source
> > projects - from very new to experienced. This project was open source
> > within government from the beginning. We do not expect to have
> difficulties
> > in operating under Apache's development process.
> >
> > === Homogenous Developers ===
> >
> > The current list of developers form a heterogeneous group, with people
> for
> > academia, government, and industry, collaborating from distributed
> > geographic locations. We aim to expand the list of contributors with the
> > help of the Apache incubation process.
> >
> > === Reliance on Salaried Developers ===
> >
> > Many but not all of the developers working on the project are salaried
> > employees, paid to work on this project. They will continue to contribute
> > to the open source project. Some of the initial committers continued as
> > volunteers even if no longer employed to work on this project and they
> plan
> > to continue supporting the project.
> >
> > === Relationships with Other Apache Products ===
> >
> > Rya uses Apache Accumulo, Hadoop, Zookeeper, Maven.
> >
> > === Apache Brand ===
> >
> > Rya has generated interest in the government. It also generated interest
> > within academia and industry. We believe that everyone could benefit from
> > having Rya as an open source project. Due to its strong ties to Accumulo,
> > an Apache project, and due to the values of the Apache Foundation, we
> > believe that Apache incubator is the right place for Rya.
> >
> > == Documentation ==
> >
> > Two peer-reviewed publications [1,2] about Rya were published in 2012 and
> > 2015. More documentation is available in the code.
> >
> > [1] Roshan Punnoose, Adina Crainiceanu, David Rapp. Rya: A Scalable RDF
> > Triple Store for the Clouds. Proceedings of the 1st International
> Workshop
> > on Cloud Intelligence, Pages 4:1-4:8, August 2012
> >
> > [2] Roshan Punnoose, Adina Crainiceanu, David Rapp. SPARQL in the Clouds
> > Using Rya. Information Systems, Volume 48, Pages 181-195, March 2015
> > (Available online 23 July 2013)
> >
> > == Initial Source ==
> >
> > The code is currently available in a private Github repository.
> > https://github.com/LAS-NCSU/rya
> >
> > == Source and Intellectual Property Submission Plan ==
> >
> > The source code has been released under the Apache License, Version 2.
> > Software grant, and CCLAs have been submitted. ICLAs for initial
> committers
> > have been submitted or are in progress.
> >
> > == External Dependencies ==
> >
> >  * Open RDF (BSD license)
> >  * GeoMesa (Apache License, Version 2.0)
> >  * Accumulo (Apache License, Version 2.0)
> >  * Hadoop (Apache License, Version 2.0)
> >  * TinkerPop (Apache License, Version 2.0)
> >  * IndexingSail (Apache License, Version 2.0)
> >
> > == Cryptography ==
> >
> > The proposal does not involve any cryptographic code.
> >
> > == Required Resources ==
> >
> > === Mailing lists ===
> >
> >  * [hidden email]
> >  * [hidden email]
> >  * [hidden email]
> >
> > === Git Repository ===
> >
> > https://git-wip-us.apache.org/repos/asf/incubator-rya.git
> >
> > === Issue Tracking ===
> >
> > JIRA Rya
> >
> > == Initial Committers ==
> >
> >  * Roshan Punnoose, roshanp at gmail dot com
> >  * David Rapp, dnrapp at ncsu dot edu
> >  * Adina Crainiceanu, adinancr at gmail dot com
> >  * Aaron Mihalik, aaron.mihalik at gmail dot com
> >  * Puja Valiyil, pujav65 at gmail dot com
> >  * Jennifer Brown, jennifer.brown at parsons dot com
> >  * Steve Wagner, steve.r.wagner at gmail dot com
> >
> > == Affiliations ==
> >
> >  * Roshan Punnoose, Enlighten IT Consulting
> >  * David Rapp, North Carolina State University
> >  * Adina Crainiceanu, US Naval Academy
> >  * Aaron Mihalik, Parsons
> >  * Puja Valiyil, Parsons
> >  * Jennifer Brown, Parsons
> >  * Steve Wagner, Enlighten IT Consulting
> >
> > == Sponsors ==
> >
> > === Champion ===
> >
> > Adam Fuchs, ASF Member, afuchs at apache dot org
> >
> > === Nominated Mentors ===
> >
> > Josh Elser josh dot elser at gmail dot com
> >
> > We are seeking additional mentors
> >
> > === Sponsoring Entity ===
> >
> > Apache Incubator
> >
>
>
>
> --
> Sean
>



--
Dr. Adina Crainiceanu
http://www.usna.edu/Users/cs/adina/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Rya Incubator Proposal

Adina Crainiceanu
In reply to this post by Josh Elser
Josh,

Thank you for volunteering to be out mentor.


> As specified in the proposal, I'm happy to volunteer as a mentor
> (currently a member, haven't officially requested to join IPMC).
>
> <snip>
>
>
>> === Inexperience with Open Source ===
>>
>> The initial committers have various degrees of experience with open source
>> projects - from very new to experienced. This project was open source
>> within government from the beginning. We do not expect to have
>> difficulties
>> in operating under Apache's development process.
>>
>
> This statement struck me as a little odd. While I understand that you
> tried to operate as an open source project, it seems impossible by
> definition to be open source. My biggest concern would just be that you are
> aware that you will have difficulties transitioning to a real open codebase
> and growing a community in the open.


We are aware that it will be different and more difficult functioning in a
real open source environment, but we are enthusiastic and committed to
making it work. With the help of our mentors, we hope to be successful.


>
> == Initial Source ==
>>
>> The code is currently available in a private Github repository.
>> https://github.com/LAS-NCSU/rya
>>
>
> What's the plan to make this a non-private repo?
>

We are working on it right now. Initially we though that the code will be
made publicly available by bringing it to ASF, but we are working to make
the current repository public.


Thank you,
Adina


--
Dr. Adina Crainiceanu
http://www.usna.edu/Users/cs/adina/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Rya Incubator Proposal

Edward J. Yoon-2
Hello, I would like to help this project as a mentor if you're OK. I
researched this topic for a long time.

On Fri, Sep 4, 2015 at 6:08 AM, Adina Crainiceanu <[hidden email]> wrote:

> Josh,
>
> Thank you for volunteering to be out mentor.
>
>
>> As specified in the proposal, I'm happy to volunteer as a mentor
>> (currently a member, haven't officially requested to join IPMC).
>>
>> <snip>
>>
>>
>>> === Inexperience with Open Source ===
>>>
>>> The initial committers have various degrees of experience with open source
>>> projects - from very new to experienced. This project was open source
>>> within government from the beginning. We do not expect to have
>>> difficulties
>>> in operating under Apache's development process.
>>>
>>
>> This statement struck me as a little odd. While I understand that you
>> tried to operate as an open source project, it seems impossible by
>> definition to be open source. My biggest concern would just be that you are
>> aware that you will have difficulties transitioning to a real open codebase
>> and growing a community in the open.
>
>
> We are aware that it will be different and more difficult functioning in a
> real open source environment, but we are enthusiastic and committed to
> making it work. With the help of our mentors, we hope to be successful.
>
>
>>
>> == Initial Source ==
>>>
>>> The code is currently available in a private Github repository.
>>> https://github.com/LAS-NCSU/rya
>>>
>>
>> What's the plan to make this a non-private repo?
>>
>
> We are working on it right now. Initially we though that the code will be
> made publicly available by bringing it to ASF, but we are working to make
> the current repository public.
>
>
> Thank you,
> Adina
>
>
> --
> Dr. Adina Crainiceanu
> http://www.usna.edu/Users/cs/adina/



--
Best Regards, Edward J. Yoon

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Rya Incubator Proposal

Phillip Rhodes-8
In reply to this post by Adina Crainiceanu
I just want to say, I'm very excited about this project, and am happy
to contribute any way I can.  I've been thinking a lot lately
about how to build a scalable RDF system using something
like Spark, so this definitely intrigues me.


Phil


This message optimized for indexing by NSA PRISM

On Thu, Sep 3, 2015 at 9:03 AM, Adina Crainiceanu <[hidden email]> wrote:

> Hi,
>
> We would like to start a discussion on accepting Rya, a scalable RDF data
> management system built on top of Accumulo. into Apache Incubator.
>
> The proposal is available online at
> https://wiki.apache.org/incubator/RyaProposal and also at the end of this
> email.
>
> We are looking for additional mentors to help us with the project. Any
> advice and help will be appreciated.
>
> Thank you very much,
> Adina
>
>
>
> = Rya Proposal =
>
> == Abstract ==
>
> Rya (pronounced "ree-uh" /rēə/) is a cloud-based RDF triple store that
> supports SPARQL queries.
>
> == Proposal ==
>
> Rya is a scalable RDF data management system built on top of Accumulo. Rya
> uses novel storage methods, indexing schemes, and query processing
> techniques that scale to billions of triples across multiple nodes. Rya
> provides fast and easy access to the data through SPARQL, a conventional
> query mechanism for RDF data.
>
> == Background ==
>
> RDF is a World Wide Web Consortium (W3C) standard used in describing
> resources on the Web. The smallest data unit is a triple consisting of
> subject, predicate, and object. Using this framework, it is very easy to
> describe any resource, not just Web related. For example, if you want to
> say that Alice is a professor, you can represent this as an RDF triple like
> (Alice, rdf:type, Professor). In general, RDF is an open world framework
> that allows anyone to make any statement about any resource, which makes it
>  a popular choice for expressing a large variety of data.
>
> RDF is used in conjunction with the Web Ontology Language (OWL). OWL is a
> framework for describing models or ontologies for RDF. It defines concepts,
> relationships, and/or structure of RDF documents. These models can be used
> to 'reason/infer' information about entities within a given domain. For
> example, you can express that a Professor is a sub class of Faculty,
> (Professor, rdfs:subClassOf, Faculty) and knowing that (Alice, rdf:type,
> Professor), it can be inferred that (Alice, rdf:type, Faculty).
>
> SPARQL is an RDF query language. Similar with SQL, SPARQL has SELECT and
> WHERE clauses; however, it is based on querying and retrieving RDF triples.
>
> Work on Rya, a large scale distributed system for  storing and querying RDF
> data, started in 2010.
>
> == Rationale ==
>
> With the increase in data size, there is a need for scalable systems for
> storing and retrieving RDF data in a cluster of nodes. We believe that Rya
> can fulfil that role. We expect that communities within government, health
> care, finance, and others who generate large amounts of RDF data will be
> most interested in this project.
>
> From its inception, the project operated with an Apache-style license, but
> it was open to mostly US government-related projects only. We believe that
> having the project and the development open for all will benefit both the
> project and the interested communities.
>
> == Current Status ==
>
> The project source code and documentation are currently hosted in a private
> repository on Github. New users are added to the repository upon request.
>
> === Meritocracy ===
>
> Meritocracy is the model that we currently follow, and we want to build a
> larger and more diverse developer community by becoming an Apache project.
>
> === Community ===
>
> Rya has being building a community of users and developers for the past 3
> years. There is currently an active workgroup with monthly meetings and the
> number of participants in the meeting is increasing.
>
> === Core Developers ===
>
> The core developers are a diverse group of people who are either government
> employees or former / current government contractors from different
> companies.
>
> === Alignment ===
>
> Rya is built on top of Accumulo, an Apache project.
>
> == Known Risks ==
>
> === Orphaned Products ===
>
> There is a very small risk of becoming orphaned. The current contributors
> are strongly committed to the project, there is a large enough number of
> developers interested in contributing to the project, and we believe that
> the support for the project will continue to grow from the interested
> communities.
>
> === Inexperience with Open Source ===
>
> The initial committers have various degrees of experience with open source
> projects - from very new to experienced. This project was open source
> within government from the beginning. We do not expect to have difficulties
> in operating under Apache's development process.
>
> === Homogenous Developers ===
>
> The current list of developers form a heterogeneous group, with people for
> academia, government, and industry, collaborating from distributed
> geographic locations. We aim to expand the list of contributors with the
> help of the Apache incubation process.
>
> === Reliance on Salaried Developers ===
>
> Many but not all of the developers working on the project are salaried
> employees, paid to work on this project. They will continue to contribute
> to the open source project. Some of the initial committers continued as
> volunteers even if no longer employed to work on this project and they plan
> to continue supporting the project.
>
> === Relationships with Other Apache Products ===
>
> Rya uses Apache Accumulo, Hadoop, Zookeeper, Maven.
>
> === Apache Brand ===
>
> Rya has generated interest in the government. It also generated interest
> within academia and industry. We believe that everyone could benefit from
> having Rya as an open source project. Due to its strong ties to Accumulo,
> an Apache project, and due to the values of the Apache Foundation, we
> believe that Apache incubator is the right place for Rya.
>
> == Documentation ==
>
> Two peer-reviewed publications [1,2] about Rya were published in 2012 and
> 2015. More documentation is available in the code.
>
> [1] Roshan Punnoose, Adina Crainiceanu, David Rapp. Rya: A Scalable RDF
> Triple Store for the Clouds. Proceedings of the 1st International Workshop
> on Cloud Intelligence, Pages 4:1-4:8, August 2012
>
> [2] Roshan Punnoose, Adina Crainiceanu, David Rapp. SPARQL in the Clouds
> Using Rya. Information Systems, Volume 48, Pages 181-195, March 2015
> (Available online 23 July 2013)
>
> == Initial Source ==
>
> The code is currently available in a private Github repository.
> https://github.com/LAS-NCSU/rya
>
> == Source and Intellectual Property Submission Plan ==
>
> The source code has been released under the Apache License, Version 2.
> Software grant, and CCLAs have been submitted. ICLAs for initial committers
> have been submitted or are in progress.
>
> == External Dependencies ==
>
>  * Open RDF (BSD license)
>  * GeoMesa (Apache License, Version 2.0)
>  * Accumulo (Apache License, Version 2.0)
>  * Hadoop (Apache License, Version 2.0)
>  * TinkerPop (Apache License, Version 2.0)
>  * IndexingSail (Apache License, Version 2.0)
>
> == Cryptography ==
>
> The proposal does not involve any cryptographic code.
>
> == Required Resources ==
>
> === Mailing lists ===
>
>  * [hidden email]
>  * [hidden email]
>  * [hidden email]
>
> === Git Repository ===
>
> https://git-wip-us.apache.org/repos/asf/incubator-rya.git
>
> === Issue Tracking ===
>
> JIRA Rya
>
> == Initial Committers ==
>
>  * Roshan Punnoose, roshanp at gmail dot com
>  * David Rapp, dnrapp at ncsu dot edu
>  * Adina Crainiceanu, adinancr at gmail dot com
>  * Aaron Mihalik, aaron.mihalik at gmail dot com
>  * Puja Valiyil, pujav65 at gmail dot com
>  * Jennifer Brown, jennifer.brown at parsons dot com
>  * Steve Wagner, steve.r.wagner at gmail dot com
>
> == Affiliations ==
>
>  * Roshan Punnoose, Enlighten IT Consulting
>  * David Rapp, North Carolina State University
>  * Adina Crainiceanu, US Naval Academy
>  * Aaron Mihalik, Parsons
>  * Puja Valiyil, Parsons
>  * Jennifer Brown, Parsons
>  * Steve Wagner, Enlighten IT Consulting
>
> == Sponsors ==
>
> === Champion ===
>
> Adam Fuchs, ASF Member, afuchs at apache dot org
>
> === Nominated Mentors ===
>
> Josh Elser josh dot elser at gmail dot com
>
> We are seeking additional mentors
>
> === Sponsoring Entity ===
>
> Apache Incubator
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Rya Incubator Proposal

Adina Crainiceanu
In reply to this post by Edward J. Yoon-2
Edward,

Sure, we would be happy to have you as a mentor.

Adina

On Thu, Sep 3, 2015 at 6:02 PM, Edward J. Yoon <[hidden email]>
wrote:

> Hello, I would like to help this project as a mentor if you're OK. I
> researched this topic for a long time.
>
> On Fri, Sep 4, 2015 at 6:08 AM, Adina Crainiceanu <[hidden email]> wrote:
> > Josh,
> >
> > Thank you for volunteering to be out mentor.
> >
> >
> >> As specified in the proposal, I'm happy to volunteer as a mentor
> >> (currently a member, haven't officially requested to join IPMC).
> >>
> >> <snip>
> >>
> >>
> >>> === Inexperience with Open Source ===
> >>>
> >>> The initial committers have various degrees of experience with open
> source
> >>> projects - from very new to experienced. This project was open source
> >>> within government from the beginning. We do not expect to have
> >>> difficulties
> >>> in operating under Apache's development process.
> >>>
> >>
> >> This statement struck me as a little odd. While I understand that you
> >> tried to operate as an open source project, it seems impossible by
> >> definition to be open source. My biggest concern would just be that you
> are
> >> aware that you will have difficulties transitioning to a real open
> codebase
> >> and growing a community in the open.
> >
> >
> > We are aware that it will be different and more difficult functioning in
> a
> > real open source environment, but we are enthusiastic and committed to
> > making it work. With the help of our mentors, we hope to be successful.
> >
> >
> >>
> >> == Initial Source ==
> >>>
> >>> The code is currently available in a private Github repository.
> >>> https://github.com/LAS-NCSU/rya
> >>>
> >>
> >> What's the plan to make this a non-private repo?
> >>
> >
> > We are working on it right now. Initially we though that the code will be
> > made publicly available by bringing it to ASF, but we are working to make
> > the current repository public.
> >
> >
> > Thank you,
> > Adina
> >
> >
> > --
> > Dr. Adina Crainiceanu
> > http://www.usna.edu/Users/cs/adina/
>
>
>
> --
> Best Regards, Edward J. Yoon
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Dr. Adina Crainiceanu
http://www.usna.edu/Users/cs/adina/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Rya Incubator Proposal

Rob Vesse
In reply to this post by Adina Crainiceanu
Interesting proposal

As someone already involved with other open source RDF projects both
inside and outside Apache it would be nice to add to the Relationships
with Other Apache Products a bit about what (if anything) you expect the
relationship to other RDF related Apache projects (Jena, Clerezza,
Stanbol, Marmotta, Commons RDF (incubating)) to be?

Apache is about community over code and so never mandates any particular
technical choices (that is always up to the individual communities) but it
would be useful to understand if you see any overlap with existing
projects or any collaboration oppurtunities.  The latter are particular
interesting because one way you can help grow a new community is by
attracting interested users in pre-existing communities who want to work
on the specific problems you are aiming to tackle where their existing
options don't address the problems while your approach does.

It would also be nice to see some discussion in that section about things
like versioning of your major dependencies.  In particular you build on
Accumulo so do you require specific version(s) thereof (since they appear
to maintain 3 release lines currently) or simply require a version with a
specific subset of Accumulo functionality?  How (if at all) does this
translate into risks in terms of adoption, community traction etc e.g.
what happens if you rely on version X and the Accumulo community abandons
that in favour of version Y or if you rely on a specific experimental
feature that never makes it into Accumulo releases?

Also it would be nice if the external dependencies section properly linked
to relevant web pages as right now it has several dead links and in some
cases outdated naming.  For example by Open RDF I assume you mean OpenRDF
Sesame which now lives at rdf4j.org (though I'll admit to not
understanding what I'm supposed to call it anymore either!)

Similarly the documentation section mentions papers but doesn't provide
links, while both can be found online easily enough it would be nice to
add the links in

Regards,

Rob

On 03/09/2015 14:03, "Adina Crainiceanu" <[hidden email]> wrote:

>Hi,
>
>We would like to start a discussion on accepting Rya, a scalable RDF data
>management system built on top of Accumulo. into Apache Incubator.
>
>The proposal is available online at
>https://wiki.apache.org/incubator/RyaProposal and also at the end of this
>email.
>
>We are looking for additional mentors to help us with the project. Any
>advice and help will be appreciated.
>
>Thank you very much,
>Adina
>
>
>
>= Rya Proposal =
>
>== Abstract ==
>
>Rya (pronounced "ree-uh" /rēə/) is a cloud-based RDF triple store that
>supports SPARQL queries.
>
>== Proposal ==
>
>Rya is a scalable RDF data management system built on top of Accumulo. Rya
>uses novel storage methods, indexing schemes, and query processing
>techniques that scale to billions of triples across multiple nodes. Rya
>provides fast and easy access to the data through SPARQL, a conventional
>query mechanism for RDF data.
>
>== Background ==
>
>RDF is a World Wide Web Consortium (W3C) standard used in describing
>resources on the Web. The smallest data unit is a triple consisting of
>subject, predicate, and object. Using this framework, it is very easy to
>describe any resource, not just Web related. For example, if you want to
>say that Alice is a professor, you can represent this as an RDF triple
>like
>(Alice, rdf:type, Professor). In general, RDF is an open world framework
>that allows anyone to make any statement about any resource, which makes
>it
> a popular choice for expressing a large variety of data.
>
>RDF is used in conjunction with the Web Ontology Language (OWL). OWL is a
>framework for describing models or ontologies for RDF. It defines
>concepts,
>relationships, and/or structure of RDF documents. These models can be used
>to 'reason/infer' information about entities within a given domain. For
>example, you can express that a Professor is a sub class of Faculty,
>(Professor, rdfs:subClassOf, Faculty) and knowing that (Alice, rdf:type,
>Professor), it can be inferred that (Alice, rdf:type, Faculty).
>
>SPARQL is an RDF query language. Similar with SQL, SPARQL has SELECT and
>WHERE clauses; however, it is based on querying and retrieving RDF
>triples.
>
>Work on Rya, a large scale distributed system for  storing and querying
>RDF
>data, started in 2010.
>
>== Rationale ==
>
>With the increase in data size, there is a need for scalable systems for
>storing and retrieving RDF data in a cluster of nodes. We believe that Rya
>can fulfil that role. We expect that communities within government, health
>care, finance, and others who generate large amounts of RDF data will be
>most interested in this project.
>
>From its inception, the project operated with an Apache-style license, but
>it was open to mostly US government-related projects only. We believe that
>having the project and the development open for all will benefit both the
>project and the interested communities.
>
>== Current Status ==
>
>The project source code and documentation are currently hosted in a
>private
>repository on Github. New users are added to the repository upon request.
>
>=== Meritocracy ===
>
>Meritocracy is the model that we currently follow, and we want to build a
>larger and more diverse developer community by becoming an Apache project.
>
>=== Community ===
>
>Rya has being building a community of users and developers for the past 3
>years. There is currently an active workgroup with monthly meetings and
>the
>number of participants in the meeting is increasing.
>
>=== Core Developers ===
>
>The core developers are a diverse group of people who are either
>government
>employees or former / current government contractors from different
>companies.
>
>=== Alignment ===
>
>Rya is built on top of Accumulo, an Apache project.
>
>== Known Risks ==
>
>=== Orphaned Products ===
>
>There is a very small risk of becoming orphaned. The current contributors
>are strongly committed to the project, there is a large enough number of
>developers interested in contributing to the project, and we believe that
>the support for the project will continue to grow from the interested
>communities.
>
>=== Inexperience with Open Source ===
>
>The initial committers have various degrees of experience with open source
>projects - from very new to experienced. This project was open source
>within government from the beginning. We do not expect to have
>difficulties
>in operating under Apache's development process.
>
>=== Homogenous Developers ===
>
>The current list of developers form a heterogeneous group, with people for
>academia, government, and industry, collaborating from distributed
>geographic locations. We aim to expand the list of contributors with the
>help of the Apache incubation process.
>
>=== Reliance on Salaried Developers ===
>
>Many but not all of the developers working on the project are salaried
>employees, paid to work on this project. They will continue to contribute
>to the open source project. Some of the initial committers continued as
>volunteers even if no longer employed to work on this project and they
>plan
>to continue supporting the project.
>
>=== Relationships with Other Apache Products ===
>
>Rya uses Apache Accumulo, Hadoop, Zookeeper, Maven.
>
>=== Apache Brand ===
>
>Rya has generated interest in the government. It also generated interest
>within academia and industry. We believe that everyone could benefit from
>having Rya as an open source project. Due to its strong ties to Accumulo,
>an Apache project, and due to the values of the Apache Foundation, we
>believe that Apache incubator is the right place for Rya.
>
>== Documentation ==
>
>Two peer-reviewed publications [1,2] about Rya were published in 2012 and
>2015. More documentation is available in the code.
>
>[1] Roshan Punnoose, Adina Crainiceanu, David Rapp. Rya: A Scalable RDF
>Triple Store for the Clouds. Proceedings of the 1st International Workshop
>on Cloud Intelligence, Pages 4:1-4:8, August 2012
>
>[2] Roshan Punnoose, Adina Crainiceanu, David Rapp. SPARQL in the Clouds
>Using Rya. Information Systems, Volume 48, Pages 181-195, March 2015
>(Available online 23 July 2013)
>
>== Initial Source ==
>
>The code is currently available in a private Github repository.
>https://github.com/LAS-NCSU/rya
>
>== Source and Intellectual Property Submission Plan ==
>
>The source code has been released under the Apache License, Version 2.
>Software grant, and CCLAs have been submitted. ICLAs for initial
>committers
>have been submitted or are in progress.
>
>== External Dependencies ==
>
> * Open RDF (BSD license)
> * GeoMesa (Apache License, Version 2.0)
> * Accumulo (Apache License, Version 2.0)
> * Hadoop (Apache License, Version 2.0)
> * TinkerPop (Apache License, Version 2.0)
> * IndexingSail (Apache License, Version 2.0)
>
>== Cryptography ==
>
>The proposal does not involve any cryptographic code.
>
>== Required Resources ==
>
>=== Mailing lists ===
>
> * [hidden email]
> * [hidden email]
> * [hidden email]
>
>=== Git Repository ===
>
>https://git-wip-us.apache.org/repos/asf/incubator-rya.git
>
>=== Issue Tracking ===
>
>JIRA Rya
>
>== Initial Committers ==
>
> * Roshan Punnoose, roshanp at gmail dot com
> * David Rapp, dnrapp at ncsu dot edu
> * Adina Crainiceanu, adinancr at gmail dot com
> * Aaron Mihalik, aaron.mihalik at gmail dot com
> * Puja Valiyil, pujav65 at gmail dot com
> * Jennifer Brown, jennifer.brown at parsons dot com
> * Steve Wagner, steve.r.wagner at gmail dot com
>
>== Affiliations ==
>
> * Roshan Punnoose, Enlighten IT Consulting
> * David Rapp, North Carolina State University
> * Adina Crainiceanu, US Naval Academy
> * Aaron Mihalik, Parsons
> * Puja Valiyil, Parsons
> * Jennifer Brown, Parsons
> * Steve Wagner, Enlighten IT Consulting
>
>== Sponsors ==
>
>=== Champion ===
>
>Adam Fuchs, ASF Member, afuchs at apache dot org
>
>=== Nominated Mentors ===
>
>Josh Elser josh dot elser at gmail dot com
>
>We are seeking additional mentors
>
>=== Sponsoring Entity ===
>
>Apache Incubator





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Rya Incubator Proposal

Adina Crainiceanu
Rob,

Thank you very much for your comments.

>
> As someone already involved with other open source RDF projects both
> inside and outside Apache it would be nice to add to the Relationships
> with Other Apache Products a bit about what (if anything) you expect the
> relationship to other RDF related Apache projects (Jena, Clerezza,
> Stanbol, Marmotta, Commons RDF (incubating)) to be?
>
>
Jena API or Commons RDF API could become the RDF API used by Rya, but such
a decision was not made. Clerezza is database/triple store agnostic, and as
such could be complementary to Rya. Stanbol focuses on providing semantic
services, while Rya focuses on providing a distributed triple store
solution, with support for SPARQL and OWL reasoning. Marmotta provides an
implementation of a Linked Data Platform, and overlaps in some of the goals
and functionality with Rya (RDF triple store, SPARQL support among others).
There are many opportunities for collaboration with these projects and we
are looking forward to such a collaboration.

Apache is about community over code and so never mandates any particular
> technical choices (that is always up to the individual communities) but it
> would be useful to understand if you see any overlap with existing
> projects or any collaboration oppurtunities.  The latter are particular
> interesting because one way you can help grow a new community is by
> attracting interested users in pre-existing communities who want to work
> on the specific problems you are aiming to tackle where their existing
> options don't address the problems while your approach does.
>
> There are indeed many opportunities for collaboration with the other
projects.


> It would also be nice to see some discussion in that section about things
> like versioning of your major dependencies.  In particular you build on
> Accumulo so do you require specific version(s) thereof (since they appear
> to maintain 3 release lines currently) or simply require a version with a
> specific subset of Accumulo functionality?  How (if at all) does this
> translate into risks in terms of adoption, community traction etc e.g.
> what happens if you rely on version X and the Accumulo community abandons
> that in favour of version Y or if you rely on a specific experimental
> feature that never makes it into Accumulo releases?
>
>
Rya is built on top of Accumulo, and uses features standard in all current
versions of Accumulo. We are not relying on any experimental feature. As
the Rya community evolves, we expect Rya to change to take advantage of
new/improved features in Accumulo or the other dependencies, if those lead
to an improvement in Rya.



> Also it would be nice if the external dependencies section properly linked
> to relevant web pages as right now it has several dead links and in some
> cases outdated naming.  For example by Open RDF I assume you mean OpenRDF
> Sesame which now lives at rdf4j.org (though I'll admit to not
> understanding what I'm supposed to call it anymore either!)
>
> We fixed the links now


> Similarly the documentation section mentions papers but doesn't provide
> links, while both can be found online easily enough it would be nice to
> add the links in
>
> We added the links.


Thank you very much,
Adina






>
> >We would like to start a discussion on accepting Rya, a scalable RDF data
> >management system built on top of Accumulo. into Apache Incubator.
> >
> >The proposal is available online at
> >https://wiki.apache.org/incubator/RyaProposal and also at the end of this
> >email.
> >
> >We are looking for additional mentors to help us with the project. Any
> >advice and help will be appreciated.
> >
> >Thank you very much,
> >Adina
> >
> >
> >
> >= Rya Proposal =
> >
> >== Abstract ==
> >
> >Rya (pronounced "ree-uh" /rēə/) is a cloud-based RDF triple store that
> >supports SPARQL queries.
> >
> >== Proposal ==
> >
> >Rya is a scalable RDF data management system built on top of Accumulo. Rya
> >uses novel storage methods, indexing schemes, and query processing
> >techniques that scale to billions of triples across multiple nodes. Rya
> >provides fast and easy access to the data through SPARQL, a conventional
> >query mechanism for RDF data.
> >
> >== Background ==
> >
> >RDF is a World Wide Web Consortium (W3C) standard used in describing
> >resources on the Web. The smallest data unit is a triple consisting of
> >subject, predicate, and object. Using this framework, it is very easy to
> >describe any resource, not just Web related. For example, if you want to
> >say that Alice is a professor, you can represent this as an RDF triple
> >like
> >(Alice, rdf:type, Professor). In general, RDF is an open world framework
> >that allows anyone to make any statement about any resource, which makes
> >it
> > a popular choice for expressing a large variety of data.
> >
> >RDF is used in conjunction with the Web Ontology Language (OWL). OWL is a
> >framework for describing models or ontologies for RDF. It defines
> >concepts,
> >relationships, and/or structure of RDF documents. These models can be used
> >to 'reason/infer' information about entities within a given domain. For
> >example, you can express that a Professor is a sub class of Faculty,
> >(Professor, rdfs:subClassOf, Faculty) and knowing that (Alice, rdf:type,
> >Professor), it can be inferred that (Alice, rdf:type, Faculty).
> >
> >SPARQL is an RDF query language. Similar with SQL, SPARQL has SELECT and
> >WHERE clauses; however, it is based on querying and retrieving RDF
> >triples.
> >
> >Work on Rya, a large scale distributed system for  storing and querying
> >RDF
> >data, started in 2010.
> >
> >== Rationale ==
> >
> >With the increase in data size, there is a need for scalable systems for
> >storing and retrieving RDF data in a cluster of nodes. We believe that Rya
> >can fulfil that role. We expect that communities within government, health
> >care, finance, and others who generate large amounts of RDF data will be
> >most interested in this project.
> >
> >From its inception, the project operated with an Apache-style license, but
> >it was open to mostly US government-related projects only. We believe that
> >having the project and the development open for all will benefit both the
> >project and the interested communities.
> >
> >== Current Status ==
> >
> >The project source code and documentation are currently hosted in a
> >private
> >repository on Github. New users are added to the repository upon request.
> >
> >=== Meritocracy ===
> >
> >Meritocracy is the model that we currently follow, and we want to build a
> >larger and more diverse developer community by becoming an Apache project.
> >
> >=== Community ===
> >
> >Rya has being building a community of users and developers for the past 3
> >years. There is currently an active workgroup with monthly meetings and
> >the
> >number of participants in the meeting is increasing.
> >
> >=== Core Developers ===
> >
> >The core developers are a diverse group of people who are either
> >government
> >employees or former / current government contractors from different
> >companies.
> >
> >=== Alignment ===
> >
> >Rya is built on top of Accumulo, an Apache project.
> >
> >== Known Risks ==
> >
> >=== Orphaned Products ===
> >
> >There is a very small risk of becoming orphaned. The current contributors
> >are strongly committed to the project, there is a large enough number of
> >developers interested in contributing to the project, and we believe that
> >the support for the project will continue to grow from the interested
> >communities.
> >
> >=== Inexperience with Open Source ===
> >
> >The initial committers have various degrees of experience with open source
> >projects - from very new to experienced. This project was open source
> >within government from the beginning. We do not expect to have
> >difficulties
> >in operating under Apache's development process.
> >
> >=== Homogenous Developers ===
> >
> >The current list of developers form a heterogeneous group, with people for
> >academia, government, and industry, collaborating from distributed
> >geographic locations. We aim to expand the list of contributors with the
> >help of the Apache incubation process.
> >
> >=== Reliance on Salaried Developers ===
> >
> >Many but not all of the developers working on the project are salaried
> >employees, paid to work on this project. They will continue to contribute
> >to the open source project. Some of the initial committers continued as
> >volunteers even if no longer employed to work on this project and they
> >plan
> >to continue supporting the project.
> >
> >=== Relationships with Other Apache Products ===
> >
> >Rya uses Apache Accumulo, Hadoop, Zookeeper, Maven.
> >
> >=== Apache Brand ===
> >
> >Rya has generated interest in the government. It also generated interest
> >within academia and industry. We believe that everyone could benefit from
> >having Rya as an open source project. Due to its strong ties to Accumulo,
> >an Apache project, and due to the values of the Apache Foundation, we
> >believe that Apache incubator is the right place for Rya.
> >
> >== Documentation ==
> >
> >Two peer-reviewed publications [1,2] about Rya were published in 2012 and
> >2015. More documentation is available in the code.
> >
> >[1] Roshan Punnoose, Adina Crainiceanu, David Rapp. Rya: A Scalable RDF
> >Triple Store for the Clouds. Proceedings of the 1st International Workshop
> >on Cloud Intelligence, Pages 4:1-4:8, August 2012
> >
> >[2] Roshan Punnoose, Adina Crainiceanu, David Rapp. SPARQL in the Clouds
> >Using Rya. Information Systems, Volume 48, Pages 181-195, March 2015
> >(Available online 23 July 2013)
> >
> >== Initial Source ==
> >
> >The code is currently available in a private Github repository.
> >https://github.com/LAS-NCSU/rya
> >
> >== Source and Intellectual Property Submission Plan ==
> >
> >The source code has been released under the Apache License, Version 2.
> >Software grant, and CCLAs have been submitted. ICLAs for initial
> >committers
> >have been submitted or are in progress.
> >
> >== External Dependencies ==
> >
> > * Open RDF (BSD license)
> > * GeoMesa (Apache License, Version 2.0)
> > * Accumulo (Apache License, Version 2.0)
> > * Hadoop (Apache License, Version 2.0)
> > * TinkerPop (Apache License, Version 2.0)
> > * IndexingSail (Apache License, Version 2.0)
> >
> >== Cryptography ==
> >
> >The proposal does not involve any cryptographic code.
> >
> >== Required Resources ==
> >
> >=== Mailing lists ===
> >
> > * [hidden email]
> > * [hidden email]
> > * [hidden email]
> >
> >=== Git Repository ===
> >
> >https://git-wip-us.apache.org/repos/asf/incubator-rya.git
> >
> >=== Issue Tracking ===
> >
> >JIRA Rya
> >
> >== Initial Committers ==
> >
> > * Roshan Punnoose, roshanp at gmail dot com
> > * David Rapp, dnrapp at ncsu dot edu
> > * Adina Crainiceanu, adinancr at gmail dot com
> > * Aaron Mihalik, aaron.mihalik at gmail dot com
> > * Puja Valiyil, pujav65 at gmail dot com
> > * Jennifer Brown, jennifer.brown at parsons dot com
> > * Steve Wagner, steve.r.wagner at gmail dot com
> >
> >== Affiliations ==
> >
> > * Roshan Punnoose, Enlighten IT Consulting
> > * David Rapp, North Carolina State University
> > * Adina Crainiceanu, US Naval Academy
> > * Aaron Mihalik, Parsons
> > * Puja Valiyil, Parsons
> > * Jennifer Brown, Parsons
> > * Steve Wagner, Enlighten IT Consulting
> >
> >== Sponsors ==
> >
> >=== Champion ===
> >
> >Adam Fuchs, ASF Member, afuchs at apache dot org
> >
> >=== Nominated Mentors ===
> >
> >Josh Elser josh dot elser at gmail dot com
> >
> >We are seeking additional mentors
> >
> >=== Sponsoring Entity ===
> >
> >Apache Incubator
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Dr. Adina Crainiceanu
http://www.usna.edu/Users/cs/adina/
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Rya Incubator Proposal

Adam Fuchs-2
Folks,

Thanks for the great conversation around bringing Rya into the incubator.
If there are no other questions, and if people are happy with the answers
so far, I will announce the vote later today.

Cheers,
Adam


On Tue, Sep 8, 2015 at 9:06 PM, Adina Crainiceanu <[hidden email]> wrote:

> Rob,
>
> Thank you very much for your comments.
>
> >
> > As someone already involved with other open source RDF projects both
> > inside and outside Apache it would be nice to add to the Relationships
> > with Other Apache Products a bit about what (if anything) you expect the
> > relationship to other RDF related Apache projects (Jena, Clerezza,
> > Stanbol, Marmotta, Commons RDF (incubating)) to be?
> >
> >
> Jena API or Commons RDF API could become the RDF API used by Rya, but such
> a decision was not made. Clerezza is database/triple store agnostic, and as
> such could be complementary to Rya. Stanbol focuses on providing semantic
> services, while Rya focuses on providing a distributed triple store
> solution, with support for SPARQL and OWL reasoning. Marmotta provides an
> implementation of a Linked Data Platform, and overlaps in some of the goals
> and functionality with Rya (RDF triple store, SPARQL support among others).
> There are many opportunities for collaboration with these projects and we
> are looking forward to such a collaboration.
>
> Apache is about community over code and so never mandates any particular
> > technical choices (that is always up to the individual communities) but
> it
> > would be useful to understand if you see any overlap with existing
> > projects or any collaboration oppurtunities.  The latter are particular
> > interesting because one way you can help grow a new community is by
> > attracting interested users in pre-existing communities who want to work
> > on the specific problems you are aiming to tackle where their existing
> > options don't address the problems while your approach does.
> >
> > There are indeed many opportunities for collaboration with the other
> projects.
>
>
> > It would also be nice to see some discussion in that section about things
> > like versioning of your major dependencies.  In particular you build on
> > Accumulo so do you require specific version(s) thereof (since they appear
> > to maintain 3 release lines currently) or simply require a version with a
> > specific subset of Accumulo functionality?  How (if at all) does this
> > translate into risks in terms of adoption, community traction etc e.g.
> > what happens if you rely on version X and the Accumulo community abandons
> > that in favour of version Y or if you rely on a specific experimental
> > feature that never makes it into Accumulo releases?
> >
> >
> Rya is built on top of Accumulo, and uses features standard in all current
> versions of Accumulo. We are not relying on any experimental feature. As
> the Rya community evolves, we expect Rya to change to take advantage of
> new/improved features in Accumulo or the other dependencies, if those lead
> to an improvement in Rya.
>
>
>
> > Also it would be nice if the external dependencies section properly
> linked
> > to relevant web pages as right now it has several dead links and in some
> > cases outdated naming.  For example by Open RDF I assume you mean OpenRDF
> > Sesame which now lives at rdf4j.org (though I'll admit to not
> > understanding what I'm supposed to call it anymore either!)
> >
> > We fixed the links now
>
>
> > Similarly the documentation section mentions papers but doesn't provide
> > links, while both can be found online easily enough it would be nice to
> > add the links in
> >
> > We added the links.
>
>
> Thank you very much,
> Adina
>
>
>
>
>
>
> >
> > >We would like to start a discussion on accepting Rya, a scalable RDF
> data
> > >management system built on top of Accumulo. into Apache Incubator.
> > >
> > >The proposal is available online at
> > >https://wiki.apache.org/incubator/RyaProposal and also at the end of
> this
> > >email.
> > >
> > >We are looking for additional mentors to help us with the project. Any
> > >advice and help will be appreciated.
> > >
> > >Thank you very much,
> > >Adina
> > >
> > >
> > >
> > >= Rya Proposal =
> > >
> > >== Abstract ==
> > >
> > >Rya (pronounced "ree-uh" /rēə/) is a cloud-based RDF triple store that
> > >supports SPARQL queries.
> > >
> > >== Proposal ==
> > >
> > >Rya is a scalable RDF data management system built on top of Accumulo.
> Rya
> > >uses novel storage methods, indexing schemes, and query processing
> > >techniques that scale to billions of triples across multiple nodes. Rya
> > >provides fast and easy access to the data through SPARQL, a conventional
> > >query mechanism for RDF data.
> > >
> > >== Background ==
> > >
> > >RDF is a World Wide Web Consortium (W3C) standard used in describing
> > >resources on the Web. The smallest data unit is a triple consisting of
> > >subject, predicate, and object. Using this framework, it is very easy to
> > >describe any resource, not just Web related. For example, if you want to
> > >say that Alice is a professor, you can represent this as an RDF triple
> > >like
> > >(Alice, rdf:type, Professor). In general, RDF is an open world framework
> > >that allows anyone to make any statement about any resource, which makes
> > >it
> > > a popular choice for expressing a large variety of data.
> > >
> > >RDF is used in conjunction with the Web Ontology Language (OWL). OWL is
> a
> > >framework for describing models or ontologies for RDF. It defines
> > >concepts,
> > >relationships, and/or structure of RDF documents. These models can be
> used
> > >to 'reason/infer' information about entities within a given domain. For
> > >example, you can express that a Professor is a sub class of Faculty,
> > >(Professor, rdfs:subClassOf, Faculty) and knowing that (Alice, rdf:type,
> > >Professor), it can be inferred that (Alice, rdf:type, Faculty).
> > >
> > >SPARQL is an RDF query language. Similar with SQL, SPARQL has SELECT and
> > >WHERE clauses; however, it is based on querying and retrieving RDF
> > >triples.
> > >
> > >Work on Rya, a large scale distributed system for  storing and querying
> > >RDF
> > >data, started in 2010.
> > >
> > >== Rationale ==
> > >
> > >With the increase in data size, there is a need for scalable systems for
> > >storing and retrieving RDF data in a cluster of nodes. We believe that
> Rya
> > >can fulfil that role. We expect that communities within government,
> health
> > >care, finance, and others who generate large amounts of RDF data will be
> > >most interested in this project.
> > >
> > >From its inception, the project operated with an Apache-style license,
> but
> > >it was open to mostly US government-related projects only. We believe
> that
> > >having the project and the development open for all will benefit both
> the
> > >project and the interested communities.
> > >
> > >== Current Status ==
> > >
> > >The project source code and documentation are currently hosted in a
> > >private
> > >repository on Github. New users are added to the repository upon
> request.
> > >
> > >=== Meritocracy ===
> > >
> > >Meritocracy is the model that we currently follow, and we want to build
> a
> > >larger and more diverse developer community by becoming an Apache
> project.
> > >
> > >=== Community ===
> > >
> > >Rya has being building a community of users and developers for the past
> 3
> > >years. There is currently an active workgroup with monthly meetings and
> > >the
> > >number of participants in the meeting is increasing.
> > >
> > >=== Core Developers ===
> > >
> > >The core developers are a diverse group of people who are either
> > >government
> > >employees or former / current government contractors from different
> > >companies.
> > >
> > >=== Alignment ===
> > >
> > >Rya is built on top of Accumulo, an Apache project.
> > >
> > >== Known Risks ==
> > >
> > >=== Orphaned Products ===
> > >
> > >There is a very small risk of becoming orphaned. The current
> contributors
> > >are strongly committed to the project, there is a large enough number of
> > >developers interested in contributing to the project, and we believe
> that
> > >the support for the project will continue to grow from the interested
> > >communities.
> > >
> > >=== Inexperience with Open Source ===
> > >
> > >The initial committers have various degrees of experience with open
> source
> > >projects - from very new to experienced. This project was open source
> > >within government from the beginning. We do not expect to have
> > >difficulties
> > >in operating under Apache's development process.
> > >
> > >=== Homogenous Developers ===
> > >
> > >The current list of developers form a heterogeneous group, with people
> for
> > >academia, government, and industry, collaborating from distributed
> > >geographic locations. We aim to expand the list of contributors with the
> > >help of the Apache incubation process.
> > >
> > >=== Reliance on Salaried Developers ===
> > >
> > >Many but not all of the developers working on the project are salaried
> > >employees, paid to work on this project. They will continue to
> contribute
> > >to the open source project. Some of the initial committers continued as
> > >volunteers even if no longer employed to work on this project and they
> > >plan
> > >to continue supporting the project.
> > >
> > >=== Relationships with Other Apache Products ===
> > >
> > >Rya uses Apache Accumulo, Hadoop, Zookeeper, Maven.
> > >
> > >=== Apache Brand ===
> > >
> > >Rya has generated interest in the government. It also generated interest
> > >within academia and industry. We believe that everyone could benefit
> from
> > >having Rya as an open source project. Due to its strong ties to
> Accumulo,
> > >an Apache project, and due to the values of the Apache Foundation, we
> > >believe that Apache incubator is the right place for Rya.
> > >
> > >== Documentation ==
> > >
> > >Two peer-reviewed publications [1,2] about Rya were published in 2012
> and
> > >2015. More documentation is available in the code.
> > >
> > >[1] Roshan Punnoose, Adina Crainiceanu, David Rapp. Rya: A Scalable RDF
> > >Triple Store for the Clouds. Proceedings of the 1st International
> Workshop
> > >on Cloud Intelligence, Pages 4:1-4:8, August 2012
> > >
> > >[2] Roshan Punnoose, Adina Crainiceanu, David Rapp. SPARQL in the Clouds
> > >Using Rya. Information Systems, Volume 48, Pages 181-195, March 2015
> > >(Available online 23 July 2013)
> > >
> > >== Initial Source ==
> > >
> > >The code is currently available in a private Github repository.
> > >https://github.com/LAS-NCSU/rya
> > >
> > >== Source and Intellectual Property Submission Plan ==
> > >
> > >The source code has been released under the Apache License, Version 2.
> > >Software grant, and CCLAs have been submitted. ICLAs for initial
> > >committers
> > >have been submitted or are in progress.
> > >
> > >== External Dependencies ==
> > >
> > > * Open RDF (BSD license)
> > > * GeoMesa (Apache License, Version 2.0)
> > > * Accumulo (Apache License, Version 2.0)
> > > * Hadoop (Apache License, Version 2.0)
> > > * TinkerPop (Apache License, Version 2.0)
> > > * IndexingSail (Apache License, Version 2.0)
> > >
> > >== Cryptography ==
> > >
> > >The proposal does not involve any cryptographic code.
> > >
> > >== Required Resources ==
> > >
> > >=== Mailing lists ===
> > >
> > > * [hidden email]
> > > * [hidden email]
> > > * [hidden email]
> > >
> > >=== Git Repository ===
> > >
> > >https://git-wip-us.apache.org/repos/asf/incubator-rya.git
> > >
> > >=== Issue Tracking ===
> > >
> > >JIRA Rya
> > >
> > >== Initial Committers ==
> > >
> > > * Roshan Punnoose, roshanp at gmail dot com
> > > * David Rapp, dnrapp at ncsu dot edu
> > > * Adina Crainiceanu, adinancr at gmail dot com
> > > * Aaron Mihalik, aaron.mihalik at gmail dot com
> > > * Puja Valiyil, pujav65 at gmail dot com
> > > * Jennifer Brown, jennifer.brown at parsons dot com
> > > * Steve Wagner, steve.r.wagner at gmail dot com
> > >
> > >== Affiliations ==
> > >
> > > * Roshan Punnoose, Enlighten IT Consulting
> > > * David Rapp, North Carolina State University
> > > * Adina Crainiceanu, US Naval Academy
> > > * Aaron Mihalik, Parsons
> > > * Puja Valiyil, Parsons
> > > * Jennifer Brown, Parsons
> > > * Steve Wagner, Enlighten IT Consulting
> > >
> > >== Sponsors ==
> > >
> > >=== Champion ===
> > >
> > >Adam Fuchs, ASF Member, afuchs at apache dot org
> > >
> > >=== Nominated Mentors ===
> > >
> > >Josh Elser josh dot elser at gmail dot com
> > >
> > >We are seeking additional mentors
> > >
> > >=== Sponsoring Entity ===
> > >
> > >Apache Incubator
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
>
> --
> Dr. Adina Crainiceanu
> http://www.usna.edu/Users/cs/adina/
>