Hello World / CRUNCH Framework

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Hello World / CRUNCH Framework

Julian Feinauer
Hi all,

I just joined the incubator ML and wanted to present myself and possibly also start a discussion about a software project we developed in the past.
But first things first. My name is Julian Feinauer and I come from Germany where I run two “start-up” companies where we work a lot on the “industrial IoT” topics, data science and processing of “larger amounts of data”. We love open source and so we love the ASF. Most notably, I closely follow the Apache Calcite project and hopefully find some time soon to contribute a bit more than in the last monts. Futhermore, I am engaged in the (incubating) PLC4X project as (P)PMC and in the  (incubating) Edgent project where I try to “revive” the community as new (P)PMC together with Christopher Dutz.

Now to the real topic. Over the last 3 years I started to develop a “Framework/Library” (currently a set of jars) to facilitate processing of timeseries data. The focus is mostly on processing of data from test stands, e.g., automotive tests, driving profiles and so on. Furthermore, in the recent year we added a lot of functionality for processing of “industrial data”. This means that we want to make it easy to analyze things like “how long did the machine spend in this state”, “when are the following set of bits set” or “nofity when the following conditions is true for the first time”.
It is a bit technical and I don’t want to go too deep into it, but generally speaking we try to introduce the “right” semantics to answer the typical questions when analyzing machine or test data. This project is called “CRUNCH” and we are in the process of making it open source (will be moved to a public github repo in this year) under the Apache 2.0 License.

As there can be seen a close relationship to other (incubating or TLP) projects we are thinking about if this project could fit into the incubator. Some examples for Apache projects that we see as “related” are Apache Flink (which we can use as the Streaming Engine to process the stream), (incubating) Edgent which we also can support as Streaming Engine and where we try to find a suitable project goal and community currently as some of the (P)PMC members retired or went inactive. Finally, CRUNCH has a very natural fit with PLC4X because it can directly process the data gathered form PLCs (and in fact we are already using it in some of our projects that way). I had several discussions with some of the (P)PMCs of PLC4X, namely Sebastian Rühl and Christpher Dutz wo encouraged me to introduce the project to the incubator because they also see some potential for the project to enrich the OSS ecosystem with regards to edge / stream processing of (I)IoT data.

So please feel free to ask questions or discuss your view on this topic as I would like to find out if this project could fit in the Apache Ecosystem and the Incubator or not.

Thank you already!
Julian
Reply | Threaded
Open this post in threaded view
|

Re: Hello World / CRUNCH Framework

Christofer Dutz
Hi Julian,

For me it always felt like crunch can't directly be compared to the other "streaming engines" as I see it as a bundle of a streaming engine and a higher level framework for doing typical industry operations on top of that.

I think the higher level library should actually be able to run on top of any of the other streaming engines we have. Does such a split make sense, or did I get something wrong? Perhaps it would make sense to evaluate Edgents stream processing and eventually merge in improvements. I don't see a need multiple edge stream frameworks especially if we have to revive the existing one.

I think an engine for higher level functions on top of a streaming engine of choice would be a great addition, because adding such logic to only one of the existing seems to be a waste.

Chris

Outlook for Android<https://aka.ms/ghei36> herunterladen

________________________________
From: Julian Feinauer <[hidden email]>
Sent: Friday, December 14, 2018 11:11:40 AM
To: [hidden email]
Subject: Hello World / CRUNCH Framework

Hi all,

I just joined the incubator ML and wanted to present myself and possibly also start a discussion about a software project we developed in the past.
But first things first. My name is Julian Feinauer and I come from Germany where I run two “start-up” companies where we work a lot on the “industrial IoT” topics, data science and processing of “larger amounts of data”. We love open source and so we love the ASF. Most notably, I closely follow the Apache Calcite project and hopefully find some time soon to contribute a bit more than in the last monts. Futhermore, I am engaged in the (incubating) PLC4X project as (P)PMC and in the  (incubating) Edgent project where I try to “revive” the community as new (P)PMC together with Christopher Dutz.

Now to the real topic. Over the last 3 years I started to develop a “Framework/Library” (currently a set of jars) to facilitate processing of timeseries data. The focus is mostly on processing of data from test stands, e.g., automotive tests, driving profiles and so on. Furthermore, in the recent year we added a lot of functionality for processing of “industrial data”. This means that we want to make it easy to analyze things like “how long did the machine spend in this state”, “when are the following set of bits set” or “nofity when the following conditions is true for the first time”.
It is a bit technical and I don’t want to go too deep into it, but generally speaking we try to introduce the “right” semantics to answer the typical questions when analyzing machine or test data. This project is called “CRUNCH” and we are in the process of making it open source (will be moved to a public github repo in this year) under the Apache 2.0 License.

As there can be seen a close relationship to other (incubating or TLP) projects we are thinking about if this project could fit into the incubator. Some examples for Apache projects that we see as “related” are Apache Flink (which we can use as the Streaming Engine to process the stream), (incubating) Edgent which we also can support as Streaming Engine and where we try to find a suitable project goal and community currently as some of the (P)PMC members retired or went inactive. Finally, CRUNCH has a very natural fit with PLC4X because it can directly process the data gathered form PLCs (and in fact we are already using it in some of our projects that way). I had several discussions with some of the (P)PMCs of PLC4X, namely Sebastian Rühl and Christpher Dutz wo encouraged me to introduce the project to the incubator because they also see some potential for the project to enrich the OSS ecosystem with regards to edge / stream processing of (I)IoT data.

So please feel free to ask questions or discuss your view on this topic as I would like to find out if this project could fit in the Apache Ecosystem and the Incubator or not.

Thank you already!
Julian
Reply | Threaded
Open this post in threaded view
|

Re: Hello World / CRUNCH Framework

Julian Feinauer
Hi Chris,

yes, you got it right.
We do not care about "how does this message get from this processing node to that".
We "transpile" the higher level input into a DAG which can then run on basically every streaming engine (I agree, we do NOT need yet another one), in that sense it is a bit like Apache Beam.
Thus, I do not see it as a contender to Edgent but more as a complementary, because edgents focus is more the engine and Cloud Communication and CRUNCHs focus is more of "what exactly does the pipeline do".

Julian

Am 14.12.18, 11:54 schrieb "Christofer Dutz" <[hidden email]>:

    Hi Julian,
   
    For me it always felt like crunch can't directly be compared to the other "streaming engines" as I see it as a bundle of a streaming engine and a higher level framework for doing typical industry operations on top of that.
   
    I think the higher level library should actually be able to run on top of any of the other streaming engines we have. Does such a split make sense, or did I get something wrong? Perhaps it would make sense to evaluate Edgents stream processing and eventually merge in improvements. I don't see a need multiple edge stream frameworks especially if we have to revive the existing one.
   
    I think an engine for higher level functions on top of a streaming engine of choice would be a great addition, because adding such logic to only one of the existing seems to be a waste.
   
    Chris
   
    Outlook for Android<https://aka.ms/ghei36> herunterladen
   
    ________________________________
    From: Julian Feinauer <[hidden email]>
    Sent: Friday, December 14, 2018 11:11:40 AM
    To: [hidden email]
    Subject: Hello World / CRUNCH Framework
   
    Hi all,
   
    I just joined the incubator ML and wanted to present myself and possibly also start a discussion about a software project we developed in the past.
    But first things first. My name is Julian Feinauer and I come from Germany where I run two “start-up” companies where we work a lot on the “industrial IoT” topics, data science and processing of “larger amounts of data”. We love open source and so we love the ASF. Most notably, I closely follow the Apache Calcite project and hopefully find some time soon to contribute a bit more than in the last monts. Futhermore, I am engaged in the (incubating) PLC4X project as (P)PMC and in the  (incubating) Edgent project where I try to “revive” the community as new (P)PMC together with Christopher Dutz.
   
    Now to the real topic. Over the last 3 years I started to develop a “Framework/Library” (currently a set of jars) to facilitate processing of timeseries data. The focus is mostly on processing of data from test stands, e.g., automotive tests, driving profiles and so on. Furthermore, in the recent year we added a lot of functionality for processing of “industrial data”. This means that we want to make it easy to analyze things like “how long did the machine spend in this state”, “when are the following set of bits set” or “nofity when the following conditions is true for the first time”.
    It is a bit technical and I don’t want to go too deep into it, but generally speaking we try to introduce the “right” semantics to answer the typical questions when analyzing machine or test data. This project is called “CRUNCH” and we are in the process of making it open source (will be moved to a public github repo in this year) under the Apache 2.0 License.
   
    As there can be seen a close relationship to other (incubating or TLP) projects we are thinking about if this project could fit into the incubator. Some examples for Apache projects that we see as “related” are Apache Flink (which we can use as the Streaming Engine to process the stream), (incubating) Edgent which we also can support as Streaming Engine and where we try to find a suitable project goal and community currently as some of the (P)PMC members retired or went inactive. Finally, CRUNCH has a very natural fit with PLC4X because it can directly process the data gathered form PLCs (and in fact we are already using it in some of our projects that way). I had several discussions with some of the (P)PMCs of PLC4X, namely Sebastian Rühl and Christpher Dutz wo encouraged me to introduce the project to the incubator because they also see some potential for the project to enrich the OSS ecosystem with regards to edge / stream processing of (I)IoT data.
   
    So please feel free to ask questions or discuss your view on this topic as I would like to find out if this project could fit in the Apache Ecosystem and the Incubator or not.
   
    Thank you already!
    Julian
   


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Hello World / CRUNCH Framework

Brian Devins
Hi Julian,

This seems like a cool project. I'd like to point out one thing with your
name, there is already an Apache Crunch project: https://crunch.apache.org

On a couple collaborative notes, as a (incubating) Zipkin developer I could
see some use for your project in our community. We've had trouble building
reusable analytics components so if your framework would help facilitate
that then we might be able to get community buy-in for adopting.

My other collaboration point is it would be cool if it could be used with
the newly incubating IoTDB once that has all of its source available.

- Brian

On Fri, Dec 14, 2018 at 5:59 AM Julian Feinauer <
[hidden email]> wrote:

> Hi Chris,
>
> yes, you got it right.
> We do not care about "how does this message get from this processing node
> to that".
> We "transpile" the higher level input into a DAG which can then run on
> basically every streaming engine (I agree, we do NOT need yet another one),
> in that sense it is a bit like Apache Beam.
> Thus, I do not see it as a contender to Edgent but more as a
> complementary, because edgents focus is more the engine and Cloud
> Communication and CRUNCHs focus is more of "what exactly does the pipeline
> do".
>
> Julian
>
> Am 14.12.18, 11:54 schrieb "Christofer Dutz" <[hidden email]>:
>
>     Hi Julian,
>
>     For me it always felt like crunch can't directly be compared to the
> other "streaming engines" as I see it as a bundle of a streaming engine and
> a higher level framework for doing typical industry operations on top of
> that.
>
>     I think the higher level library should actually be able to run on top
> of any of the other streaming engines we have. Does such a split make
> sense, or did I get something wrong? Perhaps it would make sense to
> evaluate Edgents stream processing and eventually merge in improvements. I
> don't see a need multiple edge stream frameworks especially if we have to
> revive the existing one.
>
>     I think an engine for higher level functions on top of a streaming
> engine of choice would be a great addition, because adding such logic to
> only one of the existing seems to be a waste.
>
>     Chris
>
>     Outlook for Android<https://aka.ms/ghei36> herunterladen
>
>     ________________________________
>     From: Julian Feinauer <[hidden email]>
>     Sent: Friday, December 14, 2018 11:11:40 AM
>     To: [hidden email]
>     Subject: Hello World / CRUNCH Framework
>
>     Hi all,
>
>     I just joined the incubator ML and wanted to present myself and
> possibly also start a discussion about a software project we developed in
> the past.
>     But first things first. My name is Julian Feinauer and I come from
> Germany where I run two “start-up” companies where we work a lot on the
> “industrial IoT” topics, data science and processing of “larger amounts of
> data”. We love open source and so we love the ASF. Most notably, I closely
> follow the Apache Calcite project and hopefully find some time soon to
> contribute a bit more than in the last monts. Futhermore, I am engaged in
> the (incubating) PLC4X project as (P)PMC and in the  (incubating) Edgent
> project where I try to “revive” the community as new (P)PMC together with
> Christopher Dutz.
>
>     Now to the real topic. Over the last 3 years I started to develop a
> “Framework/Library” (currently a set of jars) to facilitate processing of
> timeseries data. The focus is mostly on processing of data from test
> stands, e.g., automotive tests, driving profiles and so on. Furthermore, in
> the recent year we added a lot of functionality for processing of
> “industrial data”. This means that we want to make it easy to analyze
> things like “how long did the machine spend in this state”, “when are the
> following set of bits set” or “nofity when the following conditions is true
> for the first time”.
>     It is a bit technical and I don’t want to go too deep into it, but
> generally speaking we try to introduce the “right” semantics to answer the
> typical questions when analyzing machine or test data. This project is
> called “CRUNCH” and we are in the process of making it open source (will be
> moved to a public github repo in this year) under the Apache 2.0 License.
>
>     As there can be seen a close relationship to other (incubating or TLP)
> projects we are thinking about if this project could fit into the
> incubator. Some examples for Apache projects that we see as “related” are
> Apache Flink (which we can use as the Streaming Engine to process the
> stream), (incubating) Edgent which we also can support as Streaming Engine
> and where we try to find a suitable project goal and community currently as
> some of the (P)PMC members retired or went inactive. Finally, CRUNCH has a
> very natural fit with PLC4X because it can directly process the data
> gathered form PLCs (and in fact we are already using it in some of our
> projects that way). I had several discussions with some of the (P)PMCs of
> PLC4X, namely Sebastian Rühl and Christpher Dutz wo encouraged me to
> introduce the project to the incubator because they also see some potential
> for the project to enrich the OSS ecosystem with regards to edge / stream
> processing of (I)IoT data.
>
>     So please feel free to ask questions or discuss your view on this
> topic as I would like to find out if this project could fit in the Apache
> Ecosystem and the Incubator or not.
>
>     Thank you already!
>     Julian
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Hello World / CRUNCH Framework

Julian Feinauer
Hi Brian,

thanks for your email.
Regarding your hint, we know about the "real" CRUNCH project and are willing to change the projects name if we would enter the incubator (sadly we are massively lacking creativity...).
 
Regarding Zipkin, I could imagine that some of the things we are doing fits well with use cases but I have to admit that I never went too deep into Zipkin because it looks to crazy what you are doing there.
And in fact, this is what we are mostly looking for,

Regarding IoTDB... I was very excited when I heard of the project (and also looked through the Codebase in the "old" github repo) and I really like it. We use parquet in some situations but IotDB is definetly better suited for the specific workloads we have in mind. Thus, I am really looking forward to the project really starting, from a PLC4X, Edgent and CRUNCH dev perspective.

Julian

PS.: Do you have some sample questions or analytics that you have in mind for Zipkin, to get a feeling?

Am 14.12.18, 16:30 schrieb "Brian Devins-Suresh" <[hidden email]>:

    Hi Julian,
   
    This seems like a cool project. I'd like to point out one thing with your
    name, there is already an Apache Crunch project: https://crunch.apache.org
   
    On a couple collaborative notes, as a (incubating) Zipkin developer I could
    see some use for your project in our community. We've had trouble building
    reusable analytics components so if your framework would help facilitate
    that then we might be able to get community buy-in for adopting.
   
    My other collaboration point is it would be cool if it could be used with
    the newly incubating IoTDB once that has all of its source available.
   
    - Brian
   
    On Fri, Dec 14, 2018 at 5:59 AM Julian Feinauer <
    [hidden email]> wrote:
   
    > Hi Chris,
    >
    > yes, you got it right.
    > We do not care about "how does this message get from this processing node
    > to that".
    > We "transpile" the higher level input into a DAG which can then run on
    > basically every streaming engine (I agree, we do NOT need yet another one),
    > in that sense it is a bit like Apache Beam.
    > Thus, I do not see it as a contender to Edgent but more as a
    > complementary, because edgents focus is more the engine and Cloud
    > Communication and CRUNCHs focus is more of "what exactly does the pipeline
    > do".
    >
    > Julian
    >
    > Am 14.12.18, 11:54 schrieb "Christofer Dutz" <[hidden email]>:
    >
    >     Hi Julian,
    >
    >     For me it always felt like crunch can't directly be compared to the
    > other "streaming engines" as I see it as a bundle of a streaming engine and
    > a higher level framework for doing typical industry operations on top of
    > that.
    >
    >     I think the higher level library should actually be able to run on top
    > of any of the other streaming engines we have. Does such a split make
    > sense, or did I get something wrong? Perhaps it would make sense to
    > evaluate Edgents stream processing and eventually merge in improvements. I
    > don't see a need multiple edge stream frameworks especially if we have to
    > revive the existing one.
    >
    >     I think an engine for higher level functions on top of a streaming
    > engine of choice would be a great addition, because adding such logic to
    > only one of the existing seems to be a waste.
    >
    >     Chris
    >
    >     Outlook for Android<https://aka.ms/ghei36> herunterladen
    >
    >     ________________________________
    >     From: Julian Feinauer <[hidden email]>
    >     Sent: Friday, December 14, 2018 11:11:40 AM
    >     To: [hidden email]
    >     Subject: Hello World / CRUNCH Framework
    >
    >     Hi all,
    >
    >     I just joined the incubator ML and wanted to present myself and
    > possibly also start a discussion about a software project we developed in
    > the past.
    >     But first things first. My name is Julian Feinauer and I come from
    > Germany where I run two “start-up” companies where we work a lot on the
    > “industrial IoT” topics, data science and processing of “larger amounts of
    > data”. We love open source and so we love the ASF. Most notably, I closely
    > follow the Apache Calcite project and hopefully find some time soon to
    > contribute a bit more than in the last monts. Futhermore, I am engaged in
    > the (incubating) PLC4X project as (P)PMC and in the  (incubating) Edgent
    > project where I try to “revive” the community as new (P)PMC together with
    > Christopher Dutz.
    >
    >     Now to the real topic. Over the last 3 years I started to develop a
    > “Framework/Library” (currently a set of jars) to facilitate processing of
    > timeseries data. The focus is mostly on processing of data from test
    > stands, e.g., automotive tests, driving profiles and so on. Furthermore, in
    > the recent year we added a lot of functionality for processing of
    > “industrial data”. This means that we want to make it easy to analyze
    > things like “how long did the machine spend in this state”, “when are the
    > following set of bits set” or “nofity when the following conditions is true
    > for the first time”.
    >     It is a bit technical and I don’t want to go too deep into it, but
    > generally speaking we try to introduce the “right” semantics to answer the
    > typical questions when analyzing machine or test data. This project is
    > called “CRUNCH” and we are in the process of making it open source (will be
    > moved to a public github repo in this year) under the Apache 2.0 License.
    >
    >     As there can be seen a close relationship to other (incubating or TLP)
    > projects we are thinking about if this project could fit into the
    > incubator. Some examples for Apache projects that we see as “related” are
    > Apache Flink (which we can use as the Streaming Engine to process the
    > stream), (incubating) Edgent which we also can support as Streaming Engine
    > and where we try to find a suitable project goal and community currently as
    > some of the (P)PMC members retired or went inactive. Finally, CRUNCH has a
    > very natural fit with PLC4X because it can directly process the data
    > gathered form PLCs (and in fact we are already using it in some of our
    > projects that way). I had several discussions with some of the (P)PMCs of
    > PLC4X, namely Sebastian Rühl and Christpher Dutz wo encouraged me to
    > introduce the project to the incubator because they also see some potential
    > for the project to enrich the OSS ecosystem with regards to edge / stream
    > processing of (I)IoT data.
    >
    >     So please feel free to ask questions or discuss your view on this
    > topic as I would like to find out if this project could fit in the Apache
    > Ecosystem and the Incubator or not.
    >
    >     Thank you already!
    >     Julian
    >
    >
    >
   


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Hello World / CRUNCH Framework

Julian Feinauer
In reply to this post by Brian Devins
Hi Brian,

thanks for your email!
Regarding your hint, we know about the "real" CRUNCH project and are willing to change the projects name if we would enter the incubator (sadly we are massively lacking creativity...).

Regarding Zipkin, I could imagine that some of the things we are doing fits well with use cases but I have to admit that I never went too deep into Zipkin because it looks to crazy what you are doing there.
And in fact, this is what we are mostly looking for, combatants and people that bring in different perspectives. We already learned so much while discussing it with some PLC4X folks.

Regarding IoTDB... I was very excited when I heard of the project (and also looked through the Codebase in the "old" github repo) and I really like it. We use parquet in some situations but IotDB is definetly better suited for the specific workloads we have in mind. Thus, I am really looking forward to the project really starting, from a PLC4X, Edgent and CRUNCH dev perspective.

Julian

Am 14.12.18, 16:30 schrieb "Brian Devins-Suresh" <[hidden email]>:

    Hi Julian,
   
    This seems like a cool project. I'd like to point out one thing with your
    name, there is already an Apache Crunch project: https://crunch.apache.org
   
    On a couple collaborative notes, as a (incubating) Zipkin developer I could
    see some use for your project in our community. We've had trouble building
    reusable analytics components so if your framework would help facilitate
    that then we might be able to get community buy-in for adopting.
   
    My other collaboration point is it would be cool if it could be used with
    the newly incubating IoTDB once that has all of its source available.
   
    - Brian
   
    On Fri, Dec 14, 2018 at 5:59 AM Julian Feinauer <
    [hidden email]> wrote:
   
    > Hi Chris,
    >
    > yes, you got it right.
    > We do not care about "how does this message get from this processing node
    > to that".
    > We "transpile" the higher level input into a DAG which can then run on
    > basically every streaming engine (I agree, we do NOT need yet another one),
    > in that sense it is a bit like Apache Beam.
    > Thus, I do not see it as a contender to Edgent but more as a
    > complementary, because edgents focus is more the engine and Cloud
    > Communication and CRUNCHs focus is more of "what exactly does the pipeline
    > do".
    >
    > Julian
    >
    > Am 14.12.18, 11:54 schrieb "Christofer Dutz" <[hidden email]>:
    >
    >     Hi Julian,
    >
    >     For me it always felt like crunch can't directly be compared to the
    > other "streaming engines" as I see it as a bundle of a streaming engine and
    > a higher level framework for doing typical industry operations on top of
    > that.
    >
    >     I think the higher level library should actually be able to run on top
    > of any of the other streaming engines we have. Does such a split make
    > sense, or did I get something wrong? Perhaps it would make sense to
    > evaluate Edgents stream processing and eventually merge in improvements. I
    > don't see a need multiple edge stream frameworks especially if we have to
    > revive the existing one.
    >
    >     I think an engine for higher level functions on top of a streaming
    > engine of choice would be a great addition, because adding such logic to
    > only one of the existing seems to be a waste.
    >
    >     Chris
    >
    >     Outlook for Android<https://aka.ms/ghei36> herunterladen
    >
    >     ________________________________
    >     From: Julian Feinauer <[hidden email]>
    >     Sent: Friday, December 14, 2018 11:11:40 AM
    >     To: [hidden email]
    >     Subject: Hello World / CRUNCH Framework
    >
    >     Hi all,
    >
    >     I just joined the incubator ML and wanted to present myself and
    > possibly also start a discussion about a software project we developed in
    > the past.
    >     But first things first. My name is Julian Feinauer and I come from
    > Germany where I run two “start-up” companies where we work a lot on the
    > “industrial IoT” topics, data science and processing of “larger amounts of
    > data”. We love open source and so we love the ASF. Most notably, I closely
    > follow the Apache Calcite project and hopefully find some time soon to
    > contribute a bit more than in the last monts. Futhermore, I am engaged in
    > the (incubating) PLC4X project as (P)PMC and in the  (incubating) Edgent
    > project where I try to “revive” the community as new (P)PMC together with
    > Christopher Dutz.
    >
    >     Now to the real topic. Over the last 3 years I started to develop a
    > “Framework/Library” (currently a set of jars) to facilitate processing of
    > timeseries data. The focus is mostly on processing of data from test
    > stands, e.g., automotive tests, driving profiles and so on. Furthermore, in
    > the recent year we added a lot of functionality for processing of
    > “industrial data”. This means that we want to make it easy to analyze
    > things like “how long did the machine spend in this state”, “when are the
    > following set of bits set” or “nofity when the following conditions is true
    > for the first time”.
    >     It is a bit technical and I don’t want to go too deep into it, but
    > generally speaking we try to introduce the “right” semantics to answer the
    > typical questions when analyzing machine or test data. This project is
    > called “CRUNCH” and we are in the process of making it open source (will be
    > moved to a public github repo in this year) under the Apache 2.0 License.
    >
    >     As there can be seen a close relationship to other (incubating or TLP)
    > projects we are thinking about if this project could fit into the
    > incubator. Some examples for Apache projects that we see as “related” are
    > Apache Flink (which we can use as the Streaming Engine to process the
    > stream), (incubating) Edgent which we also can support as Streaming Engine
    > and where we try to find a suitable project goal and community currently as
    > some of the (P)PMC members retired or went inactive. Finally, CRUNCH has a
    > very natural fit with PLC4X because it can directly process the data
    > gathered form PLCs (and in fact we are already using it in some of our
    > projects that way). I had several discussions with some of the (P)PMCs of
    > PLC4X, namely Sebastian Rühl and Christpher Dutz wo encouraged me to
    > introduce the project to the incubator because they also see some potential
    > for the project to enrich the OSS ecosystem with regards to edge / stream
    > processing of (I)IoT data.
    >
    >     So please feel free to ask questions or discuss your view on this
    > topic as I would like to find out if this project could fit in the Apache
    > Ecosystem and the Incubator or not.
    >
    >     Thank you already!
    >     Julian
    >
    >
    >
   


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Hello World / CRUNCH Framework

Julian Hyde-3
In reply to this post by Julian Feinauer
Hi Julian,

Regarding whether to do this as a streaming engine (with its own query language) or as a framework above a streaming engine, I’d say that’s a false choice. If there is relational algebra inside your system, you can provide a high-level query language that can be translated to a lower-level query language in a streaming engine.

This approach of “layered” databases has worked well for me for several projects, and is ever more applicable these days as data is becoming federated.

You and I have discussed SQL’s MATCH_RECOGNIZE clause as a way to build complex time-based logic. You have probably noticed that is now in Flink, I am working on it in Calcite, and Beam will probably get it at some point. Even if MATCH_RECOGNIZE doesn’t solve your problem, let’s follow the same approach - convert your problem to a DSL that maps to or extends relational algebra, and then figure out how to translate that to SQL in an underlying engine. Calcite is a very good platform for building new “data languages”, so let’s carry on talking.

Julian


> On Dec 14, 2018, at 2:11 AM, Julian Feinauer <[hidden email]> wrote:
>
> Hi all,
>
> I just joined the incubator ML and wanted to present myself and possibly also start a discussion about a software project we developed in the past.
> But first things first. My name is Julian Feinauer and I come from Germany where I run two “start-up” companies where we work a lot on the “industrial IoT” topics, data science and processing of “larger amounts of data”. We love open source and so we love the ASF. Most notably, I closely follow the Apache Calcite project and hopefully find some time soon to contribute a bit more than in the last monts. Futhermore, I am engaged in the (incubating) PLC4X project as (P)PMC and in the  (incubating) Edgent project where I try to “revive” the community as new (P)PMC together with Christopher Dutz.
>
> Now to the real topic. Over the last 3 years I started to develop a “Framework/Library” (currently a set of jars) to facilitate processing of timeseries data. The focus is mostly on processing of data from test stands, e.g., automotive tests, driving profiles and so on. Furthermore, in the recent year we added a lot of functionality for processing of “industrial data”. This means that we want to make it easy to analyze things like “how long did the machine spend in this state”, “when are the following set of bits set” or “nofity when the following conditions is true for the first time”.
> It is a bit technical and I don’t want to go too deep into it, but generally speaking we try to introduce the “right” semantics to answer the typical questions when analyzing machine or test data. This project is called “CRUNCH” and we are in the process of making it open source (will be moved to a public github repo in this year) under the Apache 2.0 License.
>
> As there can be seen a close relationship to other (incubating or TLP) projects we are thinking about if this project could fit into the incubator. Some examples for Apache projects that we see as “related” are Apache Flink (which we can use as the Streaming Engine to process the stream), (incubating) Edgent which we also can support as Streaming Engine and where we try to find a suitable project goal and community currently as some of the (P)PMC members retired or went inactive. Finally, CRUNCH has a very natural fit with PLC4X because it can directly process the data gathered form PLCs (and in fact we are already using it in some of our projects that way). I had several discussions with some of the (P)PMCs of PLC4X, namely Sebastian Rühl and Christpher Dutz wo encouraged me to introduce the project to the incubator because they also see some potential for the project to enrich the OSS ecosystem with regards to edge / stream processing of (I)IoT data.
>
> So please feel free to ask questions or discuss your view on this topic as I would like to find out if this project could fit in the Apache Ecosystem and the Incubator or not.
>
> Thank you already!
> Julian


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hello World / CRUNCH Framework

Julian Feinauer
Hi Julian,

thanks for your answer and your insights.
I agree with you on many points (especially our last discussion on the Calcite ML made me think a lot).
So I agree with your "layered" approach, and in fact this is what we currently do (without stating it explicit enough, I think).

Basically, we do two thinks, I guess.. first, we provide a (Java-)DSL to make it easy to write specific operations (and do some very limited optimization, not at all comparable to what Calcite does).
Second, we also provide some functions which are useful or necessary for signal processing (smoothing, filtering, ...) and we plan to extend them soon with things like short or long term predictions, anomaly detection, ... .
By providing suitable wrappers for all that stuff we are able to translate this to "real" streaming engines (currently Flink and Akka Streams) and run it there.

And indeed MATCH_RECOGNIZE could be a good implementation for many situations (definitely not all) and I hope that I can contribute soon to your recent work (I will continue the discussion on the Calcite list). But overall I'm really unsure if our problem can be seen as a problem of relational algebra. I know and like the overall framework very much (it's one of the most elegant applications of math I've seen so far I would even say). But it feels like it doesn’t fit that well. As soon as you have a problem where relations are related, even for simple things like LAG or LEAD as window functions it gets pretty complicated and unnatural with regards to the definition of the algebra. But, as I'm lacking a lot of expertise there I would love to discuss the matter further with you (but again, I think we should do it on the calcite list).

The following small ASCII Image depicts my thinking of these "layers", and from our perspective MATCH_RECOGNIZE is one way to solve the problem and we can also provide "native" blocks to run directly on a streaming engine and there are surely pros and cons for both sides:

                O CRUNCH Evaluation
                |
        ----------------------
        | |
    STREAM Rel. Expression with MATCH_RECOGNIZE
        | |
   Streaming Engines |
                        |
                SQL based Engines

So, I'm not exactly sure what approach you would prefer from your mail, but my suggestion for the next steps with CRUNCH would be to enrich the DSL, add more domain specific functions, find more use-cases and get more users on-board. So to say, work on the semantics side of things. But in parallel we should follow a path to get a better separation of "business logic" and execution with support for multiple frameworks and especially the relational algebra side. Perhaps, we can conclude at one point that we can cover everything by Calcite (I'm skeptical right now) but I think whats needed for this discussion is a valid basis to also show you calcite devs what exactly we are doing in-depth.

Julian


Am 16.12.18, 08:20 schrieb "Julian Hyde" <[hidden email]>:

    Hi Julian,
   
    Regarding whether to do this as a streaming engine (with its own query language) or as a framework above a streaming engine, I’d say that’s a false choice. If there is relational algebra inside your system, you can provide a high-level query language that can be translated to a lower-level query language in a streaming engine.
   
    This approach of “layered” databases has worked well for me for several projects, and is ever more applicable these days as data is becoming federated.
   
    You and I have discussed SQL’s MATCH_RECOGNIZE clause as a way to build complex time-based logic. You have probably noticed that is now in Flink, I am working on it in Calcite, and Beam will probably get it at some point. Even if MATCH_RECOGNIZE doesn’t solve your problem, let’s follow the same approach - convert your problem to a DSL that maps to or extends relational algebra, and then figure out how to translate that to SQL in an underlying engine. Calcite is a very good platform for building new “data languages”, so let’s carry on talking.
   
    Julian
   
   
    > On Dec 14, 2018, at 2:11 AM, Julian Feinauer <[hidden email]> wrote:
    >
    > Hi all,
    >
    > I just joined the incubator ML and wanted to present myself and possibly also start a discussion about a software project we developed in the past.
    > But first things first. My name is Julian Feinauer and I come from Germany where I run two “start-up” companies where we work a lot on the “industrial IoT” topics, data science and processing of “larger amounts of data”. We love open source and so we love the ASF. Most notably, I closely follow the Apache Calcite project and hopefully find some time soon to contribute a bit more than in the last monts. Futhermore, I am engaged in the (incubating) PLC4X project as (P)PMC and in the  (incubating) Edgent project where I try to “revive” the community as new (P)PMC together with Christopher Dutz.
    >
    > Now to the real topic. Over the last 3 years I started to develop a “Framework/Library” (currently a set of jars) to facilitate processing of timeseries data. The focus is mostly on processing of data from test stands, e.g., automotive tests, driving profiles and so on. Furthermore, in the recent year we added a lot of functionality for processing of “industrial data”. This means that we want to make it easy to analyze things like “how long did the machine spend in this state”, “when are the following set of bits set” or “nofity when the following conditions is true for the first time”.
    > It is a bit technical and I don’t want to go too deep into it, but generally speaking we try to introduce the “right” semantics to answer the typical questions when analyzing machine or test data. This project is called “CRUNCH” and we are in the process of making it open source (will be moved to a public github repo in this year) under the Apache 2.0 License.
    >
    > As there can be seen a close relationship to other (incubating or TLP) projects we are thinking about if this project could fit into the incubator. Some examples for Apache projects that we see as “related” are Apache Flink (which we can use as the Streaming Engine to process the stream), (incubating) Edgent which we also can support as Streaming Engine and where we try to find a suitable project goal and community currently as some of the (P)PMC members retired or went inactive. Finally, CRUNCH has a very natural fit with PLC4X because it can directly process the data gathered form PLCs (and in fact we are already using it in some of our projects that way). I had several discussions with some of the (P)PMCs of PLC4X, namely Sebastian Rühl and Christpher Dutz wo encouraged me to introduce the project to the incubator because they also see some potential for the project to enrich the OSS ecosystem with regards to edge / stream processing of (I)IoT data.
    >
    > So please feel free to ask questions or discuss your view on this topic as I would like to find out if this project could fit in the Apache Ecosystem and the Incubator or not.
    >
    > Thank you already!
    > Julian
   
   
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [hidden email]
    For additional commands, e-mail: [hidden email]
   
   


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]