Lemonade (Live Exploration and Mining Of a Non-trivial Amount of Data from Everywhere) is an analytics platform that supports intuitive definition of tasks for knowledge discovery, mining, and learning from large amounts of data that come from a wide spectrum of scenarios. The platform interface is a web application in which users may define analytics workflows visually by dragging and dropping operations and data sources, and connecting them. Lemonade is being developed by UFMG as part of the EUBra-BIGSEA project and targets users who do not want to learn a programming language, but need to develop analytics workflows.
Lemonade provides a rich web interface, which is both accessible to learners and powerful to experts. Lemonade scope plan comprises more than 30 different operations of data mining, machine learning and extraction, transformation and loading of data. The platform is also capable of processing massive amounts of data (“Big Data”), since it is being built on top of three scalable processing and storage technologies: Apache Spark, CMCC Ophidia and BSC COMPSs, being the last two technologies developed by partners of EUBra-BIGSEA project.
Users will be able to upload data sets using a service provided by Lemonade. Data are kept in a redundant file system, aimed to provide high-availability and high throughput. Data storage requirements will depend on use cases and installation. Users may process terabytes of data and their volume will directly impact the storage and processing costs.
Lemonade can be scaled to support hundreds of users by increasing cluster capacity. A large number of users can be supported in a modest cluster of commodity computers and a volume of data often found in most of organizations.
Essential information for potential users
Currently, Lemonade is being developed and a first prototype version was demonstrated to EUBra-BIGSEA partners by UFMG in Sep, 2016. An alpha version is planned to January, 2017.
Lemonade is an open-source solution. All dependencies (operating system, processing frameworks, infrastructure technologies) are also open source, so there are no licensing costs. The license scheme is under discussion and it will be finalised for the first release.
To be kept up and running, Lemonade requires a cluster of processing computers and data storages. The size and capacity of the cluster depends on the number of users, data volume and complexity of workflow/tasks.
Lemonade depends on Apache Mesos (standalone mode) or a distributed processing technology (Apache Spark, BSC
COMPSs or CMCC Ophidia), Oracle MySQL database server and a Linux operating system distribution.
Lemonade requires a reliable infrastructure to run that may be provided by platform-as-a-service (PaaS) companies, such as Google, Amazon or Microsoft or by the organization using Lemonade.
Three different user roles are supported in Lemonade: a system administrator, a data scientist and a data explorer. System administrator will be responsible for keeping Lemonade running, adding new users, setting permissions and security, and managing data sets. Data scientists must know about Lemonade operations in order to create processing workflows and data being processed, their characteristics and how his/her results can be applied in a real scenario. Data explorers are the users of existing models.
Lemonade targets those users from areas such as Mathematics, Statistics, Business Administration, as well as Data Science practitioners from any knowledge area.