-
BigBox Logo
PRODUCTSSOLUTIONSRESOURCESCOMPANY
BigBox Logo

Platform Features

Data management and analytics functionsProject & componentsBIGBOX Data Platform

Distributed batch processing of large data sets

Apache Hadoop

Database for unstructured & structured data storage of large tables

Apache Hbase +conn, +indx

Reliably store very large files across machines in a large cluster.

HDFS

Data warehouse summarization & ad hoc querying

Apache Hive

Metadata store for Hive tables

Hive Metastore (HMS)

Workflow scheduler to manage Hadoop jobs

Apache Oozie

Columnar storage format for Hadoop ecosystem

Apache Parquet

Fast compute engine for ETL, ML, DL, stream processing

Apache Spark

Integration capabilities with various types of DBMS including MySQL, Microsoft SQL Server, Microsoft Oracle, PostgreSQL, MongoDB, and DWH Microsoft APS (Analytics Platform System)

BigAction, Apache Nifi

Bulk data between Hadoop and structured datastores (Ex : MySQL, Microsoft SQL Server, Oracle, PostgreSQL, MongoDB and others)

Apache Sqoop

Job scheduling and cluster resource management

YARN

Coordination service for distributed applications

Apache Zookeeper

Store and manage large data sets across a cluster

Apache Accumulo

Metadata management, data lineage, governance & data catalog

Apache Atlas

Migrate and replicate data to other ecosystem

Apache Ranger

Smallest, fastest columnar storage for Hadoop

Apache ORC

Data-flow framework for batch, interactive use-cases

Apache Tez

Fast analytical queries on event-driven data

Apache Druid

Perimeter security governing access to Hadoop

Apache Knox

Cryptographic key

Ranger KMS

Notebook for interactive analytics

Apache Zeppelin

Distributed processing for stateful computations

Apache Flink

Data serialization system

Apache Avro

Provisioning, managing, and monitoring Apache Hadoop clusters

Ambari

SQL workbench for data warehouses

Hue

Distributed MPP SQL query engine for Hadoop

Impala

Column-oriented data store for fast data analytics

Apache Kudu

Enterprise search & index platform

Apache Solr

Real-time streaming data pipelines, analytics, and apps

Apache Kafka, Spark Streaming, Storm, Flink

Distributed object store for Hadoop

Apache Ozone

Scalable directed graphs of data routing, transformation, and system mediation logic

BigAction, Apache Nifi

Interactive collaborating and sharing analytics tools

Jupyter

Manage resource and data security across the Hadoop ecosystem

Apache Ranger

Integrate with various format, data source, protocol and other enviroment

BigAction, Apache Nifi

Strong authentication for client/server application

Kerberos

Keeping track of the running applications

Ambari

Distributed realtime computation system

Apache Storm

Data scraping and crawling enginee

BigSpider

Faceted search enginee

BigSearch

The Enterprise Business Intelligence for All Your Needs

BigQuery

Powerful Business Dashboard Software for Everyone

BigBuilder

API Management platform

BigEnvelope

Build Evaluation Model, Ex : Confusion Matrix, Cross Validation, AUC

Apache Spark, Jupyter, Zeppelin

Run CDC (Change Data Capture) Method

BigAction, Apache Nifi

Data migration and replication between Data Centers

Apache Hadoop

Interoperable with other big data ecosystems

BigAction, Apache Nifi

Capability to scaling out / patches with auto-configuration features without causing service downtime that not impact on the ecosystem

HDFS

Capability to develop modules or jobs and job resumes when a failure occurs

YARN

Provides a data security module including authentication, access authorization, audit trail, data masking, encryption and others

Ranger, Knox, Kerberos

Data security management not only on data at rest, but also on data in motion

Ranger, Knox, Kerberos

Storage object feature capabilities by supporting the S3 protocol

Apache Ozone

Build a comprehensive data lineage from the start of data collection to the aggregation stage

Apache Atlas

Integrated capabilities with LDAP or Active Directory authentication services

Kerberos

Manage the allocation of computing resources based on user or group (Ex : CPU, Storage, Memory Requirement, etc)

YARN

Easy to use ETL tools with GUI designer workflow

BigAction, Apache Nifi

Ability to keep running when one of the nodes is not functioning (Fault Tolerance)

HDFS

Scaling out capabilities to serve increasing data processing needs

BigAction, Apache Nifi

ETL supports parallel processing of big data frameworks including MapReduce, Spark, Storm, Flink, Tez and others

BigAction, Apache Nifi

Supports Object Reusability in ETL development

BigAction, Apache Nifi

Unlimited ETL's user

BigAction, Apache Nifi

Supporting data analytics, collaborative development & modelling with Machine Learning and Deep Learning with Python, R programming and others

Spark, Jupyter, Zeppelin

Analytics platform that can be integrated with big data clusters in the context of authentication, authorization, and resource management

Spark, Ranger

sharing, publishing, and collaborating on data analytics projects

Spark, Jupyter, Zeppelin

Supports parallel processing of big data frameworks including MapReduce, Spark, Jupyter, Zeppelin

Mapreduce, Jupyter, Zeppelin, Spark

Data analysis capabilities, training models, deployment models in the form of APIs, and collaboration facilities

Spark, Jupyter, Zeppelin

Capability to monitor the machine learning models that have been deployed

Spark, Jupyter, Zeppelin

Unlimited Analytics Platform's users

Spark, Zeppelin, Jupyter

The analytics platform can interact from the Hive, HDFS and Solr data sources

Hive, HDFS, Solr, Spark, Jupyter, Zeppelin

The analytics platform can integrate with Oozie, YARN and HDFS Browser

Oozie, YARN, HDFS Browser, Spark, Jupyter, Zeppelin

Wizard features to configure landing zone, integration zone, analytics zone and others

BigLake

Export data capability from hive/impala/kudu to target database

BigAction

Process data both batch and realtime/streaming (publish-subscribe messaging)

BigAction, Kafka

Multidimensional data storage capability based on OLAP

Apache Druid

Ability to process graph data structure and run graph analytics

BigLake, Giraph

facilitates full-text search capability with many option database

Apache SOLR, Elastic Search

ACID compliant (insert-update-select process against one table simultaneously)

BigLake

Distributed storage for real time analytics

BigLake

Retrieval and utilization of SQL and NoSQL based data

BigLake

Data lineage tracking, both within the datalake and from the data source

BigLake

AI/Machine Learning is easy to upgrade or integrate with other libraries/tools

BigLake

Interactions with multiple data sources including Spark and Impala

BigBuilder

Capability to build analytical data models (Ex : descriptive analytics, predictive analytics, prescriptive analytics, path analytics, text analytics and others)

Spark, Jupyter, Zeppelin

Perform MLOps including development flow, operations, monitoring of deployed models on machine learning models that have been deployed

Spark, Jupyter, Zeppelin

Ability to process graph data structure and run graph analytics

BigLake, Giraph

Graph data processing and analysis can use the installed platform with separate size and virtual nodes without reducing the minimum usable capacity

BigLake, Giraph

Accessibility (Read and Write) by ETL tools such as InnoQuartz, and others or data integration solutions such as Talend, and others

BigLake, HDFS

Easy self service analytics tools to upload, explore data, analyze data, and create visualizations with chart or dashboard

BigQuery, BigBuilder

Able to extract and transform data using different methods in an integrated platform, without writing a line of code/script

BigAction

Supports correlation of data from different data sources to generate new data sets, without ever writing a line of code/script

BigAction, BigBuilder

Support data aggregation schemes (calculation, summation, average, search for maximum and minimum values ​​and others) without writing a line of code/scripting

BigAction, BigQuery, BigBuilder

Support execution of one query set in one operation to generate several new data sets, without writing a line of code/script (Single Query Multiple Result)

BigAction, BigBuilder, BigEnvelope

Artificial Intelligence (AI) text processing using Indonesian-based Natural Language Processing

BigLake, BigSearch

Internet crawler and media analytics support in one system, can be used to perform sentiment analysis and can determine keywords from the information to be retrieved without any restrictions on the number of keywords.

BigSpider

Supports export data to other systems using API. The resulting Chart and Dashboard can be accessed from a web browser using the generated url.

BigEnvelope, BigQuery, BigBuilder

Can process data aggregation of 1MB/sec/cpu thread, and can optimize query with CPU usage can reach more than 90%

BigLake

Support data processing & storage components with “Shard and Replica” and high availability

BigLake, HDFS

Distributed Parallel Processing, and supports the implementation of multi-node processing that can be configured with a Masterless Cluster configuration

BigLake

Can process data in gzip format (compressed data) without the ever writing a line of code/scripting

BigLake

Support data virtualization schema (queried directly at the origin)

BigBuilder