![]() Mahout – a machine learning library with tools for clustering, classification, and several types of recommenders, including tools to calculate most-similar items or build item recommendations for users.Don’t use it for batch processing or multi-user reporting with many concurrent requests. Four modules: MLlib, SparkSQL, Spark Streaming, and GraphX. Spark – a powerful open-source unified analytics engine with micro-batching but can guarantee only-once-delivery if configured.Livy – enables interaction over a REST interface with an EMR cluster running Spark.Ganglia – monitor clusters and grids while minimizing the impact on their performance.Two options to work with Jupyter notebooks: EMR Notebook, JupyterHub Jupyter Notebook -create and share documents that contain live code, equations, visualizations, and narrative text.Zookeeper – a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.HCatalog has a REST interface and command line client that allows you to create tables or do other operations HCatalog – access Hive metastore tables within Pig, Spark SQL, and/or custom MapReduce applications.HBase – run on top of HDFS to provide non-relational database capabilities.Hive – use an SQL-like language called Hive QL (query language) that abstracts programming models and supports typical data warehouse interactions.Tez – create a complex directed acyclic graph (DAG) of tasks for processing data.Pig – use SQL-like (Pig Latin) commands that runs on top of Hadoop to transform large data sets without having to write complex code and converts those commands into Tez jobs based on directed acyclic graphs (DAGs) or MapReduce programs.Presto – a fast SQL query engine designed for interactive analytic queries over large datasets from multiple sources.Phoenix – use standard SQL queries and JDBC APIs to work with an Apache HBase backing store for OLTP and operational analytics.Flink – a streaming dataflow engine that you can use to run real-time stream processing on high-throughput data sources. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |