Piflow是一个基于分布式计算框架Spark开发的大数据流水线系统。该系统将数据的采集、清洗、计算、存储等各个环节封装成组件,以所见即所得方式进行流水线配置。简单易用,功能强大。它具有如下特性:
试用地址: http://piflow.ml/piflow-web/login ,用户名/密码:admin/admin
安装使用说明详见: https://github.com/cas-bigdatalab/piflow
支持的数据处理组件如下:
| 组名 |
组件名 |
| Hive: |
cn.piflow.bundle.hive.SelectHiveQL |
| Hive: |
cn.piflow.bundle.hive.PutHiveStreaming |
| Hive: |
cn.piflow.bundle.hive.PutHiveQL |
| Hdfs: |
cn.piflow.bundle.hdfs.PutHdfs |
| Hdfs: |
cn.piflow.bundle.hdfs.DeleteHdfs |
| Hdfs: |
cn.piflow.bundle.hdfs.UnzipFilesOnHDFS |
| Hdfs: |
cn.piflow.bundle.hdfs.GetHdfs |
| Hdfs: |
cn.piflow.bundle.hdfs.ListHdfs |
| Http: |
cn.piflow.bundle.http.InvokeUrl |
| Http: |
cn.piflow.bundle.http.GetUrl |
| Http: |
cn.piflow.bundle.http.UnGZip |
| Http: |
cn.piflow.bundle.http.PostUrl |
| Http: |
cn.piflow.bundle.http.LoadZipFromUrl |
| Http: |
cn.piflow.bundle.http.FileDownHDFS |
| RDF: |
cn.piflow.bundle.rdf.CsvToNeo4J |
| RDF: |
cn.piflow.bundle.rdf.RdfToDF |
| Spider: |
cn.piflow.bundle.internetWorm.spider |
| Jdbc: |
cn.piflow.bundle.jdbc.JdbcRead |
| Jdbc: |
cn.piflow.bundle.jdbc.JdbcReadFromOracle |
| Jdbc: |
cn.piflow.bundle.jdbc.JdbcWrite |
| Jdbc: |
cn.piflow.bundle.jdbc.JdbcWriteToOracle |
| Streaming: |
cn.piflow.bundle.streaming.FlumeStream |
| Streaming: |
cn.piflow.bundle.streaming.KafkaStream |
| Streaming: |
cn.piflow.bundle.streaming.SocketTextStreamByWindow |
| Streaming: |
cn.piflow.bundle.streaming.SocketTextStream |
| Streaming: |
cn.piflow.bundle.streaming.TextFileStream |
| MongoDB: |
cn.piflow.bundle.impala.SelectImpala |
| MongoDB: |
cn.piflow.bundle.mongodb.GetMongo |
| MongoDB: |
cn.piflow.bundle.mongodb.PutMongo |
| CSV: |
cn.piflow.bundle.csv.FolderCsvParser |
| CSV: |
cn.piflow.bundle.csv.CsvSave |
| CSV: |
cn.piflow.bundle.csv.CsvParser |
| CSV: |
cn.piflow.bundle.csv.CsvStringParser |
| File: |
cn.piflow.bundle.file.PutFile |
| File: |
cn.piflow.bundle.file.FetchFile |
| File: |
cn.piflow.bundle.file.RegexTextProcess |
| Script: |
cn.piflow.bundle.script.ShellExecutor |
| Script: |
cn.piflow.bundle.script.DataFrameRowParser |
| Common: |
cn.piflow.bundle.common.Distinct |
| Common: |
cn.piflow.bundle.common.ConvertSchema |
| Common: |
cn.piflow.bundle.common.Fork |
| Common: |
cn.piflow.bundle.common.SelectField |
| Common: |
cn.piflow.bundle.common.Join |
| Common: |
cn.piflow.bundle.common.DoFlatMapStop |
| Common: |
cn.piflow.bundle.common.ExecuteSQLStop |
| Common: |
cn.piflow.bundle.common.Merge |
| Common: |
cn.piflow.bundle.common.DoMapStop |
| Common: |
cn.piflow.bundle.common.Subtract |
| Data Clean: |
cn.piflow.bundle.clean.IdentityNumberClean |
| Data Clean: |
cn.piflow.bundle.clean.PhoneNumberClean |
| Data Clean: |
cn.piflow.bundle.clean.EmailClean |
| Data Clean: |
cn.piflow.bundle.clean.TitleClean |
| Message Queue: |
cn.piflow.bundle.kafka.WriteToKafka |
| Message Queue: |
cn.piflow.bundle.kafka.ReadFromKafka |
| Microorganism: |
cn.piflow.bundle.microorganism.Ensembl_gff3Parser |
| Microorganism: |
cn.piflow.bundle.microorganism.GeneParser |
| Microorganism: |
cn.piflow.bundle.microorganism.RefseqParser |
| Microorganism: |
cn.piflow.bundle.microorganism.GoDataParse |
| Microorganism: |
cn.piflow.bundle.microorganism.PfamDataParser |
| Microorganism: |
cn.piflow.bundle.microorganism.GoldDataParse |
| Microorganism: |
cn.piflow.bundle.microorganism.Swissprot_TrEMBLDataParser |
| Microorganism: |
cn.piflow.bundle.microorganism.EmblParser |
| Microorganism: |
cn.piflow.bundle.microorganism.PDBParser |
| Microorganism: |
cn.piflow.bundle.microorganism.GenBankParse |
| Microorganism: |
cn.piflow.bundle.microorganism.TaxonomyParse |
| Microorganism: |
cn.piflow.bundle.microorganism.BioProjetDataParse |
| Microorganism: |
cn.piflow.bundle.microorganism.BioSampleParse |
| Microorganism: |
cn.piflow.bundle.microorganism.InterprodataParse |
| Microorganism: |
cn.piflow.bundle.microorganism.MicrobeGenomeDataParser |
| Memcache: |
cn.piflow.bundle.memcache.ComplementByMemcache |
| Memcache: |
cn.piflow.bundle.memcache.PutMemcache |
| Memcache: |
cn.piflow.bundle.memcache.GetMemcache |
| Mechine Learning: |
cn.piflow.bundle.ml_clustering.GaussianMixtureTraining |
| Mechine Learning: |
cn.piflow.bundle.ml_classification.LogisticRegressionTraining |
| Mechine Learning: |
cn.piflow.bundle.ml_classification.RandomForestTraining |
| Mechine Learning: |
cn.piflow.bundle.ml_classification.DecisionTreePrediction |
| Mechine Learning: |
cn.piflow.bundle.ml_classification.RandomForestPrediction |
| Mechine Learning: |
cn.piflow.bundle.ml_clustering.BisectingKMeansPrediction |
| Mechine Learning: |
cn.piflow.bundle.ml_clustering.LDAPrediction |
| Mechine Learning: |
cn.piflow.bundle.ml_classification.MultilayerPerceptronTraining |
| Mechine Learning: |
cn.piflow.bundle.ml_classification.GBTTraining |
| Mechine Learning: |
cn.piflow.bundle.ml_clustering.BisectingKMeansTraining |
| Mechine Learning: |
cn.piflow.bundle.ml_classification.MultilayerPerceptronPrediction |
| Mechine Learning: |
cn.piflow.bundle.ml_classification.GBTPrediction |
| Mechine Learning: |
cn.piflow.bundle.ml_clustering.KmeansTraining |
| Mechine Learning: |
cn.piflow.bundle.ml_classification.NaiveBayesPrediction |
| Mechine Learning: |
cn.piflow.bundle.ml_classification.DecisionTreeTraining |
| Mechine Learning: |
cn.piflow.bundle.ml_clustering.LDATraining |
| Mechine Learning: |
cn.piflow.bundle.ml_classification.LogisticRegressionPrediction |
| Mechine Learning: |
cn.piflow.bundle.ml_feature.WordToVec |
| Mechine Learning: |
cn.piflow.bundle.ml_clustering.KmeansPrediction |
| Mechine Learning: |
cn.piflow.bundle.ml_classification.NaiveBayesTraining |
| Mechine Learning: |
cn.piflow.bundle.ml_clustering.GaussianMixturePrediction |
| ElasticSearch: |
cn.piflow.bundle.es.PutEs |
| ElasticSearch: |
cn.piflow.bundle.es.QueryEs |
| ElasticSearch: |
cn.piflow.bundle.es.FetchEs |
| Redis: |
cn.piflow.bundle.redis.WriteToRedis |
| Redis: |
cn.piflow.bundle.redis.ReadFromRedis |
| Xml: |
cn.piflow.bundle.xml.XmlParser |
| Xml: |
cn.piflow.bundle.xml.XmlStringParser |
| Xml: |
cn.piflow.bundle.xml.FlattenXmlParser |
| Xml: |
cn.piflow.bundle.xml.XmlSave |
| Xml: |
cn.piflow.bundle.xml.FolderXmlParser |
| Ftp: |
cn.piflow.bundle.hdfs.SelectFilesByName |
| Ftp: |
cn.piflow.bundle.ftp.UploadToFtp |
| Ftp: |
cn.piflow.bundle.ftp.LoadFromFtp |
| Ftp: |
cn.piflow.bundle.ftp.LoadFromFtpUrl |
| Ftp: |
cn.piflow.bundle.ftp.LoadFromFtpToHDFS |
| Ftp: |
cn.piflow.bundle.ftp.UnGz |
| Ftp: |
cn.piflow.bundle.ftp.NewLoadFromFtp |
| Excel: |
cn.piflow.bundle.excel.ExcelParser |
| Solr: |
cn.piflow.bundle.solr.PutIntoSolr |
| Solr: |
cn.piflow.bundle.solr.GetFromSolr |
| Json: |
cn.piflow.bundle.json.JsonStringParser |
| Json: |
cn.piflow.bundle.json.MultiFolderJsonParser |
| Json: |
cn.piflow.bundle.json.FolderJsonParser |
| Json: |
cn.piflow.bundle.json.JsonSave |
| Json: |
cn.piflow.bundle.json.JsonParser |
| Json: |
cn.piflow.bundle.json.EvaluateJsonPath |
| GraphX: |
cn.piflow.bundle.graphx.LoadGraph |
| GraphX: |
cn.piflow.bundle.graphx.LabelPropagation |