Pentaho Data Mining, based on Weka project, is a comprehensive set of tools for machine learning and data mining. Its broad suite of
classification, regression, association rules and clustering algorithms can be used to help you understand the business better and
also be exploited to improve future performance through predictive analytics.
Recent News and Releases
- 02/18/08 English documentation for Weka PDI Plugins is now available. - 12/19/07 Weka 3.5.7 is now available. - 12/19/07 Weka 3.4.12 is now available. - 12/19/07 English documentation for Weka 3.5.7 is now available. - 12/19/07 English documentation for Weka 3.4.12 is now available. - 12/06/07 Weka Plugins for Pentaho Data Integration 3.0 are now available. - 12/06/07 Pentaho streamlines delivery of predictive analytics (press release).
Stable
Weka 3.4.12 (GA) (Release Notes) This is a patch release to Weka 3.4 containing a number of bug fixes. For a detailed list of improvements, please refer to the release notes.
New Features since 3.2 - ARFF Viewer - General purpose graph visualizer - Improvements to Knowledge Flow including support for ROC curves, CSV data sink, clustering, database connectivity and a prediction appender step - 10 new Algorithms - XML serialization support - Click here for a detailed list of new features
In Development
Weka 3.5.7 (GA) (Release Notes) This is a patch release to Weka 3.5 containing a number of bug fixes. For a detailed list of improvements, please refer to the release notes.
New Features since 3.4 - 23 new learning schemes - 16 new filters - Grouping of steps (MetaBean) in Knowledge Flow - New SQL viewer and visualization plugin support in Explorer - Area under ROC (AUC) evaluation type - Relation-valued attributes (supports multi-instance learning) - Support for incremental clusterers - XML format for instances - Text directory to ARFF tool - Several new data generators - Click here for a detailed list of new features
To suggest a new feature or view our roadmap, click here.
Major features planned in future releases: - PMML support (import/export) - Execution of Kettle transforms in KnowledgeFlow - Cost-sensitive feature selection - Per-instance costs in cost-sensitive learning (2 class case) - Exporting of serialized classifiers from the KnowledgeFlow - Incorporating a routine for pairwise coupling into the MultiClassClassifier - Integration of Weka filters into Kettle - Enhancements to the Kettle Scoring Plugin including:
•
Support for training/updating incremental Weka models