JCP version2.7

JSPA version2.0

Reason: Withdrawn at the request of the Specification Lead.

Proposal

This JSR has been Withdrawn.

Reason: Withdrawn at the request of the Specification Lead.

Updates to the Java Specification Request (JSR)

The following information has been updated from the original JSR:

2006.10.25: Schedule Update
Expert Group formation by 9/2004
Early Draft Review by 12/2005
Public Draft by 10/2006
Proposed Final Draft 4/2007
Final Release 6/2007

Java Data Mining Public Page on java.net

Expert Group Private Page on java.net

Patent Notifications on java.net

Original Java Specification Request (JSR)

Identification | Request | Contributions

Section 1. Identification

Submitting Member: Oracle Corporation

Name of Contact Person: Mark F. Hornick

E-Mail Address: mark.hornick@oracle.com

Telephone Number: +1 781 744 0315

Fax Number: +1 781 238 9857

Specification Lead: Mark F. Hornick

E-Mail Address: mark.hornick@oracle.com

Telephone Number: +1 781 744 0315

Fax Number: +1 781 238 9857

Initial Expert Group Membership:

SPSS
Hyperion Solutions
IBM
KXEN
Computer Associates

Supporting this JSR:

SAP AG

Section 2: Request

2.1 Please describe the proposed Specification:

JDM addresses the need for a pure JavaTM API that supports data mining operations and activities. JDM 2.0 extends JDM with requested functionality for new mining functions, mining algorithms, and corresponding web services specification. Features that should be considered in JDM 2.0 include, but are not limited to, the following:

Sequential Patterns / Time Series - mining functions to address forecasting and modeling seasonal or periodic fluctuations in data.
Transformations interface - data preparation is a key aspect of any data mining solution. A separate JSR for transformations is likely warranted. Having a close integration with such a JSR and addressing transformations in the next version has high priority.
Ensemble models - define composite models structured with logic, e.g., boosting and bagging approaches.
Apply for Association - augment specification to enable prediction based on association rules.
Text Mining - enable mining of unstructured text data both by explicit feature extraction and the accepting of text attributes as model predictors
Model Comparison - introduce ability to compare multiple models according to various quality metrics, e.g., accuracy and lift for classification.
Multi-record real-time scoring - enable scoring of multiple records in the record apply task as a performance optimization for applications.
Multi-target models - enable the specification of multiple targets for supervised models as a model performance and representation optimization.

The goal of the JDM 2.0 Expert Group will be to investigate these features and to identify and pursue others necessary for the data mining and Java community.

2.2 What is the target Java platform? (i.e., desktop, server, personal, embedded, card, etc.)

Desktop and server

2.3 The Executive Committees would like to ensure JSR submitters think about how their proposed technology relates to all of the Java platform editions. Please provide details here for which platform editions are being targeted by this JSR, and how this JSR has considered the relationship with the other platform editions.

J2SE^TM and J2EE^TM

Should this JSR be voted on by both Executive Committees?

2.5 What need of the Java community will be addressed by the proposed specification?

The Java community needs a standard way to create, store, access and maintain data and metadata supporting data mining models, data scoring, and data mining results serving J2EE-compliant application servers J2SE environments. JDM laid the groundwork for a standard API for data mining. By using JDM, implementers of data mining applications can expose a single, standard API that will be understood by a wide variety of client applications and components running on the J2EE/J2SE Platform.

By extending the existing JDM standard with new mining functions and algorithms, data mining clients can be coded against a single API that is independent of the underlying data mining system. The goal of JDM is to provide for data mining systems what JDBC^TM did for relational databases.

2.6 Why isn't this need met by existing specifications?

The proposed features for JDM 2.0 are data mining-specific and highly valuable for data mining users. These features fit well within the framework provided by JDM 1.0 and are not currnetly provided by JDM 1.0.

2.7 Please give a short description of the underlying technology or technologies:

Like JDM 1.0, JDM 2.0 will be based on a highly-generalized, object-oriented, data mining conceptual model leveraging emerging data mining standards such OMG's CWM, SQL/MM for Data Mining, and DMG's PMML. The JDM model will support four conceptual areas that are generally of key interest to users of data mining systems: settings, models, transformations, and results. The object model provides a core layer of services and interfaces that are available to all clients. Clients consistently see the same interfaces and semantics and are coded to these interfaces. Vendor implementations of JDM will likely not support all interfaces and services defined by JDM. However, JDM will provide mechanisms for client discovery of supported interfaces, capabilities, and constraints.

It is up to each vendor to decide how to implement JDM. Some vendors may decide to implement JDM as the native API of their product. Others may opt to develop a driver/adapter that mediates between a core JDM layer and multiple vendor products. JDM does not prescribe any particular implementation strategy.

To ensure J2EE compatibility and eliminate duplication of effort, JDM leverages existing specifications. In particular, JDM relies on the Java Connection Architecture (JSR-000016) to provide resource management, transaction management, security, and record mapping and result set management.

2.8 Is there a proposed package name for the API Specification? (i.e., `javapi.something`, `org.something`, etc.)

javax.datamining

2.9 Does the proposed specification have any dependencies on specific operating systems, CPUs, or I/O devices that you know of?

2.10 Are there any security issues that cannot be addressed by the current security model?

2.11 Are there any internationalization or localization issues?

2.12 Are there any existing specifications that might be rendered obsolete, deprecated, or in need of revision as a result of this work?

JSR-73 (JDM 1.0) will be extended to include new functionality.

2.13 Please describe the anticipated schedule for the development of this specification.

Expert Group formation by 9/2004
Early Draft Review by 5/2005
Public Draft by 9/2005
Proposed Final Draft 2/2006
Final Release 6/2006

Note that this information has been updated from the original JSR.

2.14 Please describe the anticipated working model for the Expert Group working on developing this specification.

Like JSR-73, work on JDM 2.0 will involve periodic face-to-face meetings, usually every 2-3 months. There will also be weekly 1 hour conference calls to review proposals and address issues. We will be looking for members who are willing to contribute new mining function and algorithm specifications, as well as contribute to the implementation of the TCK and RI. JDM already has in place a private (javadatamining) and public (datamining) project on java.net.

2.15 It is important to the success of the community and each JSR that the work of the Expert Group be handled in a manner which provides the community and the public with insight into the work the Expert Group is doing, and the decisions that the Expert Group has made. The Executive Committees would like to ensure Spec Leads understand the value of this transparency and ask that each JSR have an operating plan in place for how their JSR will address the involvement of the community and the public. Please provide your plan here, and refer to the Spec Lead Guide for a more detailed description and a set of example questions you may wish to answer in your plan.

With the introduction of the java.net public project (datamining), we have introduced a discussion forum, and have the standard tools for communicating or responding with the public. The JCP community is also welcome to interim materials if they officially request them. With JSR-73, community involvement relied on the Community Draft, and public involvement after the Public Draft. Now that the new features are being proposed for an existing framework, earlier feedback is welcome.

2.16 Please describe how the RI and TCK will de delivered, i.e. as part of a profile or platform edition, or stand-alone, or both. Include version information for the profile or platform in your answer.

The RI and TCK will be delivered on top of J2EE 1.5, with a version excluding web services as stand-alone.

2.17 Please state the rationale if previous versions are available stand-alone and you are now proposing in 2.13 to only deliver RI and TCK as part of a profile or platform edition (See sections 1.1.5 and 1.1.6 of the JCP 2 document).

JDM 1.0 provides the specification for web services, but does not include them in the RI or TCK. In JDM 2.0, we will include them in the RI and TCK. As a result, JDM 2.0 must use the J2EE web XML API's and will require a J2EE Container for RI & TCK.

2.18 Please provide a description of the business terms for the Specification, RI and TCK that will apply when this JSR is final.

The specification, RI, and TCK will be made available free of charge with similar licensing terms to JSR-73.

Section 3: Contributions

3.1 Please list any existing documents, specifications, or implementations that describe the technology. Please include links to the documents if they are publicly available.

The following specifications serve (in part) as design references for JDM:

* Java Data Mining 1.0 (JSR-73)
* Common Warehouse Metamodel (CWM)

http://www.omg.org/techprocess/faxvotes/CWMI_RFP.html

* CWM Specification, Volume 1 (ad/2000-01-01)

CWM Specification, Volume 1, Chapter 14, Data Mining provides a sense of the overall structure of the metadata that the metadata-oriented interfaces of JDMAPI will support.
* CWM Specification, Volume 2 (ad/2000-01-02)

CWM Specification, Volume 2, Sections 2.14 DataMining.idl, provide a general idea of how the metadata-oriented interfaces of JDMAPI might be structured (once again, generally extending the appropriate JSR-000040 interfaces).
* DMG PMML

http://www.dmg.org

PMML provides an XML-based representation for mining models and facilitates interchange among vendors for model results.
* ISO SQL/MM Part 6. Data Mining

SQL/MM Part 6 Data mining provides a standard interface to RDMBSs for performing data mining. Concepts from this approach may prove useful in the overall JDMAPI design.

3.2 Explanation of how these items might be used as a starting point for the work.

JDM 1.0 provides the conceptual framework and necessary infrastructure for JDM 2.0. The PMML and SQL/MM standards continue to evolve with data mining model specifications and functionality that will be valuable to data mining users. In JDM 2.0, we will continue to leverage the latest PMML and SQL/MM specifications.