Java Data Mining

Last updated: March 9, 2005

Change Log for JDM 1.1

No.
Packages & Interfaces
Issue Changes Comment
1
javax.datamining.base.
BuildSettings

public void setOutlierTreatment (String logicalAttrName, OutlierTreatment treatment) 
public void setOutlierIdentification(String logicalAttrName, Interval bounds)

Methods descriptions are inconsistent, noting name existence is verified by the verify method and  that they throw exception if the attribute does not exist.

  • setOutlierTreatment does not throw JDMException if the attribute does not exist.
  • setOutlierIdentification does not throw JDMException if the attribute does not exist.

2
javax.datamining.base.
BuildSettings
With BuildSettings, attributes can be specified with usage, weight, outlier treatment and outlier identification. However, the API does not provide means to retrieve these attributes. This is needed when dealing with a restored BuildSettings.
  • New enumeration - AttributeRetrievalType {usage, weight, outlierTreatment, outlierIdentification}
  • New method - public String[] getAttributeNames(AttributeRetrievalType)

3
javax.datamining.resource.
Connection
There are methods that return named objects, but not only the names of those objects. Returning the object names is important for efficiency is displaying objects in a GUI.
New methods added to Connection:
  • public Collection getObjectNames(Date, Date, NamedObject)
  • public Collection getObjectNames(Date, Date, NamedObject, Enum)
  • public Collection getModelNames(MiningFunction, MiningAlgorithm, Date, Date)
Note: getModelNames - function cannot be null.
Consideration for JDM 2.0 is a more general and powerful ObjectFilter-based interface. This requires a proposal with use cases, and agreement from vendors / users on need / uptake.
4
javax.datamining.resource.
Connection
Allow users to explicitly inform the DME to load data as an optimization hint. This is analogous to models that can be loaded into memory upon requests by the user. One use case is where there are different logical data and algorithms to build different model on the same data.
If not supported by the DME, the methods must be a no-op, as for models.
New methods:
  • public String[] getLoadedData() - Returns an array of data URIs that are currently loaded. The result is an empty array if data loading is not supported.
  • public void requestDataLoad(String) - Requests the DME to load the specified data in memory to enhance efficiency and performance. The intent is for the data to remain in memory until requestUnloadData is invoked for the same data, or the connection terminates and there are no connections using the data.
    This may be a no-op if the vendor need not load data into memory or does not support the capability. It is an idempotent operation if the data has not changed.
    This method can be invoked on multiple data. If the specified data does not exist or cannot be located, an exception is thrown.
  • public void requestDataUnloaded(String) - Informs the DME that the specified data is no longer needed and that the data may be removed from memory if necessary.
    This may be a no-op if vendor does not require loading data into memory. It is an idempotent operation.
    If the requested data does not exist or cannot be located, an exception is thrown.

5
javax.datamining.
modeldetail.tree.
TreeModelDetail
Tree model details were omitted from the API. Add new methods for more information about the decision tree model: the number of nodes, the number of leaf nodes, and the tree depth.
New methods:
  • public int getTreeDepth()
  • public int getNumberOfNodes()
  • public int getNumberOfLeafNodes()

6
javax.datamining.modeldetail.tree.
TreeNode
TreeModelDetail has getRules() and getRule(int nodeId) methods, but not an ability to get the rule on a TreeNode.  Such functionality is present in Clustering which  also supports rules. New method:
  • public Rule getRule() - Javadoc: Returns the rule associated with the node. Any node in the tree can return its associated rule.

7
javax.datamining.algorithm.tree.
TreeSettings
Currently only one kind of minimum node size is allowed and precludes vendors from accepting two kinds (count and percent).
Deprecated methods:
  • public double getMinNodeSize()
  • public SizeUnit getMinNodeSizeUnit()
Note: These two methods return the value last set.

New method in TreeSettings:
  • public double getMinNodeSize(SizeUnit)
Note: the semantics when both count and percent are specified is that node split does not happen when either criterion is satisfied.

New method in TreeSettingsFactory:
  • public boolean supportsMinNodeSizeUnit(SizeUnit)

8
javax.datamining.
modeldeatil.tree.
TreeModelDeatail
clustering.
ClusteringModel
Clarify the ranges of the values for tree depth, level, and number of clusters returned from the models Javadoc change:
Explicitly specify that all hierarchies start from level 0: Tree, clustering, taxonomy
Tree depth > 0
Number of nodes in the tree > 1
Number of leaf nodes in the tree > 1
Number of clusters > 0

9
javax.datamining.clustering.
ClusteringSettingsFactory
Aggregation function and attribute comparison function need to be coupled since each since they are not independent. but there's no capability that takes both parameters.
Deprecated methods
  • public boolean supportsCapability(AggregationFunction)
  • public boolean supportsCapability(AttributeComparisonFunction)
New Method
  • public boolean supportsCapability(AggregationFunction, AttributeComparisonFunction)
Note: Both arguments for the new method must be non-null. Either or both could be systemDefault or systemDetermined.

10
javax.datamining.clustering.
ClusteringApplySettings
Unify descriptions for create methods on apply settings:
  • ClassificationApplySettings.create() - Creates an instance of ClassificationApplySettings initialized to vendor-specific default values
  • RegressionApplySettings.create() - Creates an instance of RegressionApplySettings initialized to vendor-specific default values
  • ClusteringApplySettings.create() -
    Creates an empty instance of ClusteringApplySettings
ClusteringApplySettingsFactory.create() - javadoc changed as following:
Creates an instance of ClusteringApplySettings initialized to vendor-specific default values.

11
javax.datamining.clustering.
ClusteringApplyCapability
Correct Javadoc which uses ClusteringApplyContentCapability, instead of ClusteringApplyCapability. Javadoc changed to use ClusteringApplyCapability.
12
javax.datamining.
supervised.classification.
ClassificationApplySettings
clustering.
ClusteringApplySettings

Once an array of destination attribute names are mapped with an apply content by mapByRank method, there is no way to get such attribute names for inspection. This is the same in ClusteringApplySetting.
The following methods have been added:
ClassificationApplySettings
  • public String[] getMappedDestinationAttributeNames(ClassificationApplyContent)
ClusteringApplySettings
  • public String[] getMappedDestinationAttributeNames(ClusteringApplyContent)


13
javax.datamining.
clustering.
ClusteringApplySettings
supervised.classification.
ClassificationApplySettings
Clarify in Javadoc that the cardinality of the destination attribute names specified with the mapByRank method must be the same across different invocations (with different apply contents).
Clarify also the effect of invoking mapByCategory and
mapTopPrediction methods with the apply content.
State clearly that the map methods cannot be used together, e.g., mapByRank and mapPredictions cannot be used with the same apply settings.
Make changes to the Javadoc accordingly, including the following descriptions at interface level for ClusteringApplySettings and ClassificationApplySettings:
  • The map methods in this interface are to be used mutually exclusively. If a different kind of mapping is needed, then the previous settings must be reset by resetMapping method.
ClassificationApplySettings
  • mapByRank - If this method is invoked on the same content, the previous mapping is replaced with the new one.
    The cardinality of the destination attribute names specified with this method must be the same across multiple invocations with different apply contents. If the cardinality is different for an apply content that has already been specified previously, then all previous settings become nullified and the current invocation creates a new setting.
  • MapByCategory - If this method is invoked on the same pair of apply content and cluster identifier, then the previous setting is replaced with the new one.
  • mapPredictions - If the same content is used, the previous mapping is replaced with the new one.
  • mapTopPrediction - If this method is invoked on the same apply content, then the previous setting is replaced with the new one.
ClusteringApplySettings
  • mapByClusterIdentifier - If this method is invoked on the same pair of apply content and cluster identifier, then the previous setting is replaced with the new one.
  • mapByRank - same as ClassificationApplySettings.mapByRank
  • mapClusters - same as ClassificationApplySettings.mapPredictions
  • mapTopCluster - same as ClassificationApplySettings.mapTopPrediction

14
javax.datamining.base.
Task
Each Task subtype has a verify method. Produce a cleaner design by moving method to Task. Each child interface must still implement the method.
The Javadoc of each child interface must describe how verification can be done, and that verification is vendor specific.
Methods moved Task:
  • public VerificationReport BuildTask.verify() :
  • public VerificationReport ImportTask.verify()
  • public VerificationReport ExportTask.verify()
  • public VerificationReport ComputeStatisticsTask.verify()
  • public VerificationReport ApplyTask.verify()
  • public VerificationReport ClassificationTestTask.verify()
  • public VerificationReport ClassificationTestMetricsTask.verify()

15
javax.datamining.task.apply.
ApplyTask
The Javadoc for verify() says at the end: 

On execute, if a signature attribute does not have a mapped input, an exception is raised if synchronous, or a error status if asynchronous. 

This means that an exception is thrown if an attribute is missing in the apply data. But it should depends on implementation. For one, record apply may contain only a partial set of attributes in the record. This should apply to data set apply as well.

Javadoc change:

ApplyTask description augmented with:

If a signature attribute does not have a mapped attribute in the input data, it is the vendor's choice to regard it as a missing value and continue the apply operation. An exception may also be thrown if the vendor does not support such a feature.
Related to change #13
16
javax.datamining.
supervised.classification.
ClassificationTestTask
<?> omits a method to set the description for the test metrics object it creates because test metrics is a named object.
New methods added:
  • public String getTestMetricsDescription()
  • public void setTestMetricsDescription(String description)

17
javax.datamining.algorithm.svm.
classification.
SVMClassificationSettingsFactory
regression.
SVMRegressionSettingsFactory
SVM classification and regression need supportsCapability method to check which kernel functions are supported by the implementation.  New Methods:

SVMClassificationSettingsFactory
  • public boolean supportsCapability(KernelFunction)
SVMRegressionSettingsFactory
  • public boolean supportsCapability(KernelFunction)
Note: The argument kernelFunction cannot be null.

18
javax.datamining.algorithm.
svm.classification.
SVMClassificationSettings
svm.regression.
SVMRegressionSettings
The following parameters for SVM should not allow 0: Complexity factor, Tolerance, Epsilon. For example, accepting 0 for tolerance precludes convergence. 

Javadoc changed:

SVMClassificationSetting
  • setComplexityFactor - The factor must be a positive number.
  • setTolerance - The value must be greater than 0 and less than 1.
SVMRegressionSetting
  • setComplexityFactor - The factor must be a positive number.
  • setTolerance - The value must be greater than 0 and less than 1.
  • setEpsilon - The value must be a positive number that is less than 1.

19
javax.datamining.
modeldetail.svm.
SVMClassificationModelDeatil
SVMRegressionModelDeatil
modeldetail.naivebayes.
NaiveBayesModelDetail
It is difficult to get coefficients or target probabilities when logical data is absent with the model because these methods require knowledge of data by taking attribute values as arguments.
New methods:

SVMClassificationModelDetail
  • public java.util.Map getCoefficients(Object targetValue, String attrName) - returns a Map of pairs of attribute value and its coefficient associated with the specified target value
SVMRegressionModelDetail
  • public java.util.Map getCoefficients(String attrName) - returns a Map of pairs of attribute value and its coefficient
NaiveBayesModelDetail
  • public java.util.Map getPairProbabilties(attrName : String, targetValue : Object)

20
javax.datamining.association.
AssociationSettings
AssociationModel

Unify the range specification of support and confidence. Some use [0..1] and some use [0..100].  In addition, both boundary values must be allowed.
Javadoc descriptions have been changed to use [0..100] in interfaces:

AssociationModel
  • getMaxConfidence
  • getMinConfidence
AssociationSettings
  • setMinConfidence
  • setMinSupport

21 javax.datamining.association.
AssociationRule
Rule ID is necessary in preparation of apply with AR to provide the rule ID associated with the prediction. New method:
  • public in getRuleIdentifier()

22 javax.datamining.data.
CategoryMatrix
public Double getValue(Object rowCategoryValue, Object columnCategoryValue) throws JDMException

This method returns Double, but its behavior is not clear when a non-existing entry is specified. If it returns null for such entries, then it would not be able to support sparse representation of matrices. Also, it looks as though exception does not need to be thrown because it returns null for non-existing entries.

Need to introduce a method with a new name, such as getCellValue, since method overloading is not possible. Then, a new set method also needs to be introduced for completeness.

It is also noted that CategoryMatrix is a common super interface of three other interfaces, but they share little in common; they are tied together as a CategoryMatrix simply because they bear a name that includes Matrix.
CategoryMatrix is deprecated.
CategoryMatrix: deprecated (along with getValue method)

New methods in SimilarityMatrix
  • public double getCellValue(Object category1, Object category2)
  • public void setCellValue(Object category1, Object category2, double similarityValue)
New methods in CostMatrix:
  • public double getCellValue(Object actualTarget, Object predictedTarget)
  • public double setCellValue(Object actualTarget, Object predictedTarget, double cost)
Note: The new get methods now return the default value depending on the type of the interface if the entry does not exist, and do not throw exception. For example, CostMatrix getCellValue returns 0 for diagonal entries, even if they are not specified. The new set methods also do not throw exception because they return null if the cell is not found.

Note:
ConfusionMatrix already has a method getNumberOfPredictions that is equivalent of getCellValue.
From JDM 2.0, the two remaining methods in CategoryMatrix will be moved down to CostMatrix, ConfusionMatrix and SimilarityMatrix and CategoryMatrix  will be removed (deprecated) entirely.
23
javax.datamining.data.
PhysicalDataSet
The API does not provide a means to inspect the physical attributes based on data type or role. For example, if a physical data is created by metadata import, attributes with unsupported data type will be marked as unknown. New methods:
  • getAttributeNames(AttributeDataType dataType) : Collection
  • getAttributeNames(PhysicalAttributeRole role) : Collection
Note: These methods return a collection of attribute names.