Skip to content

Conversation

@WeichenXu123
Copy link
Contributor

What changes were proposed in this pull request?

add a mask parameter to MultivariantOnlineSummerizer constructor.
it can be the following values now:
meanMask
varianceMask
minMask
maxMask
numNonZerosMask

so that we can config the summarized targets in the following way:
new MultivariantOnlineSummerizer(meanMask|varianceMask)
it represent this summarizer will only compute mean and variance.

How was this patch tested?

unit test added.

@SparkQA
Copy link

SparkQA commented Sep 3, 2016

Test build #64902 has finished for PR 14950 at commit dc44bb9.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MultivariateOnlineSummarizer(mask: Int)

@srowen
Copy link
Member

srowen commented Sep 3, 2016

Hm how much does this really save? these are pretty cheap sufficient statistics. Now you have to know whether your particular object was configured or not to return the answer you want.

@WeichenXu123 WeichenXu123 force-pushed the optimize_MultivariantOnlineSummerizer branch from dc44bb9 to be286eb Compare September 3, 2016 16:26
@SparkQA
Copy link

SparkQA commented Sep 3, 2016

Test build #64904 has finished for PR 14950 at commit be286eb.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MultivariateOnlineSummarizer(mask: Int)

@WeichenXu123
Copy link
Contributor Author

@srowen not only cpu cost, if data dimension is big, serialization cost will be big, such as #14109
and compute all target seems not proper if we may add more summary targets in the future ?

@WeichenXu123 WeichenXu123 force-pushed the optimize_MultivariantOnlineSummerizer branch from be286eb to 3029468 Compare September 3, 2016 16:39
@SparkQA
Copy link

SparkQA commented Sep 3, 2016

Test build #64905 has finished for PR 14950 at commit 3029468.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class MultivariateOnlineSummarizer(mask: Int)

@WeichenXu123 WeichenXu123 changed the title [SPARK-17390][ML][MLLib] Optimize MultivariantOnlineSummerizer by making the summarized target configurable [WIP][SPARK-17390][ML][MLLib] Optimize MultivariantOnlineSummerizer by making the summarized target configurable Sep 13, 2016
@WeichenXu123
Copy link
Contributor Author

when benchmark is done I will reopen it.

@WeichenXu123 WeichenXu123 changed the title [WIP][SPARK-17390][ML][MLLib] Optimize MultivariantOnlineSummerizer by making the summarized target configurable [WIP][SPARK-14523][ML][MLLib] Optimize MultivariantOnlineSummerizer by making the summarized target configurable Jul 14, 2017
@WeichenXu123 WeichenXu123 changed the title [WIP][SPARK-14523][ML][MLLib] Optimize MultivariantOnlineSummerizer by making the summarized target configurable [WIP][SPARK-19208][ML][MLLib] Optimize MultivariantOnlineSummerizer by making the summarized target configurable Jul 14, 2017
@WeichenXu123 WeichenXu123 deleted the optimize_MultivariantOnlineSummerizer branch April 24, 2019 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants