[WIP] Use MKL method to implement a 'lazy' gradient zero #1561

frankfzw · 2017-09-14T08:49:56Z

What changes were proposed in this pull request?

Remove tensor.fill(0) when the zeroGradParameters is called. Instead, only a flag named zeroGradFlag is set to be true and then this method return instantly.
The clearance of gradient parameter is finished in accGradParameters. The different MKL math functions are called with different parameters according to zeroGradFlag.
zeroGradFlag is set to false after each accGradParameters.
Add corresponding unit tests or test cases.

How was this patch tested?

unit tests

frankfzw · 2017-09-14T08:51:02Z

@yiheng Please take a look at this.

yiheng · 2017-09-25T06:23:53Z

spark/dl/src/main/scala/com/intel/analytics/bigdl/nn/SpatialConvolution.scala

@@ -793,13 +813,22 @@ class SpatialConvolution[T: ClassTag](
        val sWFloat = ev.toType[Float](scaleW)
        val sBFloat = ev.toType[Float](scaleB)
        val gradBFloat = gradBias.asInstanceOf[Tensor[Float]]
+        val update = if (zeroGradFlag) {


do not use function object in computing intensive code

yiheng · 2017-09-25T06:24:41Z

spark/dl/src/main/scala/com/intel/analytics/bigdl/nn/SpatialConvolution.scala

@@ -471,9 +471,14 @@ class SpatialConvolution[T: ClassTag](
      val gradView = gradWeightMMInBatch.view(batchSize,
        nOutputPlane * nInputPlane * kernelH * kernelW / nGroup).t
      val grad = gradWeight.view(nOutputPlane * nInputPlane * kernelH * kernelW / nGroup)
-      grad.addmv(ev.fromType(1.0), ev.fromType(1.0), gradView, onesBatch)
+      val beta = if (zeroGradFlag) {
+        ev.fromType(0.0)


yiheng · 2017-09-25T06:24:47Z

spark/dl/src/main/scala/com/intel/analytics/bigdl/nn/SpatialConvolution.scala

+      val beta = if (zeroGradFlag) {
+        ev.fromType(0.0)
+      } else {
+        ev.fromType(1.0)


yiheng · 2017-09-25T06:24:57Z

spark/dl/src/main/scala/com/intel/analytics/bigdl/nn/SpatialConvolution.scala

      if (withBias) {
-        gradBias.addmv(ev.fromType(1.0), ev.fromType(1.0), gradientBiasMT.t, onesBatch)
+        gradBias.addmv(beta, ev.fromType(1.0), gradientBiasMT.t, onesBatch)


yiheng · 2017-09-25T06:25:30Z

spark/dl/src/main/scala/com/intel/analytics/bigdl/nn/Linear.scala

      }

      if (withBias && scaleB != 0) {
-        gradBias.addmv(ev.fromType[Double](scaleB), gradOutput.t, addBuffer)
+        if (zeroGradFlag) {
+          gradBias.addmv(ev.fromType[Double](0.0),


yiheng · 2017-09-25T06:26:06Z

spark/dl/src/main/scala/com/intel/analytics/bigdl/nn/Linear.scala

      }
    }
    else if (input.dim() == 2) {
      if (scaleW != 0) {
-        gradWeight.addmm(ev.fromType[Double](scaleW), gradOutput.t, input)
+        if (zeroGradFlag) {
+          gradWeight.addmm(ev.fromType[Double](0.0),


yiheng · 2017-09-25T06:26:46Z

spark/dl/src/main/scala/com/intel/analytics/bigdl/nn/Linear.scala

@@ -141,20 +141,38 @@ class Linear[T: ClassTag](

    if (input.dim() == 1) {
      if (scaleW != 0) {
-        gradWeight.addr(ev.fromType[Double](scaleW), gradOutput, input)
+        if (zeroGradFlag) {
+          gradWeight.addr(ev.fromType[Double](0.0), gradOutput, ev.fromType[Double](scaleW), input)


frankfzw · 2017-09-26T07:52:44Z

spark/dl/src/main/scala/com/intel/analytics/bigdl/nn/Linear.scala

      }
    }
    else if (input.dim() == 2) {
      if (scaleW != 0) {
-        gradWeight.addmm(ev.fromType[Double](scaleW), gradOutput.t, input)
+        if (zeroGradFlag) {
+          gradWeight.addmm(ev.zero,


It seems this operation slows down the backward of Linear @yiheng

why, can you dig deeper? These two methods call the same method with only one parameter difference

You may need to run some micro-benchmark on this method

yiheng reviewed Sep 25, 2017

View changes

spark/dl/src/main/scala/com/intel/analytics/bigdl/nn/SpatialConvolution.scala Outdated

val beta = if (zeroGradFlag) {

ev.fromType(0.0)

} else {

ev.fromType(1.0)

This comment has been minimized.

Sign in to view

yiheng Sep 25, 2017
Contributor

ev.one

yiheng reviewed Sep 25, 2017

View changes

frankfzw reviewed Sep 26, 2017

View changes

frankfzw force-pushed the frankfzw:zeros branch from 248b29f to 5e2101b Oct 24, 2017

frankfzw force-pushed the frankfzw:zeros branch from 8fe7cbf to 5e2101b Oct 31, 2017

frankfzw added 8 commits Aug 31, 2017

add zeroGradFlag in AbstractModule

70ede29

Add zeroGradFlag in AbstractModule to enable lazy zero gradient

0918dac

update zero grad in CMul

6da5661

update zero gradient in Cosine

730d5f0

update zero gradient method

653cb24

update zero gradients in spatial convolution

85e6674

update zero gradient method

c319f09

update zero gradient method in Linear

d67b640

frankfzw force-pushed the frankfzw:zeros branch from 5e2101b to d67b640 Nov 2, 2017

remove wrong mul in CAdd

7f1f2ff

Nov	DEC	Jan
	07
2019	2020	2021

intel-analytics / BigDL

[WIP] Use MKL method to implement a 'lazy' gradient zero #1561

[WIP] Use MKL method to implement a 'lazy' gradient zero #1561

frankfzw commented Sep 14, 2017

frankfzw commented Sep 14, 2017

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

intel-analytics / BigDL

Join GitHub today

GitHub is where the world builds software

[WIP] Use MKL method to implement a 'lazy' gradient zero #1561

[WIP] Use MKL method to implement a 'lazy' gradient zero #1561

Conversation

frankfzw commented Sep 14, 2017

What changes were proposed in this pull request?

How was this patch tested?

frankfzw commented Sep 14, 2017

This comment has been minimized.

yiheng Sep 25, 2017 Contributor

This comment has been minimized.

yiheng Sep 25, 2017 Contributor

This comment has been minimized.

yiheng Sep 25, 2017 Contributor

This comment has been minimized.

yiheng Sep 25, 2017 Contributor

This comment has been minimized.

yiheng Sep 25, 2017 Contributor

This comment has been minimized.

yiheng Sep 25, 2017 Contributor

This comment has been minimized.

yiheng Sep 25, 2017 Contributor

This comment has been minimized.

frankfzw Sep 26, 2017 Author Contributor

This comment has been minimized.

yiheng Sep 26, 2017 Contributor

This comment has been minimized.

yiheng Sep 26, 2017 Contributor

Essential cookies

Always active

Analytics cookies

yiheng Sep 25, 2017
Contributor

yiheng Sep 25, 2017
Contributor

yiheng Sep 25, 2017
Contributor

yiheng Sep 25, 2017
Contributor

yiheng Sep 25, 2017
Contributor

yiheng Sep 25, 2017
Contributor

yiheng Sep 25, 2017
Contributor

frankfzw Sep 26, 2017
Author Contributor

yiheng Sep 26, 2017
Contributor

yiheng Sep 26, 2017
Contributor