Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
[WIP] Use MKL method to implement a 'lazy' gradient zero #1561
Conversation
|
@yiheng Please take a look at this. |
| @@ -793,13 +813,22 @@ class SpatialConvolution[T: ClassTag]( | |||
| val sWFloat = ev.toType[Float](scaleW) | |||
| val sBFloat = ev.toType[Float](scaleB) | |||
| val gradBFloat = gradBias.asInstanceOf[Tensor[Float]] | |||
| val update = if (zeroGradFlag) { | |||
yiheng
Sep 25, 2017
Contributor
do not use function object in computing intensive code
do not use function object in computing intensive code
| @@ -471,9 +471,14 @@ class SpatialConvolution[T: ClassTag]( | |||
| val gradView = gradWeightMMInBatch.view(batchSize, | |||
| nOutputPlane * nInputPlane * kernelH * kernelW / nGroup).t | |||
| val grad = gradWeight.view(nOutputPlane * nInputPlane * kernelH * kernelW / nGroup) | |||
| grad.addmv(ev.fromType(1.0), ev.fromType(1.0), gradView, onesBatch) | |||
| val beta = if (zeroGradFlag) { | |||
| ev.fromType(0.0) | |||
yiheng
Sep 25, 2017
Contributor
ev.zero
ev.zero
| val beta = if (zeroGradFlag) { | ||
| ev.fromType(0.0) | ||
| } else { | ||
| ev.fromType(1.0) |
yiheng
Sep 25, 2017
Contributor
ev.one
ev.one
| if (withBias) { | ||
| gradBias.addmv(ev.fromType(1.0), ev.fromType(1.0), gradientBiasMT.t, onesBatch) | ||
| gradBias.addmv(beta, ev.fromType(1.0), gradientBiasMT.t, onesBatch) |
yiheng
Sep 25, 2017
Contributor
ev.one
ev.one
| } | ||
|
|
||
| if (withBias && scaleB != 0) { | ||
| gradBias.addmv(ev.fromType[Double](scaleB), gradOutput.t, addBuffer) | ||
| if (zeroGradFlag) { | ||
| gradBias.addmv(ev.fromType[Double](0.0), |
yiheng
Sep 25, 2017
Contributor
ev.zero
ev.zero
| } | ||
| } | ||
| else if (input.dim() == 2) { | ||
| if (scaleW != 0) { | ||
| gradWeight.addmm(ev.fromType[Double](scaleW), gradOutput.t, input) | ||
| if (zeroGradFlag) { | ||
| gradWeight.addmm(ev.fromType[Double](0.0), |
yiheng
Sep 25, 2017
Contributor
ev.zero
ev.zero
| @@ -141,20 +141,38 @@ class Linear[T: ClassTag]( | |||
|
|
|||
| if (input.dim() == 1) { | |||
| if (scaleW != 0) { | |||
| gradWeight.addr(ev.fromType[Double](scaleW), gradOutput, input) | |||
| if (zeroGradFlag) { | |||
| gradWeight.addr(ev.fromType[Double](0.0), gradOutput, ev.fromType[Double](scaleW), input) | |||
yiheng
Sep 25, 2017
Contributor
ev.zero
ev.zero
| } | ||
| } | ||
| else if (input.dim() == 2) { | ||
| if (scaleW != 0) { | ||
| gradWeight.addmm(ev.fromType[Double](scaleW), gradOutput.t, input) | ||
| if (zeroGradFlag) { | ||
| gradWeight.addmm(ev.zero, |
yiheng
Sep 26, 2017
Contributor
why, can you dig deeper? These two methods call the same method with only one parameter difference
why, can you dig deeper? These two methods call the same method with only one parameter difference
yiheng
Sep 26, 2017
Contributor
You may need to run some micro-benchmark on this method
You may need to run some micro-benchmark on this method

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

What changes were proposed in this pull request?
tensor.fill(0)when thezeroGradParametersis called. Instead, only a flag namedzeroGradFlagis set to betrueand then this method return instantly.accGradParameters. The different MKL math functions are called with different parameters according tozeroGradFlag.zeroGradFlagis set tofalseafter eachaccGradParameters.How was this patch tested?
unit tests