Neurons vs Weights Pruning in Artificial Neural Networks

Andrey Bondarenko, Arkady Borisov, Ludmila Alekseeva

Abstract


Artificial neural networks (ANN) are well known for their good classification abilities. Recent advances in deep learning imposed second ANN renaissance. But neural networks possesses some problems like choosing hyper parameters such as neuron layers count and sizes which can greatly influence classification rate. Thus pruning techniques were developed that can reduce network sizes, increase its generalization abilities and overcome overfitting. Pruning approaches, in contrast to growing neural networks approach, assume that sufficiently large ANN is already trained and can be simplified with acceptable classification accuracy loss.

Current paper compares nodes vs weights pruning algorithms and gives experimental results for pruned networks accuracy rates versus their non-pruned counterparts. We conclude that nodes pruning is more preferable solution, with some sidenotes.


Keywords


artificial neural networks; generalization; overfitting; pruning

Full Text:

PDF

References


A.-Krizhevsky, I. Sutskever, G. Hinton. “ImageNet Classification with Deep Convolutional Neural Networks”, Advances in Neural Information Processing Systems 25 (NIPS 2012), 2012.

G. Hinton, L. Deong, D. Yu, G. Dahl and others, “Deep Neural Networks for Accoustic Modelling in Speech Recognition”. IEEE Signal Processing Magazine, November, 2012.

X. Qiang, G. Cheng, Z. Wang, “An Overview of Some Classical Growing Neural Networks and New Developments”, IEEE, Education Technology and Computer (ICETC), 2nd International confernece Vol.3. 2010.

V. Chaudhary, A.K. Ahlawat, R.S. Bhatia, “Growing Neural Networks using Soft Competitive Learning”. International Journal of Computer Applications (0975-8887) Volume 21- No.3, May 2011.

R. Reed, “Pruning Algorithms – A Survey”, IEEE Transactions on Neural Networks, Vol.4., No.5., September 1993.

M. C. Mozer and P. Smolensky, "Skeletonization: A Techique for Trimming the Fat From a Network via Relevance Assessment," in Advances in Neural Information Processing, pp. 107-115, (Denver 1988), 1989.

B. E. Segee and M. J. Carter, "Fault Tolerance of Pruned Multilayer Networks," in Proc. Int. Joint Conf. Neural Networks, Vol. 2., (Seattle), pp.447-452, 1991.

E. D. Karnin, "A Simple Procedure For Pruning Back-Propagation Trained Neural Networks", IEEE Trans. Neural Networks, Vol. 1., No. 2, pp.239-242, 1990.

R. Setiono and H. Liu, "Understanding Neural Networks via Rule Extraction," IJCAI, 1995.

R. Setiono and W. H. Leow, "Pruned Neural Networks for Regression" in PRICAI 2000 Topics in Artificial Intelligence, Lecture Notes in Computer Science Vol. 1886, 2000, pp. 500-509.

Y. Le Cun, J. S. Denker, and S. A. Solla, "Optimal Brain Damage," in Advances in Neural Information Processing (2), D.S. Touretzky Ed. (Denver 1989), 1990, pp. 598-605.

B. Hassibi, D. G. Stork, G. J Wolf, "Optimal Brain Surgery and General Network Pruning.”

Y. Chauvin, “A Back-Propagation Algorithm With Optimal Use of Hidden Units” Advances in Neural Information Processing, (1) D.S. Touretzky ed. (Denver 1998), 1989, pp. 519-526.

A. S. Weigend, D. E. Rumelhart, and B. A. Huberman, "Back-Propagation, Weight Elimination and Time Series Prediction," in Proc. 1990 Connectionist Models Summer School, D. Touretzky, J Elman, T. Sejnowsky, and G. Hinton, Eds., 1990, pp. 105-116.

A. S. Weigend, D. E. Rumelhart, and B. A. Huberman, "Generalization by Weight-Elimination Applied to Currency Exchange Rate Prediction," in Proc. Int. Joint Conf. Neural Networks, vol. I, (Seattle), 1991, pp.837-841.

A. S. Weigend, D. E. Rumelhart and B. A. Huberman, "Generalization by Weight-Elimination With Application to Forecasting," in Advances in Neural Information Processing (3) R. Lippmann, J. Moody, and D. Touretzky, Eds., 1991, pp. 875-882.

C. Ji, R. R. Snapp, and D. Psaltis, "Generalizing Smoothness Constraints From Discreet Samples," Neural Computation, Vol. 2, No. 2, 1990, pp.188-197.

D. C. Plaut, S. J. Nowlan, and G E. Hinton, “Experiments on Learning by Back Propagation,” Tech. Rep. CMU-CS-86-126, Carnegie Mellon Univ., 1986.

S. J. Nowlan, and G. E. Hinton, "Simplifying Neural Networks by Soft Weight-Sharing," Neural Computation Vol. 4, No. 4, 1992, pp. 473-493.

L. Prechelt, “Adaptive Parameter Prunning in Neural Networks,” International Computer Science Institute, March. 1995.

W. Finnoff, F. Hergert, and H. G. Zimmermann, “Improving Model Selection by Nonconvergent Methods”, Elsiever Neural Networks, Vol. 6, Issue 6, 1993, pp.771-783.

J. K. Kruschke, “Creating Local and Distributed Bottlenecks in Hidden Layers of Back-Propagation Networks,” in Proc. 1988 Connectionist Models Summer School, D. Touretzky, G. E. Hinton, and T. Sejnowsky, Eds., 1988, pp 120-126.

J,. K. Kruschke, “Improving Generalization in Back-Propagation Networks with Distributed Bottlenecks,” in Proc. Int. Joint Conf. Neural Networks, Washington DC, Vol. 1, 1989, pp.443-447.

A. Bondarenko, A. Borisov, “Neural Networks Generalization and Simplification via Prunning”, Scientific Journal 2014 of Riga Technical University, 2014.

K. Bache, M. Lichman, (2013), UCI Machine Learning Repository [http://archive.ics.uci.edu/ml], Irvine, CA: University of California, School of Information and Computer Science (last accesed Sept 15, 2014).

P. Golik, P. Doetsch, and H. Ney, “Cross-Entropy vs. Squared Error Training: a Theoretical and Experimental Comparison”, in Interspeech, pp. 1756-1760, Lyon, France, August 2013.




DOI: http://dx.doi.org/10.17770/etr2015vol3.166

Refbacks

  • There are currently no refbacks.