Research on Approximate Dynamic Programming
and
Control of Queueing Networks with Re-entrant Lines


Researchers:

José A. Ramírez-Hernández
(email: ramirejs@ececs.uc.edu)

Dr. Emmanuel Fernandez
(email: emmanuel@ececs.uc.edu)


Approximate Dynamic Programming/Reinforcement Learning
It is well known that Bellman’s curse of dimensionality preclude the direct application of the Dynamic Programming algorithm, especially when both the state and action spaces become too large. Approximate Dynamic Programming (ADP), also known as Reinforcement Learning, is an optimization approach conceived to deal with such type of scenarios.  ADP algorithms provide an alternative to obtain near-optimal control policies in systems with high dimensional state and action spaces. By using simulation models instead of analytical ones, ADP algorithms provide reasonable approximations of the optimal cost functions, and consequently, approximations of the optimal control policies. There is a wide spectrum of ADP algorithms that vary from methods based on lookup table representations of the so-called optimal Q-factors, such as Q-learning, to those that utilize parametric functions to approximate both the optimal cost and optimal policies. ADP approaches have been developed for different types of optimization performances including total cost, discounted cost, and average cost criterion, as well as for Markov and semi-Markov decision problems.



Current Research and Future Work
Our research on ADP has focused on the control optimization problem of simple benchmark queueing networks with re-entrant lines and under the discounted cost criterion. Different experiments have been performed with several ADP algorithms including Q-learning, SARSA(lambda), Temporal Difference Learning or TD(lambda), and Optimistic Policy Iteration algorithms with linear approximation structures adjusted with a gradient-descent approach. Experiments with these algorithms have demonstrated that the ADP algorithms are able to provide close approximations to the optimal policies. This has also served as a proof of concept in the applicability of ADP in the optimization of control policies in re-entrant lines. Moreover, these experiments have allowed the identification of key features of certain ADP algorithms that seem to be more amenable for large scale re-entrant lines.

Thus, our future work will encompass the design, implementation, and testing of ADP algorithms that may be suitable for large scale and more realistic re-entrant line models. Doing so could provide opportunities for the application of ADP in industrial settings such as in semiconductor manufacturing.


Virtual Depository of DP and ADP Programming Tools
For programming tools developed for DP and ADP please click here.



Some Publications on Fundamentals of ADP/Reinforcement Learning:

D. P. Bertsekas, "Neuro-Dynamic Programming," pdf format version, Encyclopedia of Optimization, Kluwer, 2001. (pdf file).

D.P. Bertsekas, presentation: "Neuro-Dynamic Programming: An Overview."
(pdf file).

B. Van Roy, "Neuro-Dynamic Programming: Overview and Recent Trends," in Handbook of Markov Decision Processes: Methods and Applications, edited by E. Feinberg and A. Shwartz, Kluwer, 2001. (pdf file).

D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Athena Scientific, Bellmont, MA, 1996.

R.S. Sutton and A.G. Barto, "Reinforcement Learning: An Introduction",  MIT Press, Cambridge, MA, 1998.

J. N. Tsitsiklis and B. Van Roy, "An Analysis of Temporal–Difference Learning with Function Approximation." IEEE Transactions on Automatic Control, 42(5):674–690, 1997. (pdf file).


Some Lectures on ADP and a List of Some Applications:


Neuro-Dynamic Programming (power point presentation)

Application Examples



Additional Literature on ADP and Control of Queueing Networks with Re-entrant Lines:

Control of Re-entrant lines/Queueing Networks ::top::

[1] L. M. Wein, Scheduling semiconductor wafer fabrication, IEEE Transactions on Semiconductor Manufacturing, vol. 1, pp. 115-130, 1988.

[2] P. R. Kumar, Re-entrant lines, Queueing Systems: Theory and Applications, vol. 13, pp. 87-110, 1993.

[3] R. Uzsoy, C. Lee, and L. A. Martin-Vega, A review of production planning and scheduling models in the semiconductor industry part II: Shop-floor control, IIE Transactions, vol. 26, no. 6, pp. 44-55, 1994.

[4] P. R. Kumar and S. P. Meyn, Stability of queueing networks and scheduling policies, IEEE Transactions on Automatic Control, vol. 40, no. 2, pp. 251-260, 1995.

[5] S. Kumar and P. R. Kumar, Queueing network models in the design and analysis of semiconductor wafer fabs, IEEE Transactions on Robotics and Automation, vol. 17, no. 5, pp. 548-561, 2001.

[6] R.-R. Chen and S. Meyn, Value iteration and optimization of multiclass queueing networks, Queueing Systems, vol. 32, pp. 65-97, 1999.

[10] C. H. Papadimitriou and J. N. Tsitsiklis, The complexity of optimal queueing network control, Mathematics of Operations Research, vol. 24, no. 2, pp. 293-305, 1999.

[16] J. A. Ramírez-Hernández and E. Fernandez, Optimal job sequencing in a benchmark reentrant line with finite capacity buffers, in Proceedings of the 17th International Symposium on Mathematical Theory of Networks and Systems, Kyoto, Japan, July 24-28, 2006, pp. 1214-1219.

[17] , Optimal job releasing and sequencing for a reentrant manufacturing line with finite capacity buffers, in Proceedings of the 45th IEEE Conference on Decision and Control, San Diego, CA, December 13-15 2006, pp. 6654-6659.

[23] J.-B. Suk and C. G. Cassandras, Optimal control of a storage-retrieval queuing system, in Proceedings of the 28th IEEE Conference on Decision and Control, 1989, pp. 1093-1098.

[28] S. S. Panwalker and W. Iskander, A survey of scheduling rules, Operation Research, vol. 25, pp. 45-61, 1977.

[29] P. R. Kumar, Scheduling semiconductor manufacturing plants, IEEE Control Systems Magazine, vol. 39, no. 11, pp. 33-40, 1994.

[30] C. H. Lu, D. Ramaswamy, and P. R. Kumar, E cient scheduling policies to reduce mean and variance of cycle-time in semiconductor manufacturing plants, IEEE Transactions on Semiconductor Manufacturing, vol. 7, no. 3, pp. 374-388, 1994.

[31] Y. Shen and C., Stochastic wafer fabrication scheduling, IEEE Transactions on Semiconductor Manufacturing, vol. 16, no. 1, pp. 2-14, 2003.

[32] F. D. Vargas-Villamil, Hierarchical production optimization and inventory control of semiconductor reentrant manufacturing lines, Ph.D. Dissertation, Arizona State University, Tempe, AZ, December 1998.

[34] R. Alvarez-Vargas, Y. Dallery, and R. David, A study of continuous ow model of production lines with unreliable machines and finite buffers, Journal of Manufacturing Systems, vol. 13, no. 3, pp. 221-234, 2001.

[35] J. W. M. Bertrand and J. C. Wortmann, Production Control and Information Systems of Component-Manufacturing Shops. Amsterdam: Elsevier Publishing Company, 1981.

[36] F. D. Vargas-Villamil, D. E. Rivera, and K. G. Kempf, A hierarchical approach to production control of reentrant semiconductor manufacturing lines, IEEE Transactions on Control Systems Technology, vol. 11, no. 4, pp. 578-587, July 2003.

[37] B. Hajek, Optimal control of two interacting service stations, IEEE Transactions on Automatic Control, vol. 29, no. 6, pp. 491-499, June 1984.

[38] J. Walrand, An Introduction to Queueing Networks. Englewood Cli s, NJ: Prentice Hall, 1988.

[39] C. R. Glassey and J. G. Shanthikumar, Linear control rules for production control of semiconductor fabs, IEEE Transactions on Semiconductor Manufacturing Systems, vol. 9, no. 4, pp. 536-549, 1996.

[40] L. M. Wein, Scheduling networks of queues: heavy traffic analysis of two-station network with controllable inputs, Operations Research, vol. 38, no. 6, pp. 1065-1078, 1990.

[41] J. W. Fowler, G. L. Hogg, and S. J. Mason, Workload control in the semiconductor industry, Production Planning & Control, vol. 13, no. 7, pp. 568-578, 2002.

[42] M. L. Spearman, D. L. Woodru , and W. J. Hopp, CONWIP: a pull alternative to kanban, International Journal of Production Research, vol. 28, no. 5, pp. 879-894, 1990.

[43] M. Land and G. Gaalman, Workload control concepts in job shops: A critical assessment, International Journal of Production Economics, vol. 45-47, pp. 535-548, 1996.

[44] C. R. Glassey and M. G. C. Resende, Closed-loop job release control for VLSI circuit manufacturing, IEEE Transactions on Semiconductor Manufacturing, vol. 1, no. 1, pp. 36-46, 1988.

[45] J. M. Harrison, Dynamic scheduling of a multiclass queue: Discount optimality, Operations Research, vol. 23, no. 2, pp. 270-282, 1975.

[46] J. S. Baras, A. J. Dorsey, and A. M. Makowski, Two competing queues with linear costs and geometric service requirements: the ¹c-rule is often optimal, Advances of Applied Probability, vol. 17, pp. 186-209, 1985.

[47] J. S. Baras, D. J. Ma, and A. M. Makowski, K competing queues with geometric service requirements and linear costs: The ¹c-rule is always optimal, Systems & Control Letters, vol. 6, pp. 173 180, 1985.

[48] C. Buyukkoc, P. Varaiya, and J. Walrand, The $\mu$c-rule revisited, Advances of Applied Probability, vol. 17, pp. 237 -238, 1985.

[49] S. Sethi, W. Suo, M. Taksar, and H. Yan, Optimal production planning in a multi-product stochastic manufacturing system with long-run average cost, Discrete Event Dynamic Systems: Theory and Applications, vol. 8, pp. 37-54, 1998.

[50] X. Chao, On klimov's model with two job classes and exponential processing times, Journal of Applied Probability, vol. 30, pp. 716-724, 1993.

[51] J. Qiu and R. Loulou, Multiproduct production inventory control under random demands, IEEE Transactions on Automatic Control, vol. 40, no. 2, pp. 350-356, 1995.

[52] M. H. Veatch and L. M. Wein, Scheduling a make-to-stock queue: Index policies and hedging points, Operations Research, vol. 44, no. 4, pp. 634-647, 1996.

[53] A. Y. Ha, Optimal dynamic scheduling policy for a make-to-stock production system, Operations Research, vol. 45, no. 1, pp. 42-53, 1997.

[54] S. Carr and I. Duenyas, Optimal admission control and sequencing in a make-to-stock/maketo-order production system, Operations Research, vol. 48, no. 5, pp. 709-720, 2000.

[55] B. L. Miller, A queueing reward system with several customer classes, Management Science, vol. 16, no. 3, pp. 234 -245, 1969.

[81] J.-W. Breithaupt, M. Land, and P. Nyhuis, The workload control concept: theory and practical extensions of load oriented order release, Production Planning & Control, vol. 13, no. 7, pp. 625-638, 2002.

[82] L. M. Roderick, D. T. Phillips, and G. L. Hogg, A comparison of order release strategies in production control systems, International Journal of Production Research, vol. 30, no. 3, pp. 611-626, 1992.

[83] S. T. Enns and M. Prongué Costa, The e ectiveness of input control based on aggregate versus bottleneck work loads, Production Planning & Control, vol. 13, no. 7, pp. 614-624, 2002.

[84] Z. Rosberg, P. P. Varaiya, and J. C. Walrand, Optimal control of service in tandem queues, IEEE Transactions on Automatic Control, vol. 27, no. 3, pp. 600-610, 1982.

[85] R. Uzsoy, C. Lee, and L. A. Martin-Vega, A review of production planning and scheduling models in the semiconductor industry part ii: shop-foor control, IIE Transactions, vol. 24, pp. 45-55, 1994.

[86] B. Ehteshami, R. G. Pétrakian, and P. M. Shabe, Trade-o s in cycle time management: Hot lots, IEEE Transactions on Semiconductor Manufacturing, vol. 5, no. 2, pp. 101-106, 1992.

[87] Y. Narahari and L. M. Khan, Modeling the e ect of hot lots in semiconductor manufacturing systems, IEEE Transactions on Semiconductor Manufacturing, vol. 10, no. 1, pp. 185-188, 1997.

[88] A. Gupta, V. Ganesan, and A. Sivakumar, Hot lot management: Minimizing cycle time in batch processes, in Proceedings of International Engineering Management Conference, 2004, pp. 1217-1221.

[89] M. Venables, Small is beautiful: small low volume semiconductor manufacturing plants, IE Review, pp. 26 27, March 2005.

[90] R. Sturm, J. Dorner, K. Reddig, and J. Seidelmann, Simulation-based evaluation of the ramp-up behavior of waferfabs, in Advanced Semiconductor Manufacturing Conference and Workshop, March 31st-April 1st 2003, pp. 111-117.

[91] S. Sethi, W. Suo, M. Taksar, and H. Yan, Optimal production planning in a multi-product stochastic manufacturing system with long-run average cost, Discrete Event Dynamic Systems: Theory and Applications, vol. 8, pp. 37-54, 1988.

[92] D. Benavides, J. R. Duley, and B. E. Johnson, As good as it gets: Optimal fab design and deployment, IEEE Transactions on Semiconductor manufacturing Systems, vol. 12, no. 3, pp. 281-287, 1999.

[93] Y. Mikata, K. Tanimoto, and S. Oka, Improvement of mini-fab uptime by using multitask and multifunctional tools, IEEE Transactions on Semiconductor Manufacturing Systems, vol. 17, no. 3, pp. 273-279, 2004.


Simulation Tools/Simulation of Semiconductor Manufacturing Systems
::top::

[13] J. A. Ramírez-Hernández, H. Li, E. Fernandez, C. R. McLean, and S. Leong, A framework for standard modular simulation in semiconductor wafer fabrication systems, in Proceedings of the 2005 Winter Simulation Conference, Orlando, FL, December 2005, pp. 2162-2171.

[14] Semiconductor Industry Association (SIA). (2006) International Technology Road Map for Semiconductors (ITRS) 2005. [Online]. Available: http://public.itrs.net

[95] W. D. Kelton, R. P. Sadowski, and D. T. Sturrock, Simulation with Arena, 4th ed. USA: McGraw-Hill, 2006.


Dynamic Programming
::top::

[7] R. E. Bellman, Dynamic Programming. Pricenton: Princeton University Press, 1957.

[98] D. P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models. Englewood Cli s, NJ: Prentice-Hall, 1987.

[33] D. P. Bertsekas, Dynamic Programming and Optimal Control, 2nd ed. Bellmont, MA: Athena Scienti c, 2000, vol. I.

[19] D. P. Bertsekas, Dynamic Programming and Optimal Control, 2nd ed. Bellmont, MA: Athena Scienti c, 2000, vol. II.

[79] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York, NY: John Wiley & Sons, Inc, 1994.

[75] S. P. Meyn, The policy iteration algorithm for average reward markov decision processes with general state space, IEEE Transactions on Automatic Control, vol. 42, no. 12, pp. 1663-1680, 1997.


Approximate Dynamic Programming and ADP for Control of Re-entrant Lines ::top::

[8] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Bellmont, MA: Athena Scienti c, 1996.

[9] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998.

[11] J. Si, A. G. Barto, W. B. Powell, and D. W. II, Eds., Handbook of Learning and Approximate Dynamic Programming. IEEE Press, Wiley-Interscience, 2004.

[12] A. Gosavi, Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning, 1st ed. Norwell, MA: Kluwer Academic Publishers, 2003.

J. A. Ramírez-Hernández and E. Fernandez, Control of a Re-entrant Line Manufacturing Model with a Reinforcement Learning Approach, in Proceedings of The Sixth International Conference on Machine Learning and Applications (ICMLA’07), Cincinnati, Ohio, December 13-15, 2007, pp. 330-335.

J. A. Ramírez-Hernández and E. Fernandez, An approximate dynamic programming approach for job releasing and sequencing in a reentrant manufacturing line, in Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2007), Honolulu, HI, April 2007, pp. 201-208.

J. A. Ramírez-Hernández and E. Fernandez, A Case Study in Scheduling Reentrant Manufacturing Lines: Optimal and Simulation-Based Approaches, in Proceedings of the 44th IEEE Conference on Decision and Control, Seville, Spain, December 12-15, 2005, pp. 2158-2163.

[78] J.-Y. Choi and S. Reveliotis, Relative value function approximation for the capacitated reentrant line scheduling problem, IEEE Transactions on Automation Science and Engineering, vol. 2, no. 3, pp. 285-299, 2005.

[77] C. Liu, H. Jin, Y. Tian, and H. Yu, A reinforcement learning approach to re-entrant manufacturing system scheduling, in Proceedings of ICII 2001 International Conferences on Info-tech and Info-net, vol. 3, Beijing, China, October 29th to November 1st 2001, pp. 280-285.

[27] G. G. Lendaris and J. C. Neidhoefer, Guidance in the Use of Adaptive Critics for Control: in Handbook of Learning and Approximate Dynamic Programming. IEEE Press, Wiley- Interscience, 2004, pp. 97-119.

[56] B. Van Roy, Handbook of Markov Decision Processes: Methods and Applications. Kluwer, 2001, ch. Neuro-Dynamic Programming: Overview and Recent Trends.

[57] D. P. Bertsekas, Dynamic programming and suboptimal control: A survey from ADP to MPC, European Journal of Control, vol. 11, no. 4-5, pp. 310-334, 2005.

[58] R. S. Sutton, Temporal credit assignment in reinforcement learning, Ph.D. thesis, University of Massachusetts, Amherst, Amherst, MA, 1984.

[59] , Learning to predict by the methods of temporal di erences, Machine Learning, vol. 3, pp. 9-44, 1988.

[60] G. J. Tesauro, Practical issues in temporal di erence learning, Machine Learning, vol. 8, pp. 257-277, 1992.

[61] , TDGammon, a self-teaching backgammon program, achieves masterlevel play, Neural Computation, vol. 6, no. 2, pp. 215-219, 1994.

[62] , Temporal di erence learning and TDGammon, Communications of the ACM, vol. 38, pp. 58-68, 1995.

[63] C. J. C. H. Watkins, Learning from delayed rewards, Ph.D. thesis, Cambridge University, Cambridge, UK, 1989.

[64] C. J. C. H. Watkins and P. Dayan, Q-learning, Machine Learning, vol. 8, pp. 279-292, 1992.

[67] B. Van Roy, Learning and value function approximation in complex decision processes, Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA, 1998.

[68] J. N. Tsitsiklis and B. Van Roy, An analysis of temporal-di erence learning with function approximation, IEEE Transactions on Automatic Control, vol. 42, no. 5, pp. 674-690, 1997.

[69] , Average-cost temporal-di erence learning, Automatica, vol. 35, no. 11, pp. 1799-1808, 1999.

[70] J. N. Tsitsiklis, Asynchronous stochastic approximation and Q-learning, Machine Learning, vol. 16, pp. 185-202, 1994.

[71] J. N. Tsitsiklis and B. Van Roy, Feature-based methods for large scale dynamic programming, Machine Learning, vol. 22, pp. 59-94, 1996.

[72] D. P. de Farias and B. Van Roy, On the existence of xed points for approximate value iteration and temporal di erence learning, Journal of Optimization Theory and Applications, vol. 105, no. 3, pp. 589 608, June 2000.

[94] G. A. Rummery, Problem solving with reinforcement learning, Ph.D. thesis, Cambridge University, 1995.


Uniformization/Queueing Systems with Exponential Servers
::top::

[21] S. A. Lippman, Applying a new device in the optimization of exponential queuing systems, Operations Research, vol. 23, pp. 687-710, 1975.

[22] R. Seforzo, An equivalence between discrete and continuous time markov decision processes, Operations Research, vol. 27, pp. 616-620, 1979.


Other Research Related to Semiconductor Manufacturing ::top::

[24] J. A. Ramírez-Hernández and E. Fernandez, An algorithm to convert wafer to calendar-based preventive maintenance schedules for semiconductor manufacturing systems, in Proceedings of the 42th IEEE Conference on Decision and Control, Maui, HI, December, pp. 5926-5931.

[25] J. A. Ramírez-Hernández, E. Fernandez, M. O'Connor, and N. Patel, Conversion of noncalendar to calendar-time based preventive maintenance schedules for semiconductor manufacturing systems, Journal of Quality Maintenance in Engineering, vol. 13, no. 3., pp. 259-275.

[26] J. Crabtree, J. A. Ramírez-Hernández, X. Yao, E. Fernandez, M. C. Fu, M. Janakiram, S. I. Marcus, M. O'Connor, and N. Patel, Optimal preventive maintenance scheduling in semiconductor manufacturing systems: Software tool & simulation case studies, submitted for publication to IEEE Transactions on Semiconductor Manufacturing, 2006.





| Home | Introduction | Research Projects | Articles | Links | Contact us |

SMITLab © 2001-2003
Webmaster