why multiply by action and use reduce sum instead of argmax?
why multiply by action and use reduce sum instead of argmax?