Skip to content

Conversation

@mamgainanoop
Copy link

@atheiman
I made a few changes to the code. Took out the command block out of the if block .
Added few more conditions to the elif statements because few functionalities do not exist for few services.

stale_config_only = select_from_items('Restart roles on nodes with stale config? Default is false', options) # Needs to be a user selection

# rr_command = service_rr.rolling_restart(batch_size, fail_count_threshold, None, stale_config_only, None, None, None).wait()
# return rr_command
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented for now since I need to confirm which service can be given a shot before executing this.
Here I am getting few inputs from the user to set the parameters as per their wish. The ones I have passed as None will be defaulted to whatever they are.
Still a test needs to be done on this. But once this piece is merged, dont wanna take risk to lose any code . lol

import os
import sys

import smtplib
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont see this being used anywhere..? i assume you're planning to send update emails

@atheiman
Copy link
Owner

can you rename the script to cm_service_manager.py?


elif action == defined_actions[2].lower() and ('hue' in service.name.lower() or 'hive' in service.name.lower() or \
'oozie' in service.name.lower() or 'spark' in service.name.lower()): # RR unavailable for these
print "Rolling Restart is unavailable in " + service.name.upper()
Copy link
Owner

@atheiman atheiman Jul 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i would rather just call it and then catch the exception - i think that's more pythonic.
edit: actually yea do this then you get rid of the if here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct and agree with that . I am making the script bit descriptive right now and once the functionalities work as expected , we can put the exception handling and reduce any extra lines of code.

@mamgainanoop
Copy link
Author

Renamed to cm_service_manager.py

elif action == defined_actions[1].lower(): # Deploy client configs
command = service.deploy_client_config().wait()

elif action == defined_actions[2].lower() and ('hue' in service.name.lower() or 'hive' in service.name.lower() or \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id rather you get rid of the if here with checking the service name against a list like i mentioned below https://github.com/atheiman/cm-automation-scripts-devops-hadoop/pull/2/files#r127274598

but for future reference i think this would work better

elif action == defined_actions[2].lower() and service_name in ['hue', 'hive', 'spark', 'oozie']
    print "Rolling Restart is unavailable in " + service_name
    # ....
elif action == defined_actions[2].lower() and service_name in ['hdfs', 'mapreduce', 'hbase', 'yarn', 'zookeeper']
    print "Rolling Restart is unavailable in " + service_name

elif action == defined_actions[1].lower(): # Deploy client configs
command = service.deploy_client_config().wait()

elif action == defined_actions[2].lower() and ('hue' in service.name.lower() or 'hive' in service.name.lower() or \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still rather just call it and then catch the exception - its more pythonic and it handles the services that you havent listed here. you shouldnt be listing out all these services like this.

you can even make it so that if rolling restart is selected and not available then you just do a restart

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that does make sense. Cool, I will see what I can work out on these.
I was also feeling weird to code all these services explicitly.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simple example trying to illustrate what im describing

service = 'hive'

try:
    print "executing rolling restart on service '%s'" % service
    service.rolling_lower() # think of this as rolling_restart()
except AttributeError:
    print "rolling restart not available on service '%s'" % service
    print "executing restart on service '%s'" % service
    service.lower() # think of this as restart()
~ $ ~/tmp/test.py
executing rolling restart on service 'hive'
rolling restart not available on service 'hive'
executing restart on service 'hive'

print "WARNING: Rolling Restart is not available for '%s' " % service.name.lower()
print "Executing Restart on service '%s' " % service.name.lower()
rr_command = service.restart().wait()
return rr_command
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the try block.
But one thing that I think is messing up would be : when service = 'hive/oozie/hue', then the batch_size etc would still be asked to input from the user, which would be unnecessary and which I do not want.
So i was thinking , inside of try block, can a single 'if' statement be used to verify the service and raise exception if service is one those? (I have commented out in line 46-47). This way if the service doesn't support RR, then user would not need to unnecessarily input those details and except block will be executed.

server.quit()
except:
print "Invalid email information. Please check SMTP server info or mailing address"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mail block added. Mails are being sent successfully but not the the DL.
I still need to modify and generalize this part. Currently its a bit hard coded.

@atheiman
Copy link
Owner

I guess I wasn't clear on this

I still want the user to choose deploy client config, rolling restart, or restart. The logic I'm describing to you is for falling back to restart when a user selects rolling restart but it's not available. You're right, a user shouldn't select rolling restart for a non-distributed service.

I was suggesting this try catch logic because it is a better approach than the checking you were doing for "can this service do a rolling restart".

So yes, have the user select restart / rolling restart / deploy client config. then if it is rolling restart, try to run what the user selected. But if it fails then just restart.

If you want to leave that out I'm fine with it. But definitely don't define the list of services that are eligible for certain actions.

Actually - probably the very best solution would be to get the list of actions available to the service the user chooses, then offer up the actions available on that service to the user (this is how the cm ui does it).

server = smtplib.SMTP(server, 25) # SMTP server at port 25

from_email = '[email protected]'
server.sendmail(from_email, '[email protected], [email protected]', message) # Send mail
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any configuration in this repo needs to be, well, configurable.
This is for use outside of just Cerner.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, i have to make it dynamic. the user mail id and receiver id both will be asked at runtime.

@mamgainanoop
Copy link
Author

Regarding the try block. I have applies those changes already :)
So the current block runs like this The user selects RR and it goes to the rolling_restart method. There I have applied the try block , and in case there is an exception due to any non-distributed service, then except block executes and the service just restarts.

I had made these changes today, but had only one thing in my mind, that in case the user selects Hue (or oozie etc) , then the try block will definitely execute till the actual api method call, which includes the user inputs for number of nodes in batch, stale config etc.
So i need to find a way that the user should not get these input requests for batch number, stale config etc in case its a non-distributed service.

@mamgainanoop
Copy link
Author

@atheiman , need review on this sir

@atheiman
Copy link
Owner

atheiman commented Jul 27, 2017

Copy link
Owner

@atheiman atheiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our back and forth has been a little slow (mostly because ive been super busy).

Im gonna rebase on these changes and make a few adjustments and then push that rather than merging this guy. youll still have the commit here tho.

stale_config_only = select_from_items('option to Restart roles on nodes with stale config?(Default is false) \n', options) # Needs to be a user selection

rr_command = service_rr.rolling_restart(batch_size, fail_count_threshold, None, stale_config_only, None, None, None).wait()
except:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont blindly catch exceptions - if youre gonna have a try / except narrow down the exception you are looking for.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, agreed. I tried the attribute exception, but that didn't work tho.
Need to find out the exact one

try:
print "Deploying Client Configurations on service '%s'... \n" % service.name.lower()
command = service.deploy_client_config().wait()
except:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here, handle specific exceptions

rr_command = service_rr.rolling_restart(batch_size, fail_count_threshold, None, stale_config_only, None, None, None).wait()
except:
print "WARNING: Rolling Restart is not available for '%s'. Exiting... " % service.name.lower()
print "Executing Restart on service '%s' " % service.name.lower()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets not fall back to hard restart from rolling restart. we'll just error out

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error for what @atheiman ? Didn't get that

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right below this you do service.restart()

id rather just call service.rolling_restart() and the api call will fail (error out)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But wasn't that the plan? Where the services which do not have RR functionality, will be just restarted over here.
I am coming to this , since the user is selecting that particular service and aims to cycle it.
So instead, can it be like follows:

User selects a service which doesn't support RR, so it will error out in the try block.
When it goes to the except block, that time we give a prompt "Instead do a Normal restart? Yes or No"
The user has the option here only to continue or not. Instead of starting from the beginning.

options = ["true", "false"] # options for stale config
stale_config_only = select_from_items('option to Restart roles on nodes with stale config?(Default is false) \n', options) # Needs to be a user selection

rr_command = service_rr.rolling_restart(batch_size, fail_count_threshold, None, stale_config_only, None, None, None).wait()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i want to look at the doc to see what all these args are. can you link me?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fail_count_threshold = raw_input("Stop rolling restart if more than this number of worker batches fail \n")

options = ["true", "false"] # options for stale config
stale_config_only = select_from_items('option to Restart roles on nodes with stale config?(Default is false) \n', options) # Needs to be a user selection
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this wording is a bit misleading... maybe:

Restart roles only on instances with stale config?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took from CM, but yea this looks good

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh so i am misinterpretting it then i think. true is yes do include stale config instances, false is restart on every instance except those with stale config?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the doc:

@param stale_configs_only: Restart roles with stale configs only. Default is false.

https://github.com/cloudera/cm_api/blob/master/python/src/cm_api/endpoints/services.py#L1458

so i was understanding it correctly. we'll want to prompt with Restart roles with stale configs only? yes (true) or no (false)

Copy link
Author

@mamgainanoop mamgainanoop Jul 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup correct.
Option yes(true) will restart the roles only on the nodes having stale configs
Option No(false), which is default, will restart the roles on all the nodes.

@atheiman
Copy link
Owner

atheiman commented Aug 2, 2017

Im not having time to look at this RN, but I'll merge because @mamgainanoop confirmed he has used this to rolling restart.

@atheiman atheiman merged commit c109c8b into atheiman:master Aug 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants