我有以下shell脚本,可让我启动我的rails应用程序,假设它称为 start-app.sh :
#!/bin/bash cd /var/www/project/current . /home/user/.rvm/environments/ruby-2.3.3 RAILS_SERVE_STATIC_FILES=true RAILS_ENV=production nohup bundle exec rails s -e production -p 4445 > /var/www/project/log/production.log 2>&1 &
上面的文件具有以下权限:
-rwxr-xr-x 1 user user 410 Mar 21 10:00 start-app.sh*
如果我想检查过程,请执行以下操作:
ps aux | grep -v grep | grep ":4445"
它会给我以下输出:
user 2960 0.0 7.0 975160 144408 ? Sl 10:37 0:07 puma 3.12.0 (tcp://0.0.0.0:4445) [20180809094218]
PS: 我使用grep“:4445”的原因是因为我在不同端口上运行的进程很少。(针对不同的项目)
现在进入monit,我使用apt-get进行安装,而repo的最新版本是 5.16 ,因为我在 Ubuntu 16.04 上运行,还请注意monit以root身份运行,这就是为什么我在以下。(因为启动脚本以前是从“用户”而不是“根”执行的)
这是monit的配置:
set daemon 20 # check services at 20 seconds interval set logfile /var/log/monit.log set idfile /var/lib/monit/id set statefile /var/lib/monit/state set eventqueue basedir /var/lib/monit/events # set the base directory where events will be stored slots 100 # optionally limit the queue size set mailserver xx.com port xxx username "xx@xx.com" password "xxxxxx" using tlsv12 with timeout 20 seconds set alert xx@xx.com set mail-format { from: xx@xx.com subject: monit alert -- $EVENT $SERVICE message: $EVENT Service $SERVICE Date: $DATE Action: $ACTION Host: $HOST Description: $DESCRIPTION } set limits { programOutput: 51200 B sendExpectBuffer: 25600 B fileContentBuffer: 51200 B networktimeout: 10 s } check system $HOST if loadavg (1min) > 4 then alert if loadavg (5min) > 2 then alert if cpu usage > 90% for 10 cycles then alert if memory usage > 85% then alert if swap usage > 35% then alert check process nginx with pidfile /var/run/nginx.pid start program = "/bin/systemctl start nginx" stop program = "/bin/systemctl stop nginx" check process redis matching "redis" start program = "/bin/systemctl start redis" stop program = "/bin/systemctl stop redis" check process myapp matching ":4445" start program = "/bin/bash -c '/home/user/start-app.sh'" as uid "user" and gid "user" stop program = "/bin/bash -c /home/user/stop-app.sh" as uid "user" and gid "user" include /etc/monit/conf.d/* include /etc/monit/conf-enabled/*
现在,monit正在检测并警告我该进程何时关闭(如果我手动将其杀死)以及何时进行手动恢复,但它不会自动启动该Shell脚本..根据/var/log/monit.log,它显示以下内容:
[UTC Aug 13 10:16:41] info : Starting Monit 5.16 daemon [UTC Aug 13 10:16:41] info : 'production-server' Monit 5.16 started [UTC Aug 13 10:16:43] error : 'myapp' process is not running [UTC Aug 13 10:16:46] info : 'myapp' trying to restart [UTC Aug 13 10:16:46] info : 'myapp' start: /bin/bash [UTC Aug 13 10:17:17] error : 'myapp' failed to start (exit status 0) -- no output
到目前为止,当monit尝试执行脚本时,我看到的是它尝试加载它(我可以使用 ps aux | grep -v grep | grep“:4445” 不到3秒看到它,但是此输出与以上显示了我的输出,它显示了正在执行的Shell脚本的内容,特别是以下内容:
blablalba... nohup bundle exec rails s -e production -p 4445
然后它消失了。然后它会尝试重新执行外壳程序。.一次又一次…我缺少什么,我的配置有什么问题?请注意,我无法更改 start-app.sh中的 任何内容,因为它已投入生产并且可以100%运行。(我只想监视它)
编辑:根据我的理解和经验,这似乎是环境变量问题或路径问题,但是我不确定如何解决它,将env变量放在monit中没有任何意义..如果有人其他人想要编辑该shell脚本或添加新的东西吗?我希望你明白我的意思
如我所料,这是用户环境问题,我通过编辑监视配置来解决此问题,如下所示:
之前(不工作)
check process myapp matching ":4445" start program = "/bin/bash -c '/home/user/start-app.sh'" as uid "user" and gid "user" stop program = "/bin/bash -c /home/user/stop-app.sh" as uid "user" and gid "user"
之后(工作)
check process myapp matching ":4445" start program = "/bin/su -s /bin/bash -c '/home/user/start-app.sh' user" stop program = "/bin/su -s /bin/bash -c '/home/user/stop-app.sh' user"
说明:我从monit中删除了 (uid和gid)作为“用户” ,因为它 只会 以“用户”的名义执行shell脚本,但不会获取/导入/使用 用户的环境 路径或环境变量。