www.BinaryAlchemy.de :: View topic - Completed Jobs being reset after RR server crash
 SearchSearch   RegisterRegister  ProfileProfile   UsergroupsUsergroups   Log inLog in 
If you create a new post, please use a topic that describes your problem
Documento sin título
 
Completed Jobs being reset after RR server crash

 
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    www.BinaryAlchemy.de Forum Index -> old - RR Questions - v6.x
View previous topic :: View next topic  
Author Message

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Fri Mar 01, 2013 8:28 pm    Post subject: Completed Jobs being reset after RR server crash Reply with quote

Holger,

This is something that happened once a few years ago, and we had not seen in a long time. However, since last week it has happened to us 3 times.

The first 2 times were caused by a power outage in our building during the night, the UPS drained and the RR server lost power. Then today, the RR server application crashed on the server, and we had to restart it.

Our problem is that when RR is brought back online (after power outage or crash), then it will automatically restart random jobs that are already completed. The first thing it will also do is DELETE the existing (good) frames.

The only workaround we have at the moment is to clear out the old jobs after an outage/crash. Is there ANY way we can disable, or prevent this from occurring? We are often afraid to restart RR server, since we are loosing good frames, and render time.

(This may not be related, but when we look at the Jobs in the rrServer, many of the lines contain question marks ?? ?? ??)

Thanks.

[RR Ver. 6.02.08, Server Linux CentOS, Win7 clients]
Back to top
View user's profile Send private message Visit poster's website

schoenberger
Site Admin


Joined: 02 Mar 2005
Posts: 3786

PostPosted: Sat Mar 02, 2013 8:50 pm    Post subject: Reply with quote

Hi

There are now many areas that delete files.
The rrServer deletes:
- placeholder files with a size less than 160bytes
- <imagename>.lock files

The post-script sequence check will rename files to "broken" if the image cannot be read.

But otherwise, there is no function in the rrServer to delete frames.
I have myself restarted the server after a manual forced or development crash on our farm and the images have not been lost.
(we have 1000 jobs on the farm, 3/4 finished, they would kill me).

Do you use any custom pre or post-scripts or plugins?

The server placeholder and lock deletes are logged in the job log table.
Also, the job log includes a list of frames missing at every log entry.
Please check the log table of a job after the restart, is there any entry?



>This may not be related, but when we look at the Jobs in the rrServer, many of the lines contain question marks ???
There should not be question marks.
Please export the whole job database via rrControl menu debug and upload it via www.RoyalRender.de/upload.php
_________________
Holger Schönberger
Binary Alchemy - digital materialization
Back to top
View user's profile Send private message Send e-mail

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Tue Mar 05, 2013 6:06 pm    Post subject: Reply with quote

Hi,

Yes, we have custom post and done scripts in place, but nothing that would delete frames. We have a done/approve script that copies frames, but it's a copy, not a move.

We have already cleared out any jobs (and job logs) this happened to (we were in a panic since frames were deleting). But I did manage to see one, which showed that the job was complete, then it basically started it again, as if a "Reset and delete existing frames" had automatically occurred.

That is the only explanation i have, is that "somehow" the "reset and delete existing frames" code is being triggered automatically upon an ungraceful server shutdown. Although, with that said, we have had server crashes in the past, and all was fine.

Any ideas?
Back to top
View user's profile Send private message Visit poster's website

schoenberger
Site Admin


Joined: 02 Mar 2005
Posts: 3786

PostPosted: Tue Mar 05, 2013 6:33 pm    Post subject: Reply with quote

If there is a "reset and delete existing frames", then please take a look at the row with the machine name.
If it was the rrServer, then it should be named in that row.
What is displayed as machine name?

But there is no function in the source of the server that runs that command. But I can recheck.

Please select the job in rrControl.
Then execute "Collect debug information - selected job(s)" via the menu "Debug".
Please upload the zipped information via www.RoyalRender.de/upload.php
_________________
Holger Schönberger
Binary Alchemy - digital materialization
Back to top
View user's profile Send private message Send e-mail

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Tue Mar 05, 2013 7:13 pm    Post subject: Reply with quote

The log from those jobs are gone, but if this happens again I'll be sure to capture the log from one of those jobs to try and determine it's origin.

Yes, please double check if the server can issue a "reset and delete frames" command. Thanks.
Back to top
View user's profile Send private message Visit poster's website

pbillet



Joined: 24 May 2012
Posts: 155
Location/Company/Country: Paris/CGEV Studio/France

PostPosted: Wed Mar 13, 2013 6:30 pm    Post subject: Reply with quote

Hey, I second that.
i've had a crash today, and many jobs restarted (as I was on a remote site, I've been able to see what was happening, but not to discuss directly with the artist the jobs were re-running)

It is not the first time I notice that behaviour, after a restart or a crash.
I suspect it it linked to using "wait for approval" feature (post jobs won't delete anything anyway)

Most of our jobs stay for days in "wait for approval" state, because our post done job is to create avid dnxHD quicktime files for delivery, once shots are approved by vfx director (sometimes days after they are rendered).

I suspect some of those jobs, that were completed and "waiting for approval", to become active again, and for some reasons, to start rendering same frames again (though it is supposed to check for frame existence before doing any attempt..)

I can't really explain what happens in those cases, but I'm sure there is something too.

Will try to investigate more next time.

Rgds
Back to top
View user's profile Send private message

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Wed May 22, 2013 11:24 pm    Post subject: Reply with quote

I have an update to this...

Last week, we had another bad power outage. Once power was restored I restarted the RR server (linux rrServerconsole), and noticed yet again that some old jobs had reset themselves. I quickly selected "all jobs" and disabled them to halt this. We are now missing frames and many days worth of work to get them back.

We did manage to locate one of the jobs in RR that had been reset, with the frames removed. The log file has no record of the job being reset by the server. Below is the end of the log, which shows the job finished on May2, and then it received the "Disable & Abort" on May13, which is what I issued to all jobs after power was restored.

05.02 16:49:21 Render WINSTON Successful 179-205,1
05.02 16:49:47 Post Render RENDER08 Receives 0-213,1
05.02 16:49:59 Post Render RENDER08 Successful 0-213,1
05.02 16:49:59 Post Render RENDER08 Client Waits
05.13 11:07:02 Approval Wait -rrControl Disable & Abort


We have been users of RR since 2006, but this behavior of RR is extremely unacceptable. We are even considering switching our render manager because of this problem, since we are loosing days of work every time this occurs.
Back to top
View user's profile Send private message Visit poster's website

schoenberger
Site Admin


Joined: 02 Mar 2005
Posts: 3786

PostPosted: Thu May 23, 2013 12:33 pm    Post subject: Reply with quote

Hi

As I said, RR does not delete files if not requested.
I have myself terminated the server application many times without any issue or missing frames.
Perhaps there is a special configuration.

Do you still have that one job in the list?
So I can take a look via remote?

Please open a ticket in www.RoyalRender.de/support, then we can arrange a time.
_________________
Holger Schönberger
Binary Alchemy - digital materialization
Back to top
View user's profile Send private message Send e-mail

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Thu May 23, 2013 4:21 pm    Post subject: Reply with quote

ok, done. Ticket 418474.
Back to top
View user's profile Send private message Visit poster's website

schoenberger
Site Admin


Joined: 02 Mar 2005
Posts: 3786

PostPosted: Fri May 24, 2013 3:38 pm    Post subject: Reply with quote

@pbillet
Ok, I have done a few investigations on mdreams's farm.
As you I also had one thought that it could have something to do with jobs that are fully rendered, but not approved. As finished jobs are never touched, but these jobs are not set to finished.



Anyway, this is a server that is ONLY for debugging, not for running a long time. Just to find and fix the bug.
Instead of deleting files, it will print messages in the terminal (some of them into the saved render log RR/sub/server...).
www.RoyalRender.de/download/130524___6.02.12del.zip

Copy the new executable right beside your existing rrServersonsole in [RR]/bin/.../
The intention of this server is to shutdown the current rrServer and rrAutostartService.
Then open a terminal and execute the new rrServersonsole_del.
Check for "Command to delete file" lines during the startup.
_________________
Holger Schönberger
Binary Alchemy - digital materialization
Back to top
View user's profile Send private message Send e-mail

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Fri May 24, 2013 4:53 pm    Post subject: Reply with quote

We are rendering right now so I cannot stop our server. However, I will run the _del version as soon as I can find the opportunity, and let you know how that goes.
Back to top
View user's profile Send private message Visit poster's website

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Tue May 28, 2013 5:12 pm    Post subject: Reply with quote

Ok, so here is the latest...

Nothing was rendering this morning (5/2Cool, so I stopped our current rrServerconsole.exe, and launched the rrServerconsole_del.exe from the same folder.

Immediately, upon starting the server multiple jobs began rendering. We are not reporting any deleted frames at the moment. Upon investigation, it would appear that any jobs in "Finished" status, were not affected. We believe at this point that only jobs marked as "Waiting for Approval" began rendering.

As I "abort/disabled" the jobs that began rendering, other jobs would start rendering. The status of these jobs before they started rendering were "Render" and the font was Black, as if the jobs had been reset upon starting the server.

(It's worth noting, once our Softimage renders are complete, they are often moved into different folders. So if those jobs were reset, then it would start rendering again, because no frames would be present.)

At the moment, our main issue is... how do we stop these "Wait for Approval" jobs from being reset upon restarting the server? Why is this happening, and how do we prevent it?

Thanks.
Back to top
View user's profile Send private message Visit poster's website

schoenberger
Site Admin


Joined: 02 Mar 2005
Posts: 3786

PostPosted: Tue May 28, 2013 5:29 pm    Post subject: Reply with quote

Ok, I will investigate more into the issue why the jobs are rendering again.
Still, althought the status changes, it should not delete any files. It is executing a command like pressing the check button for a job.

Did you had any "Command to delete file" in the terminal?

Please upload your log file RR/sub/log/server via www.RoyalRender.de/upload_r.php
_________________
Holger Schönberger
Binary Alchemy - digital materialization
Back to top
View user's profile Send private message Send e-mail

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Tue May 28, 2013 5:50 pm    Post subject: Reply with quote

We are not running "rrServerconsole_del" from the terminal, so we are not seeing any terminal output. We just clicked on it to launch, and it is running in the background.

I will upload the server log to you. However, our server log seems to be filling up pretty quick with the following line...

Warning - Object::connect: No such slot rrServerTCP::SlotLogUI(int , int , const QString &, const QString &) rrServer console 6.02.12del []
Back to top
View user's profile Send private message Visit poster's website

schoenberger
Site Admin


Joined: 02 Mar 2005
Posts: 3786

PostPosted: Tue May 28, 2013 6:00 pm    Post subject: Reply with quote

>Object::connect: No such slot rrServerTCP::SlotLog
That is an issue creted by the new features I am currently building into RR. Already fixed in the internal development version.


> from the terminal, so we are not seeing any terminal output.
Ok, so you do not get an information if there was an delete after startup.
_________________
Holger Schönberger
Binary Alchemy - digital materialization
Back to top
View user's profile Send private message Send e-mail

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Tue May 28, 2013 6:06 pm    Post subject: Reply with quote

Correct, we are not getting any output information about deletes occurring.
Back to top
View user's profile Send private message Visit poster's website

schoenberger
Site Admin


Joined: 02 Mar 2005
Posts: 3786

PostPosted: Tue May 28, 2013 7:39 pm    Post subject: Reply with quote

Ok, the only thing that is logged are "placeholder delete" messages for files with 128 bytes size.
You probably have to delete the files yourself if you keep the new server running. (e.g. via rrViewer, sort by file size)
And you have to delete the .lock files left by Softimage.

If you can again restart the server, but this time in a terminal or at best in a terminal and piped into a file, then I can check for more messages.

RR/bin/lx64/rrServerconsole_del > RR/sub/log/_newServerLog.txt

And then send me that file.
_________________
Holger Schönberger
Binary Alchemy - digital materialization
Back to top
View user's profile Send private message Send e-mail

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Tue May 28, 2013 8:21 pm    Post subject: Reply with quote

Done.

We are running it now in the terminal. I've also sent you the _newServerLog.txt file.
Back to top
View user's profile Send private message Visit poster's website

schoenberger
Site Admin


Joined: 02 Mar 2005
Posts: 3786

PostPosted: Wed May 29, 2013 3:57 pm    Post subject: Reply with quote

There is not any message about deleting a frame on startup.
So I will concentrate my investigations why the job was in state "main render" instead of "wait for approval".

Last Question: None of you are using "Do not check for frames" as a job setting in rrSubmitter, do you?
_________________
Holger Schönberger
Binary Alchemy - digital materialization
Back to top
View user's profile Send private message Send e-mail

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Wed May 29, 2013 5:22 pm    Post subject: Reply with quote

No, we are not using the "Do not check for frames" feature.

I've stopped running the rrServerconsole_del now, and also updated to the latest 6.02.12. I will upload you the latest version of the _newServerLog.txt (piped output from _del).
Back to top
View user's profile Send private message Visit poster's website

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Wed May 29, 2013 8:31 pm    Post subject: Reply with quote

I also have some screen captures from yesterday, from when the server was restarted and multiple jobs started to render. They maybe helpful to you.

No jobs were rendering at the time the server was stopped. The images show that the status of many jobs have been reset to "Render" (If they were already like that, they would have been rendering before the server was stopped).

I'll upload those now.
Back to top
View user's profile Send private message Visit poster's website

schoenberger
Site Admin


Joined: 02 Mar 2005
Posts: 3786

PostPosted: Tue Jun 04, 2013 9:06 pm    Post subject: Reply with quote

Hi

I have tested to restart the rrServer with many jobs in approval state after the main render.
I terminated the process, restarted the rrServer, no change.
I only had a change if one or more finished frames have been removed.

Therefore I forbit the rrServer to check any jobs in that approval state on startup.

Just curious: Why do you have so many jobs waiting in approval state?
_________________
Holger Schönberger
Binary Alchemy - digital materialization
Back to top
View user's profile Send private message Send e-mail

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Wed Jun 05, 2013 4:32 pm    Post subject: Reply with quote

Hi,

Why do we have so many jobs waiting for approval. That's a good question! Truth is that we are generating comp's quicker than they can be approved. We have several compositors, and one QC person. Just yesterday, over 100 comps were submitted to RR, and at this point zero have been approved.

Actually yesterday, we had yet another complete power outage (yup, they happen way too often here!). Upon server restart, we had several jobs again resetting themselves.

How do we get this new version of server that forbids the approval state jobs checks?

Thanks.
Back to top
View user's profile Send private message Visit poster's website

schoenberger
Site Admin


Joined: 02 Mar 2005
Posts: 3786

PostPosted: Wed Jun 05, 2013 4:39 pm    Post subject: Reply with quote

I am finishing some last changes. The plan is to release a new version tonight or tomorrow.
_________________
Holger Schönberger
Binary Alchemy - digital materialization
Back to top
View user's profile Send private message Send e-mail

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Wed Jun 05, 2013 4:44 pm    Post subject: Reply with quote

Great! Thanks.

Just as a after-thought, is there any way you could add an option that would disable ALL job checks on startup for the server? Are the jobs checks necessary?
Back to top
View user's profile Send private message Visit poster's website

schoenberger
Site Admin


Joined: 02 Mar 2005
Posts: 3786

PostPosted: Wed Jun 05, 2013 7:35 pm    Post subject: Reply with quote

The job checks are necessary. For example the case that the clients continued to render frames while the rrServer machine was shut down/restarted for maintenance.
_________________
Holger Schönberger
Binary Alchemy - digital materialization
Back to top
View user's profile Send private message Send e-mail

schoenberger
Site Admin


Joined: 02 Mar 2005
Posts: 3786

PostPosted: Thu Jun 06, 2013 3:13 pm    Post subject: Reply with quote

Still working on a bug of a new feature. But the update will definitely this week. As next week is Annecy.
_________________
Holger Schönberger
Binary Alchemy - digital materialization
Back to top
View user's profile Send private message Send e-mail

schoenberger
Site Admin


Joined: 02 Mar 2005
Posts: 3786

PostPosted: Fri Jun 07, 2013 9:45 pm    Post subject: Reply with quote

Ok, the new server 6.2.13 will not check jobs that are finished, disabled or in "wait for done-approval".
_________________
Holger Schönberger
Binary Alchemy - digital materialization
Back to top
View user's profile Send private message Send e-mail

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Mon Jun 17, 2013 4:42 pm    Post subject: Reply with quote

Just installed 6.2.13, but I am receiving the following error in Linux, and unable to start the server...

"error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory"

I'm not seeing that file in the lx64\lib folder, any idea?
Back to top
View user's profile Send private message Visit poster's website

schoenberger
Site Admin


Joined: 02 Mar 2005
Posts: 3786

PostPosted: Tue Jun 18, 2013 10:49 am    Post subject: Reply with quote

I am checking the issue.
_________________
Holger Schönberger
Binary Alchemy - digital materialization
Back to top
View user's profile Send private message Send e-mail

schoenberger
Site Admin


Joined: 02 Mar 2005
Posts: 3786

PostPosted: Tue Jun 18, 2013 1:55 pm    Post subject: Reply with quote

Edit: I found the issue.
_________________
Holger Schönberger
Binary Alchemy - digital materialization
Back to top
View user's profile Send private message Send e-mail

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Tue Jun 18, 2013 3:16 pm    Post subject: Reply with quote

We did manage to make it work, by creating symbolic links to libssl.so.6 and libcrypt.so.6 from the usr/lib64 folder. Probably not the best solution though.
Back to top
View user's profile Send private message Visit poster's website

schoenberger
Site Admin


Joined: 02 Mar 2005
Posts: 3786

PostPosted: Tue Jun 18, 2013 3:19 pm    Post subject: Reply with quote

Please copy the files from
www.RoyalRender.de/download/libssl.zip to RR/bin/lx/lib/

Which symbolic links did you create?
If you have openSSL installed on the machine, then it should work on the machine.
Or did you had openssl version 7?
_________________
Holger Schönberger
Binary Alchemy - digital materialization
Back to top
View user's profile Send private message Send e-mail

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Tue Jun 18, 2013 4:33 pm    Post subject: Reply with quote

.so.6 > .so.10, which I'm guessing is version 10.

On another note, when we restarted the server yesterday, the jobs in "Approval Wait" status did not reset, but some of the old jobs did redo the post scripts that were enabled for those jobs (in this case, they remade DNxHD movies).
Back to top
View user's profile Send private message Visit poster's website

schoenberger
Site Admin


Joined: 02 Mar 2005
Posts: 3786

PostPosted: Tue Jun 18, 2013 4:37 pm    Post subject: Reply with quote

>old jobs did redo the post scripts
To confirm: They have been in approval state?
_________________
Holger Schönberger
Binary Alchemy - digital materialization
Back to top
View user's profile Send private message Send e-mail

mdreams



Joined: 28 Sep 2006
Posts: 215
Location/Company/Country: US

PostPosted: Tue Jun 18, 2013 4:46 pm    Post subject: Reply with quote

I cannot confirm this, but I'm fairly certain that they were in "Approval Wait" status.
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
This forum is locked: you cannot post, reply to, or edit topics.   This topic is locked: you cannot edit posts or make replies.    www.BinaryAlchemy.de Forum Index -> old - RR Questions - v6.x All times are GMT + 1 Hour
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
 
Documento sin título
 



Powered by phpBB © 2001, 2002 phpBB Group



Number of shameful bots caught by Anti-Spam ACP: 1667