applications/system

condor - HTCondor: High Throughput Computing

Website: https://htcondor.org/
License: Apache-2.0
Description:
HTCondor is a specialized workload management system for
compute-intensive jobs. Like other full-featured batch systems, HTCondor
provides a job queuing mechanism, scheduling policy, priority scheme,
resource monitoring, and resource management. Users submit their
serial or parallel jobs to HTCondor, HTCondor places them into a queue,
chooses when and where to run the jobs based upon a policy, carefully
monitors their progress, and ultimately informs the user upon
completion.

Packages

condor-23.10.2-1.el8.src [11.8 MiB] Changelog by Tim Theisen (2024-10-30):
- Fix for output file transfer errors obscuring input file transfer errors
condor-23.10.1-1.el8.src [11.8 MiB] Changelog by Tim Theisen (2024-10-03):
- Improvements to disk usage enforcement when using LVM
  - Can encrypt job sandboxes when using LVM
  - More precise tracking of disk usage when using LVM
  - Reduced disk usage tracking overhead
- Improvements tracking CPU and memory usage with cgroup v2 (on EL9)
  - Don't count kernel cache pages against job's memory usage
  - Avoid rare inclusion of previous job's CPU and peak memory usage
- HTCondor now re-checks DNS before re-connecting to a collector
- HTCondor now writes out per job epoch history
- HTCondor can encrypt network connections without requiring authentication
- htcondor CLI can now show status for local server, AP, and CM
- htcondor CLI can now display OAUTH2 credentials
- Uses job's sandbox to convert image format for Singularity/Apptainer
- Bug fix to not lose GPUs in Docker job on systemd reconfig
- Bug fix for PID namespaces and condor_ssh_to_job on EL9
condor-23.9.6-1.el8.src [12.2 MiB] Changelog by Tim Theisen (2024-08-08):
- Add config knob to not have cgroups count kernel memory for jobs on EL9
- Remove support for numeric unit suffixes (k,M,G) in ClassAd expressions
- In submit files, request_disk & request_memory still accept unit suffixes
- Hide GPUs not allocated to the job on cgroup v2 systems such as EL9
- DAGMan can now produce credentials when using direct submission
- Singularity jobs have a contained home directory when file transfer is on
- Avoid using IPv6 link local addresses when resolving hostname to IP addr
- New 'htcondor credential' command to aid in debugging
condor-23.8.1-1.el8.src [11.6 MiB] Changelog by Tim Theisen (2024-06-27):
- Add new condor-ap package to facilitate Access Point installation
- HTCondor Docker images are now based on Alma Linux 9
- HTCondor Docker images are now available for the arm64 CPU architecture
- The user can now choose which submit method DAGMan will use
- Can add custom attributes to the User ClassAd with condor_qusers -edit
- Add use-projection option to condor_gangliad to reduce memory footprint
- Fix bug where interactive submit does not work on cgroup v2 systems (EL9)
condor-23.7.2-1.el8.src [11.6 MiB] Changelog by Tim Theisen (2024-05-16):
- Warns about deprecated multiple queue statements in a submit file
- The semantics of 'skip_if_dataflow' have been improved
- Removing large DAGs is now non-blocking, preserving schedd performance
- Periodic policy expressions are now checked during input file transfer
- Local universe jobs can now specify a container image
- File transfer plugins can now advertise extra attributes
- DAGMan can rescue and abort if pending jobs are missing from the job queue
- Fix so 'condor_submit -interactive' works on cgroup v2 execution points
condor-23.6.2-1.el8.src [11.6 MiB] Changelog by Tim Theisen (2024-04-16):
- Fix bug where file transfer plugin error was not in hold reason code
condor-23.6.1-1.el8.src [11.6 MiB] Changelog by Tim Theisen (2024-04-15):
- Add the ability to force vanilla universe jobs to run in a container
- Add the ability to override the entrypoint for a Docker image
- condor_q -better-analyze includes units for memory and disk quantities
condor-23.5.2-1.el8.src [11.6 MiB] Changelog by Tim Theisen (2024-03-14):
- Old ClassAd based syntax is disabled by default for the job router
- Can efficiently manage/enforce disk space using LVM partitions
- GPU discovery is enabled on all Execution Points by default
- Prevents accessing unallocated GPUs using cgroup v1 enforcement
- New condor_submit commands for constraining GPU properties
- Add ability to transfer EP's starter log back to the Access Point
- Can use VOMS attributes when mapping identities of SSL connections
- The CondorVersion string contains the source git SHA
condor-23.4.0-1.el8.src [11.6 MiB] Changelog by Tim Theisen (2024-02-08):
- condor_submit warns about unit-less request_disk and request_memory
- Separate condor-credmon-local RPM package provides local SciTokens issuer
- Fix bug where NEGOTIATOR_SLOT_CONSTRAINT was ignored since version 23.3.0
- The htcondor command line tool can process multiple event logs at once
- Prevent Docker daemon from keeping a duplicate copy of the job's stdout
condor-23.3.0-1.el8.src [13.0 MiB] Changelog by Tim Theisen (2024-01-04):
- Restore limited support for Enterprise Linux 7 systems
- Additional assistance converting old syntax job routes to new syntax
- Able to capture output to debug DAGMan PRE and POST scripts
- Execution Points advertise when jobs are running with cgroup enforcement
condor-23.2.0-1.el8.src [13.0 MiB] Changelog by Tim Theisen (2023-11-29):
- Add 'periodic_vacate' submit command to restart jobs that are stuck
- EPs now advertises whether the execute directory is on rotational storage
- Add two log events for the time a job was running and occupied a slot
- Files written by HTCondor are now written in binary mode on Windows
- HTCondor now uses the Pelican Platform for OSDF file transfers
condor-23.1.0-1.el8.src [13.1 MiB] Changelog by Tim Theisen (2023-10-31):
- Enhanced filtering with 'condor_watch_q'
- Can specify alternate ssh port with 'condor_remote_cluster'
- Performance improvement for the 'condor_schedd' and other daemons
- Jobs running on cgroup v2 systems can subdivide their cgroup
- The curl plugin can now find CA certificates via an environment variable
condor-23.0.0-1.el8.src [12.2 MiB] Changelog by Tim Theisen (2023-09-29):
- Absent slot configuration, execution points will use a partitionable slot
- Linux cgroups enforce maximum memory utilization by default
- Can now define DAGMan save points to be able to rerun DAGs from there
- Much better control over environment variables when using DAGMan
- Administrators can enable and disable job submission for a specific user
- Can set a minimum number of CPUs allocated to a user
- condor_status -gpus shows nodes with GPUs and the GPU properties
- condor_status -compact shows a row for each slot type
- Container images may now be transferred via a file transfer plugin
- Support for Enterprise Linux 9, Amazon Linux 2023, and Debian 12
- Can write job information in AP history file for every execution attempt
- Can run defrag daemons with different policies on distinct sets of nodes
- Add condor_test_token tool to generate a short lived SciToken for testing
- The job’s executable is no longer renamed to ‘condor_exec.exe’

Listing created by Repoview-0.6.6-4.el7