Done issues

files uploaded to action directory with http have .bin extension added, causing heritrix to ignore them
HER-1907
Ambiguity between srcset urls and data:image base64 encoded image
HER-2097
Add support for extracting URLs from img srcset attribute
HER-2094
why do we write the header "WARC-Truncated: length" in warc revisit records?
HER-1701
NPE in BdbMultipleWorkQueues.delete() -- queue stuck?
HER-507
Webserver response 307 to 302 causes infinite redirect
HER-1560
Crawl M3U8 files and capture resources they describe
HER-2095
Are URLs including 'Japanese Full Space' supported?
HER-2089
java 8 keytool issue
HER-2085
CoderMalfunctionError: java.nio.BufferOverflowException
HER-527
Allow submission of non-login GET forms
HER-2078
checkpointing gives error on Windows
HER-1906
HostsReport issues
HER-2084
JVM terminated without running Heritrix.
HER-2087
ServerNotModified WARC revisit records incorrectly record WARC-Payload-Digest
HER-2080
Expand hosts-report.txt with novel bytes, novel urls counts
HER-1500
WARCWriterProcessor no longer prints hop path and link context for outlinks in meta data records
HER-2082
spam
HER-2081
Seeds Report missing redirect URLs for 301 / 302 responses
HER-2076
IllegalStateException "got suspicious value" in IpAddressSetDecideRule when
HER-2074
NullPointerException when getting cookies
HER-2070
Support 'Crawl-Delay' and 'Allow' robots.txt directives
HER-1
WarcWriterProcessor writes full body of revisited items
HER-2071
[junit] [ERROR] TEST org.archive.util.ms.DocTest FAILED (due to testdata folder missing in pre-release tarball)
HER-1092
ByteReplayCharSequence ignores specified/non-default character encodings
HER-1506
Investigate alt parsers (JTidy, HTMLParser, etc.)
HER-751
NPE in FetchHTTP.saveCookies
HER-572
Heritrix robot violating robots.txt and robots META tags at French National Library
HER-1263
StripSessonCFIDs missing from modules/BaseRule.options
HER-1140
heritrix fails to save accept-headers in an override
HER-580
[maven] [build] maven2 build on Mac/Windows may not find libraries, classes necessary (esp. JSPC JSP precompiling step)
HER-1186
move from Filters to DecideRules is done, but still no replacement for ContentTypeRegExpFilter exists
HER-1095
Adding decide rules in an override (via UI, or via an already-existing settings tree) does not work.
HER-1139
extend WARC metadata to contain source-seed
HER-1093
BdbModule triggers an IllegalStateException("Database already exists : uri_history") when used with PersistStoreProcessor/PersistLoadProcessor for recrawl
HER-1673
NoClassDefFoundError when starting a job
HER-104
reports (web ui and to disk) don't scale
HER-443
Cannot set cachePercentage in bdbje JMX bean
HER-461
Single settings change causes two versions to be created
HER-120
FCE on creation of new job based on job w/ overrides
HER-389
je 'log buffers are 2730 bytes' /totalBufferBytes alert
HER-428
Carry forward (& log) 'originating URL/seed' for all URLs
HER-893
default launch should nohup, save stdout/stderr
HER-681
UnsupportedCharsetException handled awkwardly
HER-153
Empty log percentages displayed as NaN%
HER-192
maven-only build rather than ant & maven
HER-744
[uuri] String index out of range: 0
HER-309
NoSuchElementException in URI queues halts crawling
HER-175
Heritrix crawl by link depth from root page
HER-656
ExtractorCSS regexp taking 'forever' on small document
HER-342
issue 1 of 1564

files uploaded to action directory with http have .bin extension added, causing heritrix to ignore them

Description

Attempting to PUT a .seeds file to the action directory of a running crawl using curl:

curl -v -k -u 'adminASSWORD' --anyauth --location -H"Accept: application/xml" -T/tmp/foo.seeds https://localhost:8443/engine/job/test-job/jobdir/action/foo.seeds

Works, except it gets saved as foo.seeds.bin. According to -v output, curl is not doing this, so I'm guessing it's the restlet library. Result from heritrix is "WARNING action file ignored"

Environment

None

Status

Assignee

Noah Levitt

Reporter

Noah Levitt

Labels

None

Group Assignee

None

ZendeskID

None

Estimated Difficulty

None

Actual Difficulty

None

Priority

Major
Configure