Mageia forum

Hello,

Assuming I have access to a directory shared through http (by enabling directory listing), how could I count the directory size in depth?
Is there a tool or a script?

Thank you,
Mike

I don't know of any way to easily or reliably do this over http. If you have SSH or shell access you can use the du command:
http://www.linfo.org/du.html

I don't.
A script to get recursively the files and sum their size would be good.

Well, you can mirror the directory with wget -mk and then measure the local directory size as a workaround, that can be easily scripted.

The idea is to estimate before I download it.
What if it has 250 GB?

You could wget the directory using --spider -S and sum up the content-length.
This should work:

Code: Select all: sum_bytes=0 for content_length in $(wget --spider -S -q http://some-web-page/ 2>&1 | grep Content-Length: | grep -o "[0-9]\+") do sum_bytes=$(($sum_bytes+$content_length)) done echo "scale=2; $sum_bytes/1024/1024 " | bc

if you use -r and -l you can check recursively and define the depth.

wget manpage said something about --spider function needing much more work behaving like a real spider,
also that ContentLenth provided from some webservers is sometimes bogus and makes wget go wild:

man wget wrote:--ignore-length
Unfortunately, some HTTP servers ( CGI programs, to be more precise) send out bogus "Content-Length" headers, which makes Wget go wild, as it thinks not all the document was retrieved. You can spot this syndrome if Wget retries getting the same document again and again, each time claiming that the (otherwise normal) connection has closed on the very same byte.

With this option, Wget will ignore the "Content-Length" header---as if it never existed.

But i don't know really, if you say it works, that would be the solution to the problem.

I tested just tested it, but you are right it may not work always.

Code: Select all: [noneco@nyra2 ~]$ sum_bytes=0 [noneco@nyra2 ~]$ for content_length in $(wget --spider -S -q -r -l2 http://ftp.mandrivauser.de/magazin/ 2>&1 | grep Content-Length: | grep -o "[0-9]\+") > do > sum_bytes=$(($sum_bytes+$content_length)) > done [noneco@nyra2 ~]$ echo "scale=2; $sum_bytes/1024/1024 " | bc 456.02

Another possible thing to take into account is the server you are spidering. There are software (and hardware) firewalls that will block your IP temporarily if you hit a specified number of TCP requests in a specified time frame; or permanently block you if triggered multiple times. I have personally seen some that are overly restrictive and limit to say 8-10 requests in a 4-5 second period. I'm sure not many are that tight, but 15-20 or so connections in a 5 second period seems to be on the tight range of normal. Of course there's really no way from an end user perspective to see this until after it happens to you.

Thank you guys, in my case the script works brilliantly.
I am totally newbie in sh.
Is there a way to make a script to be run like:

Code: Select all: script.sh http://sample/

Also, how could I run some command from a path relative to the script?

Thank you,
Mike

Mageia forum

Get http directory size

Get http directory size

Re: Get http directory size

Re: Get http directory size

Re: Get http directory size

Re: Get http directory size

Re: Get http directory size

Re: Get http directory size

Re: Get http directory size

Re: Get http directory size

Re: Get http directory size